Digest #280 2024-09-04
Contributions
3.3.3.6 Other transient regions specifies that the region for WORD shall be at least 33 characters, but an ambiguous condition is only to occur if the word exceeds the maximum length of a counted string (usually quite longer). What is the behavior for words of intermediate length? Ambiguous condition (my understanding is this is the case traditionally)? Truncation?
If I'm understanding correctly this also means that an implementation that, e.g., returns counted strings where only the region they occupy is available to the program even if this should be shorter than 33 characters is not compliant?
Replies
requestClarification - State of other stacks after ABORT
Language similar to "empty the floating-point stack if it exists" has been used elsewhere and seems clear.
requestClarification - Behavior of EMPTY-BUFFERS when BLK is nonzero
The section 3.3.3.5 Input buffers says:
The address and length returned by
SOURCE
, the string returned byPARSE
, and directly computed input-buffer addresses are valid only until the text interpreter does I/O to refill the input buffer or the input source is changed.
(emphasis mine)
On the other hand, LOAD
restores the prior input source specification. Consequently, while the input buffer is being interpreted by LOAD
, the input source remains the same for every word encountered by the Forth text interpreter in that input buffer, no matter how many blocks are loaded by nested LOAD
calls. And while REFILL
(or RESTORE-INPUT
) has not been executed, the directly computed input-buffer addresses shall be valid, contiguous, and their contents shall be the same for every encountered word in that input buffer.
This requirement can be violate in the following conditions (depending on particular implementations):
A block buffer was assigned to a block using
BUFFER
, then the block buffer was filled by some contents (without marking this block buffer asUPDATE
ed, intentionally), thenLOAD
was called for that block. Since a block buffer was already assigned to that block,LOAD
will not read the block from mass storage device and just interprets this block buffer (see the request for clarification #180). Then, if this block buffer will be assigned to another block in a nestedLOAD
, the contents of this input buffer might not be restored.If a block buffer that is the input buffer will be unassigned using
EMPTY-BUFFERS
(maybe even in a nestedLOAD
), the block being interpreted might be assigned a different block buffer, and the input-buffer addresses that was computed before that can become invalid, their contents can be modified, or further computed input-buffer addresses will not be contiguous with the previous ones (even if the previous ones are still valid and their contents are unchanged).
All of these edge cases are handled correctly if a block buffer being interpreted is locked from unassigning until its interpretation is complete.
Perhaps this approach could be promoted as a new requirement for systems (see my post in ForthHub). Nowadays, memory limitations are not so restrictive.
proposal - Exclude zero from the data types that are identifiers
Concerning zero as a file identifier
In POSIX, 0
is a possible value for a file descriptor, and this file descriptor is the standard input (stdin), by definition. A program can close this file descriptor, and then this file descriptor will be reused and returned by the next file open operation. And the next open file will be the standard input of the process, anyway. See also the StackOverflow question: Is it possible that linux file descriptor 0 1 2 not for stdin, stdout and stderr?
If a Forth system provides raw file descriptors to a program, and the program closes the fileid 0
, the next file opened with open-file
, create-file
, or included
will be the standard input for the process (since its file descriptor is 0
) and probably the user input device for the Forth system. So, interpretation of source-id
will be consistent. But this program is not a standard program and consequences are outside of the standard anyway. So, a standard program is not able to get the fileid 0
from open-file
or create-file
words.
Concerning zero as an address
There is one case when zero is possible instead of an address: for a character string of zero length.
The character string is defined as a pair ( c-addr u ). If we exclude zero from the address data type, then ( 0 0 ) is not a valid character string. But nothing wrong can occur if the value ( 0 0 ) is passed to a program that expects a character string stack parameter. And this fact is often used in practice.
Therefore, the value ( 0 0 ) should belong to the character string data type.
So I would suggest also updating the character string data type as: ( c-addr u | 0 0 ). And add a note that for brevity in stack diagrams, when ( c-addr u ) denotes a character string, the value pair ( 0 0 ) is also allowed.
Is there an example of some useful construct where cs-drop
is used to drop an orig?