Digest #312 2025-08-21
Contributions
Is the behavior of this word:
: a ." multiline?
s" [char] " ;
ambiguous, dependent on the context of whether the compilation occurred from user input vs. evaluate, block, or a file? Or asked differently, is there any difference in compliance between an implementation that behaves as if it has automatic refill even from user input (which would output the string represented as s\" multiline\ns"
plus push the integer '"'
to the stack), vs. an implementation that terminates the search for the closing " at the newline (which would output the string s" multiline?"
plus pushing a c-addr and length 7 to the stack representing "[char] ")?
https://github.com/ForthHub/discussion/discussions/126 mentions how to portably deal with \ in multiline input contexts, but doesn't mention other scenarios where multiline vs. single line interpretations can alter how a word would be defined.
requestClarification - Is behavior ambiguous if a name cannot be parsed?
In my single-xt Forth implementation, my initial code for [UNDEFINED]
reused code for '
, and as a result,
: a S" [UNDEFINED]" EVALUATE ; a
resulted in -16 throw (attempt to use zero-length name) rather than pushing TRUE to the stack. But when I compared with gforth, I noticed that gforth throws -16 for ' but happily returns TRUE for [UNDEFINED] on a zero-length name when input is exhausted. Similarly, SwiftForth returns TRUE for [UNDEFINED] while leaving the stack empty after bare '; inspecting further sees that ' ' catch
leaves -2 on the stack (but the only indication that '
on a line by itself tried to abort" is that the prompt is ? rather than ok; no useful message appears, and not the -16 that gforth produces); while ' [undefined] catch
leaves -1 0 on the stack. Does the standard already include this in the realms of ambiguous behavior, or is supporting the use of [UNDEFINED] at the end of the input buffer on a zero-length name common enough practice to be a bug in my implementation, even though such a name cannot be ticked? It's already clear that the future FIND-NAME should allow a search for a zero-length name rather than throw, https://forth-standard.org/proposals/find-name#reply-1481, but that's because FIND-NAME is not a parsing word, and it is a different matter to supply an explicit zero-length string as an argument than it is to encounter end-of-input-buffer in a parsing word. This becomes more relevant when considering that a future standard is likely to require more consistency on when particular throw conditions occur, rather than leaving behavior ambiguous: https://github.com/ForthHub/standard-evolution/issues/2
The webpage for 11.6.1.2165 inlines this Rationale quote from A.11.6.1.2165 S":
Since an implementation may choose to provide only one buffer for interpreted strings, an interpreted string is subject to being overwritten by the next execution of S" in interpretation state.
But not visible on the page other than through a link to 11.3.4 is this:
The system provides transient buffers for S" and S" strings. These buffers shall be no less than 80 characters in length, and there shall be at least two buffers. The system should be able to store two strings defined by sequential use of S" or S".
As the Rationale is non-normative, but the chapter 11 header is a requirement, it is obvious that it is the Rationale that needs updating for consistency. The existing test listed here pre-dates version 0.11 (dated April 2015) of Gerry Jackson's testsuite which adds several tests of having at least two distinct buffers. https://github.com/gerryjackson/forth2012-test-suite/blob/9773f84dd123/src/filetest.fth#L235
Replies
referenceImplementation - Suggested reference implementation
Of course, doing this with a manual loop one char at a time can be slow. If you have an implementation where CHARS does not modify its input (that is, a char occupies one addressable unit), this is likely to be faster:
: FILL ( c-addr u char -- )
OVER 0= IF DROP 2DROP EXIT THEN \ special case for 0
#2 PICK C! \ populate first char
OVER SWAP 1 /STRING ( c-addr c-addr+1 u-1 )
CMOVE
;
referenceImplementation - Suggested reference implementation
where
CHARS
does not modify its input (that is, a char occupies one addressable unit)
This solution using CMOVE
should work regardless of the char size, because CMOVE
copies characters (not address units like MOVE
), and 1 /STRING ( c-addr1 u1 -- c-addr2 u2 )
produces c-addr2 that is c-addr1 + char-size (if correctly implemented).
Yes, this definition for a
is ambiguous.
that behaves as if it has automatic refill
In general, the observable effects of standard words are always specified, and they cannot do anything that is observable (by a standard program) but was not specified.
Therefore, the words «s"
», «."
», «parse-name
», «[']
», «to
», etc, cannot refill the input buffer, because this is not specified (and can be detected by a standard program). Thus, according to their specified behavior, they can extract a multi-line text (or a lexeme on another line) from the input buffer when the input source is a string or a block, but cannot do so when the input source is a file or the user input device (since in these cases the input buffer always contains a single line).
The rationale A.6.2.2295 to the word «to
» says "Therefore TO
and name must be contiguous and on the same line in the source text". Perhaps we should formally introduce such a rule as the corresponding ambiguous condition for all parsing words that never refills the input buffer.
requestClarification - Is behavior ambiguous if a name cannot be parsed?
I would treat the example is ambiguous.
The section 4.1.2 Ambiguous conditions says that an ambiguous condition can occur due to "unexpected end of input buffer, resulting in an attempt to use a zero-length string as a name". And in turn, on an ambiguous condition, any behavior is allowed (3.4.4).
Thus, different Forth systems may exhibit different behavior. And both of the actions mentioned — returning a flag and throwing an exception — are standard compliant.
I think in the future we should stick to throwing -16
in this case, or, better, introduce a new throw code "unexpected end of input buffer" that can be used for s"
, [char]
, and other words as well.