Digest #329 2026-05-19
Contributions
requestClarification - Resizing to/from Zero Address Units
No mention is made, here or for ALLOCATE, as to what happens when the specified size is zero. I can see cases where data structures may do so (perhaps inadvertently). Can we assume that such things will work properly? Should it be explicitly stated to work, or is it assumed to do so, since there is no verbiage stating otherwise?
(If the standard declared that resizing to 0 units must return the a-addr2 equal to 0, then RESIZE would be the only necessary word, as ALLOCATE could be 0 SWAP RESIZE and FREE could be 0 RESIZE NIP)
Replies
From the description in proposal section, the execution should be ' NAME_OF_WORD PERFORM rather than PERFORM ' NAME_OF_WORD
Anyway I don't see the reason for this new word, you already can execute a native word with EXECUTE, the aleged reason to be an alternative to CODE / END-CODE is related to definition rather than execution and thus the responsability relies in word " : " or some other word similar to CODE / END-CODE so , why reinvent the wheel ?
It is not clear what the problem is that you want to solve.
A word defined with code ... end-code can be executed. It has to be defined in a system-specific way with a system-specific way to end the word and proceed to the code behind it; in a threaded-code system, code words end in the NEXT sequence, in a native-code system, they usually end with a kind of return-from-subroutine instruction.
You seem to want to define words that are invoked with jalr and that return to the instruction behind the jalr instruction, i.e., that end in a return-from-subroutine instruction, and want to call them in a threaded code system. It's not clear why you want to do that.
You might be interested in "ABI-CODE: Increasing the portability of assembly language words", but despite the fact that abi-code is in Gforth, I don't think there is enough common practice (or usage) to merit standardization at this time. If you implement abi-code, it becomes more common. But abi-code words can be executed and do not need a separate word for that purpose.
The current proposal uses the W prefix for an unsigned word fetch. Past and current systems have provided W@ with either the unsigned fetch or signed fetch semantics. Current systems which use W@ for unsigned word fetch include Gforth and lxf64. Past and current systems which use W@ for signed word fetch include LMI 80386 UR/Forth and kForth-32/64. In keeping with conventional standardization process that standardization should avoid the possibility of breaking existing code currently in use among the different implementations, I recommend against standardizing W@.
A simple remedy of using a "U" prefix for all of the unsigned fetches woud have the following benefits:
- avoid breaking older code and possibly existing code for past and present Forth implementations.
- no increase in the number of memory access words for the proposal, or other side effects.
- advantage of providing a hint to the programmer for names of unsigned fetch words, thereby preventing programming mistakes.
Thus, the currently proposed words for standardization, W@ L@ and X@ , would be renamed consistenly to UW@ UL@ and UX@ .
Past and current systems which use
W@for signed word fetch
So, in those Forth systems, C@ reads a character without sign-extension, but W@ reads a word with sign extension. This seems like an inconsistency. To remain consistent, they should have used the name SW@.
Thus, the currently proposed words for standardization,
W@L@andX@, would be renamed consistenly toUW@UL@andUX@.
But this introduces a mismatch between these words and C@ (that does not have a "U" prefix). That is, we are carrying over a naming inconsistency (a naming mistake) from some past Forth systems to all standard Forth systems. I would like to avoid this.
This was the reason I made W@ and L@ zero extended in lxf64 and introduced <C@ <W@ and <L@ for the sign extended versions!
This code is meant as a reference implementation for bracket_IF and friends
(NB: the right place for that suggestion — [if] glossary entry, or another bracket-flow word)
I think, that implementation for [if]-family words conflicts with the specification. Particularly, it depends on the search order and relies on the fact that synonyms have the same xt. But the standard does not have such requirements.
The following test case should pass on a standard Forth system (in the default environment), but will fail on that implementation:
vocabulary foo : only-foo ( -- ) only foo ;
also foo definitions
synonym [if] [if]
synonym [then] [then]
synonym only only
previous definitions
t{ only-foo 0 [if] 0 [else] 1 [then] only forth -> 1 }t
Have a look at another implementation that is correct and does not use string comparison.
requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?
@EricBlake wrote on 2025-07-30:
whether it might be possible, on a system where a-addr is always a positive value, to encode the difference between interpretation and compilation semantics, and/or identify IMMEDIATE words, based on whether an xt is positive or negative.
Technically, this is certainly possible. From the standard point of view, the system itself determines what additional information pieces are associated with execution tokens (or other identifiers) and what method is used to associate these pieces. On the other hand, a standard program cannot access this information from an xt.
The question is in what cases does that make sense?
Formally, immediacy, interpretation semantics and compilation semantics do not apply to execution tokens. They apply to named definitions and name tokens (nt) only. So, this association method for mentioned pieces could make sense if xt is implemented as a subtype of nt.
For example, a negative xt might mean that the system will throw an exception if this xt is executed in interpretation state. I.e., this behavior is the implemented interpretation semantics for the corresponding word (when its interpretation semantics are undefined by the standard). Note that «compile,» must be equivalent to «lit, postpone execute».
A negative xt may mean that the corresponding word is immediate, but this does not affect the implementation of either compile, or execute.
Other possible means of a negative xt value:
- this may indicate that the system has an auxiliary function to efficiently compile this definition (information for
compile,); - this may indicate that the definition is implemented in machine code (information for
execute); - this may indicate that the definition is implemented in threaded code and the address interpreter must be used to execute this definition (information for
execute).
In general, I like that idea. Introduce a number of categories (labels) and mark each word with categories to which it belong.
Some additions: compare should also be in "Comparison", the words like <, > (that apply to integer numbers) should also be in "Arithmetic", the words like cmove, move, fill, erase should also share some category with @ and !.
So, in essence, this proposal should suggest the list of categories, and the list of pairs ({word}, {category}).
But, I don't sure that this should be a part of the standard document. Perhaps it is enough if this will be implemented and published independently.
Regarding the names <C@ <W@ and <L@, if I have to use these words, I would prefer names without a special character.
Some ideas:
C@SW@SL@S, where the trailingSdenotes an adjective "with sign extension" (i.e., "fetch with sign extension")SC@SW@SL@, whereSC,SW, andSLdenote the data types "signed character", "signed 16-bit value", and "signed 32-bit value", respectively.
@ruv One might consider C@ as an inconsistency or as just a special word for fetching primitive char type. For consistent naming one could add UB@ for unsigned byte fetch.
translation: The result of a recognizer; [...] it's a type that consists of a translation token at the top of the data stack and additional data on various stacks;
The standard uses the primitive notion "data object", and the term "data type" (an identifier for the set of values that a data object may have). What is the rationale to introduce the new term "translation token", if if fact, it is a data object on the data stack and floating point stack, and a data type identifier (a single-cell value) at the top of the data stack.
On 2025-10-01 I suggested to introduce the term "qualified data object" using the existing terms "data object" and "data type". That seems much better.