Digest #330 2026-06-14
Contributions
16.3.3 Finding definition names says:
When searching a word list for a definition name, the system shall search each word list from its last definition to its first.
The bolded passages (emphasis mine) appear to contradict each other. I think, one of the following was intended:
- When searching a word list for a definition name, the system shall search the word list from its last definition to its first.
- When searching the search order for a definition name, the system shall search each word list from its last definition to its first.
Which one? (I'm inclined to the first one)
Replies
Concerning consistency, in cases where there are signed and unsigned versions of words, the unsigned version has the prefix u and the signed version usually has no prefix, e.g., u< < um* m*. C@ has no prefix only because there is no signed version.
Anyway, consistency is not paramount, avoiding conflicts with existing practice is. So I lean towards naming the new fetching words uw@ ul@ ux@.
"Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about.
And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description; it also helps in thinking when you think about it as one unit and not a collection of smaller data. And in order to have these advantages, we introduce the name. Actually, this whole paragraph is an explanation of why we use abstractions.
requestClarification - Resizing to/from Zero Address Units
That's a good point. AFAIK POSIX and C have tightened the requirements (i.e., given more guarantees to users) for malloc() and realloc(). It may be a good idea to look at what standard C guarantees now and maybe tighten allocate and resize, too. As for existing practice, many Forth implementations call malloc() and realloc(), respectively, so they implement these guarantees already.
requestClarification - Resizing to/from Zero Address Units
But who is going to write the proposal?
"Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about.
In this context, by qualification I mean assigning/adding a data type tag.
If I understand your correctly, your idea is that we can qualify the same data object in different ways for different purposes. And for the purpose of translation we should use the identifiers of translation. But for a different purpose we should use identifiers of another kind for qualification the same data object.
I use data type conversion, for example, the word qany>xt ( qany -- xt ), where qany is a qualified data object. With your approach, I would have to implement a separate conversion method for each new purpose of the data object use.
But, if we have a name token nt, then it is a name token, regardless of the purpose of its use. We always are talking about name token. And any conversion to xt (to the single execution token of a word) is the same, regardless of an external purposes.
Assigning different data type tags to the exact same data object (of the same type) makes data conversion, mapping, and integration significantly harder. It forces us to write unnecessary conversion logic for things that are fundamentally identical.
Could you elaborate your point now?
And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description;
Yes, sure! For this reason, I suggested (comp.lang.forth, 2020-12-09) to formally introduce a separate data symbol sd for a character string ( c-addr u | 0 0 ).
But in such cases as
rec-name ( c-addr u -- translation )rec-float ( c-addr u -- translation )
although the data type translation is correct, it is too widen, and it does not allow the reader to distinguish this specific recognizer from other recognizers.
So, I prefer to specify the stack diagram (an arrow type) for this recognizer as:
rec-name ( sd -- nt td-nt | 0 )
Note that the fact that ( nt td-nt ) is a subtype of translation follows from the data type relationships (i.e., by definition).
Then, it makes sense to introduce UB@ as @KrishnaMyneni suggested. Using of UB@ emphasizes that we're working with 8-bit bytes, not characters.
Please note, in practice, 0 is not an addr. And we have a proposal to formally exclude zero from the addr data type.
It should probably be stated that if a program relies on resize never returning 0 on success, then it has an environment dependency.
And, in the stack diagrams we should indicate 0 separately from addr.
For resize it should be:
( a-addr1|0 u -- a-addr2|0 ior )
Or, a more arrow option:
( a-addr1|0 u\0 -- a-addr2 ior | a-addr1|0 0 -- a-addr2|0 ior )that is, it may return0instead ofa-addronly when the new size is0.
An even more arrow option:
( a-addr1|0 u\0 -- a-addr2 0 | a-addr1|0 0 -- a-addr2|0 0 | a-addr1|0 0 -- x x ior\0 )that is, if the top output parameter is not0, other two output parameters are unspecified cells.
- Its exactly: " ' NAME_OF_WORD PERFORM " as: tick places the xt of name_of_word in top of data stack and PERFORM makes a jump/call to the address at top of stack
- The use is clear, as above. Do the same what EXECUTE does but using native code assembler, not Forth code, and not only at end of a word
But then should not we use the names uw>s, ul>s, ux>s instead of w>s, l>s, x>s?
If the standard declared that resizing to 0 units must return the a-addr2 equal to 0
Regarding must return 0. It seems, this would make most existing systems non-standard and would complicate some implementations.
There is no point in ub@, we already have c@, which (with the accepted 1-chars-is-1 proposal) is standardized to do what ub@ would do, on most machines; on machines with wider aus, one might want a b@ (or ub@) that masks the extra bits. I once considered adding such a word (under the name b@), but eventually decided against it. After an earlier draft Leon Wagner had implemented b@ in SwiftForth, but when I apologized for changing my mind, he said that he actually agrees that we don't need b@.
Concerning uw>s etc., there are no conflicts with the name w>s, and there is some existing practice for w>s. Plus, the point of uw@ w>s is that we actually want to load a signed number with this sequence, so the result of the uw@ in this case is a zero-extended signed number that w>s converts into a sign-extended signed number; does uw>s reflect that meaning better than w>s?
requestClarification - Resizing to/from Zero Address Units
I have now looked up C23. It says (for all allocation functions it defines):
If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
It says about realloc():
If ptr [the input pointer] is a null pointer, the realloc function behaves like the malloc function for the specified size.
It also says:
Otherwise, [...] if the size [the input parameter for the new size] is zero, the behavior is undefined.
That sounds pretty idiotic and contradicts the general guarantee; interestingly, for malloc(), C23 does not undefine the result if size is zero.
POSIX-2024 gives some additional guarantees, but they are marked as obsolescent, so it's not a good idea to take these as inspiration for future Forth standards.
I think that if we want to say anything about the behaviour if u=0, it should be the general guarantee of C23.
What we should be adding to resize is the guarantee that realloc() makes when ptr is a null pointer.
the point of
uw@ w>sis that we actually want to load a signed number with this sequence, so the result of theuw@in this case is a zero-extended signed number thatw>sconverts into a sign-extended signed number; doesuw>sreflect that meaning better thanw>s?
This word is similar to other type conversion words d>s and f>s. Note that "s" in these words denotes "signed single-cell integer number".
In the word name w>s, "w" would denote a singed 16-bit integer number in native byte order, and we convert it to a signed single-cell integer number.
And uw@ would mean that we read 16-bit without interpretation. Looks good.
Should not we change wbe to uwbe?
Rationale: the prefix uw would better emphasize that the input parameter is a bit pattern (without other interpretation).
Then, the sequence uw@ uwbe w>s means that we interpret the read value as 16-bit signed number in big-endian order (network order).
And the sequence uw@ uwbe means that we interpret the read value as 16-bit bit number in big-endian order.
The words uw! and w! would be synonyms due to sign encoding, so only one of them is sufficient. I would prefer uw!, because it should be used after change byte order: ... uwbe uw! (in implementing of network protocols), and because it better matches uw@.
Author
Ruv
Change Log
(the latest at the bottom)
2019-10-08: Initial version
2020-08-28: Avoid ambiguous clause "xt is the execution token for name" in the case of a word with non default interpretation semantics.
2021-04-18: Allow to return the different xt for any definition. More tight meaning of n in interpretation state. Avoid "implementation-dependent definition" and make the wording simpler.
2021-05-06: Correct meaning of n in interpretation state:
iff n is -1,
then xt identifies the execution semantics for name.
Eliminate the "default interpretation semantics" notion from the normative part.
2026-05-31: Major update.
Describe problems.
Simplify normative text using the updated execution semantics term description.
Update the glossary entry for search-wordlist.
Problem
The proposal
[251] Clarification for execution token
already addresses the problems related to the lack of ambigous conditions in find and search-wordlist.
The remaining problems concern cases where
find returns different xt values depending on STATE.
The rationale A.6.1.1550 for 6.1.1550
FINDexplains that a word may exist in two versions: a compiling version and an interpreting version. This means that each version of such a word has its own execution token that identifies its own semantics, and the phrase "its execution token" may refer to one of these versions depending on STATE. However, the normative parts of the standard imply that a standard word has at most one execution token, which identifies the execution semantics of the word. So, the wording in thefindspecification leads to confusing.The phrase "if the definition is immediate" is misleading because, according to the rationale, it may refer to different versions of the word depending on STATE, but the normative parts of the standard does not reflect this conception.
Despite find-name has been standardized,
it is still worth clarifying the semantics of find
as find is provided more broadly than find-name
and is usually implemented in new Forth systems
(where the standard is used as the reference).
For example, Forth implementations hosted on GitHub
provide find more than twice as often as find-name.
It should be noted that
incompatibilities caused by the mentioned problems
can only occur on Forth systems where find depends on STATE,
while on most Forth systems find does not depend on STATE.
In turn,
a problem with search-wordlist can arise in cases
where find for the same word depends on STATE.
In some such cases the top output paramenter of search-wordlist
is 1,
but the word is not immediate
(i.e., performing its execution semantics in compilation state
does not perform the compilation semantics for the word).
Note: if a word is immediate, performing its execution semantics in compilation state performs the compilation semantics for the word.
Solution
Update the glossary entry for find.
Update and harmonize with find
the glossary entry for search-wordlist
(see-also my comment).
Avoid referring to the execution token of the compiling version of a word (if any) as the execution token of the word ("its execution token").
Avoid using the term "immediate". Instead, specify how to perform the compilation semantics for the word.
Proposal
Update find in the Search-Order word set
In the glossary entry
16.6.1.1550 FIND
(in the optional
Search-Order word set),
remove the semantic description,
except the "See also" sub-section.
Rationale
This glossary entry duplicates the glossary entry for
core find,
with only one difference: it mentions the search order.
However, this is no longer necessary,
as the term "find" is now updated by the Search-Order word set,
per the proposal
[115] Remove the “rules of FIND”
accepted in 2020.
The glossary entry itself should be kept to contain
the reference implementation
E.16.6.1.1550 FIND
from Annex E: Reference Implementations
and the test
F.16.6.1.1550 FIND
from Annex F: Test Suite.
See also: the comment [r1372] by Anton Ertl on 2024-11-25.
Note: If we remove the glossary entry 16.6.1.1550,
we should also remove the corresponding reference implementation and test,
but they are useful since find is actually indirectly updated
by the Search-Order word set
through updating the definition of the term "find".
Update find in the Core word set
In the glossary entry
6.1.1550 FIND,
replace the
semantic description
with the following:
( c-addr -- c-addr 0 | xt n )
Find a named Forth definition whose name matches the counted string at c-addr.
If the definition is not found, return c-addr and zero.
Otherwise, return the execution token xt
and n, which is either -1 or 1.
For a given string, the values returned while compiling may differ from those returned while interpreting.
If a definition is found, the following conditions shall be met:
If interpreting, xt is the execution token of the found definition, otherwise the relation between xt and the found definition is implementation dependent.
If n is
-1, appending the execution semantics identified by xt to the current definition performs the compilation semantics for the found definition.If compiling and n is
1, then:- Executing xt in compilation state performs the compilation semantics for the found definition.
- An ambiguous condition exists if xt is executed
in interpretation state and at least one of the following conditions is true:
- a) interpretation semantics for the found definition are undefined by this standard;
- b) xt is not the execution token for the found definition.
Note. A definition may be found while compiling but not found while interpreting.
See also:
3.4.2 Finding definition names,
3.1.3.5 Execution tokens,
3.4.3.1 Execution semantics,
3.4.3.2 Interpretation semantics,
3.4.3.3 Compilation semantics,
A.6.1.1550 FIND,
A.3.4.3.2 Interpretation semantics.
Rationale
There is no need to repeat the ambiguous conditions declared in 3.4.3.1 Execution semantics (the updated version).
Update rationale for find in Core word set
In the section
A.6.1.1550 FIND,
add the following paragraphs at the end:
According to the rules for the values returned by find,
the following conditions are met.
If n is always
-1for a word (regardless of STATE), then xt always identifies the same semantics.For an ordinary word in a single-xt system, n is always
-1and xt is always the same (regardless of STATE).For an ordinary word in a dual-xt system, n is
-1while interpreting, but may be1while compiling (in which case xt changes).For an immediate word, n is always
1, xt may change in a dual-xt system (but typically it is always the same).For a word with defined interpretation semantics and special compilation semantics (like
toands") in a dual-xt system, n is always1and xt may change depending on STATE.
Update search-wordlist
In the glossary entry
16.6.1.2192 SEARCH-WORDLIST,
replace the
semantic description
with the following:
( c-addr u wid -- 0 | xt 1 | xt -1 )
Find a named Forth definition whose name matches the character string identified by ( c-addr u ) in the word list identified by wid.
If no such definition is found, return zero; otherwise return one of the other two options, where:
- xt is the execution token for the found definition;
- the top output parameter is minus-one (
-1) if appending the execution semantics identified by xt to the current definition performs the compilation semantics for the found definition; otherwise the top output parameter is one (1).
See also:
3.4.2 Finding definition names,
3.1.3.5 Execution tokens,
3.4.3.1 Execution semantics,
A.6.1.2192 SEARCH-WORDLIST.
Rationale
There is no need to repeat the ambiguous conditions declared in 3.4.3.1 Execution semantics (the updated version).
Update rationale for search-wordlist
In the section
A.6.1.2192 SEARCH-WORDLIST,
add the following paragraphs at the end:
If the found definition is an immediate word,
then the top output parameter is 1.
However, if the top output parameter is 1,
the found definition is not necessarily an immediate word,
since it may be a word (not a user-defined word in a standard program)
whose compilation semantics are implemented
using another definition.
If and only if the top output parameter is -1,
the found definition is an ordinary word
(a word with default interpretation semantics and default compilation semantics).
See also:
A.6.1.1550 FIND,
3.4.3.2 Interpretation semantics,
3.4.3.3 Compilation semantics.
Consequences
All classic Forth systems comply with this change.
Some dual-xt Forth systems provide an implementation for find
that is not comply with this change.
They should be updated to fix find or remove it.
Testing
See find.test.fth.
I agree with Anton that UW>S is not needed since the W in W>S just indicates a word type of either signed or unsigned. For an unsigned word on the stack, W>S will sign extend it, and for a sign-extended word on the stack, W>S will have no effect.
But if we want to standardize the pure postfix variant of
create, why don't we standardize the postfix variants of other defining words?
Complementing :, there is :noname for create a nameless colon-definition.
However, there are no analogous methods for creating nameless definitions of the same kind as those created by the words create and defer.
I would suggest the following words (the names are tentative):
create-xt ( -- xt1 )- xt1 Execution:
( -- a-addr.data-field )
a-addr.data-filed is the address of the data field associated with xt1. The execution semantics of xt1 may be extended by usingdoes! ( xt2 xt1 -- ).
- xt1 Execution:
defer-xt ( -- xt )- xt1 Execution:
( any1 -- any2 )
Execute the xt that xt1 is set to execute. If xt1 has not been set to execute an xt, an exception -82 is thrown.
- xt1 Execution:
enlist ( xt1 sd.name -- )- Place a named definition into the compilation word list; the definition's name matches the character string sd.name, and the definition's execution semantics are equivalent to the execution semantics identified by xt1.
- Rationale:
- It does not guarantee that that the execution token of a new definition is the same as xt1, since in some implementations xt is a subtype of nt (or even these types are equivalent), that is, the word
name>is a nop. - It might allow to place into the compilation word list a named definition whose name is an empty string or a string containing whitespace or control characters. Then, such a definition can be found using
find-name-inorsearch-wordlist, but the Forth text interpreter will not find it.
- It does not guarantee that that the execution token of a new definition is the same as xt1, since in some implementations xt is a subtype of nt (or even these types are equivalent), that is, the word