Digest #330 2026-06-14

Contributions

[430] 2026-06-13 08:00:01 ruv wrote:

requestClarification - Wording in 16.3.3 Find definition names

When searching a word list for a definition name, the system shall search each word list from its last definition to its first.

The bolded passages (emphasis mine) appear to contradict each other. I think, one of the following was intended:

When searching a word list for a definition name, the system shall search the word list from its last definition to its first.
When searching the search order for a definition name, the system shall search each word list from its last definition to its first.

Which one? (I'm inclined to the first one)

Replies

[r1663] 2026-05-19 05:29:49 AntonErtl replies:

proposal - Special memory access words

Concerning consistency, in cases where there are signed and unsigned versions of words, the unsigned version has the prefix u and the signed version usually has no prefix, e.g., u< < um* m*. C@ has no prefix only because there is no signed version.

Anyway, consistency is not paramount, avoiding conflicts with existing practice is. So I lean towards naming the new fetching words uw@ ul@ ux@.

[r1664] 2026-05-19 05:39:39 AntonErtl replies:

proposal - Recognizer committee proposal 2025-09-11

"Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about.

And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description; it also helps in thinking when you think about it as one unit and not a collection of smaller data. And in order to have these advantages, we introduce the name. Actually, this whole paragraph is an explanation of why we use abstractions.

[r1665] 2026-05-19 05:51:04 AntonErtl replies:

requestClarification - Resizing to/from Zero Address Units

That's a good point. AFAIK POSIX and C have tightened the requirements (i.e., given more guarantees to users) for malloc() and realloc(). It may be a good idea to look at what standard C guarantees now and maybe tighten allocate and resize, too. As for existing practice, many Forth implementations call malloc() and realloc(), respectively, so they implement these guarantees already.

[r1666] 2026-05-19 05:51:40 AntonErtl replies:

requestClarification - Resizing to/from Zero Address Units

But who is going to write the proposal?

[r1667] 2026-05-19 10:10:42 ruv replies:

proposal - Recognizer committee proposal 2025-09-11

"Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about.

In this context, by qualification I mean assigning/adding a data type tag.

If I understand your correctly, your idea is that we can qualify the same data object in different ways for different purposes. And for the purpose of translation we should use the identifiers of translation. But for a different purpose we should use identifiers of another kind for qualification the same data object.

I use data type conversion, for example, the word qany>xt ( qany -- xt ), where qany is a qualified data object. With your approach, I would have to implement a separate conversion method for each new purpose of the data object use.

But, if we have a name token nt, then it is a name token, regardless of the purpose of its use. We always are talking about name token. And any conversion to xt (to the single execution token of a word) is the same, regardless of an external purposes.

Assigning different data type tags to the exact same data object (of the same type) makes data conversion, mapping, and integration significantly harder. It forces us to write unnecessary conversion logic for things that are fundamentally identical.

Could you elaborate your point now?

And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description;

Yes, sure! For this reason, I suggested (comp.lang.forth, 2020-12-09) to formally introduce a separate data symbol sd for a character string ( c-addr u | 0 0 ).

But in such cases as

rec-name ( c-addr u -- translation )
rec-float ( c-addr u -- translation )

although the data type translation is correct, it is too widen, and it does not allow the reader to distinguish this specific recognizer from other recognizers.

So, I prefer to specify the stack diagram (an arrow type) for this recognizer as:

rec-name ( sd -- nt td-nt | 0 )

Note that the fact that ( nt td-nt ) is a subtype of translation follows from the data type relationships (i.e., by definition).

[r1668] 2026-05-19 10:16:07 ruv replies:

proposal - Special memory access words

Then, it makes sense to introduce UB@ as @KrishnaMyneni suggested. Using of UB@ emphasizes that we're working with 8-bit bytes, not characters.

[r1669] 2026-05-19 10:32:48 ruv replies:

requestClarification - Resizing to/from Zero Address Units

Please note, in practice, 0 is not an addr. And we have a proposal to formally exclude zero from the addr data type.

It should probably be stated that if a program relies on resize never returning 0 on success, then it has an environment dependency.

And, in the stack diagrams we should indicate 0 separately from addr. For resize it should be:

( a-addr1|0 u -- a-addr2|0 ior )

Or, a more arrow option:

( a-addr1|0 u\0 -- a-addr2 ior | a-addr1|0 0 -- a-addr2|0 ior ) that is, it may return 0 instead of a-addr only when the new size is 0.

An even more arrow option:

( a-addr1|0 u\0 -- a-addr2 0 | a-addr1|0 0 -- a-addr2|0 0 | a-addr1|0 0 -- x x ior\0 ) that is, if the top output parameter is not 0, other two output parameters are unspecified cells.

[r1670] 2026-05-19 12:10:18 agsb replies:

proposal - word PERFORM

Its exactly: " ' NAME_OF_WORD PERFORM " as: tick places the xt of name_of_word in top of data stack and PERFORM makes a jump/call to the address at top of stack
The use is clear, as above. Do the same what EXECUTE does but using native code assembler, not Forth code, and not only at end of a word

[r1671] 2026-05-19 12:44:52 ruv replies:

proposal - Special memory access words

But then should not we use the names uw>s, ul>s, ux>s instead of w>s, l>s, x>s?

[r1672] 2026-05-19 12:57:49 ruv replies:

requestClarification - Resizing to/from Zero Address Units

If the standard declared that resizing to 0 units must return the a-addr2 equal to 0

Regarding must return 0. It seems, this would make most existing systems non-standard and would complicate some implementations.

[r1673] 2026-05-19 17:28:04 AntonErtl replies:

proposal - Special memory access words

There is no point in ub@, we already have c@, which (with the accepted 1-chars-is-1 proposal) is standardized to do what ub@ would do, on most machines; on machines with wider aus, one might want a b@ (or ub@) that masks the extra bits. I once considered adding such a word (under the name b@), but eventually decided against it. After an earlier draft Leon Wagner had implemented b@ in SwiftForth, but when I apologized for changing my mind, he said that he actually agrees that we don't need b@.

Concerning uw>s etc., there are no conflicts with the name w>s, and there is some existing practice for w>s. Plus, the point of uw@ w>s is that we actually want to load a signed number with this sequence, so the result of the uw@ in this case is a zero-extended signed number that w>s converts into a sign-extended signed number; does uw>s reflect that meaning better than w>s?

[r1674] 2026-05-20 07:38:06 AntonErtl replies:

requestClarification - Resizing to/from Zero Address Units

I have now looked up C23. It says (for all allocation functions it defines):

If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

It says about realloc():

If ptr [the input pointer] is a null pointer, the realloc function behaves like the malloc function for the specified size.

It also says:

Otherwise, [...] if the size [the input parameter for the new size] is zero, the behavior is undefined.

That sounds pretty idiotic and contradicts the general guarantee; interestingly, for malloc(), C23 does not undefine the result if size is zero.

POSIX-2024 gives some additional guarantees, but they are marked as obsolescent, so it's not a good idea to take these as inspiration for future Forth standards.

I think that if we want to say anything about the behaviour if u=0, it should be the general guarantee of C23.

What we should be adding to resize is the guarantee that realloc() makes when ptr is a null pointer.

[r1675] 2026-05-20 08:57:24 ruv replies:

proposal - Special memory access words

the point of uw@ w>s is that we actually want to load a signed number with this sequence, so the result of the uw@ in this case is a zero-extended signed number that w>s converts into a sign-extended signed number; does uw>s reflect that meaning better than w>s?

This word is similar to other type conversion words d>s and f>s. Note that "s" in these words denotes "signed single-cell integer number".

In the word name w>s, "w" would denote a singed 16-bit integer number in native byte order, and we convert it to a signed single-cell integer number. And uw@ would mean that we read 16-bit without interpretation. Looks good.

Should not we change wbe to uwbe? Rationale: the prefix uw would better emphasize that the input parameter is a bit pattern (without other interpretation).

Then, the sequence uw@ uwbe w>s means that we interpret the read value as 16-bit signed number in big-endian order (network order). And the sequence uw@ uwbe means that we interpret the read value as 16-bit bit number in big-endian order.

The words uw! and w! would be synonyms due to sign encoding, so only one of them is sufficient. I would prefer uw!, because it should be used after change byte order: ... uwbe uw! (in implementing of network protocols), and because it better matches uw@.

[r1676] 2026-06-01 01:25:49 ruv replies:

proposal - Clarify FIND, more classic approach

Author

Ruv

Change Log

(the latest at the bottom)

2019-10-08: Initial version

2020-08-28: Avoid ambiguous clause "xt is the execution token for name" in the case of a word with non default interpretation semantics.

2021-04-18: Allow to return the different xt for any definition. More tight meaning of n in interpretation state. Avoid "implementation-dependent definition" and make the wording simpler.

2021-05-06: Correct meaning of n in interpretation state: iff n is -1, then xt identifies the execution semantics for name. Eliminate the "default interpretation semantics" notion from the normative part.

2026-05-31: Major update. Describe problems. Simplify normative text using the updated execution semantics term description. Update the glossary entry for search-wordlist.

Problem

The proposal [251] Clarification for execution token already addresses the problems related to the lack of ambigous conditions in find and search-wordlist.

The remaining problems concern cases where find returns different xt values depending on STATE.

The rationale A.6.1.1550 for 6.1.1550 FIND explains that a word may exist in two versions: a compiling version and an interpreting version. This means that each version of such a word has its own execution token that identifies its own semantics, and the phrase "its execution token" may refer to one of these versions depending on STATE. However, the normative parts of the standard imply that a standard word has at most one execution token, which identifies the execution semantics of the word. So, the wording in the find specification leads to confusing.
The phrase "if the definition is immediate" is misleading because, according to the rationale, it may refer to different versions of the word depending on STATE, but the normative parts of the standard does not reflect this conception.

Despite find-name has been standardized, it is still worth clarifying the semantics of find as find is provided more broadly than find-name and is usually implemented in new Forth systems (where the standard is used as the reference). For example, Forth implementations hosted on GitHub provide find more than twice as often as find-name.

findin 136 files (at the date)
find-name in 66 files (at the date)

It should be noted that incompatibilities caused by the mentioned problems can only occur on Forth systems where find depends on STATE, while on most Forth systems find does not depend on STATE.

In turn, a problem with search-wordlist can arise in cases where find for the same word depends on STATE. In some such cases the top output paramenter of search-wordlist is 1, but the word is not immediate (i.e., performing its execution semantics in compilation state does not perform the compilation semantics for the word).

Note: if a word is immediate, performing its execution semantics in compilation state performs the compilation semantics for the word.

Solution

Update the glossary entry for find. Update and harmonize with find the glossary entry for search-wordlist (see-also my comment).

Avoid referring to the execution token of the compiling version of a word (if any) as the execution token of the word ("its execution token").
Avoid using the term "immediate". Instead, specify how to perform the compilation semantics for the word.

Proposal

Update `find` in the Search-Order word set

In the glossary entry 16.6.1.1550 FIND (in the optional Search-Order word set), remove the semantic description, except the "See also" sub-section.

Rationale

This glossary entry duplicates the glossary entry for core find, with only one difference: it mentions the search order. However, this is no longer necessary, as the term "find" is now updated by the Search-Order word set, per the proposal [115] Remove the “rules of FIND” accepted in 2020.

The glossary entry itself should be kept to contain the reference implementation E.16.6.1.1550 FIND from Annex E: Reference Implementations and the test F.16.6.1.1550 FIND from Annex F: Test Suite.

See also: the comment [r1372] by Anton Ertl on 2024-11-25.

Note: If we remove the glossary entry 16.6.1.1550, we should also remove the corresponding reference implementation and test, but they are useful since find is actually indirectly updated by the Search-Order word set through updating the definition of the term "find".

Update `find` in the Core word set

In the glossary entry 6.1.1550 FIND, replace the semantic description with the following:

( c-addr -- c-addr 0 | xt n )

Find a named Forth definition whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero. Otherwise, return the execution token xt and n, which is either -1 or 1.

For a given string, the values returned while compiling may differ from those returned while interpreting.

If a definition is found, the following conditions shall be met:

If interpreting, xt is the execution token of the found definition, otherwise the relation between xt and the found definition is implementation dependent.
If n is -1, appending the execution semantics identified by xt to the current definition performs the compilation semantics for the found definition.
If compiling and n is 1, then:
- Executing xt in compilation state performs the compilation semantics for the found definition.
- An ambiguous condition exists if xt is executed in interpretation state and at least one of the following conditions is true:
  - a) interpretation semantics for the found definition are undefined by this standard;
  - b) xt is not the execution token for the found definition.

Note. A definition may be found while compiling but not found while interpreting.

Rationale

There is no need to repeat the ambiguous conditions declared in 3.4.3.1 Execution semantics (the updated version).

Update rationale for `find` in Core word set

In the section A.6.1.1550 FIND, add the following paragraphs at the end:

According to the rules for the values returned by find, the following conditions are met.

If n is always -1 for a word (regardless of STATE), then xt always identifies the same semantics.
For an ordinary word in a single-xt system, n is always -1 and xt is always the same (regardless of STATE).
For an ordinary word in a dual-xt system, n is -1 while interpreting, but may be 1 while compiling (in which case xt changes).
For an immediate word, n is always 1, xt may change in a dual-xt system (but typically it is always the same).
For a word with defined interpretation semantics and special compilation semantics (like to and s") in a dual-xt system, n is always 1 and xt may change depending on STATE.

Update `search-wordlist`

In the glossary entry 16.6.1.2192 SEARCH-WORDLIST, replace the semantic description with the following:

( c-addr u wid -- 0 | xt 1 | xt -1 )

Find a named Forth definition whose name matches the character string identified by ( c-addr u ) in the word list identified by wid.

If no such definition is found, return zero; otherwise return one of the other two options, where:

xt is the execution token for the found definition;
the top output parameter is minus-one (-1) if appending the execution semantics identified by xt to the current definition performs the compilation semantics for the found definition; otherwise the top output parameter is one (1).

Rationale

There is no need to repeat the ambiguous conditions declared in 3.4.3.1 Execution semantics (the updated version).

Update rationale for `search-wordlist`

In the section A.6.1.2192 SEARCH-WORDLIST, add the following paragraphs at the end:

If the found definition is an immediate word, then the top output parameter is 1.

However, if the top output parameter is 1, the found definition is not necessarily an immediate word, since it may be a word (not a user-defined word in a standard program) whose compilation semantics are implemented using another definition.

If and only if the top output parameter is -1, the found definition is an ordinary word (a word with default interpretation semantics and default compilation semantics).

Consequences

All classic Forth systems comply with this change.

Some dual-xt Forth systems provide an implementation for find that is not comply with this change. They should be updated to fix find or remove it.

Testing

See find.test.fth.

[r1677] 2026-06-03 13:06:38 KrishnaMyneni replies:

proposal - Special memory access words

I agree with Anton that UW>S is not needed since the W in W>S just indicates a word type of either signed or unsigned. For an unsigned word on the stack, W>S will sign extend it, and for a sign-extended word on the stack, W>S will have no effect.

[r1678] 2026-06-11 15:40:42 ruv replies:

proposal - Non parsing CREATE

But if we want to standardize the pure postfix variant of create, why don't we standardize the postfix variants of other defining words?

Complementing :, there is :noname for create a nameless colon-definition.

However, there are no analogous methods for creating nameless definitions of the same kind as those created by the words create and defer.

I would suggest the following words (the names are tentative):

create-xt ( -- xt1 )
- xt1 Execution: ( -- a-addr.data-field )
  a-addr.data-filed is the address of the data field associated with xt1. The execution semantics of xt1 may be extended by using does! ( xt2 xt1 -- ).
defer-xt ( -- xt )
- xt1 Execution: ( any1 -- any2 )
  Execute the xt that xt1 is set to execute. If xt1 has not been set to execute an xt, an exception -82 is thrown.
enlist ( xt1 sd.name -- )
- Place a named definition into the compilation word list; the definition's name matches the character string sd.name, and the definition's execution semantics are equivalent to the execution semantics identified by xt1.
- Rationale:
  - It does not guarantee that that the execution token of a new definition is the same as xt1, since in some implementations xt is a subtype of nt (or even these types are equivalent), that is, the word name> is a nop.
  - It might allow to place into the compilation word list a named definition whose name is an empty string or a string containing whitespace or control characters. Then, such a definition can be found using find-name-in or search-wordlist, but the Forth text interpreter will not find it.

Digest #330 2026-06-14

Contributions

requestClarification - Wording in 16.3.3 Find definition names

Replies

proposal - Special memory access words

proposal - Recognizer committee proposal 2025-09-11

requestClarification - Resizing to/from Zero Address Units

requestClarification - Resizing to/from Zero Address Units

proposal - Recognizer committee proposal 2025-09-11

proposal - Special memory access words

requestClarification - Resizing to/from Zero Address Units

proposal - word PERFORM

proposal - Special memory access words

requestClarification - Resizing to/from Zero Address Units

proposal - Special memory access words

requestClarification - Resizing to/from Zero Address Units

proposal - Special memory access words

proposal - Clarify FIND, more classic approach

Author

Change Log

Problem

Solution

Proposal

Update find in the Search-Order word set

Rationale

Update find in the Core word set

Rationale

Update rationale for find in Core word set

Update search-wordlist

Rationale

Update rationale for search-wordlist

Consequences

Testing

proposal - Special memory access words

proposal - Non parsing CREATE

Update `find` in the Search-Order word set

Update `find` in the Core word set

Update rationale for `find` in Core word set

Update `search-wordlist`

Update rationale for `search-wordlist`