,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2026-06-13 08:00:01 ruv wrote: | requestClarification - Wording in 16.3.3 Find definition names | see: https://forth-standard.org/standard/search#contribution-430 `------------------------------------------ [16.3.3 Finding definition names](https://forth-standard.org/standard/search#search:find) says: > When searching __a word list__ for a definition name, the system shall search __each word list__ from its last definition to its first. The bolded passages (emphasis mine) appear to contradict each other. I think, one of the following was intended: - When searching a word list for a definition name, the system shall search __the word list__ from its last definition to its first. - When searching __the search order__ for a definition name, the system shall search each word list from its last definition to its first. Which one? (I'm inclined to the first one) ,---------. | Replies | `---------´ ,------------------------------------------ | 2026-05-19 05:29:49 AntonErtl replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1663 `------------------------------------------ Concerning consistency, in cases where there are signed and unsigned versions of words, the unsigned version has the prefix `u` and the signed version usually has no prefix, e.g., `u<` `<` `um*` `m*`. `C@` has no prefix only because there is no signed version. Anyway, consistency is not paramount, avoiding conflicts with existing practice is. So I lean towards naming the new fetching words `uw@ ul@ ux@`. ,------------------------------------------ | 2026-05-19 05:39:39 AntonErtl replies: | proposal - Recognizer committee proposal 2025-09-11 | see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1664 `------------------------------------------ "Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about. And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description; it also helps in thinking when you think about it as one unit and not a collection of smaller data. And in order to have these advantages, we introduce the name. Actually, this whole paragraph is an explanation of why we use abstractions. ,------------------------------------------ | 2026-05-19 05:51:04 AntonErtl replies: | requestClarification - Resizing to/from Zero Address Units | see: https://forth-standard.org/standard/memory/RESIZE#reply-1665 `------------------------------------------ That's a good point. AFAIK POSIX and C have tightened the requirements (i.e., given more guarantees to users) for `malloc()` and `realloc()`. It may be a good idea to look at what standard C guarantees now and maybe tighten `allocate` and `resize`, too. As for existing practice, many Forth implementations call `malloc()` and `realloc()`, respectively, so they implement these guarantees already. ,------------------------------------------ | 2026-05-19 05:51:40 AntonErtl replies: | requestClarification - Resizing to/from Zero Address Units | see: https://forth-standard.org/standard/memory/RESIZE#reply-1666 `------------------------------------------ But who is going to write the proposal? ,------------------------------------------ | 2026-05-19 10:10:42 ruv replies: | proposal - Recognizer committee proposal 2025-09-11 | see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1667 `------------------------------------------ > "Data object" and "data type" are generic concepts that are not specific to the recognizer proposal. > The reason for using specific concept and type names is such that the users of the standard know which data object and which data type we are talking about. In this context, by qualification I mean assigning/adding a data type tag. If I understand your correctly, your idea is that we can qualify the same data object in different ways for different purposes. And for the purpose of translation we should use the identifiers of translation. But for a different purpose we should use identifiers of another kind for qualification the same data object. I use data type conversion, for example, the word `qany>xt ( qany -- xt )`, where `qany` is a qualified data object. With your approach, I would have to implement a separate conversion method for each new purpose of the data object use. But, if we have a name token **nt**, then it is a name token, regardless of the purpose of its use. We always are talking about name token. And any conversion to xt (to the single execution token of a word) is the same, regardless of an external purposes. Assigning different data type tags to the exact same data object (of the same type) makes data conversion, mapping, and integration significantly harder. It forces us to write unnecessary conversion logic for things that are fundamentally identical. Could you elaborate your point now? ----- > And the reason for describing one data type by its name and not always enumerating all its components (even when the data type is not fully opaque, as in this case) is the convenience of a more concise description; Yes, sure! For this reason, I suggested ([`comp.lang.forth`, 2020-12-09](https://groups.google.com/g/comp.lang.forth/c/NUYqSpNah7E/m/G4Run1FGCAAJ)) to formally introduce a separate data symbol `sd` for a character string ( c-addr u | 0 0 ). But in such cases as - `rec-name ( c-addr u -- translation )` - `rec-float ( c-addr u -- translation )` although the data type _translation_ is correct, it is **too widen**, and it does not allow the reader to distinguish this specific recognizer from other recognizers. So, I prefer to specify the stack diagram (an [arrow type](https://en.wikipedia.org/wiki/Function_type)) for this recognizer as: - `rec-name ( sd -- nt td-nt | 0 )` Note that the fact that `( nt td-nt )` is a subtype of _translation_ follows from the data type relationships (i.e., by definition). ,------------------------------------------ | 2026-05-19 10:16:07 ruv replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1668 `------------------------------------------ Then, it makes sense to introduce `UB@` as @KrishnaMyneni suggested. Using of `UB@` emphasizes that we're working with 8-bit bytes, not characters. ,------------------------------------------ | 2026-05-19 10:32:48 ruv replies: | requestClarification - Resizing to/from Zero Address Units | see: https://forth-standard.org/standard/memory/RESIZE#reply-1669 `------------------------------------------ Please note, in practice, `0` is not an _addr_. And we have a [proposal](https://forth-standard.org/proposals/exclude-zero-from-the-data-types-that-are-identifiers?hideDiff#reply-1533) to formally exclude zero from the _addr_ data type. It should probably be stated that if a program relies on `resize` never returning 0 on success, then it has an _environment dependency_. And, in the stack diagrams we should indicate 0 separately from addr. For `resize` it should be: - `( a-addr1|0 u -- a-addr2|0 ior )` Or, a more arrow option: - `( a-addr1|0 u\0 -- a-addr2 ior | a-addr1|0 0 -- a-addr2|0 ior )` that is, it may return `0` instead of `a-addr` only when the new size is `0`. An even more arrow option: - `( a-addr1|0 u\0 -- a-addr2 0 | a-addr1|0 0 -- a-addr2|0 0 | a-addr1|0 0 -- x x ior\0 )` that is, if the top output parameter is not `0`, other two output parameters are unspecified cells. ,------------------------------------------ | 2026-05-19 12:10:18 agsb replies: | proposal - word PERFORM | see: https://forth-standard.org/proposals/word-perform#reply-1670 `------------------------------------------ 1. Its exactly: " ' NAME_OF_WORD PERFORM " as: tick places the xt of name_of_word in top of data stack and PERFORM makes a jump/call to the address at top of stack 2. The use is clear, as above. Do the same what EXECUTE does but using native code assembler, not Forth code, and not only at end of a word ,------------------------------------------ | 2026-05-19 12:44:52 ruv replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1671 `------------------------------------------ But then should not we use the names `uw>s`, `ul>s`, `ux>s` instead of `w>s`, `l>s`, `x>s`? ,------------------------------------------ | 2026-05-19 12:57:49 ruv replies: | requestClarification - Resizing to/from Zero Address Units | see: https://forth-standard.org/standard/memory/RESIZE#reply-1672 `------------------------------------------ > If the standard declared that resizing to 0 units must return the a-addr2 equal to 0 Regarding **must return 0**. It seems, this would make most existing systems non-standard and would complicate some implementations. ,------------------------------------------ | 2026-05-19 17:28:04 AntonErtl replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1673 `------------------------------------------ There is no point in `ub@`, we already have `c@`, which (with the accepted 1-chars-is-1 proposal) is standardized to do what `ub@` would do, on most machines; on machines with wider aus, one might want a `b@` (or `ub@`) that masks the extra bits. I once considered adding such a word (under the name `b@`), but eventually decided against it. After an earlier draft Leon Wagner had implemented `b@` in SwiftForth, but when I apologized for changing my mind, he said that he actually agrees that we don't need `b@`. Concerning `uw>s` etc., there are no conflicts with the name `w>s`, and there is some existing practice for `w>s`. Plus, the point of `uw@ w>s` is that we actually want to load a signed number with this sequence, so the result of the `uw@` in this case is a zero-extended signed number that `w>s` converts into a sign-extended signed number; does `uw>s` reflect that meaning better than `w>s`? ,------------------------------------------ | 2026-05-20 07:38:06 AntonErtl replies: | requestClarification - Resizing to/from Zero Address Units | see: https://forth-standard.org/standard/memory/RESIZE#reply-1674 `------------------------------------------ I have now looked up [C23](https://open-std.org/JTC1/SC22/WG14/www/docs/n3854.pdf). It says (for all allocation functions it defines): > If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. It says about `realloc()`: > If ptr [the input pointer] is a null pointer, the realloc function behaves like the malloc function for the specified size. It also says: > Otherwise, [...] if the size [the input parameter for the new size] is zero, the behavior is undefined. That sounds pretty idiotic and contradicts the general guarantee; interestingly, for `malloc()`, C23 does not undefine the result if size is zero. POSIX-2024 gives some additional guarantees, but they are marked as obsolescent, so it's not a good idea to take these as inspiration for future Forth standards. I think that if we want to say anything about the behaviour if u=0, it should be the general guarantee of C23. What we should be adding to `resize` is the guarantee that `realloc()` makes when ptr is a null pointer. ,------------------------------------------ | 2026-05-20 08:57:24 ruv replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1675 `------------------------------------------ > the point of `uw@ w>s` is that we actually want to load a signed number with this sequence, so the result of the `uw@` in this case is a zero-extended signed number that `w>s` converts into a sign-extended signed number; does `uw>s` reflect that meaning better than `w>s`? This word is similar to other type conversion words `d>s` and `f>s`. Note that "s" in these words denotes "signed single-cell integer number". In the word name `w>s`, "w" would denote a singed 16-bit integer number in native byte order, and we convert it to a signed single-cell integer number. And `uw@` would mean that we read 16-bit without interpretation. Looks good. Should not we change `wbe` to `uwbe`? Rationale: the prefix `uw` would better emphasize that the input parameter is a bit pattern (without other interpretation). Then, the sequence `uw@ uwbe w>s` means that we *interpret* the read value as 16-bit signed number in big-endian order (*network order*). And the sequence `uw@ uwbe` means that we *interpret* the read value as 16-bit bit number in big-endian order. The words `uw!` and `w!` would be synonyms due to sign encoding, so only one of them is sufficient. I would prefer `uw!`, because it should be used after change byte order: ` ... uwbe uw!` (in implementing of network protocols), and because it better matches `uw@`. ,------------------------------------------ | 2026-06-01 01:25:49 ruv replies: | proposal - Clarify FIND, more classic approach | see: https://forth-standard.org/proposals/clarify-find-more-classic-approach#reply-1676 `------------------------------------------ ## Author Ruv ## Change Log (the latest at the bottom) 2019-10-08: Initial version 2020-08-28: Avoid ambiguous clause "xt is the execution token for name" in the case of a word with non default interpretation semantics. 2021-04-18: Allow to return the different xt for any definition. More tight meaning of _n_ in interpretation state. Avoid "implementation-dependent definition" and make the wording simpler. 2021-05-06: Correct meaning of _n_ in interpretation state: [iff](https://en.wikipedia.org/wiki/If_and_only_if) _n_ is `-1`, then _xt_ identifies the execution semantics for _name_. Eliminate the "default interpretation semantics" notion from the normative part. 2026-05-31: Major update. Describe problems. Simplify normative text using the updated _execution semantics_ term description. Update the glossary entry for `search-wordlist`. ## Problem The proposal [[251] Clarification for execution token](https://forth-standard.org/proposals/clarification-for-execution-token?hideDiff#reply-1572) already addresses the problems related to the lack of ambigous conditions in `find` and `search-wordlist`. The remaining problems concern cases where `find` returns different xt values depending on STATE. 1. The rationale [A.6.1.1550](https://forth-standard.org/standard/rationale#rat:core:FIND) for [6.1.1550 `FIND`](https://forth-standard.org/standard/core/FIND) explains that a word **may exist in two versions: a compiling version and an interpreting version**. This means that each version of such a word has its own execution token that identifies its own semantics, and the phrase "its execution token" may refer to one of these versions depending on STATE. However, the normative parts of the standard imply that a standard word has at most one execution token, which identifies *the execution semantics* of the word. So, the wording in the `find` specification leads to confusing. 2. The phrase "if the definition is immediate" is misleading because, according to the rationale, it may refer to different versions of the word depending on STATE, but the normative parts of the standard does not reflect this conception. Despite `find-name` has been standardized, it is still worth clarifying the semantics of `find` as `find` is provided more broadly than `find-name` and is usually implemented in new Forth systems (where the standard is used as the reference). For example, Forth implementations hosted on GitHub provide `find` more than twice as often as `find-name`. - `find`in [136 files](https://github.com/search?q=language%3Aforth+NOT+is%3Afork+%2F%3A+find+%2F&type=code) (at the date) - `find-name` in [66 files](https://github.com/search?q=language%3Aforth+NOT+is%3Afork+%2F%3A+find-name+%2F&type=code) (at the date) It should be noted that incompatibilities caused by the mentioned problems can only occur on Forth systems where `find` depends on STATE, while on most Forth systems `find` does not depend on STATE. In turn, a problem with `search-wordlist` can arise in cases where `find` for the same word depends on STATE. In some such cases the top output paramenter of `search-wordlist` is `1`, but the word is not immediate (i.e., performing its execution semantics in compilation state does not perform the compilation semantics for the word). Note: if a word is immediate, performing its execution semantics in compilation state performs the compilation semantics for the word. ## Solution Update the glossary entry for `find`. Update and harmonize with `find` the glossary entry for `search-wordlist` (see-also [my comment](https://forth-standard.org/standard/search/SEARCH-WORDLIST#reply-1246)). 1. Avoid referring to the execution token of the compiling version of a word (if any) as _the execution token of **the word**_ ("its execution token"). 2. Avoid using the term "immediate". Instead, specify how to perform the compilation semantics for the word. ## Proposal ### Update `find` in the Search-Order word set In the glossary entry [16.6.1.1550 `FIND`](https://forth-standard.org/standard/search/FIND) (in the optional [Search-Order word set](https://forth-standard.org/standard/search)), remove the [semantic description](https://forth-standard.org/standard/notation#subsubsection.2.2.4.2), except the "See also" sub-section. #### Rationale This glossary entry duplicates the glossary entry for [core `find`](https://forth-standard.org/standard/core/FIND), with only one difference: it mentions the search order. However, this is no longer necessary, as the term "find" is now updated by the Search-Order word set, per the proposal [[115] Remove the “rules of FIND”](https://forth-standard.org/proposals/remove-the-rules-of-find-?hideDiff#reply-900) accepted in 2020. The glossary entry itself should be kept to contain the reference implementation [E.16.6.1.1550 `FIND`](https://forth-standard.org/standard/implement#imp:search:FIND) from [Annex E: Reference Implementations](https://forth-standard.org/standard/implement) and the test [F.16.6.1.1550 `FIND`](https://forth-standard.org/standard/testsuite#test:search:FIND) from [Annex F: Test Suite](https://forth-standard.org/standard/testsuite). See also: the [comment [r1372]](https://forth-standard.org/standard/search/FIND#reply-1372) by Anton Ertl on 2024-11-25. Note: If we remove the glossary entry 16.6.1.1550, we should also remove the corresponding reference implementation and test, but they are useful since `find` is actually indirectly updated by the Search-Order word set through updating the definition of the term "find". ### Update `find` in the Core word set In the glossary entry [6.1.1550 `FIND`](https://forth-standard.org/standard/core/FIND), replace the [semantic description](https://forth-standard.org/standard/notation#subsubsection.2.2.4.2) with the following: --- _( c-addr -- c-addr 0 | xt n )_ Find a named Forth definition whose name matches the counted string at _c-addr_. If the definition is not found, return _c-addr_ and zero. Otherwise, return the execution token _xt_ and _n_, which is either `-1` or `1`. For a given string, the values returned while compiling may differ from those returned while interpreting. If a definition is found, the following conditions shall be met: - If interpreting, _xt_ is *the execution token* of the found definition, otherwise the relation between _xt_ and the found definition is implementation dependent. - If _n_ is `-1`, appending the execution semantics identified by _xt_ to the current definition performs the compilation semantics for the found definition. - If compiling and _n_ is `1`, then: - Executing _xt_ in compilation state performs the compilation semantics for the found definition. - An ambiguous condition exists if _xt_ is executed in interpretation state and at least one of the following conditions is true: - a) interpretation semantics for the found definition are undefined by this standard; - b) _xt_ is not the execution token for the found definition. Note. A definition may be found while compiling but not found while interpreting. See also: [3.4.2 Finding definition names](https://forth-standard.org/standard/usage#usage:find), [3.1.3.5 Execution tokens](https://forth-standard.org/standard/usage#subsubsection.3.1.3.5), [3.4.3.1 Execution semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.1), [3.4.3.2 Interpretation semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.2), [3.4.3.3 Compilation semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.3), [A.6.1.1550 `FIND`](https://forth-standard.org/standard/rationale#rat:core:FIND), [A.3.4.3.2 Interpretation semantics](https://forth-standard.org/standard/rationale#paragraph.A.3.4.3.2). ----- #### Rationale There is no need to repeat the ambiguous conditions declared in [3.4.3.1 Execution semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.1) (the [updated](https://forth-standard.org/proposals/clarification-for-execution-token?hideDiff#reply-1572) version). ### Update rationale for `find` in Core word set In the section [A.6.1.1550 `FIND`](https://forth-standard.org/standard/rationale#rat:core:FIND), add the following paragraphs at the end: --- According to the rules for the values returned by `find`, the following conditions are met. - If _n_ is always `-1` for a word (regardless of STATE), then _xt_ always identifies the same semantics. - For an ordinary word in a **single-xt system**, _n_ is always `-1` and _xt_ is always the same (regardless of STATE). - For an ordinary word in a **dual-xt system**, _n_ is `-1` while interpreting, but may be `1` while compiling (in which case _xt_ changes). - For an immediate word, _n_ is always `1`, _xt_ may change in a **dual-xt system** (but typically it is always the same). - For a word with defined interpretation semantics and special compilation semantics (like `to` and `s"`) in a **dual-xt system**, _n_ is always `1` and _xt_ may change depending on STATE. ### Update `search-wordlist` In the glossary entry [16.6.1.2192 `SEARCH-WORDLIST`](https://forth-standard.org/standard/search/SEARCH-WORDLIST), replace the [semantic description](https://forth-standard.org/standard/notation#subsubsection.2.2.4.2) with the following: --- _( c-addr u wid -- 0 | xt 1 | xt -1 )_ Find a named Forth definition whose name matches the character string identified by _( c-addr u )_ in the word list identified by _wid_. If no such definition is found, return zero; otherwise return one of the other two options, where: - _xt_ is *the execution token* for the found definition; - the top output parameter is minus-one (`-1`) if appending the execution semantics identified by _xt_ to the current definition performs the compilation semantics for the found definition; otherwise the top output parameter is one (`1`). See also: [3.4.2 Finding definition names](https://forth-standard.org/standard/usage#usage:find), [3.1.3.5 Execution tokens](https://forth-standard.org/standard/usage#subsubsection.3.1.3.5), [3.4.3.1 Execution semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.1), [A.6.1.2192 `SEARCH-WORDLIST`](https://forth-standard.org/standard/rationale#rat:search:SEARCH-WORDLIST). ----- #### Rationale There is no need to repeat the ambiguous conditions declared in [3.4.3.1 Execution semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.1) (the [updated](https://forth-standard.org/proposals/clarification-for-execution-token?hideDiff#reply-1572) version). ### Update rationale for `search-wordlist` In the section [A.6.1.2192 `SEARCH-WORDLIST`](https://forth-standard.org/standard/rationale#rat:search:SEARCH-WORDLIST), add the following paragraphs at the end: ----- If the found definition is an immediate word, then the top output parameter is `1`. However, if the top output parameter is `1`, the found definition is not necessarily an immediate word, since it may be a word (not a user-defined word in a standard program) whose compilation semantics are implemented using another definition. If and only if the top output parameter is `-1`, the found definition is an ordinary word (a word with default interpretation semantics and default compilation semantics). See also: [A.6.1.1550 `FIND`](https://forth-standard.org/standard/rationale#rat:core:FIND), [3.4.3.2 Interpretation semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.2), [3.4.3.3 Compilation semantics](https://forth-standard.org/standard/usage#subsubsection.3.4.3.3). ----- ## Consequences All classic Forth systems comply with this change. Some dual-xt Forth systems provide an implementation for `find` that is not comply with this change. They should be updated to fix `find` or remove it. ## Testing See [find.test.fth](https://gist.github.com/ruv/3c75b48f405ecd8842d8024f1dcd0692#file-find-test-fth). ,------------------------------------------ | 2026-06-03 13:06:38 KrishnaMyneni replies: | proposal - Special memory access words | see: https://forth-standard.org/proposals/special-memory-access-words#reply-1677 `------------------------------------------ I agree with Anton that UW>S is not needed since the W in W>S just indicates a word type of either signed or unsigned. For an unsigned word on the stack, W>S will sign extend it, and for a sign-extended word on the stack, W>S will have no effect. ,------------------------------------------ | 2026-06-11 15:40:42 ruv replies: | proposal - Non parsing CREATE | see: https://forth-standard.org/proposals/non-parsing-create#reply-1678 `------------------------------------------ > But if we want to standardize the pure postfix variant of `create`, why don't we standardize the postfix variants of other defining words? Complementing `:`, there is `:noname` for create a nameless colon-definition. However, there are no analogous methods for creating nameless definitions of the same kind as those created by the words `create` and `defer`. I would suggest the following words (the names are tentative): - `create-xt ( -- xt1 )` - _xt1_ Execution: `( -- a-addr.data-field )` _a-addr.data-filed_ is the address of *the data field* associated with _xt1_. The execution semantics of _xt1_ may be extended by using `does! ( xt2 xt1 -- )`. - `defer-xt ( -- xt )` - _xt1_ Execution: `( any1 -- any2 )` Execute the _xt_ that _xt1_ is set to execute. If _xt1_ has not been set to execute an _xt_, an exception -82 is thrown. - `enlist ( xt1 sd.name -- )` - Place a named definition into the compilation word list; the definition's name matches the character string _sd.name_, and the definition's execution semantics are equivalent to the execution semantics identified by _xt1_. - Rationale: - It does not guarantee that that *the execution token* of a new definition is the same as _xt1_, since in some implementations xt is a subtype of nt (or even these types are equivalent), that is, the word `name>` is a nop. - It might allow to place into the compilation word list a named definition whose name is an empty string or a string containing whitespace or control characters. Then, such a definition can be found using `find-name-in` or `search-wordlist`, but the Forth text interpreter will not find it.