,---------------.
| Contributions |
`---------------´


,------------------------------------------
| 2025-09-12 03:56:07  AntonErtl  wrote:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#contribution-412
`------------------------------------------
The committee has found consensus on the following words.  I was asked
to write it up as a proposal, quickly.  Due to time limits this is
just a skeleton, and will not make sense to people new to the
discussion.  A more fleshed-out proposal will be submitted later.

## Translations and text-interpretation

Recognizers produce translations.  The text interpreter (and other
users, such as `postpone`), removes the translation from the stack(s),
and then either performs the interpreting run-time, compiling
run-time, or postponing run-time.

Unless otherwise specified the compiling run-time compiles the
interpreting run-time.  The postponing run-time compiles the compiling
run-time.


## Types

**translation**: The result of a recognizer; the input of interpreting,
compiling, and postponing; it's a semi-opaque type that consists of a
translation token at the top of the data stack and additional
data on various stacks below.

**translation token**: Single-cell item that identifies a certain
  translation.  (This has formerly been called a rectype.)

## Words

### `rec-name` ( c-addr u -- translation )

If c-addr u is the name of a visible local or a visible named word,
translation represents the text-interpretation semantics
(interpreting, compiling, postponing) of that word (see
`translate-name`).  (formerly called rec-nt).  If not, translation is
`translate-none`.

### `rec-number` ( c-addr u -- translation )

If c-addr u is a single or double number (without or with prefix), or
a character, all as described in section ..., translation represents
pushing that number at run-time (see `translate-cell`,
`translate-dcell`).  If not, translation is `translate-none`.

### `rec-float` ( c-addr u -- translation )

If c-addr u is a floating-point number, as described in section ...,
translation represents pushing that number at run-time (see `translate-float`). If c-addr u
has the syntax of a double number without prefix according to section
..., and it correspond to the floating-point number r corresponding to
that string according to section ..., translation may represent
pushing r at run-time.  If c-addr u is not recognized as a
floating-point number, translation is `translate-none`.

### `rec-none` ( c-addr u -- translation )

This word does not recognize anything.  For its translation, see
`translate-none`.  (formerly known as notfound and r:fail)

### recs ( -- )

Print the recognizers in the recognizer sequence in `rec-forth`, the
first searched recognizer leftmost. (formerly known as .recognizers)

### rec-forth ( c-addr u -- translation )

This is a deferred word that contains the recognizer (sequence) that
is used by the Forth text interpreter.  (formerly forth-recognize)

### `rec-sequence:` ( xtu .. xt1 u "name" -- )

Define a recognizer sequence "name" containing u recognizers
represented by their xts.  If `set-recs` is implemented, the sequence
must be able to accomodate at least 16 recognizers.

name execution: ( c-addr u -- translation )

Execute xt1; if the resulting translation is the result of
`translate-none`, restore the data stack to ( c-addr u -- ) and try
the next xt.  If there is no next xt, remove ( c-addr u -- ) and
perform `translate-none`.

### `translate-none` ( -- translation )

(formerly r:fail or notfound)

translation interpreting run-time: ( ... -- )

`-13 throw`

translation compiling run-time: ( ... --  )

`-13 throw`

translation postponing run-time: ( ... --  )

`-13 throw`

### `translate-cell` ( x -- translation )

(formerly translate-num)

translation interpreting run-time: ( -- x )

### `translate-dcell` ( xd -- translation )

(formerly translate-dnum)

translation interpreting run-time: ( -- xd )

### `translate-float` ( r -- translation )

translation interpreting run-time: ( -- r )

### `translate-name` ( nt -- translation )

(formerly translate-nt)

translation interpreting run-time: ( ... -- ... )

Perform the interpretation semantics of nt.

translation compiling run-time: ( ... -- ... )

Perform the compilation semantics of nt.

### `translate:` ( xt-int xt-comp xt-post "name" -- )

Define "name" (formerly rectype:)

"name" exection: ( i\*x -- translation )

translation interpreting run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-int.

translation compiling run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-comp.

translation postponing run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-post.

### `get-recs` ( xt -- xt_u ... xt_1 u )

xt is the execution token of a recognizer sequence.  xt_1 is the first
recognizer searched by this sequence, xt_u is the last one.

### `set-recs` ( xt_u ... xt_1 u xt -- )

xt is the execution token of a recognizer sequence.  Replace the
contents of this sequence with xt_u ... xt_1, where xt_1 is searched
first, and xt_u is searched last.  Throw ... if u exceeds the number
of elements supported by the recognizer sequence.

## Rationale

(This will also be fleshed out)

The committee has decided not to standardize words
that consume translations for now.  Such words would be useful for
defining a user-defined text interpreter, but the experience with
recognizers has shown that a recognizer-using text interpreter is
flexible enough that it is no longer necessary to write such text
interpreters, and such words are only used internally in the text
interpreter.

However, to give an idea how all this works together, here's the words
that Gforth provides for that purpose:

### `interpreting` ( ... translation -- ... )

For a system-defined translation token, first remove the translation
from the stack(s), then perform the interpreting run-time specified
for the translation token.  For a user-defined translation token,
remove it from the stack and execute its int-xt.

### `compiling` ( ... translation -- ...  )

For a system-defined translation token, first remove the translation
from the stacks, then perform the compiling run-time specified for the
translation token, or, if none is specified, compile the
'interpreting' run-time.  For a user-defined translation token, remove
it from the stack and execute its comp-xt.

### `postponing` ( ... translation --  )

For a system-defined translation token, first consume the translation,
then compile the 'compiling' run-time.  For a user-defined translation
token, remove it from the stack and execute its post-xt.

## Examples

````
s" 123" rec-forth ( translation ) interpreting

: rec-tick ( addr u -- translation )
    over c@ '`' = if
        1 /string find-name dup if
            name>interpret ( xt ) translate-num then
        exit then
    \ 2drop notfound
    rec-none ;

' noop                       ( x -- x )                             \ int-xt
' lit,                       ( compilation: x -- ; run-time: -- x ) \ comp-xt
:noname lit, postpone lit, ; ( postponing: x -- ;  run-time: -- x ) \ post-xt
translate: translate-num
````


,---------.
| Replies |
`---------´


,------------------------------------------
| 2025-09-08 10:48:10  EricBlake  replies:
| referenceImplementation - CMOVE implementation based on MOVE
| see: https://forth-standard.org/standard/string/CMOVE#reply-1529
`------------------------------------------
Each of the solutions proposed so far has relied on non-standard (but presumably obvious) words: [211] and [r722] use `u>=`, [r725] uses `not`, [r726] uses `-rot`, [r727] uses `?exit` and `umin`. I'm not sure if that means more words should be standardized, or if we just need to be more careful in writing examples.


,------------------------------------------
| 2025-09-08 12:13:22  EricBlake  replies:
| referenceImplementation - Possible Reference Implementation
| see: https://forth-standard.org/standard/core/MIN#reply-1530
`------------------------------------------
It's also possible to provide a branchless reference implementation, such as:

```forth
: MIN ( n1 n2 -- n3 ) OVER 2DUP < >R - R> AND + ;
: MAX ( n1 n2 -- n3 ) OVER 2DUP > >R - R> AND + ;
```


,------------------------------------------
| 2025-09-10 08:48:02  AntonErtl  replies:
| proposal - Special memory access words
| see: https://forth-standard.org/proposals/special-memory-access-words#reply-1531
`------------------------------------------
## Author:

M. Anton Ertl

## Change Log:

2025-09-10 additional wording for systems with char-unit-bits>8;
           discussed signed number representation;
           discussed (but did not use) finer typing;
           continue using `c´ prefix for byte access
2024-06-14 initial version

## Problem:

Data coming from or going to a file or another computer often contain
16-bit, 32-bit, and 64-bit integer values that may be signed or
unsigned, may be naturally aligned or not, and may be in big-endian or
little-endian instead of the native byte order.  Architectures tend to
provide convenient instructions for accessing these data, but the
Forth standard does not provide words for that, and synthesizing the
operations from C@ and C! is not just cumbersome, but also leads to
inefficient code.

## Solution:

This proposal targets primarily byte-addressed systems. See the
discussion below about larger address units.

We use the following prefixes:

| prefix | Meaning | informal name |
| ------ | ------- | ------------- |
| `c`    | 8 bits  | Byte          |
| `w`    | 16 bits | Wyde          |
| `l`    | 32 bits | Long          |
| `x`    | 64 bits | eXtended      |

The `l`-prefixed words are not useful on systems with cell size <32
bits, and such systems are therefore expected not to implement them.
Likewise for the `x`-prefixed words and cell size <64 bits.

For the `w` prefix this proposal specifies the following set of words:

`w@` `w!` for unaligned 16-bit memory accesses; `w@` zero-extends.

Right after `w@` or right before `w!` you can adjust the byte order:
`wbe` converts from big-endian to native byte-order and from native to
big-endian byte order. `wle` is the corresponding word for
little-endian byte order.

On fetching signed values can then be sign-extended with `w>s`.
Unsigned values are already in the proper zero-extended form.  On
storing all the target bits are present in the cell, so no extension
is necessary.

These five words allow us to fetch and store big-endian, little-endian
or natively ordered signed and unsigned 16-bit values, with sequences
like:

````
w@           \ 16-bit unaligned unsigned native-order fetch
w@ wbe w>s   \ 16-bit unaligned   signed   big-endian fetch
>r wle r> w! \ 16-bit unaligned         little-endian store
w!           \ 16-bit unaligned          native-order store
````

For the `c` prefix byte order and alignment are not issues, so there
are no words for that.

These words do not work properly if the data does not fit into a cell,
so a 16-bit system would only implement the `b` and `w` words, a
32-bit system only the `b`, `w` and `l` words, and only systems with
cell size >= 64 bits would implement all the words.

## Typical use: (Optional)

````
( c-addr ) l@ lle l>s \ 32-bit unaligned  signed little-endian fetch
( c-addr ) w@         \ 16-bit unaligned unsigned native-order fetch
( n|u c-addr ) >r xbe r> x! \ 64-bit unaligned big-endian   store
( n|u c-addr ) l!           \ 32-bit unaligned native-order store
````

## Discussion

### Previous work

The present proposal can be seen as another take on the problems
attacked with the following proposals.

#### [Memory Access](http://www.forth200x.org/memory-2010-06-26.txt)
Federico de Ceballos (with Stephen Pelc) has proposed a wordset for
solving the same problem by having words like

````
be-w@ \ 16-bit unaligned unsigned big-endian fetch
le-w! \ 16-bit unaligned       little-endian store
````

That would require 6 words `be-w@` `le-w@` `w@` `be-w!` `le-w!` `w!`,
but would still not work for fetching signed values, so you either
need `w>s` to be possibly used after any of the `...w@` words (for a
total of 7 `w` words, but it's still a composing approach), or it
would need a doubling of the `...w@` words (for a total of 9 `w`
words, but now everything is precomposed).

This proposal also includes words like `w,` `walign` `waligned`
`wfield:` discussed below.

This proposal has been met with significant resistance due to the
large number of words proposed.

#### [16-bit memory access](https://forth-standard.org/proposals/16-bit-memory-access#contribution-301)

This proposes `w@` `w!` as working with w-addr addresses that are not
defined in the proposal (but I would expect them to require 16-bit
alignment, but OTOH neither `waligned` nor `walign` are proposed).  No
solution for byte order or sign extension is presented.  The proposal
includes `w,` which requires 16-bit alignment of the data-space
pointer.

#### [32-bit memory operators](https://forth-standard.org/proposals/32-bit-memory-operators#contribution-302)

This is the 32-bit variant (using `l` as prefix) of the "16-bit memory
access" proposal discussed above.

### Efficiency

Does the proposed approach not lead to less efficient code than the
approach of the Memory Access proposal mentioned above?  The more
advanced Forth systems combine code sequences and produce efficient
code for them.  E.g., in the present case, for `l@ lle l>s`
gforth-fast on AMD64 produces:

````
movsx   r8,dword PTR [r8]
````

Simpler systems will indeed be less efficient when such special memory
accesses are performed, but the present proposal proposes fewer words,
which is often more in line with the philosophy behind many simple
systems.

Also, is a lot of time spent accessing input and output data?

### Larger address units

On some systems (in particular on word-addressed hardware) the address
unit is larger than one byte (8 bits).  How can these words work
there?  The only way I can see is to work with a special memory layout
where only the least significant 8 bits of each address unit are used,
and the other bits are ignored on fetching, and are set to 0 on
storing.  The reference implementation of the proposed words can be
used in such a setting.

This memory layout would be used between I/O words that produce or
consume this layout, and the words that use the special memory access
words to fetch from or store to this layout.  For the file words, this
layout could come into play through a file access mode modifier
(similar to `bin`).

To support this scheme, we would specify "bytewise" memory access for
words like `w@` and `w!`, a type b-addr for words that perform such
memory accesses, and some wording in "3.1.3.3 Addresses" about that.
We would also need a word `b@` (for zero-extending the least
significant 8 bits, while `c@` zero-extends possibly bigger units;
`b!` is not needed, `c!` is good enough), and a word `bytewise`, which
is the file access mode modifier mentioned above.

However, during the discussion no implementor of a system with larger
address units emerged, who announced interest in implementing such a
wordset.  So, in order to avoid standardizing unnecessary words, `b@`,
`b!` and `bytewise` are not proposed.  Instead, the issue is discussed
in the Rationale.

### `c` vs. `b` prefix

Some people have expressed a preference for the `b` prefix even if the
result would mean introducing synonyms of `c@` and `c!`.  However, in
the interest of avoiding superfluous words, such words are not
proposed.  Those interested in such words can define them themselves:

````
synonym b@ c@
synonym b! c!
synonym b>s c>s
````

### Require alignment or not

One might wonder whether we should not have versions of the fetch and
store words that require alignment as well as versions that do not,
but we have decided to only supply the words that do not require
alignment, for the following reasons.  All the surviving
general-purpose architectures (IA-32/AMD64, ARMv7-A/R ff. (since 2005),
RISC-V, Power, S/390x) have converged on supporting unaligned
accesses, so on these architectures both variants would use the same
instructions.

On other architectures `w@` will be slower than a hypothetical `w@a`,
but given that these words are not used that often, that these
machines are no longer widespread, and that alignment is sometimes
lost by embedding one structure inside another (as has occured in
network protocols), we decided that `w@a` and friends are more trouble
than they are worth.

### Upper bit handling for the byte-order words

How do we specify the upper bits in the results for `wle`, `wbe` etc.?
E.g., on 64-bit Gforth I see:

````
$1234567890abcdef wbe hex. \ output: $EFCD  ok
$1234567890abcdef wle hex. \ output: $1234567890ABCDEF  ok
````

So in one case it sets the other bits to zero, in the other case it
leaves them alone.  However, we do not want to specify that the upper
bits can be anything, otherwise `w@ wle` would not work as unsigned
little-endian 16-bit fetch, and we would need to add a word `w>u` or
somesuch.

So we specify that the upper bits of the result are either untouched
or 0 (when applying `wbe`/`wle` to the result of `w@`, that produces
the same result in either case).

### Types

There has been some discussion whether 'c-addr' or 'addr' is more
appropriate (they are formally equivalent).  Using 'c-addr' makes the
intent of supporting unaligned accesses more obvious.

Another discussion is about 'u' vs. 'x'.  The 'u' (where used)
expresses that the values are zero-extended, even if they happen to be
a zero-extended representation of a possibly signed, possibly
byte-swapped integer.  If more precise typing is required,
[r1259](https://forth-standard.org/proposals/special-memory-access-words#reply-1259)
outlines a precise way to specify the typing.  However, it is doubtful
that such a specification would result in enough additional clarity
(if any) to justify the longer specification.

### Accesses to values larger than one cell

Gforth has `xd` words where the on-stack representation is a
double-cell.  This allows implementing 64-bit accesses on systems with
32-bit cells.  When I presented these words at the 2023 Forth200x
meeting, I was asked not to include them in this proposal.  So access
to values larger than a cell is not supported by the proposed words.

### Signed number representations

Given that the other Forth words (e.g. `+`) work for 2s-complement
numbers, any other signed-number representation in the input and
output requires additional conversion work.  No words for such
conversions are proposed, because there is no existing practice.

The description of `c>s` `w>s` `l>s` `x>s` as sign-extending makes it
clear what operation they perform.  Specifying a number format is
redundent and actually reduced generality (these words could also be
used for 1s-complement, but other words would be needed to then
convert the results into 2s-complement or perform computations in
1s-complement).

### Additional words

Gforth has the following words related to this proposal:

* `/w ( -- u )` specifies the size of a 16-bit value, i.e. `2`.

* `w, ( x -- )` allocates and stores a 16-bit value.  `wbe` or `wle`
  can be used before.  SwiftForth and VFX Forth also have `w,`.

* `walign ( -- )` naturally aligns the dictionary pointer to a 16-bit
  boundary..

* `waligned ( u1 -- u2 )` does the same for an address or offset on
  the stack.

* `*aligned ( u1 u2 -- u )`: *u2* divides *u*, and *u* is the next
  value *u* >= *u1* with that property.  The result of the operation
  is *not specified if *u2* is not a power of two.

* `wfield: ( u1 "name" -- u2 )` defines a naturally (i.e., 16-bit)
  aligned 16-bit field.  `wfield:` is equivalent to `waligned /w
  +field`.

* `wvalue: ( u1 "name" -- u2 )` defines a naturally-aligned
  value-flavoured 16-bit field.  No easy way exists to define a
  value-flavoured field without imposing alignment.

These words (and their `l` and `x` siblings) were not in my
presentation at the 2023 meeting, so I have not been asked to include
them in this proposal, and therefore I have not included them, but if
consensus emerges that we want to include some of them, I am prepared
to do that.  But do we need them and do we need them in this form?

* `/w` just means `2`, but documents the intent (number of bytes
  accessed by `w@`) better.

* `w,` is convenient in interactive usage, but for maintained code its
  usage often is problematic: In many cases it redundantly respecifies
  the layout of a data structure (already defined with the `field`
  words), which means that a change to the layout results several
  changes in the code.

  There was some discussion of `w,` at the 2024 meeting, and the
  result of the discussion was not to include such words in this
  proposal.  They may be proposed separately.

* `walign` may be useful in connection with `w,`, but has the same
  problem of redundancy.

* `waligned` may be useful for influencing field layout, but one could
  also write `/w *aligned` (replacing three `aligned` words with one).
  Also, if the structure layout is coming from outside the Forth
  system, we probably just want to transfer it using the C interface
  rather than defining it the way we would a Forth-internal data
  structure.

* The automatic alignment of `wfield:` and `wvalue:` is in line with
  the automatic alignment of `field:` etc., but is at odds with with
  the idea that these words are for data structures defined outside of
  Forth where fields may be unaligned.  Variable-flavoured fields for
  such data structures can be defined with `+field`, e.g., `15 0
  +field <name> drop`.  For value-flavoured fields an unaligned
  version of `wvalue:` would be useful, with the possible usage `15
  wvalue:u <name> drop`.

* Value-flavoured fields also inspire the idea that the byte order and
  signedness should also be part of the field definition.

Do we want to add any such words to the proposal?

### FP memory accesses

The words `sf@` `sf!` `df@` `df!` are also intended for data exchange
with the outside world, but they require alignment and there is no
provision for dealing with different byte orders.

For dealing with alignment we could add support for unaligned accesses
to these words.  This would require a change in the standard.  What is
your opinion about that?

For dealing with different byte orders one can do the potential byte
swapping on the integer side, as follows:

````
create dfbuf 1 dfloats allot

: be-df@ ( c-addr -- r ) x@ xbe dfbuf x! dfbuf df@ ;
: be-df! ( r c-addr -- ) dfbuf df! dfbuf x@ xbe swap x! ;
````


## Proposal (Changes to the standard document):

Add the following words:

`w@` ( c-addr -- u ) "w-fetch"

  u is the zero-extended 16-bit value stored at c_addr.

`w!` ( x c-addr -- ) "w-store"

   Store the least significant 16 bits of x at c_addr.

`wbe` ( u1 -- u2 )

   Convert 16-bit value in u1 from native byte order to big-endian or
   from big-endian to native byte order (the same operation).  The
   other bits are either untouched or set to 0.

`wle` ( u1 -- u2  )

   Convert 16-bit value in u1 from native byte order to little-endian
   or from little-endian to native byte order (the same operation).
   The other bits are either untouched or set to 0.

`w>s` ( x -- n ) "w-to-s"

   Sign-extend the low-order 16 bits in x to the full cell width.

`l@` ( c-addr -- u ) "l-fetch"

   u is the zero-extended 32-bit value stored at c_addr.

`l!` ( x c-addr -- ) "l-store"

   Store the least significant 32 bits of x at c_addr.

`lbe` ( u1 -- u2 )

   Convert 32-bit value in u1 from native byte order to big-endian or
   from big-endian to native byte order (the same operation).  The
   other bits are either untouched or set to 0.

`lle` ( u1 -- u2  )

   Convert 32-bit value in u1 from native byte order to little-endian
   or from little-endian to native byte order (the same operation).
   The other bits are either untouched or set to 0.

`l>s` ( x -- n ) "l-to-s"

   Sign-extend the low-order 32 bits in x to the full cell width.

`x@` ( c-addr -- u ) "x-fetch"

   u is the zero-extended 64-bit value stored at c_addr.

`x!` ( x c-addr -- ) "x-store"

   Store the least significant 64 bits of x at c_addr.

`xbe` ( u1 -- u2 )

   Convert 64-bit value in u1 from native byte order to big-endian or
   from big-endian to native byte order (the same operation).  The
   other bits are either untouched or set to 0.

`xle` ( u1 -- u2  )

   Convert 64-bit value in u1 from native byte order to little-endian
   or from little-endian to native byte order (the same operation).
   The other bits are either untouched or set to 0.

`x>s` ( x -- n ) "l-to-s"

   Sign-extend the low-order 64 bits in x to the full cell width.

`c>s` ( x -- n ) "c-to-s"

   Sign-extend the low-order 8 bits in x to the full cell width.

Add the following Rationale for these words:

Typical use: (Optional)

````
( c-addr ) l@ lle l>s \ 32-bit unaligned signed little-endian fetch
( c-addr ) w@         \ 16-bit unaligned unsigned native-order fetch
( n|u c-addr ) >r xbe r> x! \ 64-bit unaligned big-endian   store
( n|u c-addr ) l!           \ 32-bit unaligned native-order store
````

Implementation on systems with address-unit-bits > 8:

This wordset primarily addresses byte-addressed machines, but can also
be used on others (in particular, word-addressed machines), by using
only the lower 8 bits of each address unit (e.g., each word).  The
application would use a file-access-mode modifier `bytewise` (not
standardized) to read the input into memory in that format, then use
`c@ $ff and` `w@` `l@` `x@` to convert from this 8-bit-per-au format
into data on the data stack, work with that, then use `c!` `w!` ´l!`
`x!` to convert back to the 8-bit-per-au format, and finally submit
the data to the external destination by writing to a file opened in
`bytewise` mode.

## Reference implementation:

Will be provided at a later time.


## Testing: (Optional)

Will be provided at a later time.


,------------------------------------------
| 2025-09-11 16:19:26  AntonErtl  replies:
| proposal - Fix stack comments for N>R and NR>
| see: https://forth-standard.org/proposals/fix-stack-comments-for-n-r-and-nr-#reply-1532
`------------------------------------------
## Author:

* Anton Ertl
* Leon Wagner

## Change Log

* 2025-09-11 replaced +n with u (AE)
* 2024-06-06 replaced some `n` with `+n`; formatting changes (AE)
* 2023-09-14 Revision after discussion (AE)
* 2023-09-13 Initial proposal

## Problem:

The stack comments for N>R and NR> don't make it clear that _u_ items are moved between the data and return stacks.

## Solution:

The stack comments should more clearly indicate that _u_ data stack items are moved to or from the return stack.

## Proposal:

In the definition of [`N>R`](https://forth-standard.org/standard/tools/NtoR), replace

> `( i * n u -- ) ( R: -- j * x u )`

with

> `(  x_u ... x_1 u -- ) ( R: -- j*x u )`

In the definition of  [`NR>`](https://forth-standard.org/standard/tools/NRfrom), replace

> `( -- i * x u ) ( R: j * x u -- )`

with

> `( -- x_u ... x_1 u ) ( R: j*x u -- )`

## Discussion

On the return stack, `j*x u` because the data may be in a separate buffer and only the address on the return stack.  `u` on the return stack because the original specified that, and changing that would be a substantial change.

On the data stack `x_n ... x_1 u` because that is the way we usually specify a numbered number of cells (even for `u=0`).  See, e.g., [`get-order`](https://forth-standard.org/standard/search/GET-ORDER).


,------------------------------------------
| 2025-09-11 16:27:56  ruv  replies:
| proposal - Exclude zero from the data types that are identifiers
| see: https://forth-standard.org/proposals/exclude-zero-from-the-data-types-that-are-identifiers#reply-1533
`------------------------------------------
## Author

Ruv

## Change Log

- 2022-08-14 Initial version
- 2025-09-11 Add changes from comments, exclude _flag_ instead of zero.

## Problem

In many cases it's supposed that a data object cannot be equal to a single-cell value zero, but the corresponding data type allows that.

For example
  - [`NAME>INTERPRET`](https://forth-standard.org/standard/tools/NAMEtoINTERPRET) implies that an execution token cannot be zero.
  - [`FIND-NAME`](https://forth-standard.org/proposals/find-name?hideDiff#reply-174) implies that a name token cannot be zero.
  - Usual practice in programs is to assume an address, a file identifier, a word list identifier be a nonzero value.

In some cases it is supposed that a data object cannot be equal to `-1` or `TRUE` (in two's complement encoding).
  - According to FILE [SOURCE-ID](https://forth-standard.org/standard/file/SOURCE-ID), _fileid_ cannot be `-1`, because `-1` is reserved for the case where the input source is a string.

## Solution

Explicitly exclude _false_ (zero, all bits clear) and _true_ (all bits set) from the address, execution token, name token, word list identifier, file identifier data types.
When zero is a valid value on the underlying level, it can be reserved from use, or filtered out in the wrappers over the API routines of the underlying level.


Also, fix incorrect wording in the "subtype" definition, since members are not a subject of the subset relationship (it actually operates on sets):<br/>
"A data type _i_ is a subtype of type _j_ if and only if the members of _i_ are <del>a subset of</del> the members of _j_".

Also, add the missed data type relationships.


## Proposal

### Fix wording for "subtype"

In the section [3.1.1 Data-type relationships](https://forth-standard.org/standard/usage#subsection.3.1.1)

Replace the phrase:
> A data type _i_ is a subtype of type _j_ if and only if the members of _i_ are a subset of the members of _j_.

With the phrase:
> A data type _i_ is a subtype of type _j_ if and only if each member of _i_ is a members of _j_.

### Notation for difference between sets (relative complement)

In the section [3.1.1 Data-type relationships](https://forth-standard.org/standard/usage#subsection.3.1.1), add the following phrase at the end of the first paragraph:

> The notation "i \ j" is used to denote "the data type that includes all those and only those members of _i_ which are not members of _j_".

### Exclude _flag_ from the **address** and **execution token** data types

In the section [3.1.1 Data-type relationships](https://forth-standard.org/standard/usage#subsection.3.1.1)

Replace:
> a-addr => c-addr => addr => u ; 

With:
> a-addr => c-addr => addr => u \ flag ; 

Replace:
> xt => x; 

With:
> xt => x \ flag ; 

### Specify that `( 0 0 )` is a character string member

In the section [3.1.4.2 Character strings](https://forth-standard.org/standard/usage#subsubsection.3.1.4.2)

Replace:
> `(c-addr u)`

With
> `(c-addr u | 0 0 )`


### Exclude _flag_ from the **name token** data type

In the end of the section [15.3.1 Data types](https://forth-standard.org/standard/tools#subsection.15.3.1) add the following subsection:

> #### 15.3.1.1 Data-type relationships
> Add the following to the end of the list of subtype relationships in the section [3.1.1](https://forth-standard.org/standard/usage#subsection.3.1.1) Data-type relationships:
>
> `nt => x \ flag;`


### Exclude _flag_ from the **word list identifier** data type

In the end of the section [16.3.1 Data types](https://forth-standard.org/standard/search#subsection.16.3.1) add the following subsection:

> #### 16.3.1.1 Data-type relationships
> Add the following to the end of the list of subtype relationships in the section [3.1.1](https://forth-standard.org/standard/usage#subsection.3.1.1) Data-type relationships:
> 
> `wid => x \ flag; `


### Exclude _flag_ from the **file identifier** data type

In the end of the section [11.3.1 Data types](https://forth-standard.org/standard/file#subsection.11.3.1) **add** the following subsection:

> #### 11.3.1.1 Data-type relationships
> Add the following to the end of the list of subtype relationships in the section [3.1.1](https://forth-standard.org/standard/usage#subsection.3.1.1) Data-type relationships:
>
> `fam => x;`
> `fileid => x \ flag ;`


,------------------------------------------
| 2025-09-11 16:54:20  PeterKnaggs  replies:
| proposal - Exclude zero from the data types that are identifiers
| see: https://forth-standard.org/proposals/exclude-zero-from-the-data-types-that-are-identifiers#reply-1534
`------------------------------------------
> In the section 3.1.4.2 Character strings
> Replace:
> (c-addr u)
> With
> (c-addr u | 0 0 )

Add a sentence to the effect that (0 0) is to be considered as an empty string.

> Exclude flag from the name token data type
> In the end of the section 15.3.1 Data types add the following subsection:

Add a sentence that a name token can evaluate to a *flag* (0 or -1)

> Exclude flag from the word list identifier data type
> In the end of the section 16.3.1 Data types add the following subsection:

Add a sentence that a word list ID can evaluate to a *flag* (0 or -1)

> In the end of the section 11.3.1 Data types add the following subsection:

Add a sentence that a file ID can evaluate to a *flag* (0 or -1)


,------------------------------------------
| 2025-09-12 04:05:22  AntonErtl  replies:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1535
`------------------------------------------
The committee has found consensus on the following words.  I was asked
to write it up as a proposal, quickly.  Due to time limits this is
just a skeleton, and will not make sense to people new to the
discussion.  A more fleshed-out proposal will be submitted later.

## Translations and text-interpretation

Recognizers produce translations.  The text interpreter (and other
users, such as `postpone`), removes the translation from the stack(s),
and then either performs the interpreting run-time, compiling
run-time, or postponing run-time.

Unless otherwise specified the compiling run-time compiles the
interpreting run-time.  The postponing run-time compiles the compiling
run-time.


## Types

**translation**: The result of a recognizer; the input of interpreting,
compiling, and postponing; it's a semi-opaque type that consists of a
translation token at the top of the data stack and additional
data on various stacks below.

**translation token**: Single-cell item that identifies a certain
  translation.  (This has formerly been called a rectype.)

## Words

### `rec-name` ( c-addr u -- translation )

If c-addr u is the name of a visible local or a visible named word,
translation represents the text-interpretation semantics
(interpreting, compiling, postponing) of that word (see
`translate-name`).  If not, translation is `translate-none`.
(formerly called rec-nt)

### `rec-number` ( c-addr u -- translation )

If c-addr u is a single or double number (without or with prefix), or
a character, all as described in section ..., translation represents
pushing that number at run-time (see `translate-cell`,
`translate-dcell`).  If not, translation is `translate-none`.

### `rec-float` ( c-addr u -- translation )

If c-addr u is a floating-point number, as described in section ...,
translation represents pushing that number at run-time (see
`translate-float`). If c-addr u has the syntax of a double number
without prefix according to section ..., and it correspond to the
floating-point number r corresponding to that string according to
section ..., translation may represent pushing r at run-time.  If
c-addr u is not recognized as a floating-point number, translation is
`translate-none`.

### `rec-none` ( c-addr u -- translation )

This word does not recognize anything.  For its translation, see
`translate-none`.

### recs ( -- )

Print the recognizers in the recognizer sequence in `rec-forth`, the
first searched recognizer leftmost. (formerly known as .recognizers)

### rec-forth ( c-addr u -- translation )

This is a deferred word that contains the recognizer (sequence) that
is used by the Forth text interpreter.  (formerly forth-recognize)

### `rec-sequence:` ( xtu .. xt1 u "name" -- )

Define a recognizer sequence "name" containing u recognizers
represented by their xts.  If `set-recs` is implemented, the sequence
must be able to accomodate at least 16 recognizers.

name execution: ( c-addr u -- translation )

Execute xt1; if the resulting translation is the result of
`translate-none`, restore the data stack to ( c-addr u -- ) and try
the next xt.  If there is no next xt, remove ( c-addr u -- ) and
perform `translate-none`.

### `translate-none` ( -- translation )

(formerly r:fail or notfound)

translation interpreting run-time: ( ... -- )

`-13 throw`

translation compiling run-time: ( ... --  )

`-13 throw`

translation postponing run-time: ( ... --  )

`-13 throw`

### `translate-cell` ( x -- translation )

(formerly translate-num)

translation interpreting run-time: ( -- x )

### `translate-dcell` ( xd -- translation )

(formerly translate-dnum)

translation interpreting run-time: ( -- xd )

### `translate-float` ( r -- translation )

translation interpreting run-time: ( -- r )

### `translate-name` ( nt -- translation )

(formerly translate-nt)

translation interpreting run-time: ( ... -- ... )

Perform the interpretation semantics of nt.

translation compiling run-time: ( ... -- ... )

Perform the compilation semantics of nt.

### `translate:` ( xt-int xt-comp xt-post "name" -- )

Define "name" (formerly rectype:)

"name" exection: ( i\*x -- translation )

translation interpreting run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-int.

translation compiling run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-comp.

translation postponing run-time: ( ... translation -- ... )

Remove the top of stack (the translation token) and execute xt-post.

### `get-recs` ( xt -- xt_u ... xt_1 u )

xt is the execution token of a recognizer sequence.  xt_1 is the first
recognizer searched by this sequence, xt_u is the last one.

### `set-recs` ( xt_u ... xt_1 u xt -- )

xt is the execution token of a recognizer sequence.  Replace the
contents of this sequence with xt_u ... xt_1, where xt_1 is searched
first, and xt_u is searched last.  Throw ... if u exceeds the number
of elements supported by the recognizer sequence.

## Rationale

(This will also be fleshed out)

The committee has decided not to standardize words
that consume translations for now.  Such words would be useful for
defining a user-defined text interpreter, but the experience with
recognizers has shown that a recognizer-using text interpreter is
flexible enough that it is no longer necessary to write such text
interpreters, and such words are only used internally in the text
interpreter.

However, to give an idea how all this works together, here's the words
that Gforth provides for that purpose:

### `interpreting` ( ... translation -- ... )

For a system-defined translation token, first remove the translation
from the stack(s), then perform the interpreting run-time specified
for the translation token.  For a user-defined translation token,
remove it from the stack and execute its int-xt.

### `compiling` ( ... translation -- ...  )

For a system-defined translation token, first remove the translation
from the stacks, then perform the compiling run-time specified for the
translation token, or, if none is specified, compile the
'interpreting' run-time.  For a user-defined translation token, remove
it from the stack and execute its comp-xt.

### `postponing` ( ... translation --  )

For a system-defined translation token, first consume the translation,
then compile the 'compiling' run-time.  For a user-defined translation
token, remove it from the stack and execute its post-xt.

## Examples

````
s" 123" rec-forth ( translation ) interpreting

: rec-tick ( addr u -- translation )
    over c@ '`' = if
        1 /string find-name dup if
            name>interpret ( xt ) translate-num then
        exit then
    \ 2drop notfound
    rec-none ;

' noop                       ( x -- x )                             \ int-xt
' lit,                       ( compilation: x -- ; run-time: -- x ) \ comp-xt
:noname lit, postpone lit, ; ( postponing: x -- ;  run-time: -- x ) \ post-xt
translate: translate-num
````


,------------------------------------------
| 2025-09-12 09:24:31  ruv  replies:
| proposal - Fix stack comments for N>R and NR>
| see: https://forth-standard.org/proposals/fix-stack-comments-for-n-r-and-nr-#reply-1536
`------------------------------------------
> 2025-09-11 replaced _+n_ with _u_ (AE)

Anton, could you please write a rationale for why it is proposed to replace _+n_ with _u_ in these words?

Note that in Forth-2012, in the definitions of [`N>R`](https://forth-standard.org/standard/tools/NtoR) and [`NR>`](https://forth-standard.org/standard/tools/NRfrom), the data type _+n_ is used, not _u_ (as this version  claims).


,------------------------------------------
| 2025-09-12 10:18:46  ruv  replies:
| proposal - Tick and undefined execution semantics
| see: https://forth-standard.org/proposals/tick-and-undefined-execution-semantics#reply-1537
`------------------------------------------
A better to specify what behavior xt identifies when ticking such words as `s"` or `to`.


,------------------------------------------
| 2025-09-12 10:43:18  AntonErtl  replies:
| proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
| see: https://forth-standard.org/proposals/optional-ieee-754-binary-floating-point-word-set#reply-1538
`------------------------------------------
Some committee members will look at this proposal if they update their FP package.  But given the lack of a champion for this proposal at the moment, the committee has decided to retire this proposal for now.  If any champion steps up, move it to informal again.


,------------------------------------------
| 2025-09-12 15:04:45  EricBlake  replies:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1539
`------------------------------------------
I'm aware you intend to flush this out further, but when doing so, the following observations may be useful.

> Examples
>
>      s" 123" rec-forth ( translation ) interpreting

It might be helpful to also show the stack effect after interpreting, as in:
```
s" 123" rec-forth ( translation ) interpreting ( n ) \ leaves 123 on the stack
```

If I understand the intent correctly, the difference between `rec-none` and `translate-none` is that both produce the same translation (which consists of a translation token and possibly additional cells), while only the former also consumes `addr u` before pushing that translation.  Taking it further, since translation is a semi-opaque type of one or more cells, I think I can still implement it where the translation token returned by `translate-none` is a literal 0 (trivially as `0 constant translate-none`), provided that my `interpreting`/`compiling`/`postponing` all recognize a literal 0 as the built-in translation whose effects are to result in `-13 throw`.  Or I could implement `:noname -13 throw ; dup dup translate: translate-none` where the translation token is a non-zero xt just like any other user-defined word created by `translate:`, and then interpreting/compiling/postponing don't have to special-case 0.  But either way, my implementation choice for translate-none is not unduly constrained by the standard, and not relevant or visible to the user; but it DOES place constraints on the user writing their own recognizers to use `rec-none` or `translate-none` instead of hard-coding 0 in their code, if they don't want an environmental dependency on the implementation.  I think I like how that turned out.

But given that analysis, It means that the standard should not prohibit a translation token from being a literal 0; when compared to the other recent work in [r1533](https://forth-standard.org/proposals/exclude-zero-from-the-data-types-that-are-identifiers#reply-1533) to designate that `xt => x \ flag`, the standard should be clear that `translation token => x` and not `translation token => xt`.

Continuing with the examples,

>     : rec-tick ( addr u -- translation )
>         over c@ '`' = if
>             1 /string find-name dup if
>                 name>interpret ( xt ) translate-num then

Why is this calling out translate-num instead of translate-cell?

>             exit then

This exit leaves the stack with either `xt translate-num` or `0`; the former makes sense but the latter assumes `translate-none` produces a literal 0, which I just argued above is a specific implementation choice rather than something the standard mandates.  Would it be better as:

```
... find-name ?dup if name>interpret ( xt ) translate-cell exit then translate-none exit ...
```

>         \ 2drop notfound
>         rec-none ;

I like how `rec-none` serves the same role as the former `2drop notfound`, but does leaving in the comment aid in understanding the example?


>     ' noop                       ( x -- x )                             \ int-xt
>     ' lit,                       ( compilation: x -- ; run-time: -- x ) \ comp-xt
>     :noname lit, postpone lit, ; ( postponing: x -- ;  run-time: -- x ) \ post-xt
>     translate: translate-num

Is this part of the example intended to be a potential reference implementation of translate-cell?  Or is it intended to supply the translate-num used in the rec-tick above, in which case the order of presentation should be swapped?


,------------------------------------------
| 2025-09-12 17:09:32  ruv  replies:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1540
`------------------------------------------
> `translate-cell ( x -- translation )`  
> `translate-name ( nt -- translation )`

The naming scheme `translate-***`  is inappropriate and confusing for these words; for example, the name `translate-name` implies that the word performs some translation, but this word actually does not perform any translation, it is just a **constant** (i.e., it simply pushes a single-cell token on the stack; and this should be indicated in its stack diagram).

We should find a better naming scheme for these words.

Possible options:
  - `***-recognized`
  - `***-tag`  or  `tag-***`  (because effectively this value is a data type tag for a data object)
  - `td-***` (from "token discriminator" or "token descriptor", similar to tag)

Other?

-----

@EricBlake [wrote](https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1539):

> I think I can still implement it where the translation token returned by `translate-none` is a literal `0` 

Yes, for example, in Gforth it is currently [implemented](https://github.com/forthy42/gforth/blob/cb1cd1f37c1a6b8735685fc1b889a718e47eaa95/kernel/recognizer.fs#L117) this way.

Zero value on unsuccess simplifies analyzing — you can do «`dup if ...`»  or «`if ...`»  instead of   «`dup tag-none <> if ... `». If most implementations stick to this approach, it can be standardized.

> Why is this calling out `translate-num` instead of `translate-cell`?

This is probably an oversight after renamings.


,------------------------------------------
| 2025-09-12 18:49:52  EricBlake  replies:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1541
`------------------------------------------
> it DOES place constraints on the user writing their own recognizers to use rec-none or translate-none instead of hard-coding 0 in their code, if they don't want an environmental dependency on the implementation. I think I like how that turned out.

But now in trying to code something up that uses recognizers from the user's point of view, I tried to write a quick word that accepts single-cell numbers but rejects character strings that would produce a double cell or not be recognized as a number.  Using gforth's implementation, it might be as simple as:

```
: single? ( c-addr u -- n true | 0 ) \ recognize only a single-cell integer, with a flag rather than translator on success
  rec-number
  case
    translate-cell of ( n ) true exit endof
    translate-none of ( -- ) 0 exit endof
    translate-dcell of ( d ) 2drop 0 exit endof
    abort" unexpected translation" endcase ;
```

But re-reading the proposed specification, translate-cell has a mandated stack effect of `( n -- translation )`, and my usage above did not satisfy that requirement.  So, what if I modify it along these lines, to ensure that every time translate-cell is used, there is an n on the stack before-hand (and then jumping through hoops to get it back off the stack)?

```
: token-none ( -- token ) \ determine the token produced by translate-none
  translate-none
;
: token-cell ( -- token ) \ determine the token produced by translate-cell
  0 translate-cell dup >r interpreting drop r>
;
: token-dcell ( -- token ) \ determine the token produced by translate-dcell
  #0. translate-dcell dup >r interpreting 2drop r>
;
: single? ( c-addr u -- n true | 0 ) \ recognize only a single-cell integer, with a flag rather than translator on success
  rec-number
  case
    token-cell of ( n ) true exit endof
    token-none of ( -- ) 0 exit endof
    token-dcell of ( d ) 2drop 0 exit endof
    abort" unexpected translation" endcase ;
```

Alas, even that seems like it is not portable to the proposed spec, but has an environmental dependency on gforth's implementation.  There's nothing in the proposed wording that requires the translation token cell to be identical regardless of what the rest of the overall translation represents.  In fact, I see nothing that requires translate-none to produce a single cell, nor for any other given flavor of translation to occupy a consistent number of cells regardless of the value being translated.  The proposal is very clear that `123 translate-cell interpreting` will leave 123 on the stack, but intentionally does not state how many cells of the stack are in use in between `translate-cell` and `interpreting` (only that it is a semi-opaque type of one or more cells).

Put differently, gforth's implementation for translate-cell happens to be idempotent and produce a translation that occupies two cells (namely, the value of the cell being translated, and a single-cell translation token); but based on just the proposed spec, what would prevent an alternative implementation that has a translation occupy exactly one cell (namely, a pointer to an internal struct that wraps multiple pieces of information, including the value to push, the current line/offset of the source at the time the call to translate-cell was made in order to make for more friendly SEE output, and so on).  With such an implementation, I could argue that `0 translate-cell 0 translate-cell =` producing false is acceptable, because it results in two different pointers (there were two different source locations at the time of the two different invocations of translate-cell).  Or what would prevent an implementation where `0 translate-cell` occupies one cell, because it is a frequently-encountered and worth special-casing in the interpreter loop, vs. `123 translate-cell` occupying two cells, because it is infrequently encountered?

From the user's point of view, it would be a lot more powerful if we had a guarantee that a given translate-XXX produces the same translation token at the top of the stack (even if the rest of the stack is variable-length), and that a given rec-XXX produces idempotent output (maybe with limitations on how SOURCE and >IN can be changed between the recognizer and the action on the translation).  It would also be nice if we could guarantee that ALL instances of `translate-XXX` have the behavior of pushing a single cell to the stack, where that cell is constant for a given translate-XXX, and document that comparing translation tokens is well-defined, and that the rest of the stack diagram for a translator only matters if the resulting translation will be further passed to interpreting/compiling/postponing (ie. `translate-cell drop` is always unambiguous and cannot cause stack underflow, but it is ambiguous behavior to attempt `translate-cell compiling` if there was not an n on the stack).

Finally, it would also be nice if there were an easy way to discard the entire stack effect of a given translation, if the result of a recognizer produces a different translation than desired.  Maybe `discard ( translation -- )`, so that I could rewrite my earlier example more compactly:

```
: single? ( c-addr u -- n true | 0 ) \ recognize only a single-cell integer, with a flag rather than translator on success
  rec-number
  dup translate-cell = if ( n token-cell ) drop true exit then
  ( 0 | d token-dcell -- ) discard 0
```


,------------------------------------------
| 2025-09-12 19:19:02  ruv  replies:
| proposal - Recognizer committee proposal 2025-09-11
| see: https://forth-standard.org/proposals/recognizer-committee-proposal-2025-09-11#reply-1542
`------------------------------------------
> But re-reading the proposed specification, translate-cell has a mandated stack effect of `( n -- translation )`

The idea is that `translate-cell` has stack effect `( -- x.td )`,  and `( x x.td )` is a _translation_.  The name `translate-cell` with its stack diagram is very confusing. I think this proposal needed more work before publication.

See also [my suggestion](https://github.com/ForthHub/fep-recognizer/blob/master/terms-and-datatypes.md#table-xy1) for naming these data types.