,---------------.
| Contributions |
`---------------´


,------------------------------------------
| 2024-12-30 17:37:44  ruv  wrote:
| requestClarification - May `CODE` be a parsing word?
| see: https://forth-standard.org/standard/tools/CODE#contribution-372
`------------------------------------------
[15.6.2.0930 `CODE`](https://forth-standard.org/standard/tools/CODE) says:

> Those characters are processed in an implementation-defined manner, generating the corresponding machine code. The process continues, refilling the input buffer as needed, until an implementation-defined ending sequence is processed.

Does this imply that `code` actively parses the input source?

If it actively parses, is it standard-compliant?


,---------.
| Replies |
`---------´


,------------------------------------------
| 2024-12-06 16:07:14  JimPeterson  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1396
`------------------------------------------
I understand what you're saying, but in terms of conveying information to the reader, I think some indication that there may not be loop control parameters to `UNLOOP`, presented in the stack notation, may be of use.  I know that the text below it says as much, but `?DO` having the same stack notation as `DO` just feels wrong or misleading.

I know, from a machine's perspective, the notation is technically correct, but I feel like this documentation is being written for humans, and they often require (or at least benefit from) a little more hand-holding.  The change I suggest is also technically correct but more informative.


,------------------------------------------
| 2024-12-07 07:22:49  AntonErtl  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397
`------------------------------------------
@BerndPaysan:

If you eliminate the state-dependence of translators, then text
interpreters that use more than just the xt-int action (e.g., the one
for colorforh-bw, see below) can be written without having to deal with `state`.
And text interpreters that use xt-post can be written using the
proposed wordset rather than having to use a detour through `postpone`
(which is a parsing word, possibly introducing additional
complications).

The following is also relevant to @ruv:

[Ruv's colorforth-bw
implementation](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/example/recognize-colorforth-bw.fth)
demonstrates the shortcomings of the present proposal, because it does
not use recognizers nor translators at all for implementing
`recognize-colorforth-bw`; instead, it reimplements everything that the name
recognizer and the number recognizer already do internally, nicely
demonstrating that the present proposal buries the tools.  And it only
implements dealing with names and single-cell numbers.  Finally, the
implementation is so long (44 lines without putting it into
`forth-recognize`) that you have not shown it inline, but posted a
link to github.

By contrast, let's take much of the proposal from
[[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516),
but replace the state-dependent translators with the state-independent
rectypes of
[[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160).  With such a proposal, colorforth-bw might look as follows (untested):

```
defer recognizer1 forth-recognizer is recognizer1

: prefix>index ( c -- n )
  case
    '[' of  0 endof
    '_' of -1 endof
    ']' of -2 endof
    1 swap
  endcase ;
  
: rectype-colorforth-bw ( ... rectype index state -- ... )
  drop \ we use index, not the surrounding Forth interpreter's state
  swap execute ;

: recognize-colorforth-bw ( c-addr u -- )
  dup 0= if 2drop ['] notfound exit then
  over c@ prefix>index dup 0 > if 2drop drop ['] notfound exit then
  >r 1 /string recognizer1 r> ['] rectype-colorforth-bw ;

' recognize-colorforth-bw set-forth-recognize
```

This has only 20 lines (vs. 44), and it uses all the recognizers
originally present in `forth-recognizer` (name, integers (including
doubles), FP, etc.).  This demonstrates the superior expressive power
of the rectypes from [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160) over the translators from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516).

BTW, I find the presence of both `forth-recognize` and
`forth-recognizer` confusing, and would prefer to define
`forth-recognize` as deferred word.  If you have to have getters and
setters, call the getter `get-forth-recognize`.

> In this approach, why do you need to write «[postpone _foo» instead
> of «]foo» ?

Nobody is suggesting that.  But you need to perform xt-post in order
to implement `]foo`.  In your implementation, you do it by
reimplementing xt-post for the two recognizers you implement
internally to `recognize-colorforh-bw`.  If you would use a detour
through `postpone` instead, you would use the xt-post invoked in that
way.  And in my implementation above, xt-post is invoked directly.


,------------------------------------------
| 2024-12-07 07:29:52  AntonErtl  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1398
`------------------------------------------
You are correct, but originally failed to get your point across to me, and apparently to ruv, who addressed the issue that a loop-sys may have 0 items on the return stack.  But yes, if n1|u1 = n2|u2, no loop-sys is pushed by the run-time semantics of `?do`.


,------------------------------------------
| 2024-12-07 08:36:05  ruv  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1399
`------------------------------------------
JimPeterson, I now see what you mean. My argument about size of _loop-sys_ is irrelevant.

> that there may not be loop control parameters to `UNLOOP`

This is impossible. `UNLOOP` can only be used in a loop body. And if `?DO` Run-time semantics do not place _loop-sys_, the loop body does not gain control.

```forth
  program1 ( S: u.limit u.initial ) ( R: 0*x ) ?DO ( S: 0*x ) ( R: loop-sys ) program2 ( R: loop-sys ) LOOP ( R: 0*x )
```

By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram `( R: -- loop-sys | )` would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Take a look:
```forth
program1 ( S: param-type.1 ) program2 ( S: param-type.2 ) program3
```
When we specify a stack diagram for `program2`, which is `( param-type.1 -- param-type.2 )`, we indicate:
   - the input parameter type for `program2`, which should be provided by `program1`,
   - the output parameter type of `program2`, which is available for `program3`,
   - we can indicate a case when `program3` does not gain control, but we have to indicate **in prose** who will gain control and what parameters will be passed.

For example:
```
: ?return-true ( -- ) ]] if true exit then [[ ; immediate
```
- A correct stack diagram for `?return-true` Run-time semantics:
   - `( x -- )`
- A more narrow stack diagram for `?return-true` Run-time semantics:
   - `( 0 -- | x\0 -- true never )`
  - But this diagram does not tell us where the output parameter `true` is available (if any).

See also [my post](https://github.com/ForthHub/discussion/discussions/171#discussioncomment-10882259) about the _never_ data type.

-----

NB: `( n1 | u1 n2 | u2 -- )` is incorrect due to excessive spaces, it's an [editorial issue](https://forth-standard.org/proposals/formatting-spaces-in-data-type-symbols#contribution-250).


,------------------------------------------
| 2024-12-07 15:46:16  ruv  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1400
`------------------------------------------
> - A correct stack diagram for ?return-true Run-time semantics:
>    - `( x -- )`

It could be unclear why this diagram is correct if the `?return-true` Run-time semantics do not return control in some cases.

The answer is that the data type _never_ is a subtype of any other data type, and it is a subtype of `0*x`.

`( x -- )` ⟺ `( x -- 0*x )`  ⟺ `( x -- 0*x|never )` ⟺ `(  x --  |  x -- never  )`

A stack diagram by itself does not guarantee that a word returns control. It only guarantees that if the word returns control, it returns a parameter of specified type.


,------------------------------------------
| 2024-12-08 10:46:00  AntonErtl  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1401
`------------------------------------------
> By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Counterexample: [`THROW`](https://forth-standard.org/standard/exception/THROW)


,------------------------------------------
| 2024-12-08 18:25:20  ruv  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1402
`------------------------------------------
Anton, thank you for this example. Yes, I'm wrong that it "would be incorrect". _Formally_, it is still correct. Because specifying a wider data type than possible does not introduce contradictions. But it does introduce **confusing**.  I think that data types should be specified as narrow as possible.

The stack diagram `( k*x n -- k*x | i*x n )`  is justified only if neither type of `( i*x n )` and `( k*x )` is a subtype of the other.  This holds for the word `throw` because  `( i*x n )` is never returned to the caller.

But specifying two data types in [the union](https://github.com/ForthHub/discussion/discussions/171#discussioncomment-11030626) so that the members of one are sometimes returned and the members of the other are never returned **is useless and just confusing**.

A better diagram for [`throw`](https://forth-standard.org/standard/exception/THROW) is `(  k*x 0 -- k*x  |  k*x n1\0 -- i*x n1 never  )`. This diagram explicitly says that the output parameter of the type `( i*x n1 )` (where the value that corresponds to `n1` is the same in the input and in the output parameter) is not available to the caller. 

A better diagram for `?do` Run-time semantics is `(  n1 n2 -- ; R: -- loop-sys ;  |  u1 u2 -- ; R: -- loop-sys ;  |  x1 x1 -- never ;  )`.  This diagram specifies:
   - an ambiguous condition exists if the input parameter has type neither `( n n )` nor `( u u )`,
   - if the first input parameter and the second input parameter are identical, then the loop body may not gain control.


,------------------------------------------
| 2024-12-08 19:17:59  ruv  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1403
`------------------------------------------
By "a wider data type than possible" I mean a data type that has **more members** than another suitable data type. How best to phrase that?


,------------------------------------------
| 2024-12-08 20:45:55  ruv  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1404
`------------------------------------------
> The stack diagram `( k*x n -- k*x | i*x n )` is justified only if neither type of `( i*x n )` and `( k*x )` is a subtype of the other.

Actually, this **always** holds.

`( k*x n -- k*x | i*x n )`  ⟺  `( k*x n -- k*x  |  k*x n -- i*x n )`

To prove that neither type from the union is a subtype of the other, it is enough to give two examples, one of which is a member of only `( k*x n -- k*x  )` (in the union), and the other is a member of only `( k*x n -- i*x n )` (in the union).

Here are such examples:
   - the mapping `( 0 ↦ )`  is a member of only `( k*x n -- k*x  )` (in the union)
   - the mapping `( 1 ↦ 1 )` is a member of  only `( k*x n -- i*x n )` (in the union)

There are also members that belongs to both. For example, the following mappings:
   - `( 0 0 ↦ 0 )`
   - `( 1 1 ↦ 1 )`
   - `( 123 0 0 ↦ 123 0 )`


,------------------------------------------
| 2024-12-08 22:59:20  ruv  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1405
`------------------------------------------
@AntonErtl [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397):

> [Ruv's colorforth-bw implementation](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/example/recognize-colorforth-bw.fth) demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing `recognize-colorforth-bw`; instead, it reimplements everything that the name recognizer and the number recognizer already do internally,

It's wrong. Have a look at [L18-L19](https://github.com/ForthHub/fep-recognizer/blob/fed494a7b545c8fe9338a12bee2254fc838baace/implementation/example/recognize-colorforth-bw.fth#L18-L19):
```forth
  \ Reuse a recognizer for numbers
  ['] recognize-number-n-prefixed apply-recognizer-cf dup 0= if exit then
```
It **uses** the recognizer for numbers.  And it uses `find-name` instead of the recognizer for names (Forth words) just because it's simpler in this case. It does not reuse _token translators_.

>  And it only implements dealing with names and single-cell numbers.

Because your original example implemented only that. And I just rewrote your original example.

> Finally, the implementation is so long (44 lines without putting it into forth-recognize) that you have not shown it inline, but posted a link to github.

Why count 10 lines of comments at the beginning of the file?  Without comments, 31 lines, the same as in your example (lexical size is greater due to nt vs xt, and improvements in the behavior).


> By contrast, let's take much of the proposal from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516), but replace the state-dependent translators with the state-independent rectypes of [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160). With such a proposal, colorforth-bw might look as follows (untested):

[...]

> This has only 20 lines (vs. 44), and it uses all the recognizers originally present in forth-recognizer (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160) over the translators from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1081).

(I corrected the [r1081] link in the citation above)

This comparison is incorrect. Below is an [implementation](https://github.com/ForthHub/fep-recognizer/commit/df96c6ab2c9613f4552933e42fc24ac38cc41c02) against the latest API version (except `compile-postpone-qtoken` that is a variation of discussed `postpone-qtoken`, which should be either present or implementable in any variant of API):

```forth
: cf-prefix>tt? ( c -- tt true | c false )
  case
    '[' of ['] execute-interpreting endof
    '_' of ['] execute-compiling endof
    ']' of ['] compile-postpone-qtoken endof
    0 exit
  endcase true
;

defer recognize-default  perceptor is recognize-default

: recognize-colorforth-bw ( sd.lexeme -- qt|0 )
  dup 0= if nip exit then
  over c@ cf-prefix>tt? 0= if drop 2drop 0 exit then
  >r 1 /string recognize-default dup if r> exit then rdrop
;
```

**16 lines.**

Can be tested in Gforth too:
```
gforth index.fth example/recognize-colorforth-bw.fth

:noname cf( _1. _drop _s" foo" ) ; execute s" foo" compare 0=  .s \ prints "1 -1"
```


,------------------------------------------
| 2024-12-09 07:07:25  AntonErtl  replies:
| requestClarification - Return Stack Notation Mildly Inaccurate
| see: https://forth-standard.org/standard/core/qDO#reply-1406
`------------------------------------------
It is obvious that the idea behind the stack diagram of `throw` is to specify what happens on the data stack in both cases, including the case where the control flow does not continue sequentially.  And I think it's a good idea to specify the stack effect for that case, and it should also be done for `?do`.

Whether the `|` syntax as used for `throw` is good enough or whether we should have separate stack diagrams for the two cases is something one might discuss.  However, I have not seen confused questions about what the stack effect of `throw` means, and I would not expect confusion from a similar usage of `|` for the stack effect of `?do`.  In both cases the prose makes it clear enough which part of the stack effect diagram corresponds to which case.  Actually for the `?do` case it has taken 3 decades until someone asked about the lack of a stack-effect diagram for the case when index=limit.


,------------------------------------------
| 2024-12-09 07:25:04  AntonErtl  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1407
`------------------------------------------
The latest proposal is [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1081) and it does not contain `execute-interpreting`, `execute-compiling`, `compile-postpone-qtoken`, or `perceptor`.  And that's what we were tasked with discussing and giving feedback on.  And that's what I did.


,------------------------------------------
| 2024-12-09 09:38:01  ruv  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1408
`------------------------------------------
> The latest proposal is [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1081) and it does not contain `execute-interpreting`, `execute-compiling`, `compile-postpone-qtoken`, or `perceptor`. And that's what we were tasked with discussing and giving feedback on. And that's what I did.

I see, thank you.  Actually, [r1081] is outdated, a new version will be prepared soon and then it should be discussed (was noted in the recognizer chat). Nevertheless, my example implementation for `recognize-colorforth-bw` above is compatible with [r1081] with the following exceptions: it relies on `0` instead of `NOTFOUND` (you should note how it makes things simpler), and it uses the method `compile-postpone-qtoken` that appends the compilation semantics of a qualified token to the current definition (this method is missing in [r1081]). The word `perceptor` is simply a better name than `forth-recognizer` in [r1081] (I just posted in ForthHub/fep-recognizer a [rationale](https://github.com/ForthHub/fep-recognizer/issues/23) from the chat).

The words `execute-interpreting` and `execute-compiling` are general words that are [needed](https://forth-standard.org/standard/tools/NAMEtoINTERPRET#contribution-364
) anyway to perform interpretation or compilation semantics regardless the initial `STATE`, they are [implemented](https://github.com/ForthHub/fep-recognizer/blob/81355bfa4ee639d822e25da2a32f2ec4d9526815/implementation/lib/compat/core.translator.fth) in the standard Forth as:
```
: compilation ( comp: true ; S: -- true ; | comp: false ; S: -- false ; )  state @ 0<> ;
: enter-compilation ( comp: false -- true ; S: -- ; | comp: true  ; S: -- ; )  ] ;
: leave-compilation ( comp: true -- false ; S: -- ; | comp: false ; S: -- ; )  postpone [ ;
: execute-interpreting ( i*x xt -- j*x )
  compilation 0= if execute exit then
  leave-compilation execute enter-compilation
;
: execute-compiling ( i*x xt -- j*x )
  compilation if execute exit then
  enter-compilation execute leave-compilation
;
```


,------------------------------------------
| 2024-12-09 10:00:24  ruv  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1409
`------------------------------------------
@AntonErtl [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397):

> If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with state.

Token translators cannot be written without having to deal with state (possibly indirectly), by the term definition. A token translator shall perform different actions depending on the state, and it does not matter how the state is passed to the translator: though the data stack, through a separate stack intended for this purpose, or though an internal variable. The state does not matter in only one case: if the translator shall perform the same action regardless of the state.

Moreover, if you pass a parameter that encodes compilation state or interpretation state not through `STATE`, you have to make `STATE` to be in sync with this parameter to guarantee that STATE-dependent words are translated correctly.


,------------------------------------------
| 2024-12-10 15:17:03  ruv  replies:
| requestClarification - The case of undefined interpretation semantics
| see: https://forth-standard.org/standard/tools/BracketDEFINED#reply-1410
`------------------------------------------
The initial problem in `[defined]` was fixed by the proposal [Remove the “rules of FIND”](https://forth-standard.org/proposals/remove-the-rules-of-find-?hideDiff#reply-900). Close.


,------------------------------------------
| 2024-12-10 15:17:09  ruv  replies:
| proposal - Remove the “rules of FIND”
| see: https://forth-standard.org/proposals/remove-the-rules-of-find-#reply-1411
`------------------------------------------
[5.6.2.2534 `[UNDEFINED]`](https://forth-standard.org/standard/tools/BracketUNDEFINED) should also be updated according to the new wording in `[DEFINED]`.


,------------------------------------------
| 2024-12-15 22:00:00  BerndPaysan  replies:
| proposal - minimalistic core API for recognizers
| see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1412
`------------------------------------------
# Minimalistic Recognizer API

## Author:

Bernd Paysan

## Change Log:

* 2020-09-06 initial version
* 2020-09-08 taking ruv's approach and vocabulary at translators
* 2020-09-08 replace the remaining rectypes with translators
* 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
* 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
* 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
* 2022-09-10 More complete reference implementation
* 2022-09-10 Add use of extended words in reference implementation
* 2022-09-10 Typo fixed
* 2022-09-12 Fix for search order reference implementation
* 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
* 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
* 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
* 2023-09-13 Make clear that `TRANSLATE:` is the only way to define a standard-conforming translator.
* 2023-09-15 Add list of example recognizers and their names.
* 2024-12-15 Take comments after freezing the proposal into account

## Problem

The Forth compiler can be extended easily. The Forth interpreter however has a
fixed set of capabilities as outlined in section 3.4 of the standard text:
Words from the dictionary and some number formats.

It's not possible to use the Forth text interpreter in an application or
system extension context. Most interpreters in existing systems use a number
of hooks to extent the interpreter.  That makes it possible to use a loadable
library to implement new data types to be handled like the built-in ones. An
example are the floating point numbers. They have their own parsing and data
handling words including a stack of their own.

Furthermore applications need to use system provided and system specific
words or have to re-invent the wheel to get numbers with a sign or
hex numbers with the $ prefix. The building blocks (`FIND`, `COMPILE,`,
`>NUMBER` etc) are available but there is a gap between them and what
the Forth interpreter already does.

The Forth interpreter is stateful, but the API should avoid the problems of
the `STATE` variable.  In particular, an implementation without `STATE` should
be possible, and there is only one place where the stateful dispatch is
necessary.

## Solution

The monolithic design of the Forth interpreter is factored into three major
blocks:

  1. The interpreter. It extracts sub-strings (lexemes) from `SOURCE`, hands
    them over to the data parsing and processes the results.

  2. The actual data parsing. It analyses lexemes whether they match the
    criteria for a certain token type.  These words, called recognizers, can
    be grouped to achieve an order of invocation.

  3. The result of the recognizer, a translator and associated data, is handed
    over to the interpreter.

There is no strict 1:1 relation between a recognizer and the returned
translator.  A translator for e.g. single cell numbers can be used by
different recognizers, a recognizer can return different translators
(e.g. single and double cell numbers).

Whenever the Forth text interpreter is mentioned, the standard
words `EVALUATE` (CORE), `'` (tick, CORE), `INCLUDE-FILE`
(FILE), `INCLUDED` (FILE), `LOAD` (BLOCK) and `THRU` (BLOCK)
are expected to act likewise. This proposal is not about to change
these words, but to provide the tools to do so. As long as the
standard feature set is used, a complete replacement with
recognizers is possible.

Important changes to the Matthias Trute proposal:

  * Make the translators executable to dispatch according to the state (interpreting, compiling, postponing) themselves
  * Use dedicated invocation methods to call a translator for a particular state
  * Make the recognizer sequence executable with the same effect as a recognizer
  * Make sure the API is not mandating any particular implementation

The core principle is that the recognizer is not aware of state, and the
returned translator is.  If you have for some reason legacy code that looks
like

    : recognize-xt ( addr u -- translator-stub | 0 )
      here place  here find dup IF
          0< state @ and  IF  compile,  ELSE  execute  THEN  ['] noop
      THEN ;

then you should factor the part starting with `STATE @` out and return it as
translator:

    : translate-xt ( xt flag -- )
      0< state @ and  IF  compile,  ELSE  execute  THEN ;
    : recognize-xt ( addr u -- ... translator | 0 )
      here place  here find dup IF  [']  translate-xt  THEN ;

In a second step, you need to remove the `STATE @` entirely and use
`TRANSLATE:`.  If you don't know what to do on postpone in this stage,
use `-48 throw`, otherwise define a postpone action:

    :noname ( xt flag -- ) drop execute ;
    :noname ( xt flag -- ) 0< IF  compile,  ELSE  execute  THEN ;
    :noname ( xt flag -- ) 0< IF  postpone literal postpone compile,  ELSE  compile,  THEN ;
    translate: translate-xt

## Typical use

The standard interpreter loop should look like this:

    : interpret ( i*x -- j*x )
      BEGIN  parse-name dup  WHILE  forth-recognize ?found execute  REPEAT
      2drop ;

with the usual additions to check e.g. for empty stacks and such.

Operating a recognizer in a particular state, e.g. to postpone a single word,
do

    : postpone ( "name" -- )
      parse-name forth-recognize ?found postponing ; immediate

to optain an xt for a name, use something like that:

    : ' ( "name" -- xt )
      parse-name forth-recognize ?found
      ['] translate-nt <> #-32 and throw
      name>interpret ;

## Proposal:

# XY. The optional Recognizer Wordset #

# XY.1 Introduction #

Recognizers have the form

`REC-`*SOMETYPE* ( addr len -- i\*x j\*r translate-xt | 0/NOTFOUND )

A recognizer takes the string *addr len* of a lexeme and on success returns a
translator *translate-xt* and additional data on the data and floating point
stack.

### [IF] NOTFOUND=0 ###

If it fails, it returns 0.

### [ELSE] NOTFOUND=xt ###

If it fails, it returns the xt of `NOTFOUND`.

For clarity, unless this issue is decided, the non-success return value of a
recognizer is notated as 0/NOTFOUND.  The reference implementation uses the
option 0.

### [THEN] notfound ###

### [IF] side-effect ###

A recognizer shall not have a side effect.

Rationale: Side effects are supposed to all happen inside the translators.
This promise allows to try recognize something and fail if the result is not
desired without having to roll back unkown changes.  Examples: The tick and to
recognizer pass a substring of the to be translated string to
`FORTH-RECOGNIZE`, and fail if the result is not a name type.

### [THEN] side-effect ###

# XY.3 Additional usage requirements

## XY.3.1 Translator

**translator:** named subtype of xt, and executes with the following stack
effect:

*name* ( j\*x i\*x -- k\*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

*i\*x* is the additional information provided by the recognizer, *j\*x* and *k\*x* are the stack inputs and outputs of interpreting/compiling or postponing the recognized lexeme.

# XY.6 Glossary

## XY.6.1 Recognizer Words

**FORTH-RECOGNIZE** ( addr len -- i\*x translator-xt | 0/NOTFOUND ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and
additional information if successful, or 0/NOTFOUND if not.

### [IF] defer

`FORTH-RECOGNIZE` is a deferred word.  Changing the system recognizer can be
done with `IS FORTH-RECOGNIZE`, obtaining the system recognizer with
`ACTION-OF FORTH-RECOGNIZE`.

Rationale: use existing API to change it; most simple system have this
available, and advanced systems have capabilities to work around limitations.

### [ELSE] setter and getter ###

**SET-FORTH-RECOGNIZE** ( xt -- ) RECOGNIZER EXT

Assign the recognizer *xt* to FORTH-RECOGNIZE.

**FORTH-RECOGNIZER** ( -- xt ) RECOGNIZER EXT

Obtain the recognizer *xt* that is assigned to `FORTH-RECOGNIZE`.

Rationale: not sufficiently advanced systems can work around the limitations
of `IS` and `ACTION-OF` better with this API.

### [THEN]

**TRANSLATE:** ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER

Create a translator word under the name "name".  This word is the only
standard way to define a general purpose translator.

"name:" ( j\*x i\*x -- k\*x ) performs xt-int in interpretation, xt-comp in
compilation and xt-post in postpone state using a system-specific way to
determine the current state.

Rationale: The by far most common usage of translators is inside the outer
interpreter, and this default mode of operation is called by `EXECUTE` to keep
the API small.  You can not simply set `STATE`, use `EXECUTE` and afterwards
restore `STATE` to perform interpretation or compilation semantics, because
words can change `STATE`, so you need the words `INTERPRETING` and `COMPILING`
defined below.  This problem does not apply to `POSTPONING`, so systems that
only want to implement direct access to `POSTPONE` mode can get away without
`TRANSLATE:`.

### [IF] NOTFOUND=0 ###

**?FOUND** ( translator-xt -- translator-xt  |  0 -- never ) RECOGNIZER

Check if the recognizer was successful, and if not, perform a `-13 THROW` or
display an appropriate error message if the exception wordset is not present.

### [THEN] NOTFOUND=0 ###

## XY.6.2 Recognizer Extension Words

### [IF] NOTFOUND=0 ###

**?NOTFOUND** ( translator-xt -- translator-xt  |  0 -- addr u notfound-xt )

Check if the recognizer was successful.  If not, replace the 0 result with the
*addr u* of the last scanned lexeme, and put the xt of the `NOTFOUND`
translator on top of the stack.

**NOTFOUND** ( -- never ) RECOGNIZER

Translator for unsuccessful recognizers: perform a `-13 THROW`.

### [THEN] NOTFOUND=0 ###

**POSTPONE** ( "<spaces>lexeme" -- ) RECOGNIZER

Compilation: recognize *lexeme*.  On success, perform the postpone action of
the returned translator, otherwise `-13 THROW` or display the appropriate
error message if the exception wordset is not present.

**RECOGNIZER-SEQUENCE:** ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when
executed, tries to recognize strings starting with *xtn* on stack and
proceeding towards *xt1* until successful.

**SET-RECOGNIZER-SEQUENCE** ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of *xt-seq* to xt1 .. xtn.

**GET-RECOGNIZER-SEQUENCE** ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence from *xt-seq* as *xt1 .. xtn n*.

**TANSLATE-NT** ( j\*x nt -- k\*x ) RECOGNIZER EXT

Translates a name token:

Interpretation: perform the interpretation semantics of the word

Compilation: perform the compilation semantics of the word

Postpone: append the compilation semantics above to the current definition

**REC-NT** ( addr u -- nt translate-nt | 0/NOTFOUND ) RECOGNIZER EXT

Search the dictionary for the string *addr u*.  If successful, return the *nt*
and the xt of `TRANSLATE-NT`.  If the search fails, return 0/NOTFOUND.

**TRANSLATE-NUM** ( x -- x | ) RECOGNIZER EXT

Translates a number:

Interpretation: keep the number on the stack

Compilation: Append the run-time defined in `LITERAL` to the current definition

Postpone: Append the compilation semantics above to the current definition

**TRANSLATE-DNUM** ( x1 x2 -- x1 x2 | ) RECOGNIZER EXT

Translates a double number:

Interpretation: keep the numbers on the stack

Compilation: Append the run-time defined in `2LITERAL` to the current definition

Postpone: Append the compilation semantics above to the current definition

**REC-NUM** ( addr u -- x translate-num | xd translate-dnum | 0/NOTFOUND ) RECOGNIZER EXT

Convert *addr u* to a number *x* and the xt of `TRANSLATE-NUM` as specified in
3.4.1.3 or a double number *xd* and the xt of `TRANSLATE-DNUM` as
specified in 8.3.1 if the double number wordset is available.  If the
conversion fails, return 0/NOTFOUND.

**TRANSLATE-FLOAT** ( r -- r | ) RECOGNIZER EXT

Translates a floating point number:

Interpretation: Keep *r* on the stack

Compilation: Append the run-time defined in `FLITERAL` to the current definition

Postpone: Append the compilation semantics above to the current definition

**REC-FLOAT** ( addr u -- r translate-float | 0/NOTFOUND ) RECOGNIZER EXT

Convdert *addr u* to a number *r* specified in 12.3.7 if the float wordset is
availabe; if the conversion fails, return 0/NOTFOUND.

**SCAN-TRANSLATE-STRING** ( addr1 u1 string-rest<"> -- addr2 u2 | ) RECOGNIZER EXT

Complete parsing a string: *addr1 u1* consists of the starting quote and
additional characters up to the first space in the string.  *addr2 u2*
consists of the entire string without the starting quote up to (but not
including) the final quote, and translated the escape sequences according to
the rules of `S\\"`.  `>IN` is modified appropriately, and points just after
the final quote.  If there's no final quote in the current line, `REFILL` can
be used to read in more lines, adding corresponding newlines into the string.
The final quote can be inside *addr1 u1*, setting `>IN` backwards in that
case.

Translate the string:

Interpretation: keep the string on the stack

Compilation: Append the run-time defined in `SLITERAL` to the current definition

Postpone: Append the compilation semantics stated above to the current definition

** TRANSLATE-STRING** ( addr1 u1 -- addr1 u1 | ) RECOGNIZER EXT

Translate the string:

Interpretation: keep the string on the stack

Compilation: Append the run-time defined in `SLITERAL` to the current definition

Postpone: Append the compilation semantics stated above to the current definition

**?SCAN-STRING** ( addr1 u1 scan-translate-string string-rest<"> -- addr2 u2 translate-string  | ... translator -- ... translator ) RECOGNIZER

If the recognized token is an incompleted string, complete the scanning as
defined for `SCAN-TRANSLATE-STRING` and replace the translator with the xt of
`TRANSLATE-STRING`.

**REC-STRING** ( addr u -- addr u translate-string | 0/NOTFOUND ) RECOGNIZER EXT

Check if *addr u* starts with a quote, and return that string and the xt of
`SCAN-TRANSLATE-STRING` if it does, 0/NOTFOUND otherwise.

### [IF] Optional API for direct access of translator states ###

**INTERPRETING** ( j\*x xt -- k\*x ) RECOGNIZER EXT

Execute *xt-int* of the translator *xt*.  If *xt* is not a translator, do `-21
THROW`, or a best-effort attempt to execute *xt* in interpreting state.

**COMPILING** ( j\*x xt -- ) RECOGNIZER EXT

Execute *xt-comp* of the translator *xt*.  If *xt* is not a translator, do `-21
THROW`, or a best-effort attempt to execute *xt* in compiling state.

**POSTPONING** ( j\*x xt -- ) RECOGNIZER EXT

Execute *xt-post* of the translator *xt*.  If *xt* is not a translator, do `-21
THROW`, or a best-effort attempt to execute *xt* in postponing state.

**GET-STATE** ( -- xt ) RECOGNIZER EXT

Obtain the operation *xt* performed when translating.

**SET-STATE** ( xt -- ) RECOGNIZER EXT

Makes *xt* the operation performed when translating. If *xt* is not related to
`' INTERPRETING`, `' COMPILING`, or `' POSTPONING`, do `-12 THROW`.

### [THEN] optional API for direct access of translator states ###

**]]** ( -- ) RECOGNIZER EXT

Interpretation semantics: undefined

Compilation semantics: Set the system into postpone state.  The interpreter
will then perform *post-xt* of all translators found.  Compilation state
resumes when `[[` is recognized.  This word may change `STATE` and the
recognizer sequence to reflect the change of this state.

**[[** ( -- ) RECOGNIZER EXT

Interpretation semantics: undefined

Compilaton semantics: undefined

Postpone semantics: enter compilation state, see `]`; all changes to `STATE`
and recognizer sequence done by `]]` are reverted.

Note: `[[` needs special treatment in postpone mode, so it might also use a
non-standard translator and be not a word at all.

**STATE** ( -- addr ) RECOGNIZER

If `]]` uses `STATE` to store postpone state, extends the semantics of
6.1.2250 by adding a second non-zero value.  `]]` enters this state, and `[[`
leaves it.  Only translators and the code responsible for displaying the
prompt can see this third state, as all other words are postponed in this
state.

## Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system,
that handles only words and single numbers without base prefix.  This
implementation does only take interpret and compile state into account, and
uses the STATE variable to distinguish.  It uses NOTFOUND=0.

    Defer forth-recognize ( addr u -- i*x translator-xt / 0 )
    : ?found ( translator -- translator  |  0 -- never )
      dup 0= IF  -13 throw  THEN ;
    : interpret ( i*x -- j*x )
      BEGIN
          parse-name dup  WHILE
          forth-recognize ?found execute
      REPEAT ;
    : translate: ( xt-interpret xt-compile xt-postpone "name" -- )
      create , , ,
      does> state @ 2 + cells + @ execute ;

An alternative implementation for `TRANSLATE:` can use a deferred word:

    Defer do-translate
    : translate: ( xt-interpret xt-compile xt-postpone "name" -- )
      create , , , does> do-translate ;
    : set-state ( xt -- ) dup is do-translate  >body @ 2 - state ! ;
    : get-state ( -- xt ) action-of do-translate ;

## Extensions reference implementation:

    : ]] -2 state ! ; immediate
    : [[ -1 state ! ; immediate
    :noname name>interpret execute ;
    :noname name>compile execute ;
    :noname dup name>interpret ['] [[ =
      IF    name>interpret execute \ special case
      ELSE  name>compile swap lit, compile,  THEN ;
    translate: translate-nt ( nt -- )
    : lit,  ( n -- )  postpone literal ;
    ' noop
    ' lit,
    :noname lit, postpone lit, ;
    translate: translate-num ( n -- )

    : rec-nt ( addr u -- nt nt-translator | 0 )
      forth-wordlist find-name-in dup IF  ['] translate-nt  THEN ;
    : rec-num ( addr u -- n num-translator | 0 )
      0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop 0  THEN ;

    : minimal-recognize ( addr u -- nt nt-translator | n num-translator | 0 )
      2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

    ' minimal-recognizer is forth-recognize

    : translate-method: ( n -- )
      Create , DOES> @ cells + >body @ execute ;
    0 translate-method: postponing
    1 translate-method: compiling
    2 translate-method: interpreting

    : set-state ( xt -- )
      >body @ 2 - state ! ;
    : get-state ( -- xt )
      case state @
          0  of ['] interpreting  endof
          -1 of ['] compiling     endof
          -2 of ['] postponing    endof
	  -11 throw
      endcase ;

    : postpone ( "name" -- )
      parse-name forth-recognize ?found postponing ; immediate

This reference implementation uses a table dispatch only.  Note that this can
give surprising results when you directly apply a particular state, and one of
the words executed (translator or nt/xt found) is a state-smart word.  If you
want to use combined translators, like

   : translate-dnum ( d -- )  >r translate-num r> translate-num ;

you can't do it like this.  Neither does this work if you execute state-smart
words, as they expect `STATE` to be set accordingly.  Instead, you'll use
something like

   : translate-method: ( n -- )
     Create , DOES> @ dup state @ = IF  drop execute  EXIT  THEN
     state @ >r state ! execute r> state ! ;

This will definitely work for combined literal translators, because those
don't change state anyways.

This will also work for `POSTPONE`, because apart from the tranlator, no word
is actually executed in one-shot `POSTPONE`, and therefore, no state change is
possible.

This will also work for `[` and `]` (and words using them) while interpreting
and compiling, because if you are already in the state from which the state is
changed away, you will not restore the state.  If you are in the state this
will change to, this will work, too, because the state is restored after
`EXECUTE`.  This will not work if you are interpreting, and you do a `s" ]]"
forth-recognize ?found compiling`, because that transitions to postponing, and
then is reverted to interpreting.

### [IF] setter and getter

    : set-forth-recognize ( xt -- )
      is forth-recognize ;
    : forth-recognizer ( -- xt )
      action-of forth-recognize ;

### [THEN] setter and getter

### Stack library

    : STACK: ( size "name" -- )
      CREATE 0 , CELLS ALLOT ;

    : SET-STACK ( item-n .. item-1 n stack-id -- )
      2DUP ! CELL+ SWAP CELLS BOUNDS
      ?DO I ! CELL +LOOP ;

    : GET-STACK ( stack-id -- item-n .. item-1 n )
      DUP @ >R R@ CELLS + R@ BEGIN
        ?DUP
      WHILE
        1- OVER @ ROT CELL - ROT
      REPEAT
      DROP R> ;

### Recognizer sequences

    : recognize ( addr len rec-seq-id -- i*x translator-xt | 0 )
      DUP >R @
      BEGIN
        DUP
      WHILE
        DUP CELLS R@ + @
        2OVER 2>R SWAP 1- >R
        EXECUTE DUP IF
          2R> 2DROP 2R> 2DROP EXIT
        THEN
        DROP R> 2R> ROT
      REPEAT
      DROP 2DROP R> DROP 0
    ;
    #10 Constant min-sequence#
    : recognizer-sequence: ( rec1 .. recn n "name" -- )
      min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
      DOES>  recognize ;
    : ?defer@ ( xt1 -- xt2 )
      BEGIN dup is-defer? WHILE  defer@  REPEAT ;
    : set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- )
      ?defer@ >body set-stack ;
    : get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n )
      ?defer@ >body get-stack ;

Once you have recognizer sequences, define

    ' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
    ' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order.  In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

    : find-name-in ( addr u wid -- nt / 0 )
      execute dup IF  drop  THEN ;
    root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
    : find-name ( addr u -- nt / 0 )
      ['] search-order find-name-in ;
    : get-order ( -- wid1 .. widn n )
      ['] search-order get-recognizer-sequence ;
    : set-order ( wid1 .. widn n -- )
      ['] search-order set-recognizer-sequence ;

### Recognizer examples

Apart from the standardized recognizers above, here are some more examples of
recognizers:

**REC-TICK** ( addr u -- xt translate-num | 0/NOTFOUND ) If *addr u* starts with a `\`` (backtick), search the search order for the name specified by the rest of the string, and if found, return its *xt* and *translate-num*.

**REC-SCOPE** ( addr u -- nt translate-nt | 0/NOTFOUND ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string *addr u* has the form *vocabulary*`:`*name*, otherwise than that this specifies the vocabulary to be searched in, `REC-SCOPE` is identical in effect to `REC-NT`.

**REC-TO** ( addr u -- xt n translate-to | 0/NOTFOUND ) Handle the following syntax of `TO`-like operations of value-like words:

  * `->`*name* as `TO `*name*
  * `=>`*name* as `IS `*name*
  * `+>`*name* as `+TO `*name*
  * `'>`*name* as `ADDR `*name*
  * `@>`*name* as `ACTION-OF `*name*

*xt* is the execution token of the value found, *n* indexes which variant of a `TO`-like operation is meant, and *translate-to* is the corresponding translator.

**REC-ENV** ( addr u -- addr1 u1 translate-env | 0/NOTFOUND ) Takes a pattern in the form of `${`*name*`}` and provides the *name* as *addr1 u1* on the stack.  The corresponding translator `TRANSLATE-ENV` is responsible for looking up that name in the operating system's environment variable array, or compiling appropriate code to do so.

**REC-COMPLEX** ( addr u -- rr ri translate-complex | 0/NOTFOUND ) Converts a pair of floating point numbers in the form of *float1*`+`float2`i` into a complex number on the stack, and returns the xt of `TRANSLATE-COMPLEX` on success.

## Testing

```
T{ 0 recognizer-sequence: RS -> }T

T{ :noname 1 ;  :noname 2 ;  :noname 3  ; translate: translate-1 -> }T
T{ :noname 10 ; :noname 20 ; :noname 30 ; translate: translate-2 -> }T

\ really stupid: 1 character length or 2 characters
T{ : rec-1 NIP 1 = IF ['] translate-1 ELSE 0 THEN ; -> }T
T{ : rec-2 NIP 2 = IF ['] translate-2 ELSE 0 THEN ; -> }T

T{ ' translate-1 interpreting  -> 1 }T
T{ ' translate-1 compiling     -> 2 }T
T{ ' translate-1 postponing    -> 3 }T

\ set and get methods
T{ 0 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> 0 }T

T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 1 }T

T{ ' rec-1 ' rec-2 2 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 ' rec-2 2 }T

\ testing RECOGNIZE
T{         0 ' RS set-recognizer-sequence -> }T
T{ S" 1"     RS   -> 0 }T
T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ S" 1"     RS   -> ' translate-1 }T
T{ S" 10"    RS   -> 0 }T
T{ ' rec-2 ' rec-1 2 ' RS set-recognizer-sequence -> }T
T{ S" 10"    RS   -> ' translate-2 }T
```