Digest #290 2024-12-31

Contributions

[372] 2024-12-30 17:37:44 ruv wrote:

requestClarification - May `CODE` be a parsing word?

15.6.2.0930 CODE says:

Those characters are processed in an implementation-defined manner, generating the corresponding machine code. The process continues, refilling the input buffer as needed, until an implementation-defined ending sequence is processed.

Does this imply that code actively parses the input source?

If it actively parses, is it standard-compliant?

Replies

[r1396] 2024-12-06 16:07:14 JimPeterson replies:

requestClarification - Return Stack Notation Mildly Inaccurate

I understand what you're saying, but in terms of conveying information to the reader, I think some indication that there may not be loop control parameters to UNLOOP, presented in the stack notation, may be of use. I know that the text below it says as much, but ?DO having the same stack notation as DO just feels wrong or misleading.

I know, from a machine's perspective, the notation is technically correct, but I feel like this documentation is being written for humans, and they often require (or at least benefit from) a little more hand-holding. The change I suggest is also technically correct but more informative.


[r1397] 2024-12-07 07:22:49 AntonErtl replies:

proposal - minimalistic core API for recognizers

@BerndPaysan:

If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with state. And text interpreters that use xt-post can be written using the proposed wordset rather than having to use a detour through postpone (which is a parsing word, possibly introducing additional complications).

The following is also relevant to @ruv:

Ruv's colorforth-bw implementation demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing recognize-colorforth-bw; instead, it reimplements everything that the name recognizer and the number recognizer already do internally, nicely demonstrating that the present proposal buries the tools. And it only implements dealing with names and single-cell numbers. Finally, the implementation is so long (44 lines without putting it into forth-recognize) that you have not shown it inline, but posted a link to github.

By contrast, let's take much of the proposal from [r1081], but replace the state-dependent translators with the state-independent rectypes of [160]. With such a proposal, colorforth-bw might look as follows (untested):

defer recognizer1 forth-recognizer is recognizer1

: prefix>index ( c -- n )
  case
    '[' of  0 endof
    '_' of -1 endof
    ']' of -2 endof
    1 swap
  endcase ;
  
: rectype-colorforth-bw ( ... rectype index state -- ... )
  drop \ we use index, not the surrounding Forth interpreter's state
  swap execute ;

: recognize-colorforth-bw ( c-addr u -- )
  dup 0= if 2drop ['] notfound exit then
  over c@ prefix>index dup 0 > if 2drop drop ['] notfound exit then
  >r 1 /string recognizer1 r> ['] rectype-colorforth-bw ;

' recognize-colorforth-bw set-forth-recognize

This has only 20 lines (vs. 44), and it uses all the recognizers originally present in forth-recognizer (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [160] over the translators from [r1081].

BTW, I find the presence of both forth-recognize and forth-recognizer confusing, and would prefer to define forth-recognize as deferred word. If you have to have getters and setters, call the getter get-forth-recognize.

In this approach, why do you need to write «[postpone _foo» instead of «]foo» ?

Nobody is suggesting that. But you need to perform xt-post in order to implement ]foo. In your implementation, you do it by reimplementing xt-post for the two recognizers you implement internally to recognize-colorforh-bw. If you would use a detour through postpone instead, you would use the xt-post invoked in that way. And in my implementation above, xt-post is invoked directly.


[r1398] 2024-12-07 07:29:52 AntonErtl replies:

requestClarification - Return Stack Notation Mildly Inaccurate

You are correct, but originally failed to get your point across to me, and apparently to ruv, who addressed the issue that a loop-sys may have 0 items on the return stack. But yes, if n1|u1 = n2|u2, no loop-sys is pushed by the run-time semantics of ?do.


[r1399] 2024-12-07 08:36:05 ruv replies:

requestClarification - Return Stack Notation Mildly Inaccurate

JimPeterson, I now see what you mean. My argument about size of loop-sys is irrelevant.

that there may not be loop control parameters to UNLOOP

This is impossible. UNLOOP can only be used in a loop body. And if ?DO Run-time semantics do not place loop-sys, the loop body does not gain control.

  program1 ( S: u.limit u.initial ) ( R: 0*x ) ?DO ( S: 0*x ) ( R: loop-sys ) program2 ( R: loop-sys ) LOOP ( R: 0*x )

By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Take a look:

program1 ( S: param-type.1 ) program2 ( S: param-type.2 ) program3

When we specify a stack diagram for program2, which is ( param-type.1 -- param-type.2 ), we indicate:

  • the input parameter type for program2, which should be provided by program1,
  • the output parameter type of program2, which is available for program3,
  • we can indicate a case when program3 does not gain control, but we have to indicate in prose who will gain control and what parameters will be passed.

For example:

: ?return-true ( -- ) ]] if true exit then [[ ; immediate
  • A correct stack diagram for ?return-true Run-time semantics:
    • ( x -- )
  • A more narrow stack diagram for ?return-true Run-time semantics:
    • ( 0 -- | x\0 -- true never )
    • But this diagram does not tell us where the output parameter true is available (if any).

See also my post about the never data type.


NB: ( n1 | u1 n2 | u2 -- ) is incorrect due to excessive spaces, it's an editorial issue.


[r1400] 2024-12-07 15:46:16 ruv replies:

requestClarification - Return Stack Notation Mildly Inaccurate

  • A correct stack diagram for ?return-true Run-time semantics:
    • ( x -- )

It could be unclear why this diagram is correct if the ?return-true Run-time semantics do not return control in some cases.

The answer is that the data type never is a subtype of any other data type, and it is a subtype of 0*x.

( x -- )( x -- 0*x )( x -- 0*x|never )( x -- | x -- never )

A stack diagram by itself does not guarantee that a word returns control. It only guarantees that if the word returns control, it returns a parameter of specified type.


[r1401] 2024-12-08 10:46:00 AntonErtl replies:

requestClarification - Return Stack Notation Mildly Inaccurate

By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Counterexample: THROW


[r1402] 2024-12-08 18:25:20 ruv replies:

requestClarification - Return Stack Notation Mildly Inaccurate

Anton, thank you for this example. Yes, I'm wrong that it "would be incorrect". Formally, it is still correct. Because specifying a wider data type than possible does not introduce contradictions. But it does introduce confusing. I think that data types should be specified as narrow as possible.

The stack diagram ( k*x n -- k*x | i*x n ) is justified only if neither type of ( i*x n ) and ( k*x ) is a subtype of the other. This holds for the word throw because ( i*x n ) is never returned to the caller.

But specifying two data types in the union so that the members of one are sometimes returned and the members of the other are never returned is useless and just confusing.

A better diagram for throw is ( k*x 0 -- k*x | k*x n1\0 -- i*x n1 never ). This diagram explicitly says that the output parameter of the type ( i*x n1 ) (where the value that corresponds to n1 is the same in the input and in the output parameter) is not available to the caller.

A better diagram for ?do Run-time semantics is ( n1 n2 -- ; R: -- loop-sys ; | u1 u2 -- ; R: -- loop-sys ; | x1 x1 -- never ; ). This diagram specifies:

  • an ambiguous condition exists if the input parameter has type neither ( n n ) nor ( u u ),
  • if the first input parameter and the second input parameter are identical, then the loop body may not gain control.

[r1403] 2024-12-08 19:17:59 ruv replies:

requestClarification - Return Stack Notation Mildly Inaccurate

By "a wider data type than possible" I mean a data type that has more members than another suitable data type. How best to phrase that?


[r1404] 2024-12-08 20:45:55 ruv replies:

requestClarification - Return Stack Notation Mildly Inaccurate

The stack diagram ( k*x n -- k*x | i*x n ) is justified only if neither type of ( i*x n ) and ( k*x ) is a subtype of the other.

Actually, this always holds.

( k*x n -- k*x | i*x n )( k*x n -- k*x | k*x n -- i*x n )

To prove that neither type from the union is a subtype of the other, it is enough to give two examples, one of which is a member of only ( k*x n -- k*x ) (in the union), and the other is a member of only ( k*x n -- i*x n ) (in the union).

Here are such examples:

  • the mapping ( 0 ↦ ) is a member of only ( k*x n -- k*x ) (in the union)
  • the mapping ( 1 ↦ 1 ) is a member of only ( k*x n -- i*x n ) (in the union)

There are also members that belongs to both. For example, the following mappings:

  • ( 0 0 ↦ 0 )
  • ( 1 1 ↦ 1 )
  • ( 123 0 0 ↦ 123 0 )

[r1405] 2024-12-08 22:59:20 ruv replies:

proposal - minimalistic core API for recognizers

@AntonErtl writes:

Ruv's colorforth-bw implementation demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing recognize-colorforth-bw; instead, it reimplements everything that the name recognizer and the number recognizer already do internally,

It's wrong. Have a look at L18-L19:

  \ Reuse a recognizer for numbers
  ['] recognize-number-n-prefixed apply-recognizer-cf dup 0= if exit then

It uses the recognizer for numbers. And it uses find-name instead of the recognizer for names (Forth words) just because it's simpler in this case. It does not reuse token translators.

And it only implements dealing with names and single-cell numbers.

Because your original example implemented only that. And I just rewrote your original example.

Finally, the implementation is so long (44 lines without putting it into forth-recognize) that you have not shown it inline, but posted a link to github.

Why count 10 lines of comments at the beginning of the file? Without comments, 31 lines, the same as in your example (lexical size is greater due to nt vs xt, and improvements in the behavior).

By contrast, let's take much of the proposal from [r1081], but replace the state-dependent translators with the state-independent rectypes of [160]. With such a proposal, colorforth-bw might look as follows (untested):

[...]

This has only 20 lines (vs. 44), and it uses all the recognizers originally present in forth-recognizer (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [160] over the translators from [r1081].

(I corrected the [r1081] link in the citation above)

This comparison is incorrect. Below is an implementation against the latest API version (except compile-postpone-qtoken that is a variation of discussed postpone-qtoken, which should be either present or implementable in any variant of API):

: cf-prefix>tt? ( c -- tt true | c false )
  case
    '[' of ['] execute-interpreting endof
    '_' of ['] execute-compiling endof
    ']' of ['] compile-postpone-qtoken endof
    0 exit
  endcase true
;

defer recognize-default  perceptor is recognize-default

: recognize-colorforth-bw ( sd.lexeme -- qt|0 )
  dup 0= if nip exit then
  over c@ cf-prefix>tt? 0= if drop 2drop 0 exit then
  >r 1 /string recognize-default dup if r> exit then rdrop
;

16 lines.

Can be tested in Gforth too:

gforth index.fth example/recognize-colorforth-bw.fth

:noname cf( _1. _drop _s" foo" ) ; execute s" foo" compare 0=  .s \ prints "1 -1"

[r1406] 2024-12-09 07:07:25 AntonErtl replies:

requestClarification - Return Stack Notation Mildly Inaccurate

It is obvious that the idea behind the stack diagram of throw is to specify what happens on the data stack in both cases, including the case where the control flow does not continue sequentially. And I think it's a good idea to specify the stack effect for that case, and it should also be done for ?do.

Whether the | syntax as used for throw is good enough or whether we should have separate stack diagrams for the two cases is something one might discuss. However, I have not seen confused questions about what the stack effect of throw means, and I would not expect confusion from a similar usage of | for the stack effect of ?do. In both cases the prose makes it clear enough which part of the stack effect diagram corresponds to which case. Actually for the ?do case it has taken 3 decades until someone asked about the lack of a stack-effect diagram for the case when index=limit.


[r1407] 2024-12-09 07:25:04 AntonErtl replies:

proposal - minimalistic core API for recognizers

The latest proposal is [r1081] and it does not contain execute-interpreting, execute-compiling, compile-postpone-qtoken, or perceptor. And that's what we were tasked with discussing and giving feedback on. And that's what I did.


[r1408] 2024-12-09 09:38:01 ruv replies:

proposal - minimalistic core API for recognizers

The latest proposal is [r1081] and it does not contain execute-interpreting, execute-compiling, compile-postpone-qtoken, or perceptor. And that's what we were tasked with discussing and giving feedback on. And that's what I did.

I see, thank you. Actually, [r1081] is outdated, a new version will be prepared soon and then it should be discussed (was noted in the recognizer chat). Nevertheless, my example implementation for recognize-colorforth-bw above is compatible with [r1081] with the following exceptions: it relies on 0 instead of NOTFOUND (you should note how it makes things simpler), and it uses the method compile-postpone-qtoken that appends the compilation semantics of a qualified token to the current definition (this method is missing in [r1081]). The word perceptor is simply a better name than forth-recognizer in [r1081] (I just posted in ForthHub/fep-recognizer a rationale from the chat).

The words execute-interpreting and execute-compiling are general words that are needed anyway to perform interpretation or compilation semantics regardless the initial STATE, they are implemented in the standard Forth as:

: compilation ( comp: true ; S: -- true ; | comp: false ; S: -- false ; )  state @ 0<> ;
: enter-compilation ( comp: false -- true ; S: -- ; | comp: true  ; S: -- ; )  ] ;
: leave-compilation ( comp: true -- false ; S: -- ; | comp: false ; S: -- ; )  postpone [ ;
: execute-interpreting ( i*x xt -- j*x )
  compilation 0= if execute exit then
  leave-compilation execute enter-compilation
;
: execute-compiling ( i*x xt -- j*x )
  compilation if execute exit then
  enter-compilation execute leave-compilation
;

[r1409] 2024-12-09 10:00:24 ruv replies:

proposal - minimalistic core API for recognizers

@AntonErtl writes:

If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with state.

Token translators cannot be written without having to deal with state (possibly indirectly), by the term definition. A token translator shall perform different actions depending on the state, and it does not matter how the state is passed to the translator: though the data stack, through a separate stack intended for this purpose, or though an internal variable. The state does not matter in only one case: if the translator shall perform the same action regardless of the state.

Moreover, if you pass a parameter that encodes compilation state or interpretation state not through STATE, you have to make STATE to be in sync with this parameter to guarantee that STATE-dependent words are translated correctly.


[r1410] 2024-12-10 15:17:03 ruv replies:

requestClarification - The case of undefined interpretation semantics

The initial problem in [defined] was fixed by the proposal Remove the “rules of FIND”. Close.


[r1411] 2024-12-10 15:17:09 ruv replies:

proposal - Remove the “rules of FIND”

5.6.2.2534 [UNDEFINED] should also be updated according to the new wording in [DEFINED].


[r1412] 2024-12-15 22:00:00 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Minimalistic Recognizer API

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
  • 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
  • 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
  • 2023-09-13 Make clear that TRANSLATE: is the only way to define a standard-conforming translator.
  • 2023-09-15 Add list of example recognizers and their names.
  • 2024-12-15 Take comments after freezing the proposal into account

Problem

The Forth compiler can be extended easily. The Forth interpreter however has a fixed set of capabilities as outlined in section 3.4 of the standard text: Words from the dictionary and some number formats.

It's not possible to use the Forth text interpreter in an application or system extension context. Most interpreters in existing systems use a number of hooks to extent the interpreter. That makes it possible to use a loadable library to implement new data types to be handled like the built-in ones. An example are the floating point numbers. They have their own parsing and data handling words including a stack of their own.

Furthermore applications need to use system provided and system specific words or have to re-invent the wheel to get numbers with a sign or hex numbers with the $ prefix. The building blocks (FIND, COMPILE,, >NUMBER etc) are available but there is a gap between them and what the Forth interpreter already does.

The Forth interpreter is stateful, but the API should avoid the problems of the STATE variable. In particular, an implementation without STATE should be possible, and there is only one place where the stateful dispatch is necessary.

Solution

The monolithic design of the Forth interpreter is factored into three major blocks:

  1. The interpreter. It extracts sub-strings (lexemes) from SOURCE, hands them over to the data parsing and processes the results.

  2. The actual data parsing. It analyses lexemes whether they match the criteria for a certain token type. These words, called recognizers, can be grouped to achieve an order of invocation.

  3. The result of the recognizer, a translator and associated data, is handed over to the interpreter.

There is no strict 1:1 relation between a recognizer and the returned translator. A translator for e.g. single cell numbers can be used by different recognizers, a recognizer can return different translators (e.g. single and double cell numbers).

Whenever the Forth text interpreter is mentioned, the standard words EVALUATE (CORE), ' (tick, CORE), INCLUDE-FILE (FILE), INCLUDED (FILE), LOAD (BLOCK) and THRU (BLOCK) are expected to act likewise. This proposal is not about to change these words, but to provide the tools to do so. As long as the standard feature set is used, a complete replacement with recognizers is possible.

Important changes to the Matthias Trute proposal:

  • Make the translators executable to dispatch according to the state (interpreting, compiling, postponing) themselves
  • Use dedicated invocation methods to call a translator for a particular state
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating any particular implementation

The core principle is that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: recognize-xt ( addr u -- translator-stub | 0 )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] noop
  THEN ;

then you should factor the part starting with STATE @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: recognize-xt ( addr u -- ... translator | 0 )
  here place  here find dup IF  [']  translate-xt  THEN ;

In a second step, you need to remove the STATE @ entirely and use TRANSLATE:. If you don't know what to do on postpone in this stage, use -48 throw, otherwise define a postpone action:

:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF  compile,  ELSE  execute  THEN ;
:noname ( xt flag -- ) 0< IF  postpone literal postpone compile,  ELSE  compile,  THEN ;
translate: translate-xt

Typical use

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize ?found execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Operating a recognizer in a particular state, e.g. to postpone a single word, do

: postpone ( "name" -- )
  parse-name forth-recognize ?found postponing ; immediate

to optain an xt for a name, use something like that:

: ' ( "name" -- xt )
  parse-name forth-recognize ?found
  ['] translate-nt <> #-32 and throw
  name>interpret ;

Proposal:

XY. The optional Recognizer Wordset

XY.1 Introduction

Recognizers have the form

REC-SOMETYPE ( addr len -- i*x j*r translate-xt | 0/NOTFOUND )

A recognizer takes the string addr len of a lexeme and on success returns a translator translate-xt and additional data on the data and floating point stack.

[IF] NOTFOUND=0

If it fails, it returns 0.

[ELSE] NOTFOUND=xt

If it fails, it returns the xt of NOTFOUND.

For clarity, unless this issue is decided, the non-success return value of a recognizer is notated as 0/NOTFOUND. The reference implementation uses the option 0.

[THEN] notfound

[IF] side-effect

A recognizer shall not have a side effect.

Rationale: Side effects are supposed to all happen inside the translators. This promise allows to try recognize something and fail if the result is not desired without having to roll back unkown changes. Examples: The tick and to recognizer pass a substring of the to be translated string to FORTH-RECOGNIZE, and fail if the result is not a name type.

[THEN] side-effect

XY.3 Additional usage requirements

XY.3.1 Translator

translator: named subtype of xt, and executes with the following stack effect:

name ( j*x i*x -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the recognized lexeme.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | 0/NOTFOUND ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or 0/NOTFOUND if not.

[IF] defer

FORTH-RECOGNIZE is a deferred word. Changing the system recognizer can be done with IS FORTH-RECOGNIZE, obtaining the system recognizer with ACTION-OF FORTH-RECOGNIZE.

Rationale: use existing API to change it; most simple system have this available, and advanced systems have capabilities to work around limitations.

[ELSE] setter and getter

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale: not sufficiently advanced systems can work around the limitations of IS and ACTION-OF better with this API.

[THEN]

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER

Create a translator word under the name "name". This word is the only standard way to define a general purpose translator.

"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current state.

Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by EXECUTE to keep the API small. You can not simply set STATE, use EXECUTE and afterwards restore STATE to perform interpretation or compilation semantics, because words can change STATE, so you need the words INTERPRETING and COMPILING defined below. This problem does not apply to POSTPONING, so systems that only want to implement direct access to POSTPONE mode can get away without TRANSLATE:.

[IF] NOTFOUND=0

?FOUND ( translator-xt -- translator-xt | 0 -- never ) RECOGNIZER

Check if the recognizer was successful, and if not, perform a -13 THROW or display an appropriate error message if the exception wordset is not present.

[THEN] NOTFOUND=0

XY.6.2 Recognizer Extension Words

[IF] NOTFOUND=0

?NOTFOUND ( translator-xt -- translator-xt | 0 -- addr u notfound-xt )

Check if the recognizer was successful. If not, replace the 0 result with the addr u of the last scanned lexeme, and put the xt of the NOTFOUND translator on top of the stack.

NOTFOUND ( -- never ) RECOGNIZER

Translator for unsuccessful recognizers: perform a -13 THROW.

[THEN] NOTFOUND=0

POSTPONE ( "<spaces>lexeme" -- ) RECOGNIZER

Compilation: recognize lexeme. On success, perform the postpone action of the returned translator, otherwise -13 THROW or display the appropriate error message if the exception wordset is not present.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn on stack and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence from xt-seq as xt1 .. xtn n.

TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT

Translates a name token:

Interpretation: perform the interpretation semantics of the word

Compilation: perform the compilation semantics of the word

Postpone: append the compilation semantics above to the current definition

REC-NT ( addr u -- nt translate-nt | 0/NOTFOUND ) RECOGNIZER EXT

Search the dictionary for the string addr u. If successful, return the nt and the xt of TRANSLATE-NT. If the search fails, return 0/NOTFOUND.

TRANSLATE-NUM ( x -- x | ) RECOGNIZER EXT

Translates a number:

Interpretation: keep the number on the stack

Compilation: Append the run-time defined in LITERAL to the current definition

Postpone: Append the compilation semantics above to the current definition

TRANSLATE-DNUM ( x1 x2 -- x1 x2 | ) RECOGNIZER EXT

Translates a double number:

Interpretation: keep the numbers on the stack

Compilation: Append the run-time defined in 2LITERAL to the current definition

Postpone: Append the compilation semantics above to the current definition

REC-NUM ( addr u -- x translate-num | xd translate-dnum | 0/NOTFOUND ) RECOGNIZER EXT

Convert addr u to a number x and the xt of TRANSLATE-NUM as specified in 3.4.1.3 or a double number xd and the xt of TRANSLATE-DNUM as specified in 8.3.1 if the double number wordset is available. If the conversion fails, return 0/NOTFOUND.

TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT

Translates a floating point number:

Interpretation: Keep r on the stack

Compilation: Append the run-time defined in FLITERAL to the current definition

Postpone: Append the compilation semantics above to the current definition

REC-FLOAT ( addr u -- r translate-float | 0/NOTFOUND ) RECOGNIZER EXT

Convdert addr u to a number r specified in 12.3.7 if the float wordset is availabe; if the conversion fails, return 0/NOTFOUND.

SCAN-TRANSLATE-STRING ( addr1 u1 string-rest<"> -- addr2 u2 | ) RECOGNIZER EXT

Complete parsing a string: addr1 u1 consists of the starting quote and additional characters up to the first space in the string. addr2 u2 consists of the entire string without the starting quote up to (but not including) the final quote, and translated the escape sequences according to the rules of S\\". >IN is modified appropriately, and points just after the final quote. If there's no final quote in the current line, REFILL can be used to read in more lines, adding corresponding newlines into the string. The final quote can be inside addr1 u1, setting >IN backwards in that case.

Translate the string:

Interpretation: keep the string on the stack

Compilation: Append the run-time defined in SLITERAL to the current definition

Postpone: Append the compilation semantics stated above to the current definition

** TRANSLATE-STRING** ( addr1 u1 -- addr1 u1 | ) RECOGNIZER EXT

Translate the string:

Interpretation: keep the string on the stack

Compilation: Append the run-time defined in SLITERAL to the current definition

Postpone: Append the compilation semantics stated above to the current definition

?SCAN-STRING ( addr1 u1 scan-translate-string string-rest<"> -- addr2 u2 translate-string | ... translator -- ... translator ) RECOGNIZER

If the recognized token is an incompleted string, complete the scanning as defined for SCAN-TRANSLATE-STRING and replace the translator with the xt of TRANSLATE-STRING.

REC-STRING ( addr u -- addr u translate-string | 0/NOTFOUND ) RECOGNIZER EXT

Check if addr u starts with a quote, and return that string and the xt of SCAN-TRANSLATE-STRING if it does, 0/NOTFOUND otherwise.

[IF] Optional API for direct access of translator states

INTERPRETING ( j*x xt -- k*x ) RECOGNIZER EXT

Execute xt-int of the translator xt. If xt is not a translator, do -21 THROW, or a best-effort attempt to execute xt in interpreting state.

COMPILING ( j*x xt -- ) RECOGNIZER EXT

Execute xt-comp of the translator xt. If xt is not a translator, do -21 THROW, or a best-effort attempt to execute xt in compiling state.

POSTPONING ( j*x xt -- ) RECOGNIZER EXT

Execute xt-post of the translator xt. If xt is not a translator, do -21 THROW, or a best-effort attempt to execute xt in postponing state.

GET-STATE ( -- xt ) RECOGNIZER EXT

Obtain the operation xt performed when translating.

SET-STATE ( xt -- ) RECOGNIZER EXT

Makes xt the operation performed when translating. If xt is not related to ' INTERPRETING, ' COMPILING, or ' POSTPONING, do -12 THROW.

[THEN] optional API for direct access of translator states

]] ( -- ) RECOGNIZER EXT

Interpretation semantics: undefined

Compilation semantics: Set the system into postpone state. The interpreter will then perform post-xt of all translators found. Compilation state resumes when [[ is recognized. This word may change STATE and the recognizer sequence to reflect the change of this state.

[[ ( -- ) RECOGNIZER EXT

Interpretation semantics: undefined

Compilaton semantics: undefined

Postpone semantics: enter compilation state, see ]; all changes to STATE and recognizer sequence done by ]] are reverted.

Note: [[ needs special treatment in postpone mode, so it might also use a non-standard translator and be not a word at all.

STATE ( -- addr ) RECOGNIZER

If ]] uses STATE to store postpone state, extends the semantics of 6.1.2250 by adding a second non-zero value. ]] enters this state, and [[ leaves it. Only translators and the code responsible for displaying the prompt can see this third state, as all other words are postponed in this state.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish. It uses NOTFOUND=0.

Defer forth-recognize ( addr u -- i*x translator-xt / 0 )
: ?found ( translator -- translator  |  0 -- never )
  dup 0= IF  -13 throw  THEN ;
: interpret ( i*x -- j*x )
  BEGIN
      parse-name dup  WHILE
      forth-recognize ?found execute
  REPEAT ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

An alternative implementation for TRANSLATE: can use a deferred word:

Defer do-translate
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , , does> do-translate ;
: set-state ( xt -- ) dup is do-translate  >body @ 2 - state ! ;
: get-state ( -- xt ) action-of do-translate ;

Extensions reference implementation:

: ]] -2 state ! ; immediate
: [[ -1 state ! ; immediate
:noname name>interpret execute ;
:noname name>compile execute ;
:noname dup name>interpret ['] [[ =
  IF    name>interpret execute \ special case
  ELSE  name>compile swap lit, compile,  THEN ;
translate: translate-nt ( nt -- )
: lit,  ( n -- )  postpone literal ;
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )

: rec-nt ( addr u -- nt nt-translator | 0 )
  forth-wordlist find-name-in dup IF  ['] translate-nt  THEN ;
: rec-num ( addr u -- n num-translator | 0 )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop 0  THEN ;

: minimal-recognize ( addr u -- nt nt-translator | n num-translator | 0 )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

: translate-method: ( n -- )
  Create , DOES> @ cells + >body @ execute ;
0 translate-method: postponing
1 translate-method: compiling
2 translate-method: interpreting

: set-state ( xt -- )
  >body @ 2 - state ! ;
: get-state ( -- xt )
  case state @
      0  of ['] interpreting  endof
      -1 of ['] compiling     endof
      -2 of ['] postponing    endof
  -11 throw
  endcase ;

: postpone ( "name" -- )
  parse-name forth-recognize ?found postponing ; immediate

This reference implementation uses a table dispatch only. Note that this can give surprising results when you directly apply a particular state, and one of the words executed (translator or nt/xt found) is a state-smart word. If you want to use combined translators, like

: translate-dnum ( d -- ) >r translate-num r> translate-num ;

you can't do it like this. Neither does this work if you execute state-smart words, as they expect STATE to be set accordingly. Instead, you'll use something like

: translate-method: ( n -- ) Create , DOES> @ dup state @ = IF drop execute EXIT THEN state @ >r state ! execute r> state ! ;

This will definitely work for combined literal translators, because those don't change state anyways.

This will also work for POSTPONE, because apart from the tranlator, no word is actually executed in one-shot POSTPONE, and therefore, no state change is possible.

This will also work for [ and ] (and words using them) while interpreting and compiling, because if you are already in the state from which the state is changed away, you will not restore the state. If you are in the state this will change to, this will work, too, because the state is restored after EXECUTE. This will not work if you are interpreting, and you do a s" ]]" forth-recognize ?found compiling, because that transitions to postponing, and then is reverted to interpreting.

[IF] setter and getter

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: forth-recognizer ( -- xt )
  action-of forth-recognize ;

[THEN] setter and getter

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | 0 )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP 0
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- )
  ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n )
  ?defer@ >body get-stack ;

Once you have recognizer sequences, define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute dup IF  drop  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Recognizer examples

Apart from the standardized recognizers above, here are some more examples of recognizers:

REC-TICK ( addr u -- xt translate-num | 0/NOTFOUND ) If addr u starts with a ``` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.

REC-SCOPE ( addr u -- nt translate-nt | 0/NOTFOUND ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE is identical in effect to REC-NT.

REC-TO ( addr u -- xt n translate-to | 0/NOTFOUND ) Handle the following syntax of TO-like operations of value-like words:

  • ->name as TO name
  • =>name as IS name
  • +>name as +TO name
  • '>name as ADDR name
  • @>name as ACTION-OF name

xt is the execution token of the value found, n indexes which variant of a TO-like operation is meant, and translate-to is the corresponding translator.

REC-ENV ( addr u -- addr1 u1 translate-env | 0/NOTFOUND ) Takes a pattern in the form of ${name} and provides the name as addr1 u1 on the stack. The corresponding translator TRANSLATE-ENV is responsible for looking up that name in the operating system's environment variable array, or compiling appropriate code to do so.

REC-COMPLEX ( addr u -- rr ri translate-complex | 0/NOTFOUND ) Converts a pair of floating point numbers in the form of float1+float2i into a complex number on the stack, and returns the xt of TRANSLATE-COMPLEX on success.

Testing

T{ 0 recognizer-sequence: RS -> }T

T{ :noname 1 ;  :noname 2 ;  :noname 3  ; translate: translate-1 -> }T
T{ :noname 10 ; :noname 20 ; :noname 30 ; translate: translate-2 -> }T

\ really stupid: 1 character length or 2 characters
T{ : rec-1 NIP 1 = IF ['] translate-1 ELSE 0 THEN ; -> }T
T{ : rec-2 NIP 2 = IF ['] translate-2 ELSE 0 THEN ; -> }T

T{ ' translate-1 interpreting  -> 1 }T
T{ ' translate-1 compiling     -> 2 }T
T{ ' translate-1 postponing    -> 3 }T

\ set and get methods
T{ 0 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> 0 }T

T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 1 }T

T{ ' rec-1 ' rec-2 2 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 ' rec-2 2 }T

\ testing RECOGNIZE
T{         0 ' RS set-recognizer-sequence -> }T
T{ S" 1"     RS   -> 0 }T
T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ S" 1"     RS   -> ' translate-1 }T
T{ S" 10"    RS   -> 0 }T
T{ ' rec-2 ' rec-1 2 ' RS set-recognizer-sequence -> }T
T{ S" 10"    RS   -> ' translate-2 }T