,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2023-09-13 12:09:40 LeonWagner wrote: | proposal - Fix stack comments for N>R and NR> | see: https://forth-standard.org/proposals/fix-stack-comments-for-n-r-and-nr-#contribution-307 `------------------------------------------ ## Author: Leon Wagner ## Change Log 2023-09-13 Initial proposal ## Problem: The stack comments for N>R and NR> don't make it clear that _n_ items are moved between the data and return stacks. ## Solution: The stack comments should more clearly indicate that _n_ data stack items are moved to or from the return stack. ## Proposal: Change the stack comments for 15.6.2.1908 N>R to _`( n*x +n -- ) ( R: -- n*x +n )`_ and 15.6.2.1940 NR> to _`( -- n*x +n ) ( R: n*x +n -- )`_ ,---------. | Replies | `---------´ ,------------------------------------------ | 2023-09-11 20:47:25 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1035 `------------------------------------------ I removed the access words to the xts for a reason: We don't do it that way in Gforth, and we actually found little use of those words. There are (at least) three ways to create an interface to these operations: 1. Field-like, i.e. you can read and write the xts for interpretation/compilation/postpone. The typical usage is ( translator ) ``TRANSLATE-``*\* ``@ EXECUTE``. 2. Valuefield-like, as in the original Trute proposal. You can read the xts for interpretation/compilation/postpone without an extra ``@``, but you can't write them, unless your access word also implements ``TO``. The typical usage is ( translator ) ``TRANSLATE-``*\* ``EXECUTE``. 3. Deferfield-like, which is what Gforth does. Here, not only the ``@`` is part of the operation, but also the ``EXECUTE`` (really a tail-call variant of it). You can neither read nor write the xts, unless the access words also implement ``IS`` and ``ACTION-OF``. The typical usage is ( translator ) ``TRANSLATE-``*\*, and that looks about right. Gforth uses different names to not collide with the proposal here. Gforth offers as an extension to add more states and thus more access words, and that extension also adds ``IS``, ``TO`` (which are synonyms) and ``ACTION-OF`` to the existing (it is only one, only for postpone state you need it explicit) access word, and also implements the other two for interpret and compile, which are never used on their own. Of course when you add a new state, you need to specify what existing translators do on that state, so ``IS`` becomes necessary, and ``ACTION-OF`` just comes for free through Gforth's way of implementing ``TO`` and variants, of which ``ACTION-OF`` is one. This extension is non-standard, and not proposed here, it is used for creating obscured (“tokenized”) source code and reading name=value-style config files. The experience so far is that outside of this extension, there's only one of those three access words needed at all, which is ``TRANSLATE-POSTPONE``, and it is exclusively needed inside the standard word ``POSTPONE`` itself, a word where the implementation is left up to the system anyways. So the usage of these words is extremely limited. Therefore, I deleted them and suggest not to standardize these words, following the “don't speculate” rule and the topic of this proposal to make a minimalistic API, which contains only what's necessary. These words are of little use, and therefore there's no need to standardize them. ,------------------------------------------ | 2023-09-11 20:58:32 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1036 `------------------------------------------ ## Author: Bernd Paysan ## Change Log: * 2020-09-06 initial version * 2020-09-08 taking ruv's approach and vocabulary at translators * 2020-09-08 replace the remaining rectypes with translators * 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion * 2022-09-08 adjust reference implementation to results of last bikeshedding discussion * 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement * 2022-09-10 More complete reference implementation * 2022-09-10 Add use of extended words in reference implementation * 2022-09-10 Typo fixed * 2022-09-12 Fix for search order reference implementation * 2022-09-15 Revert to Trute's table approach to call specific modes deliberately * 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way. * 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM ## Problem: The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal. ## Solution: Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset. Important changes to the original proposal: * Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves * Make the recognizer sequence executable with the same effect as a recognizer * Make sure the API is not mandating a special implementation This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible. The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like : rec-xt ( addr u -- translator ) here place here find dup IF 0< state @ and IF compile, ELSE execute THEN ['] drop ELSE drop ['] notfound THEN ; then you should factor the part starting with state @ out and return it as translator: : translate-xt ( xt flag -- ) 0< state @ and IF compile, ELSE execute THEN ; : rec-xt ( addr u -- ... translator ) here place here find dup IF ['] translate-xt ELSE drop ['] notfound THEN ; In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use ``-48 throw``, otherwise define a postpone action: :noname ( xt flag -- ) drop execute ; :noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ; :noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ; translate: translate-xt The standard interpreter loop should look like this: : interpret ( i*x -- j*x ) BEGIN parse-name dup WHILE forth-recognize execute REPEAT 2drop ; with the usual additions to check e.g. for empty stacks and such. ## Typical use TBD ## Proposal: XY. The optional Recognizer Wordset A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for `NOTFOUND`): REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND ) # XY.3 Additional usage requirements ## XY.3.1 Translator **translator:** subtype of xt, and executes with the following stack effect: *TRANSLATE-THING* ( j\*x i\*x -- k\*x ) A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in. `i*x` is the additional information provided by the recognizer, `j*x` and `k*x` are the stack inputs and outputs of interpreting/compiling or postponing the thing. # XY.6 Glossary ## XY.6.1 Recognizer Words **FORTH-RECOGNIZE** ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or `NOTFOUND` if not. **NOTFOUND** ( -- ) RECOGNIZER Performs `-13 THROW`. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message. **TRANSLATE:** ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch. "name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode. ## XY.6.2 Recognizer Extension Words **SET-FORTH-RECOGNIZE** ( xt -- ) RECOGNIZER EXT Assign the recognizer xt to FORTH-RECOGNIZE. Rationale: FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using `IS FORTH-RECOGNIZE`. **FORTH-RECOGNIZER** ( -- xt ) RECOGNIZER EXT Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE. Rationale: FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using `ACTION-OF FORTH-RECOGNIZE`. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support `TO FORTH-RECOGNIZER`, too. **RECOGNIZER-SEQUENCE:** ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful. **SET-RECOGNIZER-SEQUENCE** ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT Set the recognizer sequence of xt-seq to xt1 .. xtn. **GET-RECOGNIZER-SEQUENCE** ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT Obtain the recognizer sequence xt-seq as xt1 .. xtn n. **TANSLATE-NT** ( j*x nt -- k*x ) RECOGNIZER EXT Translates a name token. **TRANSLATE-NUM** ( j*x x -- k*x ) RECOGNIZER EXT Translates a number. ## Reference implementation: This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish. Defer forth-recognize ( addr u -- i*x translator-xt / notfound ) : interpret ( i*x -- j*x ) BEGIN ?stack parse-name dup WHILE forth-recognize execute REPEAT ; : lit, ( n -- ) postpone literal ; : notfound ( state -- ) -13 throw ; : translate: ( xt-interpret xt-compile xt-postpone "name" -- ) create , , , does> state @ 2 + cells + @ execute ; :noname name>interpret execute ; :noname name>compile execute ; :noname name>compile swap lit, compile, ; translate: translate-nt ( nt -- ) ' noop ' lit, :noname lit, postpone lit, ; translate: translate-num ( n -- ) : rec-nt ( addr u -- nt nt-translator / notfound ) forth-wordlist find-name-in dup IF ['] translate-nt ELSE drop ['] notfound THEN ; : rec-num ( addr u -- n num-translator / notfound ) 0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop ['] notfound THEN ; : minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound ) 2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ; ' minimal-recognizer is forth-recognize ## Extensions reference implementation: : set-forth-recognize ( xt -- ) is forth-recognize ; : forth-recognizer ( -- xt ) action-of forth-recognize ; ### Stack library : STACK: ( size "name" -- ) CREATE 0 , CELLS ALLOT ; : SET-STACK ( item-n .. item-1 n stack-id -- ) 2DUP ! CELL+ SWAP CELLS BOUNDS ?DO I ! CELL +LOOP ; : GET-STACK ( stack-id -- item-n .. item-1 n ) DUP @ >R R@ CELLS + R@ BEGIN ?DUP WHILE 1- OVER @ ROT CELL - ROT REPEAT DROP R> ; ### Recognizer sequences : recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND ) DUP >R @ BEGIN DUP WHILE DUP CELLS R@ + @ 2OVER 2>R SWAP 1- >R EXECUTE DUP ['] NOTFOUND <> IF 2R> 2DROP 2R> 2DROP EXIT THEN DROP R> 2R> ROT REPEAT DROP 2DROP R> DROP ['] NOTFOUND ; #10 Constant min-sequence# : recognizer-sequence: ( rec1 .. recn n "name" -- ) min-sequence# stack: min-sequence# 1+ cells negate here + set-stack DOES> recognize ; : ?defer@ ( xt1 -- xt2 ) BEGIN dup is-defer? WHILE defer@ REPEAT ; : set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ; : get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ; Once you have recognizer sequences, you shall define ' rec-num ' rec-nt 2 recognizer-sequence: default-recognize ' default-recognize is forth-recognize The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator. : find-name-in ( addr u wid -- nt / 0 ) execute ['] notfound = IF 0 THEN ; root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order : find-name ( addr u -- nt / 0 ) ['] search-order find-name-in ; : get-order ( -- wid1 .. widn n ) ['] search-order get-recognizer-sequence ; : set-order ( wid1 .. widn n -- ) ['] search-order set-recognizer-sequence ; ## Testing TBD ,------------------------------------------ | 2023-09-13 04:35:13 AntonErtl replies: | proposal - 2023 Standards meeting agenda (2023-09-13 to 2023-09-15) | see: https://forth-standard.org/proposals/2023-standards-meeting-agenda-2023-09-13-to-2023-09-15-#reply-1037 `------------------------------------------ We probably should also look at the [proposals in state CfV](https://forth-standard.org/contributions/proposal/voting) ,------------------------------------------ | 2023-09-13 05:08:28 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1038 `------------------------------------------ Making the translators depend on `state` is a bad idea. It means that everything using the translators becomes infected with this state-dependency. It also means that you cannot implement `postpone` or `]]`...`[[` as standard-compliant code (while, with `state`-independent translators you could). Moreover, when you write a state-independent text interpreter, such as a polyForth-style text interpreter, or [colorforth-bw](https://www.complang.tuwien.ac.at/forth/colorforth-bw), you would have to set `state` before executing the translators, which is perverse. And in the case of colorforth-bw, again there is no standard way to set `state` to get the translator to perform xt-post. ,------------------------------------------ | 2023-09-13 09:34:11 GeraldWodni replies: | proposal - 2023 Standards meeting agenda (2023-09-13 to 2023-09-15) | see: https://forth-standard.org/proposals/2023-standards-meeting-agenda-2023-09-13-to-2023-09-15-#reply-1039 `------------------------------------------ __Please note that the session times have changed__ a bit. The most up to date schedule can be found here: https://euro.theforth.net/program/ ,------------------------------------------ | 2023-09-13 10:32:51 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1040 `------------------------------------------ The experience with the usage in Gforth (non-standard extensions excluded) shows that direct calls to translators with a specific state are limited to ``postpone``, which is compile-only and therefore : postpone ( "name" -- ) -2 state ! parse-name forth-recognize execute -1 state ! ; immediate compile-only is not generating surprises (``postpone`` is expected to leave the system in compilation state after it has done its work). In Gforth, ``]]`` and ``[[`` are implemented by changing state, and for recognizing the super-immediate ``[[`` a special recognizer is added to the stack which returns a translator that has a specific postpone effect that changes back to compilation state and drops the additional recognizer from the stack. ' noop dup :noname ] forth-recognizer stack> drop ; translate: translate-[[ The ``state``-dependent invocation is the 99.9% case for translators, and that includes ``]]`` and ``[[``. The Forth outer interpreter depends on ``state`` (or a similar internal representation). The object that deals with the different actions depending on ``state`` is the translator. The proposal allows you to implement other ways to access the individual methods of a translator, if you need them. It does not encourage anymore to use translators as building blocks for other translators, and we can add wording that only translators created by ``translate:`` are standard-conforming. Since there's little use for these other access methods, it does not suggest to standardize those. ,------------------------------------------ | 2023-09-13 10:37:35 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1041 `------------------------------------------ ## Author: Bernd Paysan ## Change Log: * 2020-09-06 initial version * 2020-09-08 taking ruv's approach and vocabulary at translators * 2020-09-08 replace the remaining rectypes with translators * 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion * 2022-09-08 adjust reference implementation to results of last bikeshedding discussion * 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement * 2022-09-10 More complete reference implementation * 2022-09-10 Add use of extended words in reference implementation * 2022-09-10 Typo fixed * 2022-09-12 Fix for search order reference implementation * 2022-09-15 Revert to Trute's table approach to call specific modes deliberately * 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way. * 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM * 2023-09-13 Make clear that ``TRANSLATE:`` is the only way to define a standard-conforming translator. ## Problem: The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal. ## Solution: Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset. Important changes to the original proposal: * Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves * Make the recognizer sequence executable with the same effect as a recognizer * Make sure the API is not mandating a special implementation This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible. The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like : rec-xt ( addr u -- translator ) here place here find dup IF 0< state @ and IF compile, ELSE execute THEN ['] drop ELSE drop ['] notfound THEN ; then you should factor the part starting with state @ out and return it as translator: : translate-xt ( xt flag -- ) 0< state @ and IF compile, ELSE execute THEN ; : rec-xt ( addr u -- ... translator ) here place here find dup IF ['] translate-xt ELSE drop ['] notfound THEN ; In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use ``-48 throw``, otherwise define a postpone action: :noname ( xt flag -- ) drop execute ; :noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ; :noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ; translate: translate-xt The standard interpreter loop should look like this: : interpret ( i*x -- j*x ) BEGIN parse-name dup WHILE forth-recognize execute REPEAT 2drop ; with the usual additions to check e.g. for empty stacks and such. ## Typical use TBD ## Proposal: XY. The optional Recognizer Wordset A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for `NOTFOUND`): REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND ) # XY.3 Additional usage requirements ## XY.3.1 Translator **translator:** subtype of xt, and executes with the following stack effect: *TRANSLATE-THING* ( j\*x i\*x -- k\*x ) A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in. `i*x` is the additional information provided by the recognizer, `j*x` and `k*x` are the stack inputs and outputs of interpreting/compiling or postponing the thing. # XY.6 Glossary ## XY.6.1 Recognizer Words **FORTH-RECOGNIZE** ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or `NOTFOUND` if not. **NOTFOUND** ( -- ) RECOGNIZER Performs `-13 THROW`. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message. **TRANSLATE:** ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT Create a translator word under the name "name". This word is the only standard way to define a translator. "name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode. Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by ``EXECUTE`` to keep the API small. There may be other, non-standard modes of operation, where the individual component xts are accessed ``STATE``-independently, which only works on translators created by ``TRANSLATE:`` (e.g. for implementing ``POSTPONE``), so any other way to define a translator is non-standard. ## XY.6.2 Recognizer Extension Words **SET-FORTH-RECOGNIZE** ( xt -- ) RECOGNIZER EXT Assign the recognizer xt to FORTH-RECOGNIZE. Rationale: FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using `IS FORTH-RECOGNIZE`. **FORTH-RECOGNIZER** ( -- xt ) RECOGNIZER EXT Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE. Rationale: FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using `ACTION-OF FORTH-RECOGNIZE`. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support `TO FORTH-RECOGNIZER`, too. **RECOGNIZER-SEQUENCE:** ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful. **SET-RECOGNIZER-SEQUENCE** ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT Set the recognizer sequence of xt-seq to xt1 .. xtn. **GET-RECOGNIZER-SEQUENCE** ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT Obtain the recognizer sequence xt-seq as xt1 .. xtn n. **TANSLATE-NT** ( j*x nt -- k*x ) RECOGNIZER EXT Translates a name token. **TRANSLATE-NUM** ( j*x x -- k*x ) RECOGNIZER EXT Translates a number. ## Reference implementation: This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish. Defer forth-recognize ( addr u -- i*x translator-xt / notfound ) : interpret ( i*x -- j*x ) BEGIN ?stack parse-name dup WHILE forth-recognize execute REPEAT ; : lit, ( n -- ) postpone literal ; : notfound ( state -- ) -13 throw ; : translate: ( xt-interpret xt-compile xt-postpone "name" -- ) create , , , does> state @ 2 + cells + @ execute ; :noname name>interpret execute ; :noname name>compile execute ; :noname name>compile swap lit, compile, ; translate: translate-nt ( nt -- ) ' noop ' lit, :noname lit, postpone lit, ; translate: translate-num ( n -- ) : rec-nt ( addr u -- nt nt-translator / notfound ) forth-wordlist find-name-in dup IF ['] translate-nt ELSE drop ['] notfound THEN ; : rec-num ( addr u -- n num-translator / notfound ) 0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop ['] notfound THEN ; : minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound ) 2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ; ' minimal-recognizer is forth-recognize ## Extensions reference implementation: : set-forth-recognize ( xt -- ) is forth-recognize ; : forth-recognizer ( -- xt ) action-of forth-recognize ; ### Stack library : STACK: ( size "name" -- ) CREATE 0 , CELLS ALLOT ; : SET-STACK ( item-n .. item-1 n stack-id -- ) 2DUP ! CELL+ SWAP CELLS BOUNDS ?DO I ! CELL +LOOP ; : GET-STACK ( stack-id -- item-n .. item-1 n ) DUP @ >R R@ CELLS + R@ BEGIN ?DUP WHILE 1- OVER @ ROT CELL - ROT REPEAT DROP R> ; ### Recognizer sequences : recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND ) DUP >R @ BEGIN DUP WHILE DUP CELLS R@ + @ 2OVER 2>R SWAP 1- >R EXECUTE DUP ['] NOTFOUND <> IF 2R> 2DROP 2R> 2DROP EXIT THEN DROP R> 2R> ROT REPEAT DROP 2DROP R> DROP ['] NOTFOUND ; #10 Constant min-sequence# : recognizer-sequence: ( rec1 .. recn n "name" -- ) min-sequence# stack: min-sequence# 1+ cells negate here + set-stack DOES> recognize ; : ?defer@ ( xt1 -- xt2 ) BEGIN dup is-defer? WHILE defer@ REPEAT ; : set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ; : get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ; Once you have recognizer sequences, you shall define ' rec-num ' rec-nt 2 recognizer-sequence: default-recognize ' default-recognize is forth-recognize The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator. : find-name-in ( addr u wid -- nt / 0 ) execute ['] notfound = IF 0 THEN ; root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order : find-name ( addr u -- nt / 0 ) ['] search-order find-name-in ; : get-order ( -- wid1 .. widn n ) ['] search-order get-recognizer-sequence ; : set-order ( wid1 .. widn n -- ) ['] search-order set-recognizer-sequence ; ## Testing TBD ,------------------------------------------ | 2023-09-13 17:02:48 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1042 `------------------------------------------ Anton writes: > Making the translators depend on state is a bad idea. It's a subject of terminology. A translator depends on state **by [definition](https://github.com/ForthHub/fep-recognizer/blob/master/terms-and-datatypes.md)**: > to **translate a token**: to interpret the token if interpreting, or to compile the token if compiling. > **token translator**: a Forth definition that translates a token; also, depending on context, the execution token for this Forth definition. ----- If you want a recognizer to return not a execution token, but some opaque identifier, I suggest to call it "descriptor". > **token descriptor object**: an implementation dependent data object (a set of information) that describes how to interpret and how to compile a token. > **token descriptor**: a value that identifies a token descriptor object; also, less formally and depending on context, a Forth definition that just returns this value (i.e., a constant), or a token descriptor object itself. ,------------------------------------------ | 2023-09-13 20:49:55 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1043 `------------------------------------------ As said, this is just about moving things around. There's little difference if you use ``translator-execute`` or ``execute`` on a translator as specified way to get from the translator to its state-dependent action. It's all system-dependent and hidden, and systems might implement it without even referring to ``STATE`` and only update ``STATE`` to reflect compilation and interpretation state and otherwise never look at it, and the way the system internally keeps its state can be completely different. The most obvious difference is that with ``translator-execute``, you need another word. The fact that some abstract data type is executable does not mean ``EXECUTE`` is the only way to operate on it. Recognizer sequences are executable in this proposal, and they still can be read out with and set by ``GET``/``SET-RECOGNIZER-SEQUENCE``. So you can't just define them as colon definitions, you need to go through ``RECOGNIZER-SEQUENCE:`` to define them. Though I don't propose to standardize this, the proposal also suggests to make word list ids executable, and put them together in a recognizer sequence called ``search-order``. word list ids still have be used in other ways, e.g. to add new words to them, and the details are left to the system; but it is clear that they can't be normal colon definitions. Not providing an abstraction like either ``translator-execute`` or ``execute``, and instead putting it directly as ``state @ abs cells + @ execute`` into the outer interpreter is a really bad idea, because all details of the reference implementation in which this sequence works become then part of the standard. Other ways to implement it, which may have performance advantages, or not expose the postpone state in ``STATE`` would then not be allowed. The following implementation should be standard, too: : do-translate ( translate-body -- ) 0 + @ execute ; : state! ( state -- ) dup state ! abs cells ['] do-translate >body cell+ ! ; \ assume threaded code : translate: ( int-xt comp-xt post-xt "name" - - ) create swap rot , , , does> do-translate ; : [ 0 state! ; immediate : ] -1 state! ; : ]] 2 cells ['] do-translate >body cell+ ! ; immediate \ STATE left as is How to recognize ``[[`` is left as exercise to the reader, hint: a recognizer is a good idea, because it actually provides something that is executed at postpone time.