,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2024-12-30 17:37:44 ruv wrote: | requestClarification - May `CODE` be a parsing word? | see: https://forth-standard.org/standard/tools/CODE#contribution-372 `------------------------------------------ [15.6.2.0930 `CODE`](https://forth-standard.org/standard/tools/CODE) says: > Those characters are processed in an implementation-defined manner, generating the corresponding machine code. The process continues, refilling the input buffer as needed, until an implementation-defined ending sequence is processed. Does this imply that `code` actively parses the input source? If it actively parses, is it standard-compliant? ,---------. | Replies | `---------´ ,------------------------------------------ | 2024-12-06 16:07:14 JimPeterson replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1396 `------------------------------------------ I understand what you're saying, but in terms of conveying information to the reader, I think some indication that there may not be loop control parameters to `UNLOOP`, presented in the stack notation, may be of use. I know that the text below it says as much, but `?DO` having the same stack notation as `DO` just feels wrong or misleading. I know, from a machine's perspective, the notation is technically correct, but I feel like this documentation is being written for humans, and they often require (or at least benefit from) a little more hand-holding. The change I suggest is also technically correct but more informative. ,------------------------------------------ | 2024-12-07 07:22:49 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397 `------------------------------------------ @BerndPaysan: If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with `state`. And text interpreters that use xt-post can be written using the proposed wordset rather than having to use a detour through `postpone` (which is a parsing word, possibly introducing additional complications). The following is also relevant to @ruv: [Ruv's colorforth-bw implementation](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/example/recognize-colorforth-bw.fth) demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing `recognize-colorforth-bw`; instead, it reimplements everything that the name recognizer and the number recognizer already do internally, nicely demonstrating that the present proposal buries the tools. And it only implements dealing with names and single-cell numbers. Finally, the implementation is so long (44 lines without putting it into `forth-recognize`) that you have not shown it inline, but posted a link to github. By contrast, let's take much of the proposal from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516), but replace the state-dependent translators with the state-independent rectypes of [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160). With such a proposal, colorforth-bw might look as follows (untested): ``` defer recognizer1 forth-recognizer is recognizer1 : prefix>index ( c -- n ) case '[' of 0 endof '_' of -1 endof ']' of -2 endof 1 swap endcase ; : rectype-colorforth-bw ( ... rectype index state -- ... ) drop \ we use index, not the surrounding Forth interpreter's state swap execute ; : recognize-colorforth-bw ( c-addr u -- ) dup 0= if 2drop ['] notfound exit then over c@ prefix>index dup 0 > if 2drop drop ['] notfound exit then >r 1 /string recognizer1 r> ['] rectype-colorforth-bw ; ' recognize-colorforth-bw set-forth-recognize ``` This has only 20 lines (vs. 44), and it uses all the recognizers originally present in `forth-recognizer` (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160) over the translators from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516). BTW, I find the presence of both `forth-recognize` and `forth-recognizer` confusing, and would prefer to define `forth-recognize` as deferred word. If you have to have getters and setters, call the getter `get-forth-recognize`. > In this approach, why do you need to write «[postpone _foo» instead > of «]foo» ? Nobody is suggesting that. But you need to perform xt-post in order to implement `]foo`. In your implementation, you do it by reimplementing xt-post for the two recognizers you implement internally to `recognize-colorforh-bw`. If you would use a detour through `postpone` instead, you would use the xt-post invoked in that way. And in my implementation above, xt-post is invoked directly. ,------------------------------------------ | 2024-12-07 07:29:52 AntonErtl replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1398 `------------------------------------------ You are correct, but originally failed to get your point across to me, and apparently to ruv, who addressed the issue that a loop-sys may have 0 items on the return stack. But yes, if n1|u1 = n2|u2, no loop-sys is pushed by the run-time semantics of `?do`. ,------------------------------------------ | 2024-12-07 08:36:05 ruv replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1399 `------------------------------------------ JimPeterson, I now see what you mean. My argument about size of _loop-sys_ is irrelevant. > that there may not be loop control parameters to `UNLOOP` This is impossible. `UNLOOP` can only be used in a loop body. And if `?DO` Run-time semantics do not place _loop-sys_, the loop body does not gain control. ```forth program1 ( S: u.limit u.initial ) ( R: 0*x ) ?DO ( S: 0*x ) ( R: loop-sys ) program2 ( R: loop-sys ) LOOP ( R: 0*x ) ``` By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram `( R: -- loop-sys | )` would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment. Take a look: ```forth program1 ( S: param-type.1 ) program2 ( S: param-type.2 ) program3 ``` When we specify a stack diagram for `program2`, which is `( param-type.1 -- param-type.2 )`, we indicate: - the input parameter type for `program2`, which should be provided by `program1`, - the output parameter type of `program2`, which is available for `program3`, - we can indicate a case when `program3` does not gain control, but we have to indicate **in prose** who will gain control and what parameters will be passed. For example: ``` : ?return-true ( -- ) ]] if true exit then [[ ; immediate ``` - A correct stack diagram for `?return-true` Run-time semantics: - `( x -- )` - A more narrow stack diagram for `?return-true` Run-time semantics: - `( 0 -- | x\0 -- true never )` - But this diagram does not tell us where the output parameter `true` is available (if any). See also [my post](https://github.com/ForthHub/discussion/discussions/171#discussioncomment-10882259) about the _never_ data type. ----- NB: `( n1 | u1 n2 | u2 -- )` is incorrect due to excessive spaces, it's an [editorial issue](https://forth-standard.org/proposals/formatting-spaces-in-data-type-symbols#contribution-250). ,------------------------------------------ | 2024-12-07 15:46:16 ruv replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1400 `------------------------------------------ > - A correct stack diagram for ?return-true Run-time semantics: > - `( x -- )` It could be unclear why this diagram is correct if the `?return-true` Run-time semantics do not return control in some cases. The answer is that the data type _never_ is a subtype of any other data type, and it is a subtype of `0*x`. `( x -- )` ⟺ `( x -- 0*x )` ⟺ `( x -- 0*x|never )` ⟺ `( x -- | x -- never )` A stack diagram by itself does not guarantee that a word returns control. It only guarantees that if the word returns control, it returns a parameter of specified type. ,------------------------------------------ | 2024-12-08 10:46:00 AntonErtl replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1401 `------------------------------------------ > By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment. Counterexample: [`THROW`](https://forth-standard.org/standard/exception/THROW) ,------------------------------------------ | 2024-12-08 18:25:20 ruv replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1402 `------------------------------------------ Anton, thank you for this example. Yes, I'm wrong that it "would be incorrect". _Formally_, it is still correct. Because specifying a wider data type than possible does not introduce contradictions. But it does introduce **confusing**. I think that data types should be specified as narrow as possible. The stack diagram `( k*x n -- k*x | i*x n )` is justified only if neither type of `( i*x n )` and `( k*x )` is a subtype of the other. This holds for the word `throw` because `( i*x n )` is never returned to the caller. But specifying two data types in [the union](https://github.com/ForthHub/discussion/discussions/171#discussioncomment-11030626) so that the members of one are sometimes returned and the members of the other are never returned **is useless and just confusing**. A better diagram for [`throw`](https://forth-standard.org/standard/exception/THROW) is `( k*x 0 -- k*x | k*x n1\0 -- i*x n1 never )`. This diagram explicitly says that the output parameter of the type `( i*x n1 )` (where the value that corresponds to `n1` is the same in the input and in the output parameter) is not available to the caller. A better diagram for `?do` Run-time semantics is `( n1 n2 -- ; R: -- loop-sys ; | u1 u2 -- ; R: -- loop-sys ; | x1 x1 -- never ; )`. This diagram specifies: - an ambiguous condition exists if the input parameter has type neither `( n n )` nor `( u u )`, - if the first input parameter and the second input parameter are identical, then the loop body may not gain control. ,------------------------------------------ | 2024-12-08 19:17:59 ruv replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1403 `------------------------------------------ By "a wider data type than possible" I mean a data type that has **more members** than another suitable data type. How best to phrase that? ,------------------------------------------ | 2024-12-08 20:45:55 ruv replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1404 `------------------------------------------ > The stack diagram `( k*x n -- k*x | i*x n )` is justified only if neither type of `( i*x n )` and `( k*x )` is a subtype of the other. Actually, this **always** holds. `( k*x n -- k*x | i*x n )` ⟺ `( k*x n -- k*x | k*x n -- i*x n )` To prove that neither type from the union is a subtype of the other, it is enough to give two examples, one of which is a member of only `( k*x n -- k*x )` (in the union), and the other is a member of only `( k*x n -- i*x n )` (in the union). Here are such examples: - the mapping `( 0 ↦ )` is a member of only `( k*x n -- k*x )` (in the union) - the mapping `( 1 ↦ 1 )` is a member of only `( k*x n -- i*x n )` (in the union) There are also members that belongs to both. For example, the following mappings: - `( 0 0 ↦ 0 )` - `( 1 1 ↦ 1 )` - `( 123 0 0 ↦ 123 0 )` ,------------------------------------------ | 2024-12-08 22:59:20 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1405 `------------------------------------------ @AntonErtl [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397): > [Ruv's colorforth-bw implementation](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/example/recognize-colorforth-bw.fth) demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing `recognize-colorforth-bw`; instead, it reimplements everything that the name recognizer and the number recognizer already do internally, It's wrong. Have a look at [L18-L19](https://github.com/ForthHub/fep-recognizer/blob/fed494a7b545c8fe9338a12bee2254fc838baace/implementation/example/recognize-colorforth-bw.fth#L18-L19): ```forth \ Reuse a recognizer for numbers ['] recognize-number-n-prefixed apply-recognizer-cf dup 0= if exit then ``` It **uses** the recognizer for numbers. And it uses `find-name` instead of the recognizer for names (Forth words) just because it's simpler in this case. It does not reuse _token translators_. > And it only implements dealing with names and single-cell numbers. Because your original example implemented only that. And I just rewrote your original example. > Finally, the implementation is so long (44 lines without putting it into forth-recognize) that you have not shown it inline, but posted a link to github. Why count 10 lines of comments at the beginning of the file? Without comments, 31 lines, the same as in your example (lexical size is greater due to nt vs xt, and improvements in the behavior). > By contrast, let's take much of the proposal from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-516), but replace the state-dependent translators with the state-independent rectypes of [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160). With such a proposal, colorforth-bw might look as follows (untested): [...] > This has only 20 lines (vs. 44), and it uses all the recognizers originally present in forth-recognizer (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [[160]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#contribution-160) over the translators from [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1081). (I corrected the [r1081] link in the citation above) This comparison is incorrect. Below is an [implementation](https://github.com/ForthHub/fep-recognizer/commit/df96c6ab2c9613f4552933e42fc24ac38cc41c02) against the latest API version (except `compile-postpone-qtoken` that is a variation of discussed `postpone-qtoken`, which should be either present or implementable in any variant of API): ```forth : cf-prefix>tt? ( c -- tt true | c false ) case '[' of ['] execute-interpreting endof '_' of ['] execute-compiling endof ']' of ['] compile-postpone-qtoken endof 0 exit endcase true ; defer recognize-default perceptor is recognize-default : recognize-colorforth-bw ( sd.lexeme -- qt|0 ) dup 0= if nip exit then over c@ cf-prefix>tt? 0= if drop 2drop 0 exit then >r 1 /string recognize-default dup if r> exit then rdrop ; ``` **16 lines.** Can be tested in Gforth too: ``` gforth index.fth example/recognize-colorforth-bw.fth :noname cf( _1. _drop _s" foo" ) ; execute s" foo" compare 0= .s \ prints "1 -1" ``` ,------------------------------------------ | 2024-12-09 07:07:25 AntonErtl replies: | requestClarification - Return Stack Notation Mildly Inaccurate | see: https://forth-standard.org/standard/core/qDO#reply-1406 `------------------------------------------ It is obvious that the idea behind the stack diagram of `throw` is to specify what happens on the data stack in both cases, including the case where the control flow does not continue sequentially. And I think it's a good idea to specify the stack effect for that case, and it should also be done for `?do`. Whether the `|` syntax as used for `throw` is good enough or whether we should have separate stack diagrams for the two cases is something one might discuss. However, I have not seen confused questions about what the stack effect of `throw` means, and I would not expect confusion from a similar usage of `|` for the stack effect of `?do`. In both cases the prose makes it clear enough which part of the stack effect diagram corresponds to which case. Actually for the `?do` case it has taken 3 decades until someone asked about the lack of a stack-effect diagram for the case when index=limit. ,------------------------------------------ | 2024-12-09 07:25:04 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1407 `------------------------------------------ The latest proposal is [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1081) and it does not contain `execute-interpreting`, `execute-compiling`, `compile-postpone-qtoken`, or `perceptor`. And that's what we were tasked with discussing and giving feedback on. And that's what I did. ,------------------------------------------ | 2024-12-09 09:38:01 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1408 `------------------------------------------ > The latest proposal is [[r1081]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1081) and it does not contain `execute-interpreting`, `execute-compiling`, `compile-postpone-qtoken`, or `perceptor`. And that's what we were tasked with discussing and giving feedback on. And that's what I did. I see, thank you. Actually, [r1081] is outdated, a new version will be prepared soon and then it should be discussed (was noted in the recognizer chat). Nevertheless, my example implementation for `recognize-colorforth-bw` above is compatible with [r1081] with the following exceptions: it relies on `0` instead of `NOTFOUND` (you should note how it makes things simpler), and it uses the method `compile-postpone-qtoken` that appends the compilation semantics of a qualified token to the current definition (this method is missing in [r1081]). The word `perceptor` is simply a better name than `forth-recognizer` in [r1081] (I just posted in ForthHub/fep-recognizer a [rationale](https://github.com/ForthHub/fep-recognizer/issues/23) from the chat). The words `execute-interpreting` and `execute-compiling` are general words that are [needed](https://forth-standard.org/standard/tools/NAMEtoINTERPRET#contribution-364 ) anyway to perform interpretation or compilation semantics regardless the initial `STATE`, they are [implemented](https://github.com/ForthHub/fep-recognizer/blob/81355bfa4ee639d822e25da2a32f2ec4d9526815/implementation/lib/compat/core.translator.fth) in the standard Forth as: ``` : compilation ( comp: true ; S: -- true ; | comp: false ; S: -- false ; ) state @ 0<> ; : enter-compilation ( comp: false -- true ; S: -- ; | comp: true ; S: -- ; ) ] ; : leave-compilation ( comp: true -- false ; S: -- ; | comp: false ; S: -- ; ) postpone [ ; : execute-interpreting ( i*x xt -- j*x ) compilation 0= if execute exit then leave-compilation execute enter-compilation ; : execute-compiling ( i*x xt -- j*x ) compilation if execute exit then enter-compilation execute leave-compilation ; ``` ,------------------------------------------ | 2024-12-09 10:00:24 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1409 `------------------------------------------ @AntonErtl [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1397): > If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with state. Token translators cannot be written without having to deal with state (possibly indirectly), by the term definition. A token translator shall perform different actions depending on the state, and it does not matter how the state is passed to the translator: though the data stack, through a separate stack intended for this purpose, or though an internal variable. The state does not matter in only one case: if the translator shall perform the same action regardless of the state. Moreover, if you pass a parameter that encodes compilation state or interpretation state not through `STATE`, you have to make `STATE` to be in sync with this parameter to guarantee that STATE-dependent words are translated correctly. ,------------------------------------------ | 2024-12-10 15:17:03 ruv replies: | requestClarification - The case of undefined interpretation semantics | see: https://forth-standard.org/standard/tools/BracketDEFINED#reply-1410 `------------------------------------------ The initial problem in `[defined]` was fixed by the proposal [Remove the “rules of FIND”](https://forth-standard.org/proposals/remove-the-rules-of-find-?hideDiff#reply-900). Close. ,------------------------------------------ | 2024-12-10 15:17:09 ruv replies: | proposal - Remove the “rules of FIND” | see: https://forth-standard.org/proposals/remove-the-rules-of-find-#reply-1411 `------------------------------------------ [5.6.2.2534 `[UNDEFINED]`](https://forth-standard.org/standard/tools/BracketUNDEFINED) should also be updated according to the new wording in `[DEFINED]`. ,------------------------------------------ | 2024-12-15 22:00:00 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1412 `------------------------------------------ # Minimalistic Recognizer API ## Author: Bernd Paysan ## Change Log: * 2020-09-06 initial version * 2020-09-08 taking ruv's approach and vocabulary at translators * 2020-09-08 replace the remaining rectypes with translators * 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion * 2022-09-08 adjust reference implementation to results of last bikeshedding discussion * 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement * 2022-09-10 More complete reference implementation * 2022-09-10 Add use of extended words in reference implementation * 2022-09-10 Typo fixed * 2022-09-12 Fix for search order reference implementation * 2022-09-15 Revert to Trute's table approach to call specific modes deliberately * 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way. * 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM * 2023-09-13 Make clear that `TRANSLATE:` is the only way to define a standard-conforming translator. * 2023-09-15 Add list of example recognizers and their names. * 2024-12-15 Take comments after freezing the proposal into account ## Problem The Forth compiler can be extended easily. The Forth interpreter however has a fixed set of capabilities as outlined in section 3.4 of the standard text: Words from the dictionary and some number formats. It's not possible to use the Forth text interpreter in an application or system extension context. Most interpreters in existing systems use a number of hooks to extent the interpreter. That makes it possible to use a loadable library to implement new data types to be handled like the built-in ones. An example are the floating point numbers. They have their own parsing and data handling words including a stack of their own. Furthermore applications need to use system provided and system specific words or have to re-invent the wheel to get numbers with a sign or hex numbers with the $ prefix. The building blocks (`FIND`, `COMPILE,`, `>NUMBER` etc) are available but there is a gap between them and what the Forth interpreter already does. The Forth interpreter is stateful, but the API should avoid the problems of the `STATE` variable. In particular, an implementation without `STATE` should be possible, and there is only one place where the stateful dispatch is necessary. ## Solution The monolithic design of the Forth interpreter is factored into three major blocks: 1. The interpreter. It extracts sub-strings (lexemes) from `SOURCE`, hands them over to the data parsing and processes the results. 2. The actual data parsing. It analyses lexemes whether they match the criteria for a certain token type. These words, called recognizers, can be grouped to achieve an order of invocation. 3. The result of the recognizer, a translator and associated data, is handed over to the interpreter. There is no strict 1:1 relation between a recognizer and the returned translator. A translator for e.g. single cell numbers can be used by different recognizers, a recognizer can return different translators (e.g. single and double cell numbers). Whenever the Forth text interpreter is mentioned, the standard words `EVALUATE` (CORE), `'` (tick, CORE), `INCLUDE-FILE` (FILE), `INCLUDED` (FILE), `LOAD` (BLOCK) and `THRU` (BLOCK) are expected to act likewise. This proposal is not about to change these words, but to provide the tools to do so. As long as the standard feature set is used, a complete replacement with recognizers is possible. Important changes to the Matthias Trute proposal: * Make the translators executable to dispatch according to the state (interpreting, compiling, postponing) themselves * Use dedicated invocation methods to call a translator for a particular state * Make the recognizer sequence executable with the same effect as a recognizer * Make sure the API is not mandating any particular implementation The core principle is that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like : recognize-xt ( addr u -- translator-stub | 0 ) here place here find dup IF 0< state @ and IF compile, ELSE execute THEN ['] noop THEN ; then you should factor the part starting with `STATE @` out and return it as translator: : translate-xt ( xt flag -- ) 0< state @ and IF compile, ELSE execute THEN ; : recognize-xt ( addr u -- ... translator | 0 ) here place here find dup IF ['] translate-xt THEN ; In a second step, you need to remove the `STATE @` entirely and use `TRANSLATE:`. If you don't know what to do on postpone in this stage, use `-48 throw`, otherwise define a postpone action: :noname ( xt flag -- ) drop execute ; :noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ; :noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ; translate: translate-xt ## Typical use The standard interpreter loop should look like this: : interpret ( i*x -- j*x ) BEGIN parse-name dup WHILE forth-recognize ?found execute REPEAT 2drop ; with the usual additions to check e.g. for empty stacks and such. Operating a recognizer in a particular state, e.g. to postpone a single word, do : postpone ( "name" -- ) parse-name forth-recognize ?found postponing ; immediate to optain an xt for a name, use something like that: : ' ( "name" -- xt ) parse-name forth-recognize ?found ['] translate-nt <> #-32 and throw name>interpret ; ## Proposal: # XY. The optional Recognizer Wordset # # XY.1 Introduction # Recognizers have the form `REC-`*SOMETYPE* ( addr len -- i\*x j\*r translate-xt | 0/NOTFOUND ) A recognizer takes the string *addr len* of a lexeme and on success returns a translator *translate-xt* and additional data on the data and floating point stack. ### [IF] NOTFOUND=0 ### If it fails, it returns 0. ### [ELSE] NOTFOUND=xt ### If it fails, it returns the xt of `NOTFOUND`. For clarity, unless this issue is decided, the non-success return value of a recognizer is notated as 0/NOTFOUND. The reference implementation uses the option 0. ### [THEN] notfound ### ### [IF] side-effect ### A recognizer shall not have a side effect. Rationale: Side effects are supposed to all happen inside the translators. This promise allows to try recognize something and fail if the result is not desired without having to roll back unkown changes. Examples: The tick and to recognizer pass a substring of the to be translated string to `FORTH-RECOGNIZE`, and fail if the result is not a name type. ### [THEN] side-effect ### # XY.3 Additional usage requirements ## XY.3.1 Translator **translator:** named subtype of xt, and executes with the following stack effect: *name* ( j\*x i\*x -- k\*x ) A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in. *i\*x* is the additional information provided by the recognizer, *j\*x* and *k\*x* are the stack inputs and outputs of interpreting/compiling or postponing the recognized lexeme. # XY.6 Glossary ## XY.6.1 Recognizer Words **FORTH-RECOGNIZE** ( addr len -- i\*x translator-xt | 0/NOTFOUND ) RECOGNIZER Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or 0/NOTFOUND if not. ### [IF] defer `FORTH-RECOGNIZE` is a deferred word. Changing the system recognizer can be done with `IS FORTH-RECOGNIZE`, obtaining the system recognizer with `ACTION-OF FORTH-RECOGNIZE`. Rationale: use existing API to change it; most simple system have this available, and advanced systems have capabilities to work around limitations. ### [ELSE] setter and getter ### **SET-FORTH-RECOGNIZE** ( xt -- ) RECOGNIZER EXT Assign the recognizer *xt* to FORTH-RECOGNIZE. **FORTH-RECOGNIZER** ( -- xt ) RECOGNIZER EXT Obtain the recognizer *xt* that is assigned to `FORTH-RECOGNIZE`. Rationale: not sufficiently advanced systems can work around the limitations of `IS` and `ACTION-OF` better with this API. ### [THEN] **TRANSLATE:** ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER Create a translator word under the name "name". This word is the only standard way to define a general purpose translator. "name:" ( j\*x i\*x -- k\*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current state. Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by `EXECUTE` to keep the API small. You can not simply set `STATE`, use `EXECUTE` and afterwards restore `STATE` to perform interpretation or compilation semantics, because words can change `STATE`, so you need the words `INTERPRETING` and `COMPILING` defined below. This problem does not apply to `POSTPONING`, so systems that only want to implement direct access to `POSTPONE` mode can get away without `TRANSLATE:`. ### [IF] NOTFOUND=0 ### **?FOUND** ( translator-xt -- translator-xt | 0 -- never ) RECOGNIZER Check if the recognizer was successful, and if not, perform a `-13 THROW` or display an appropriate error message if the exception wordset is not present. ### [THEN] NOTFOUND=0 ### ## XY.6.2 Recognizer Extension Words ### [IF] NOTFOUND=0 ### **?NOTFOUND** ( translator-xt -- translator-xt | 0 -- addr u notfound-xt ) Check if the recognizer was successful. If not, replace the 0 result with the *addr u* of the last scanned lexeme, and put the xt of the `NOTFOUND` translator on top of the stack. **NOTFOUND** ( -- never ) RECOGNIZER Translator for unsuccessful recognizers: perform a `-13 THROW`. ### [THEN] NOTFOUND=0 ### **POSTPONE** ( "lexeme" -- ) RECOGNIZER Compilation: recognize *lexeme*. On success, perform the postpone action of the returned translator, otherwise `-13 THROW` or display the appropriate error message if the exception wordset is not present. **RECOGNIZER-SEQUENCE:** ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with *xtn* on stack and proceeding towards *xt1* until successful. **SET-RECOGNIZER-SEQUENCE** ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT Set the recognizer sequence of *xt-seq* to xt1 .. xtn. **GET-RECOGNIZER-SEQUENCE** ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT Obtain the recognizer sequence from *xt-seq* as *xt1 .. xtn n*. **TANSLATE-NT** ( j\*x nt -- k\*x ) RECOGNIZER EXT Translates a name token: Interpretation: perform the interpretation semantics of the word Compilation: perform the compilation semantics of the word Postpone: append the compilation semantics above to the current definition **REC-NT** ( addr u -- nt translate-nt | 0/NOTFOUND ) RECOGNIZER EXT Search the dictionary for the string *addr u*. If successful, return the *nt* and the xt of `TRANSLATE-NT`. If the search fails, return 0/NOTFOUND. **TRANSLATE-NUM** ( x -- x | ) RECOGNIZER EXT Translates a number: Interpretation: keep the number on the stack Compilation: Append the run-time defined in `LITERAL` to the current definition Postpone: Append the compilation semantics above to the current definition **TRANSLATE-DNUM** ( x1 x2 -- x1 x2 | ) RECOGNIZER EXT Translates a double number: Interpretation: keep the numbers on the stack Compilation: Append the run-time defined in `2LITERAL` to the current definition Postpone: Append the compilation semantics above to the current definition **REC-NUM** ( addr u -- x translate-num | xd translate-dnum | 0/NOTFOUND ) RECOGNIZER EXT Convert *addr u* to a number *x* and the xt of `TRANSLATE-NUM` as specified in 3.4.1.3 or a double number *xd* and the xt of `TRANSLATE-DNUM` as specified in 8.3.1 if the double number wordset is available. If the conversion fails, return 0/NOTFOUND. **TRANSLATE-FLOAT** ( r -- r | ) RECOGNIZER EXT Translates a floating point number: Interpretation: Keep *r* on the stack Compilation: Append the run-time defined in `FLITERAL` to the current definition Postpone: Append the compilation semantics above to the current definition **REC-FLOAT** ( addr u -- r translate-float | 0/NOTFOUND ) RECOGNIZER EXT Convdert *addr u* to a number *r* specified in 12.3.7 if the float wordset is availabe; if the conversion fails, return 0/NOTFOUND. **SCAN-TRANSLATE-STRING** ( addr1 u1 string-rest<"> -- addr2 u2 | ) RECOGNIZER EXT Complete parsing a string: *addr1 u1* consists of the starting quote and additional characters up to the first space in the string. *addr2 u2* consists of the entire string without the starting quote up to (but not including) the final quote, and translated the escape sequences according to the rules of `S\\"`. `>IN` is modified appropriately, and points just after the final quote. If there's no final quote in the current line, `REFILL` can be used to read in more lines, adding corresponding newlines into the string. The final quote can be inside *addr1 u1*, setting `>IN` backwards in that case. Translate the string: Interpretation: keep the string on the stack Compilation: Append the run-time defined in `SLITERAL` to the current definition Postpone: Append the compilation semantics stated above to the current definition ** TRANSLATE-STRING** ( addr1 u1 -- addr1 u1 | ) RECOGNIZER EXT Translate the string: Interpretation: keep the string on the stack Compilation: Append the run-time defined in `SLITERAL` to the current definition Postpone: Append the compilation semantics stated above to the current definition **?SCAN-STRING** ( addr1 u1 scan-translate-string string-rest<"> -- addr2 u2 translate-string | ... translator -- ... translator ) RECOGNIZER If the recognized token is an incompleted string, complete the scanning as defined for `SCAN-TRANSLATE-STRING` and replace the translator with the xt of `TRANSLATE-STRING`. **REC-STRING** ( addr u -- addr u translate-string | 0/NOTFOUND ) RECOGNIZER EXT Check if *addr u* starts with a quote, and return that string and the xt of `SCAN-TRANSLATE-STRING` if it does, 0/NOTFOUND otherwise. ### [IF] Optional API for direct access of translator states ### **INTERPRETING** ( j\*x xt -- k\*x ) RECOGNIZER EXT Execute *xt-int* of the translator *xt*. If *xt* is not a translator, do `-21 THROW`, or a best-effort attempt to execute *xt* in interpreting state. **COMPILING** ( j\*x xt -- ) RECOGNIZER EXT Execute *xt-comp* of the translator *xt*. If *xt* is not a translator, do `-21 THROW`, or a best-effort attempt to execute *xt* in compiling state. **POSTPONING** ( j\*x xt -- ) RECOGNIZER EXT Execute *xt-post* of the translator *xt*. If *xt* is not a translator, do `-21 THROW`, or a best-effort attempt to execute *xt* in postponing state. **GET-STATE** ( -- xt ) RECOGNIZER EXT Obtain the operation *xt* performed when translating. **SET-STATE** ( xt -- ) RECOGNIZER EXT Makes *xt* the operation performed when translating. If *xt* is not related to `' INTERPRETING`, `' COMPILING`, or `' POSTPONING`, do `-12 THROW`. ### [THEN] optional API for direct access of translator states ### **]]** ( -- ) RECOGNIZER EXT Interpretation semantics: undefined Compilation semantics: Set the system into postpone state. The interpreter will then perform *post-xt* of all translators found. Compilation state resumes when `[[` is recognized. This word may change `STATE` and the recognizer sequence to reflect the change of this state. **[[** ( -- ) RECOGNIZER EXT Interpretation semantics: undefined Compilaton semantics: undefined Postpone semantics: enter compilation state, see `]`; all changes to `STATE` and recognizer sequence done by `]]` are reverted. Note: `[[` needs special treatment in postpone mode, so it might also use a non-standard translator and be not a word at all. **STATE** ( -- addr ) RECOGNIZER If `]]` uses `STATE` to store postpone state, extends the semantics of 6.1.2250 by adding a second non-zero value. `]]` enters this state, and `[[` leaves it. Only translators and the code responsible for displaying the prompt can see this third state, as all other words are postponed in this state. ## Reference implementation: This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish. It uses NOTFOUND=0. Defer forth-recognize ( addr u -- i*x translator-xt / 0 ) : ?found ( translator -- translator | 0 -- never ) dup 0= IF -13 throw THEN ; : interpret ( i*x -- j*x ) BEGIN parse-name dup WHILE forth-recognize ?found execute REPEAT ; : translate: ( xt-interpret xt-compile xt-postpone "name" -- ) create , , , does> state @ 2 + cells + @ execute ; An alternative implementation for `TRANSLATE:` can use a deferred word: Defer do-translate : translate: ( xt-interpret xt-compile xt-postpone "name" -- ) create , , , does> do-translate ; : set-state ( xt -- ) dup is do-translate >body @ 2 - state ! ; : get-state ( -- xt ) action-of do-translate ; ## Extensions reference implementation: : ]] -2 state ! ; immediate : [[ -1 state ! ; immediate :noname name>interpret execute ; :noname name>compile execute ; :noname dup name>interpret ['] [[ = IF name>interpret execute \ special case ELSE name>compile swap lit, compile, THEN ; translate: translate-nt ( nt -- ) : lit, ( n -- ) postpone literal ; ' noop ' lit, :noname lit, postpone lit, ; translate: translate-num ( n -- ) : rec-nt ( addr u -- nt nt-translator | 0 ) forth-wordlist find-name-in dup IF ['] translate-nt THEN ; : rec-num ( addr u -- n num-translator | 0 ) 0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop 0 THEN ; : minimal-recognize ( addr u -- nt nt-translator | n num-translator | 0 ) 2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ; ' minimal-recognizer is forth-recognize : translate-method: ( n -- ) Create , DOES> @ cells + >body @ execute ; 0 translate-method: postponing 1 translate-method: compiling 2 translate-method: interpreting : set-state ( xt -- ) >body @ 2 - state ! ; : get-state ( -- xt ) case state @ 0 of ['] interpreting endof -1 of ['] compiling endof -2 of ['] postponing endof -11 throw endcase ; : postpone ( "name" -- ) parse-name forth-recognize ?found postponing ; immediate This reference implementation uses a table dispatch only. Note that this can give surprising results when you directly apply a particular state, and one of the words executed (translator or nt/xt found) is a state-smart word. If you want to use combined translators, like : translate-dnum ( d -- ) >r translate-num r> translate-num ; you can't do it like this. Neither does this work if you execute state-smart words, as they expect `STATE` to be set accordingly. Instead, you'll use something like : translate-method: ( n -- ) Create , DOES> @ dup state @ = IF drop execute EXIT THEN state @ >r state ! execute r> state ! ; This will definitely work for combined literal translators, because those don't change state anyways. This will also work for `POSTPONE`, because apart from the tranlator, no word is actually executed in one-shot `POSTPONE`, and therefore, no state change is possible. This will also work for `[` and `]` (and words using them) while interpreting and compiling, because if you are already in the state from which the state is changed away, you will not restore the state. If you are in the state this will change to, this will work, too, because the state is restored after `EXECUTE`. This will not work if you are interpreting, and you do a `s" ]]" forth-recognize ?found compiling`, because that transitions to postponing, and then is reverted to interpreting. ### [IF] setter and getter : set-forth-recognize ( xt -- ) is forth-recognize ; : forth-recognizer ( -- xt ) action-of forth-recognize ; ### [THEN] setter and getter ### Stack library : STACK: ( size "name" -- ) CREATE 0 , CELLS ALLOT ; : SET-STACK ( item-n .. item-1 n stack-id -- ) 2DUP ! CELL+ SWAP CELLS BOUNDS ?DO I ! CELL +LOOP ; : GET-STACK ( stack-id -- item-n .. item-1 n ) DUP @ >R R@ CELLS + R@ BEGIN ?DUP WHILE 1- OVER @ ROT CELL - ROT REPEAT DROP R> ; ### Recognizer sequences : recognize ( addr len rec-seq-id -- i*x translator-xt | 0 ) DUP >R @ BEGIN DUP WHILE DUP CELLS R@ + @ 2OVER 2>R SWAP 1- >R EXECUTE DUP IF 2R> 2DROP 2R> 2DROP EXIT THEN DROP R> 2R> ROT REPEAT DROP 2DROP R> DROP 0 ; #10 Constant min-sequence# : recognizer-sequence: ( rec1 .. recn n "name" -- ) min-sequence# stack: min-sequence# 1+ cells negate here + set-stack DOES> recognize ; : ?defer@ ( xt1 -- xt2 ) BEGIN dup is-defer? WHILE defer@ REPEAT ; : set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ; : get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ; Once you have recognizer sequences, define ' rec-num ' rec-nt 2 recognizer-sequence: default-recognize ' default-recognize is forth-recognize The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator. : find-name-in ( addr u wid -- nt / 0 ) execute dup IF drop THEN ; root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order : find-name ( addr u -- nt / 0 ) ['] search-order find-name-in ; : get-order ( -- wid1 .. widn n ) ['] search-order get-recognizer-sequence ; : set-order ( wid1 .. widn n -- ) ['] search-order set-recognizer-sequence ; ### Recognizer examples Apart from the standardized recognizers above, here are some more examples of recognizers: **REC-TICK** ( addr u -- xt translate-num | 0/NOTFOUND ) If *addr u* starts with a `\`` (backtick), search the search order for the name specified by the rest of the string, and if found, return its *xt* and *translate-num*. **REC-SCOPE** ( addr u -- nt translate-nt | 0/NOTFOUND ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string *addr u* has the form *vocabulary*`:`*name*, otherwise than that this specifies the vocabulary to be searched in, `REC-SCOPE` is identical in effect to `REC-NT`. **REC-TO** ( addr u -- xt n translate-to | 0/NOTFOUND ) Handle the following syntax of `TO`-like operations of value-like words: * `->`*name* as `TO `*name* * `=>`*name* as `IS `*name* * `+>`*name* as `+TO `*name* * `'>`*name* as `ADDR `*name* * `@>`*name* as `ACTION-OF `*name* *xt* is the execution token of the value found, *n* indexes which variant of a `TO`-like operation is meant, and *translate-to* is the corresponding translator. **REC-ENV** ( addr u -- addr1 u1 translate-env | 0/NOTFOUND ) Takes a pattern in the form of `${`*name*`}` and provides the *name* as *addr1 u1* on the stack. The corresponding translator `TRANSLATE-ENV` is responsible for looking up that name in the operating system's environment variable array, or compiling appropriate code to do so. **REC-COMPLEX** ( addr u -- rr ri translate-complex | 0/NOTFOUND ) Converts a pair of floating point numbers in the form of *float1*`+`float2`i` into a complex number on the stack, and returns the xt of `TRANSLATE-COMPLEX` on success. ## Testing ``` T{ 0 recognizer-sequence: RS -> }T T{ :noname 1 ; :noname 2 ; :noname 3 ; translate: translate-1 -> }T T{ :noname 10 ; :noname 20 ; :noname 30 ; translate: translate-2 -> }T \ really stupid: 1 character length or 2 characters T{ : rec-1 NIP 1 = IF ['] translate-1 ELSE 0 THEN ; -> }T T{ : rec-2 NIP 2 = IF ['] translate-2 ELSE 0 THEN ; -> }T T{ ' translate-1 interpreting -> 1 }T T{ ' translate-1 compiling -> 2 }T T{ ' translate-1 postponing -> 3 }T \ set and get methods T{ 0 ' RS set-recognizer-sequence -> }T T{ ' RS get-recognizer-sequence -> 0 }T T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T T{ ' RS get-recognizer-sequence -> ' rec-1 1 }T T{ ' rec-1 ' rec-2 2 ' RS set-recognizer-sequence -> }T T{ ' RS get-recognizer-sequence -> ' rec-1 ' rec-2 2 }T \ testing RECOGNIZE T{ 0 ' RS set-recognizer-sequence -> }T T{ S" 1" RS -> 0 }T T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T T{ S" 1" RS -> ' translate-1 }T T{ S" 10" RS -> 0 }T T{ ' rec-2 ' rec-1 2 ' RS set-recognizer-sequence -> }T T{ S" 10" RS -> ' translate-2 }T ```