,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2025-03-21 13:49:08 antonio wrote: | example - First test for D2* | see: https://forth-standard.org/standard/double/DTwoTimes#contribution-375 `------------------------------------------ Maybe it was already discussed, but to me the first test should be: T{ 0. D2* -> 0. }T or maybe T{ 0. D2* -> 0 0 }T because the test is the same on both sides. ,---------. | Replies | `---------´ ,------------------------------------------ | 2025-01-23 08:00:15 AntonErtl replies: | requestClarification - where definition is compiled? | see: https://forth-standard.org/standard/core/ColonNONAME#reply-1417 `------------------------------------------ The standard generally leaves it to the system where it puts the compiled code. It might be in the dictionary, but it could also be elsewhere. Or it could be in the dictionary and elsewhere (e.g., Gforth puts threaded code in the dictionary, and native code elsewhere). The standard gives very few guarantees about this, so it also talks very little about it: There is "code space" in "2.1 Definition of Terms", and there is "3.3.2 Code Space" which does not give any guarantees. Your best bet at reclaiming code space is to use FORGET or MARKER, but there is not guarantee that these words actually reclaim code space. And given the complexity of implementation arising from that, I am thinking about changing Gforth such that it does not reclaim the native code when MARKER is used. I think this answers the question, so I am closing it. If there is anything unclear yet, write a reply and reopen it. ,------------------------------------------ | 2025-01-23 08:09:47 AntonErtl replies: | requestClarification - `NAME>STRING` result is transient | see: https://forth-standard.org/standard/tools/NAMEtoSTRING#reply-1418 `------------------------------------------ I think the idea was to allow systems that store definition names in a representation other than that returned by `NAME>STRING`. One example would be the fig-Forth representation of names, which sets the high bit of the last byte (fig-Forth only supports names in ASCII). Now that we have had `NAME>STRING` in the standard for a decade, we can look at the systems that actually implement this word. If they all return a name that lives as long as the definition, we could enhance this word by giving that guarantee. But who will examine the systems and make the proposal? ,------------------------------------------ | 2025-01-23 08:29:12 ruv replies: | requestClarification - where definition is compiled? | see: https://forth-standard.org/standard/core/ColonNONAME#reply-1419 `------------------------------------------ > how it is implemented, specially where the definition list of the word created using `:NONAME` is compiled. The standard intentionally does not specify many options, which fall into _implementation-defined_ options (which shall be documented) and _implementation-dependent_ options (which might be undocumented). Simple implementations typically reserve a big memory region for dictionary and use data space for code space too. Then, in direct and indirect threaded code, the words `compile,` and `lit,` is defined simply as: ``` : compile, ( xt -- ) , ; : lit, ( x -- ) ['] lit compile, , ; ``` And ```forth :noname 2 * ; ``` is equivalent to ```forth :noname [ 2 lit, ' * compile, ] ; ``` > is there any way to free the memory? This is possible using [`marker`](https://forth-standard.org/standard/core/MARKER): ```forth marker restore-dict : foo 123 . ; foo \ prints "123" restore-dict foo \ error: not found restore-dict \ error: not found ``` In some Forth system you can create many dictionaries and free them independently of each other. ,------------------------------------------ | 2025-01-23 09:57:57 ruv replies: | requestClarification - `NAME>STRING` result is transient | see: https://forth-standard.org/standard/tools/NAMEtoSTRING#reply-1420 `------------------------------------------ I have checked. There are about 22 Forth systems on GitHub that provide `name>string` _implemented in Forth_ (see the [search results](https://github.com/search?q=language%3Aforth+%2F%28%5E%7C+%29%3A+name%3Estring%2F&type=code)). Among these systems there is only one system, namely [solo-forth](https://github.com/programandala-net/solo-forth) (description: "Standard Forth system for ZX Spectrum 128 and compatible computers, with disk drives"), in which the word `name>string` [returns a string in a transit buffer](https://github.com/programandala-net/solo-forth/blob/a193fec867e75f0c9a0df9244070cf272789c3bd/src/lib/compilation.fs#L649). This system has to copy the resulting string into the transient buffer **because the header space and the data space are located in different address spaces**. ,------------------------------------------ | 2025-01-23 14:13:32 ruv replies: | requestClarification - `NAME>STRING` result is transient | see: https://forth-standard.org/standard/tools/NAMEtoSTRING#reply-1421 `------------------------------------------ So, the only advantage of a transient result is that **it allows to save memory** in some cases. And it only makes sense when saving 10-100 KiB of memory (say, 8% of compiled code size) matters. An example of approach that can benefit is the use of a [trie](https://en.wikipedia.org/wiki/Trie) data structure (prefix tree) to implement efficient searching of word lists. This data structure does not need to store entire strings. Therefore, to save memory, a transient string for a word name can be constructed each time it is needed. ,------------------------------------------ | 2025-02-13 13:14:50 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1422 `------------------------------------------ ## Re `interpreting` (similar arguments apply to the word `compiling` too) From the proposal's "Problem" section: > The Forth interpreter is stateful, but the API should avoid the problems of the `STATE` variable. In particular, an implementation without `STATE` should be possible, and there is only one place where the stateful dispatch is necessary. We should consider that Forth words **may do stateful dispatch** by themselves and they may rely on the value of `STATE`. Usually, the Forth system itself cannot determine whether a user-defined word perform stateful dispatch. Therefore, it is essential for the Forth system to ensure the `STATE` variable is correctly set to reflect the formal state of the Forth text interpreter when executing a user-defined word. The assumption that the value of `STATE` is irrelevant when _xt-int_ is executed—because _xt-int_ does not perform stateful dispatch itself—is flawed. This is because, when _xt-int_ is executed, it may invoke a user-defined word that performs stateful dispatch. The suggested word `INTERPRETING ( j*x xt -- k*x )` is confusing and useless, because it just executes _xt-int_ (obtained from _xt_) and does not ensure that the value of `STATE` is `0` before a user-defined word is invoked by _xt-int_. I [suggested](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1363) the word `execute-interpreting` that applies to **any** _xt_. When it applies to a token translator, the corresponding interpretation semantics are performed. And this correctly works even when a user-defined word that performs stateful dispatch is invoked by the token translator. ----- From the rationale to `TRANSLATE:`: > The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by `EXECUTE` to keep the API small. You can not simply set `STATE`, use `EXECUTE` and afterwards restore `STATE` to perform interpretation or compilation semantics, because words can change `STATE`, so you need the words `INTERPRETING` and `COMPILING` defined below. The provided specification does not guarantee that `interpreting` and `compiling` solve this problem, and in the reference implementation they do not solve the problem. The words `execute-interpreting` and `execute-compiling` solve the problem, and they do not need to know _xt-int_ or _xt-comp_ from a token translator. ## Re "postponing" state 1. What is a rationale to formally introduce "postponing" state? If you need it only for `]] ... [[`, then it's better to extract them into a separate proposal, and also provide a ground why this approach is better than implement `]]` as a parsing word. 2. Why do you need to specify that `]]` changes `STATE` to a third value if user-defined words do not see this value and cannot analyze this value? ## Re `set-state` and `get-state` It's unclear how these words can be used. It seems that the word `SET-STATE` is underspecified. Also, it's name is confusing, because it formally is not allowed to change `STATE` (at the moment). ## Re translators and `translate:` > **translator**: named subtype of xt, and executes with the following stack effect: name ( j*x i*x – k*x ) Why do you require a translator to be named? I use anonymous token translators (defined as quotations) and find them very useful. From the spec to `TRANSLATE:`: > Create a translator word under the name "name". This word is the only standard way to define a general purpose translator. It's necessary to define what a "general purpose translator" is, and it should be clear how a translator that is not a general purpose translator can be defined in a standard way. ## Minimize core To reduce the scope of discussion, we should minimize the core API. So, it is better to put the words `]]`, `[[`, `RECOGNIZER-SEQUENCE`, etc, to separate proposals. But a recognizer for local variables should be added, because they are already standardized and a Forth system that supports local variables must recognize them. ,------------------------------------------ | 2025-02-13 14:17:51 GeraldWodni replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1423 `------------------------------------------ @ruv: I agree, that it would be nicer to have sequences etc. not in the proposal. _However_: They are a good example of how recognizers can be used in practice, which makes me think they should stay inside to give some guidance to `rec:newbies`. We can still ask Bernd for further modifications, but I think we should do so grouped together after the meeting, to avoid unnecessary edits. ,------------------------------------------ | 2025-02-14 11:21:35 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1424 `------------------------------------------ I don't see a problem to separate the proposal into several smaller ones, especially taking optional parts out that belong together. The postpone mode can indeed either implemented loop-style (i.e. like PolyForth's `]`), or with a state; it shouldn't be necessary to specify the details. If you have `STATE`-smart words in your system or user-defined such words, the only way to get the correct interpretation and compilation semantics involves having `STATE` as expected, you can't just call `INTERPRETING` or `COMPILING` on a translator or use some table index mechanism as in the Trute proposal to call the right slot. If you don't have such things and have a Forth system where `STATE`-free replacement mechanisms are used for dual-semantics words (e.g. Gforth or VFX), and you don't define `STATE`-smart words yourself, you can actually use that API. That's why I think such an API can be actually standardized before we make `STATE` obsolescent and have standardized replacements available. ,------------------------------------------ | 2025-02-14 15:18:17 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1425 `------------------------------------------ > can actually be standardized I mean can't. We need to phase out `STATE` and define possible replacements before we can have a `STATE`less API. ,------------------------------------------ | 2025-02-15 09:57:48 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1426 `------------------------------------------ At the online meeting on 2025-02-13 I was asked to present a subproposal for factoring the state-dependent component out of `TRANSLATE:`. There are many possible ways to skin this cat, e.g., the one in Matthias Trute's proposal, or the way that present proposal used up to [v4](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-872) and earlier. Here I present a way that requires relatively few changes to the current version of this proposal. ### XY.3.1 Definition of terms Replace the definition of **translator** with: **translator**: a cell-sized opaque token that represents how a recognized lexeme can be interpreted, compiled, or postponed. A translator usually needs additional data about the recognized lexeme that is deeper in the stacks. Replace uses of *translator-xt* in `?NOTFOUND` with *translator*, and likewise for other words that, in [[r1412]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1412), consume or push the xt of a translator. ### XY.6 Glossary #### TRANSLATOR: Replace the definition of `TRANSLATE:` with `TRANSLATOR:` ( *xt-int xt-comp xt-post "name"* -- ) Skip leading space delimiters. Parse *name* delimited by a space. Create a definition for *name* with the execution semantics defined below. *name* is referred to as translator. *name* Execution: ( -- *translator* ) *translator* represents a translator with interpretation action *xt-int*, compilation action *xt-comp*, and postpone action *xt-post*.. #### Modified words: `INTERPRETING` ( *i\*x translator* -- *k\*x* ) Execute *xt-int* of *translator*. `COMPILING` ( *j\*x translator* -- *l\*x* ) Execute *xt-comp* of *translator*. `POSTPONING` ( *j\*x translator* -- ) Execute *xt-post* of *translator*. #### STATE-TRANSLATING Add: `STATE-TRANSLATING` ( *i\*x translator* -- *j\*x* ) Remove *translator* from the stack. If the system has a postpone state, and is currently is in postpone state, execute *xt-post* of *translator*. Otherwise, if the system is in interpretation state, execute *xt-int* of *translator*. Otherwise, execute *xt-comp* of *translator*. ### Discussion The benefit of having each translator word return a translator token is that one does not need to tick the translator words in all the recognizers. A slight improvement in writability and readability with no downside (compared to [[r1412]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1412)). The benefit of factoring out `state-translating` is that the `state` dependence can be confined to the place(s) that actually need state dependence: The standard Forth text interpreter (and user-defined text interpreters that are intended to work similarly). It does not infect all translators. ### Typical use The standard interpreter loop: ```` : interpret ( i\*x -- j\*x ) BEGIN parse-name dup WHILE forth-recognize ?found state-translating REPEAT 2drop ; ```` Implementation of `POSTPONE` is the same as in the existing proposal: ```` : postpone ( "name" -- ) parse-name forth-recognize ?found postponing ; immediate ```` The implementation of `'` becomes slightly shorter (no need to tick `translate-nt`: ```` : ' ( "name" -- xt ) parse-name forth-recognize ?found translate-nt <> #-32 and throw name>interpret ; ```` Now for interpreter loops that do not use `STATE`. First, the polyForth division of interpreter and compiler: ```` : parse-name-refill ( -- c-addr u ) begin parse-name dup 0= while 2drop refill 0= if 0 0 exit then repeat ; : ] ( i\*x -- j\*x ) BEGIN parse-name-refill dup while 2dup "[" str= 0= while forth-recognize ?found compiling REPEAT 2drop ; : pf-interpret ( i\*x -- j\*x ) BEGIN parse-name-refill dup WHILE forth-recognize ?found interpreting REPEAT 2drop ; ```` And here's one for colorforth-bw: ```` : cfbw-interpret ( i\*x -- j\*x ) begin parse-name dup while over c@ >r 1 /string forth-recognize ?found r> case '[' of interpreting endof '_' of compiling endof ']' of postponing endof -13 throw endcase repreat ; ```` The problem with these interpreters is that there is no standardized or proposed way to plug this `interpret` into the existing infrastructure (e.g., `included`), so the benefit of being able to write this is limited to one line (in case of colorforth-bw) or the rest of the file in case of the polyForth-style interpreter. But the recognizer proposal allows to replace `forth-recognizer`, and this allows us to plug in colorforth-bw into the text interpreter until further notice. I presented a way to do it with an earlier version of this proposal in [[r1397]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1397), here's a way for doing it with [[r1412]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1412) modified by this sub-proposal: ```` defer recognizer1 action-of forth-recognize is recognizer1 : translator-bw1 ( i\*x translator c -- j\*x ) case '[' of interpreting endof '_' of compiling endof ']' of postponing endof -13 throw endcase ; ' translator-bw1 dup dup translator: translator-bw : recognize-colorforth-bw ( c-addr u -- translator ) dup 0= if 2drop 0 exit then over c@ >r 1 /string recognizer1 r> over if translator-bw else drop then ; ' recognize-colorforth-bw is forth-recognize ```` ### Reference implementation: A straightforward implementation is: ```` : translator: ( xt-int xt-comp xt-post "name" -- ) create , , , ; : state-translating ( i\*x translator -- j\*x ) state @ if compiling else interpreting then ; ```` This does not cover a potential postpone state; if a system has a postpone state and can enter the standard text interpreter in this state, then the implementation of `state-translating` should be extended accordingly. Of course, this implementation of `state-translating` is far too inefficient for some tastes, so here's a more clever one: ```` : state-translating ( i\*x translator -- j\*x ) 2 state @ 0<> + cells + @ execute ; ```` For even more efficiency we can redefine `]' and '[': ```` defer state-translating : [ ( -- ) [ ( old implementation ) ['] interrpreting is state-translating ; immediate [ \ initialize state-translating : ] ( -- ) ] ( old implementation ) ['] compiling is state-translating ; ```` If there is a word that sets the postpone state, that word should also set `state-translating` accordingly. There are also a changes involving words that push literal translator tokens. In [[r1412]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1412) the translator word needs to be ticked, in this subproposal you do not do that. E.g., `rec-nt` now looks as follows: ```` : rec-nt ( addr u -- nt nt-translator | 0 ) forth-wordlist find-name-in dup IF translate-nt THEN ; ```` ,------------------------------------------ | 2025-02-16 18:45:17 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1427 `------------------------------------------ ## STATE-dependence [[r1412]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1412) still contains a defining word for state-dependent translators (and none for translators without this mistake), which are unacceptable to me. I have suggested an improvement in [[r1426]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1426). ## Dividing the proposal? There have been some discussions about dividing the proposal. I don't think that that's a good idea for the discussion, but in usage I see the division into the following hierarchy of use cases, which require different words; the later use cases usually require also implementing the words for the earlier use cases: 1. Programs that use the default recognizers. For them we need to specify a standard recognizer sequence (including how to deal with locals): `REC-NT` `REC-NUM` `REC-FLOAT` (if present) corresponds to Forth-2012. I expect that systems that have `REC-STRING` and `REC-TICK` to put these into their recognizer sequence, too. How do we document in the program documentation which recognizers are needed? Probably we need to extend the program documentation requirements (until now the recognition of doubles, floats and locals has been coupled with documenting the double, float and local wordset, respectively, but for `REC-STRING` and `REC-TICK` that's probably not the way to go). The new `POSTPONE` is also at that usage level. 2. Programs that change which of the existing recognizers are used and in what order. For them we need the names of the existing recognizers (not sure about the translators), `FORTH-RECOGNIZE`, `SET-RECOGNIZER-SEQUENCE`, `GET-RECOGNIZER-SEQUENCE`, `.RECOGNIZERS` (not yet proposed) and maybe `RECOGNIZER-SEQUENCE:`. If all the standardized recognizers are in `FORTH-RECOGNIZE` by default, there will probably not be much of this kind of usage, except maybe to put `REC-FLOAT` in front of `REC-NUM` (to recognize "1." as float; `REC-FLOAT` would have to be to defined in more detail for that to work). 3. Programs that define new recognizers that use existing translators. This usage needs the names of the translators. 4. Programs that define new translators. This usage needs `TRANSLATE:` (or `TRANSLATOR:`). 5. Programs that define text interpreters and programming tools that have to deal with recognizers (such as a recognizer-aware `postpone`). These programs need `INTERPRETING`, `COMPILING`, `POSTPONING` or `STATE-TRANSLATING`. A system with recognizers is a program of all these types, so all these words will be present in every such system (with the exception of some recognizers and related translators), so there is little point in making most of these words optional (except `rec-float`, `rec-string`, `rec-tick` and translators used only by those recognizers). But it is still a good idea to present the words divided by these usages. We usually present words in alphabetical order in the document. Should we continue this tradition for these words? If so, the division of words above should probably be documented in the rationale. ## For word counters Given that usage 5 above is rare in user programs, word counters may prefer to replace the four words `INTERPRETING`, `COMPILING`, `POSTPONING` or `STATE-TRANSLATING` with one word `TRANSLATING` ( i*x translator n -- j*x ) where * `0 TRANSLATING` is equivalent to `INTERPRETING` * `-1 TRANSLATING` is equivalent to `COMPILING` * `-2 TRANSLATING` is equivalent to `POSTPONING` * `STATE @ 0<> TRANSLATING` is equivalent to the reference implementation of `STATE-TRANSLATING` A simple Forth system has only one use of `POSTPONING` (in `POSTPONE`) and one use of `STATE-TRANSLATING` (in `INTERPRET`), so defining 4 words for the purpose may seem excessive. And replacing them with `TRANSLATING` saves a tiny bit of source code and memory. OTOH, there is no standard way to use `TRANSLATING` for `STATE-TRANSLATING` in the general case, where the system has a postpone state, because there is no standard way to determine postpone state. Moreover, the specification of `TRANSLATING` is not so nice (that's why I left it out in the above), and the code using it will be less readable. ## Gerund It's not clear to me why the gerund form is used (`INTERPRETING` etc.), although I kept with it for my suggestions (for consistency). I would use an imperative form; and because "interpret", "compile" and "postpone" are already taken, maybe something like `TRANSLATOR>INTERPRET` or somesuch, which would parallel `NAME>INTERPRET`. However, the latter pushes an xt, the former executes it, so either we let `TRANSLATOR>INTERPRET` also produce an xt, or use a slightly different naming scheme, such as `TRANSLATOR*INTERPRET`. ## GET-STATE SET-STATE It's unclear what `get-state` and `set-state` do, and their names suggest a stack effect ( -- f ) and ( f -- ). The reference implementation does not make that any clearer; in particular, the reference implementation of `set-state` does not make any sense at all, and I would not know why anybody would want to use `get-state`. ## [IF] parts This makes the proposal hard to understand and discuss. Take a decision (possible after asking around, but I doubt that anyone but you and maybe ruv has a proper basis for an opinion), put it in the proposal, and give a rationale for the decision in a section *Discussion*. ## Side effects I do not see a good way to specify in the normative part of the document that a recognizer must not have a side effect. The proposal mentions "supposed to" and "promise". The normative part says what specific words do (or there is an ambiguous condition). It seems to me that the discussion about side effects should go into the non-normative rationale. It's clear enough what happens when somebody uses a word that invokes a recognizer, and that recognizer has a side effect; no need for an ambiguous condition. ## NOTFOUND I have no preference here, but I remember that Matthias Trute presented a case for notfound, and that sounded convincing. Why do his arguments no longer hold (or did they not hold in the first place)? ## `FORTH-RECOGNIZE`, deferred or getter and setter? I see no benefits to having a getter and setter here. Deferred words are fine. ## Presentation The "Solution" chapter is not comprehensible except to those deep into the discussion: It is full of unexplained terms, such as "data parsing", "token type". And "translator" is not comprehensible to anybody who comes fresh to the proposal, and even to those who have seen some earlier recognizer proposals. The second part of "Solution" should be a separate section "Transition for some implementors/users of Matthias Trute's proposal". ## More `NOTFOUND` stuff The proposal defines `?FOUND`, `?NOTFOUND`, and `NOTFOUND` only for NOTFOUND=0. This looks like a bug to me. The stack effect of `?FOUND` and other words: We do not have "never" in the standard. What's that supposed to mean? ## XY.3.1 Translator "named subtype"? What's that? The rest of the wording is woefully inadequate. A careful specification would reveal the complexity that you get with state-dependent translators. ## ?NOTFOUND `?NOTFOUND` has a horrible stack effect. This word is not shown in any typical use examples? Is it needed? If it is needed, maybe the stack effects of the other words can be changed to make it unnecessary; although, admittedly, when I worked on combining recognizers, I did not find a solution with a nice stack flow (and I have tried). Hmm, maybe with a variant of `case` with a specialized variant of `of`? ## POSTPONE "if the exception wordset is not present". The exception wordset has been a required part of Forth200x for several years. ## SET-RECOGNIZER-SEQUENCE As specified, the sequence will always fit. Can the sequence fail to fit? If so, specify what happens. ## REC-NUM Should this be the all-singing, all-dancing variant (including doubles, number prefixes and '')? Given existing practice and the legacy code base, yes. OTOH, with recognizers it seems a conceptually attractive option to have the `rec-num` be a decomposable sequence consisting of the various cases. But given nestable recognizer sequences, that's always an option for the future. ## SCAN-TRANSLATE-STRING This should follow C conventions for newlines like the rest of the string syntax, i.e., escape newlines with \\. If other conventions are desired (e.g. what may or may not be JSON syntax), that would be for another recognizer and another translator. The specification should be clear about what it does: "`REFILL` can be used to read in more lines" is neither here nor there. ## TRANSLATE-STRING ?SCAN-STRING What are these words good for? `REC-STRING` apparently does not need them. ## [[ A word without interpretation nor compilation semantics? Should we specify whether there is a postpone state, or alternatively that `]]` has its own text interpreter loop? There are ways to distinguish these two kinds of implementation; does it matter? Maybe if you want to `EVALUATE` something in postpone state or somesuch. `]]` and `[[` should probably go into a separate proposal. ## STATE Changing the specification of `state` such that there is at least one non-zero value that does not mean "compilation state" is not an extension of the current specification of `state`, but a change. However, existing practice of systems which use -2 as postpone state suggest that this does not break existing code in practice. That's probably because so little existing code actually uses postpone state. With wider use of postpone state, some breakage may actually turn up. The safe option would be to represent postpone state (if we have it at all) in a way other than through a value of `STATE`. E.g., have another variable `POSTPONE-STATE`: if it's false, then `STATE` determines the state; if it's true, the system is in postpone state. In any case, if we put `]]` in another proposal, that's where we should have this discussion. ,------------------------------------------ | 2025-02-16 23:03:46 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1428 `------------------------------------------ # Multiline strings I don't think C is setting a good example. Nobody took C's syntax for proper multiline strings, not even C++. C is still an important legacy language, but COBOL also is in the top 20. You don't want to have multiline strings like COBOL. * C++11 got raw strings, and gcc supports them even in C. The syntax has a `R"(` as start, and a `)"` as end (with the option of adding more letters to disambiguate the string ending). Raw strings don't translate backslash+characters, which is often what you want, because the multiline string is actually some other programming language, and the editor is fine inserting all the characters you want there without escapes. Note that you need some way to disambiguate the string ending in a raw string, as you can't escape `"`. * Rust, Visual Basic (≥14), R, Ruby, and PHP strings are multiline by default (inserting newlines where the string has line breaks) * JavaScript (using template literals) and Go uses `` ` `` (backtick) for multiline (raw) strings * C# uses `@"` to start a multiline string * SQL uses `'` (single quote) for multiline strings * Java 15 has text blocks (with `"""` as start and end) * Python use either `"""` or `'''` or for multiline strings Nobody makes proper multiline strings like C. Really nobody. Not even recent C compilers, they follow C++. I'm now at item 20 of Tiobe index, and most languages nowadays have multiline strings one way or the other. Getting Emacs to recognize multiline strings was easy: Just remove the `\n` from the end of string pattern. Emacs likes multiline strings. JSON-variants with multiline strings are likely from developers that use Ruby or PHP. You have to deal with this sort of stuff. The most popular option seem to be multiline strings by default, when legacy (e.g. through a C-like syntax) isn't a problem. As we are adding a new syntax for string literals, we don't need to care about backwards compatibility. One popular feature is to remove blanks from auto-indented strings, as editors indent these strings. Strictly speaking, if we support non-raw multi-line strings, we could even parse C strings, if a \ as last character is defined as “don't add a newline here” (instead of “unfinished escape sequence”). ,------------------------------------------ | 2025-02-17 00:37:20 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1429 `------------------------------------------ Bernd [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1424): > If you have `STATE`-smart words in your system or user-defined such words, the only way to get the correct interpretation and compilation semantics involves having `STATE` as expected, you can't just call `INTERPRETING` or `COMPILING` on a translator or use some table index mechanism as in the Trute proposal to call the right slot. Right. > If you don't have such things and have a Forth system where STATE-free replacement mechanisms are used for dual-semantics words (e.g. Gforth or VFX), and you don't define STATE-smart words yourself, you can actually use that API. In Forth, you almost always have such things, because you have `EVALUATE` and `INCLUDE-FILE`, which depend on `STATE`. `INCLUDE-FILE` translates a file, `EVALUATE` translates a string. In practice, it's also necessary to translate a single lexeme, or even a single semantic token (like a number, xt, nt). Bernd [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1425): > We need to phase out `STATE` and define possible replacements before we can have a `STATE`less API. Recognizers already don't depend on `STATE`. Only _some_ token translators depend on `STATE`. But we cannot avoid them in Forth system, and cannot eliminate `STATE`. The existence of interpretation semantics and compilation semantics of Forth words is associated with two modes (states) of the Forth text interpreter: interpretation state and compilation state. **The only way** to essentially eliminate `STATE` is to eliminate one of these modes and the corresponding semantics. For example, one could remove interpretation state and interpretation semantics of words. This is possible, but the resulting language will not be backwards compatible with Standard Forth, since any parsing word must be an "immediate" word in this language. For example, without interpretation state it's impossible to translate the following program: ```forth : my' ['] ' execute ; my' my' constant mytick-xt ``` Changing the search order outside of definitions is also problematic: ```forth also myvoc myword ( x ) previous constant my-x ``` In this line, `myword` must be recognized in the modified search order. This is only possible in interpretation state, which means that the next lexeme is recognized only after the previous lexeme has been recognized and executed. **Factor** is an example of a Forth-like language without interpretation sate. There, ordinary words are always "compiled" (added to AST), parsing words (and syntax words) are always immediately executed. See: [Factor / Syntax / Parser algorithm](https://docs.factorcode.org/content/article-parser-algorithm.html). ,------------------------------------------ | 2025-02-17 01:31:51 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1430 `------------------------------------------ Anton [writes](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1426) > Add: > `STATE-TRANSLATING` _( i*x translator -- j*x )_ Why is this better than making _translator_ a subtype of _xt_, and using `EXECUTE` instead of `STATE-TRANSLATING`? The benefits of making _translator_ a subtype of _xt_: - no need for a separate word (for word counters); - a translator can be defined as a quotation or anonymous definitions (sometimes this is very convenient); - a new translator can be simply defied using other translators; - an example for illustration the idea: ```forth : translate-2lit ( 2*x -- 2*x | ) >r translate-lit r> translate-lit ; ``` in some my implementations [example](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/variant.gamma/postpone/auto.via-mmode.fth), `postpone` correctly applies to a lexeme that is recognized into a qualified semantic token with this translator. - the Forth text interpreter loop can be re-used for other purposes; - just for illustration, reuse the Forth text interpreter to count lexemes in a string: ```forth : count-lexemes ( sd.string -- u ) 0 rot rot ['] example.evaluate [: 2drop 1+ ['] noop ;] apply-perceptor ; s" a b c d" count-lexemes . \ prints "4" ``` See the `apply-perceptor` word definition in [recognizer-api-ext.fth](https://github.com/ForthHub/fep-recognizer/blob/master/implementation/variant.gamma/recognizer-api-ext.fth) ,------------------------------------------ | 2025-02-17 06:33:43 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1431 `------------------------------------------ Anton [writes in [r1426], 2025-02-15](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1426), in the "Discussion" sub-section: > The benefit of factoring out `state-translating` is that the `state` dependence can be confined to the place(s) that actually need state dependence: The standard Forth text interpreter (and user-defined text interpreters that are intended to work similarly). It does not infect all translators. This seems irrelevant to the question of whether _translator_ is a subtype of _xt_ or not. I don't see any benefit of using `state-translating` against `executefor the API users. Please note that this is irrelevant to the question of whether _translator_ is a subtype of _xt_ or not. For example, Translator is `1+` ``` : count-lexemes ( sd.string -- u ) 0 rot rot ['] example.evaluate [: 2drop ['] 1+ ;] apply-perceptor ; I provided above an example of a translator that is an _xt_, and the **only difference** to the API users is whether `state-translate` or `execute` is used. And the latter allows provides more useful use cases. ,------------------------------------------ | 2025-02-17 06:45:34 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1432 `------------------------------------------ The above message is a draft that was sent accidentally. A better edition is below -) Anton [writes in [r1426], 2025-02-15](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1426), in the "Discussion" sub-section: > The benefit of factoring out `state-translating` is that the `state` dependence can be confined to the place(s) that actually need state dependence: The standard Forth text interpreter (and user-defined text interpreters that are intended to work similarly). It does not infect all translators. This seems irrelevant to the question of whether _translator_ is a subtype of _xt_ or not. I don't see any benefit of using `state-translating` against `execute` for the API users. For example, in the case of `execute` a translator can be even as simple as `1+`: ``` : count-lexemes ( sd.string -- u ) 0 rot rot ['] evaluate [: 2drop ['] 1+ ;] apply-perceptor ; ``` This translator is not infected either by `state` or by a dummy triple _( xt-int xt-comp xt-post )_ . ,------------------------------------------ | 2025-02-17 15:29:33 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1433 `------------------------------------------ I use recognizers for non-Forth languages. These languages are usually state-free, i.e. they are interpret- or compile-only. Using a quotation for the translator is completely sufficient. E.g. the recognizer in net2o's chat message that matches URLs has ``` [: rework-% $, msg-url ;] ``` as translator. No need to define a triple-entry translator table. And the translators are indeed all that short, and there's no reusability (a token translates 1:1 to a command plus a way to add the corresponding data). This thing used to be a bit more complex when it was still based on the Trute recognizers, because then, I always needed a table, and used only one slot of it (I ended up with the generic name-translator, and just put the xt I wanted to execute on the stack underneath, so it worked in interpretation state, but was actually compiling message into a buffer). The text messages are parsed by standard `EVALUATE`, but a language-specific recognizer stack that has no single Forth recognizer in it. Therefore I disagree with Anton that the current translator concept ties `STATE` to every translator: it's the other way round. It ties them only to full-blown Forth translators that work in a mixed interpreter/compiler language, where there is a state (and there, it is inevitable, and you can move that dispatch only around). You can define translators used by Forth with `TRANSLATE:`, but you can define translators used by other (single-state) languages just as ordinary xt, and with a single action for translation. There's no need for the table and dispatch if your language has no state at all, it's just `EXECUTE` of the one single action. When you want to reuse slots of system translators in Gforth (e.g. for a Color-Forth clone), you can use `action-of interpreting`/`compiling`/`postponing` ( translator -- xt ) to access the subfields. That's, because all these accessing words are just identical to the defer field for value-style structures. E.g. Anton's example could be ``` : cf-recognizer ( <[_]>addr u -- data translator | 0 ) sp@ fp@ {: sp' fp' :} over c@ >r 1 /string recognizer1 dup 0= IF rdrop EXIT THEN case r> '[' of action-of interpreting endof '_' of action-of compiling endof ']' of action-of postponing endof fp' fp! sp' sp! 2drop 0 dup endcase ; ``` and that works (the vocabularies used by the colorForth core wouldn't have any `STATE` in it). That way, you don't need to write your own outer interpreting colorForth, the standard Forth interpreter does it. My design assumption was that making all new data types (recognizer sequences, translators) subtypes of xt, and therefore executable, will pay off, and it did. ,------------------------------------------ | 2025-02-19 21:50:22 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1434 `------------------------------------------ ## Multiline strings Checking on Python3, I see that it uses C's syntax for strings starting with `"`. In particular, if you just do a newline in the middle of a string without escaping the newline, you get an error: ``` >>> print("abc File "", line 1 print("abc ^ SyntaxError: unterminated string literal (detected at line 1) ``` An escaped newline is ignored, and you need to write `\n` to get an actual newline. I expect that it's the same for most other languages you mention, because they all use a different syntax for "proper multiline strings". I have no problem with an additional recognizer for "proper multiline strings" with a distinguishable syntax (such as `"""`); I can even live with `rec-string` doing the additional syntax, but I think that there might be others who will disparage it as a WIBNI or somesuch. But I think that, for `"`-delimited strings, `rec-string` should either not do multi-line strings at all or do it the C/Python3/etc. way. ## STATE-TRANSLATING > Why is this better than making translator a subtype of xt, and using > EXECUTE instead of STATE-TRANSLATING? It is better because it isolates the state-dependence in the word(s) calling `state-translating` rather than having it in the translator coming out of the recognizer and potentially being invoked through any `execute`, `compile,`, `is` or `defer!` in the system (with data-flow analysis necessary to reduce the number of potential invocations, and the result of that analysis probably still showing more occurences than what searching for `state-translating` would otherwise give us). It's similar to the difference between arming a bomb at the factory, or arming it only just before dropping it (which may never happen). ## Examples of translators not produced with `translate:` The proposal states about `translate:` ``` This word is the only standard way to define a general purpose translator. ``` Any argument based on defining translators in other ways is therefore not in line with the proposal. This applies to [[r1432]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1432) as well as [[r1433]](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1433). So the usages you show may work on some particular implementation, but may fail on a different implementation of the proposal. And if you are willing to design an implementation for some convenient code of your interpretation-only recognizers, I am sure that your are able to design an implementation of recognizers with `state-translating` that's just as convenient. ,------------------------------------------ | 2025-02-20 16:40:03 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1435 `------------------------------------------ ## Making _translator_ a subtype of _xt_ >> Why is this better than making _translator_ a subtype of _xt_ > It is better because it isolates the state-dependence in the word(s) calling `state-translating` rather than having it in the translator coming out of the recognizer and potentially being invoked through any `execute`, `compile,`, `is` or `defer!` in the system (with data-flow analysis necessary to reduce the number of potential invocations, and the result of that analysis probably still showing more occurences than what searching for `state-translating` would otherwise give us). 1\. It **does not isolate** the state-dependence in the word(s) calling `state-translating` — because `interpreting` and `compiling` will also exhibit state-dependent behavior on some arguments. At the same time, `state-translating` will exhibit **state-independent** behavior on some arguments. 2\. If the user prefer the word `state-translating` because it allows him to find invocations of translators in his code, he can define this word as `synonym state-translating execute` and use in his code. 3\. Given the choice between the ability to find invocations of translators and the [set of benefits](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1430) that an _xt_ subtype provides, I would prefer the latter. > Any argument based on defining translators in other ways is therefore not in line with the proposal. Yes, but they are aimed at changing the proposal )) > And if you are willing to design an implementation for some convenient code of your interpretation-only recognizers, I am sure that your are able to design an implementation of recognizers with state-translating that's just as convenient. Bernd [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1433): "This thing used to be a bit more complex when it was still based on the Trute recognizers, because then, I always needed a table, and used only one slot of it". ## Side effects Anton [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1427): > It seems to me that the discussion about side effects should go into the non-normative rationale. Agreed. A standard word cannot have an unspecified side effect that can be detected by a standard program. Therefore, it's sufficient to specify the allowed effects for standard recognizers and for the perceptor in the standard Forth system. ## NOTFOUND Anton [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1427): > I have no preference here, but I remember that Matthias Trute presented a case for notfound, and that sounded convincing. Why do his arguments no longer hold (or did they not hold in the first place)? In Matthias Trute's [proposal](https://forth-standard.org/proposals/recognizer?hideDiff#reply-897) I don't see any arguments why NOTFOUND is better than zero. See my arguments why zero is better in the section "Special data object on failure considered harmful" of my comment [[r1351] 2024-10-08](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1351). ## FORTH-RECOGNIZE, deferred or getter and setter? Anton [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1427): > I see no benefits to having a getter and setter here. Deferred words are fine. Anton, you [wrote](https://www.novabbs.com/devel/article-flat.php?id=28039&group=comp.lang.forth#28039) in comp.lang.forth on 2024-10-05: "I wish they had defined `GET-BASE` and `SET-BASE` instead of `BASE`". You seem to see shortcomings of `BASE`. The shortcomings of the deferred word `FORTH-RECOGNIZE` are similar: if additional actions are needed on set or get the value, this is difficult to implement in a system and almost impossible in a program. And this word cannot be redefined by a program. ,------------------------------------------ | 2025-02-20 21:39:26 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1436 `------------------------------------------ # Multiline Strings Anton, you seemed to miss the largest group that does it identical: Rust, Visual Basic (≥14), R, Ruby, and PHP. A number of languages who started with C-like strings did not continue to follow that example and then obviously needed a different syntax to stay backwards compatible (VB didn't have C-like strings to begin with and the multiline extension was compatible). If we introduce multiline strings as new feature, we should not first copy a bad example and then add another syntax to fix that. The primary reason why C's multiline string style is so weird is the preprocessor: the preprocessor has a rudimentary understanding of the language, **and** it uses \ at the end of the line to concatenate multiple lines. That makes it create single-line entries out of strings, and then it understands that all this is just one string it shouldn't look inside (C macros aren't replaced in strings). There's absolutely no need to copy C's weird strings caused by their weird preprocessor approach into Forth. ,------------------------------------------ | 2025-02-22 09:09:43 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1437 `------------------------------------------ Checking Ruby (the only of this bunch of languages that I have installed), I see that it indeed has strings that include newlines. So yes, there are programming languages that allow unescaped newlines in their most popular string syntax, instead of introducing a separate syntax for multi-line strings. Why is this a mistake? The common case is that a string ends on the same line where it started. If the string terminator is missing on that line, it is often a mistake, and a friendly programming language has a syntax for single-line strings that allows catching the mistake right on that line. By contrast, in Ruby I get: ```` [~:155788] ruby puts 'hello, puts 'world' -:2: syntax error, unexpected local variable or method, expecting end-of-input puts 'world' ```` So it gives me a misleading error message on a different line from where the mistake happened, possibly several lines later. That's why many languages require either escaping the newline or a different syntax for multi-line strings. The reason for that has nothing to do with backwards compatibility: These languages report an error if there is an unescaped newline in a string with the most popular syntax. Defining that case to do what you want rather than as an error does not break any existing, working programs. The reason also has nothing to do with the C preprocessor. The C preprocessor has to know when something is inside or outside a string (it must not do macro expansion inside string literals), so it could just as well accept a newline inside a string. I don't have an opinion on whether we should use an alternative delimiting syntax for multi-line strings, escape the newlines in the syntax for single-line strings, or have both options. ,------------------------------------------ | 2025-02-22 10:04:11 AntonErtl replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1438 `------------------------------------------ ## Making translator a subtype of xt 1. Yes, having the translators not being `state`-dependent does not prevent people from performing `state`-dependent code (and it should not). But what it gives me is that if I avoid such code (and I do), I do not have to worry about state-dependence in every `execute`, `compile,`, `is` and `defer!`, only in `state-translating`. 2. Defining `state-translate` as an alias of `execute` does not help, because the `state`-dependence is in the words defined with `translate:`. Every other `execute`, `compile,`, `is` or `defer!` might still do something `state`-dependent because of that even if I have no other source of `state`-dependence in my program. 3. My preference is for translators without `state`-dependence. As for Bernd Paysan simplifying code when rewriting it, sure, that's his way. That's why I expect that, if he puts his mind to it, he will design an implementation of recognizers without `state`-dependent `translate:` children that's just as convenient. ## NOTFOUND After complaints about the proposal being too long, Matthias Trute removed lots of the rationale in one version of his proposal, including the rationale for not using 0. Of course the same person who had earlier complained about the length then complained about incomprehensibility. Anyway, you can find earlier versions of the proposal through [Forth200x](http://www.forth200x.org/); there is also a link to the split-out comments there. ## FORTH-RECOGNIZE, deferred or getter and setter? For `base`, for the optimization I have in mind, one would have to check on every use of `#` whether `base` has changed in the meantime, and that cost would be substantial compared to the benefit of the optimization. And it's not just `set-base` that would avoid the problem: If `base` was a uvalue (not a uvarue), it would be relatively easy to eliminate the check in Gforth. For `forth-recognize`, I have no such optimization in mind. If I had, it would be relatively easy to implement in Gforth without change check for `forth-recognize`, because `forth-recognize` is a deffered word, not a variable. But even on a system where you cannot attach the optimization to the `defer!` method of `forth-recognize`, inserting a change check would be much less of a problem than for `base`: `forth-recognize` tends to be much more expensive than `#`, so any optimization with noticable benefit will also reduce the cycles per invocation much more, easily amortizing the change check. Anyway, given that nobody has proposed some actual benefit from having a getter and setter, we should follow Chuck Moore's advice here: Do not speculate. In this case, this means not introducing a getter and setter. ,------------------------------------------ | 2025-02-22 21:41:29 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1439 `------------------------------------------ ## Making translator a subtype of xt > Defining `state-translate` as an alias of `execute` does not help, because the state-dependence is in the words defined with `translate:`. Every other `execute`, `compile,`, `is` or `defer!` might still do something state-dependent because of that even if I have no other source of state-dependence in my program. Does this mean that, according to your idea, a Forth system is not allowed to define `state-translate` as an alias of `execute`? Otherwise, if a Forth system is allowed to provide such an implementation, then defining `state-translate` as an alias of `execute` in your program is not distinguishable from such system's implementation. Other parts of your program simply should not know whether `state-translate` is alias of `execute` or not, and so don't depend on that fact. In general, other parts can do something state-dependent regardless whether `state-translate` is alias of `execute`. One can write: ```forth defer foo : bar ... state-translate ... ; ' bar is foo \ `foo` is state-dependent now (in the general case) : baz ... foo ... ; \ `baz` is state-dependent (in the general case) ``` On the other hand, if you do not have other sources of state-dependence in your program (including `evaluate` and `include-file`), and you only perform translators using `state-translate`, **how** can `execute` do anything state-dependent in your program other than calling something that calls `state-translate`? ## NOTFOUND > Of course the same person who had earlier complained about the length then complained about incomprehensibility. Just in case, it wasn't me who complained about the length ;-) > Anyway, you can find earlier versions of the proposal through Forth200x; there is also a link to the split-out comments there. Thank you, there is a [`RECTYPE-NULL` necessity](http://www.forth200x.org/Recognizer-rfc-D-comments.html#rectype-null-necessity) section in the split-out comments. In this section the author argues that `RECTYPE-NULL` (against `0`) simplifies the implementation. But the author only considers cases when result of recognizing is used for translation. He **does not consider** cases when the result of recognizing is used to obtain a semantic token itself (a number, xt, nt, etc). Thus his argument did not hold in the first place. Because there is no point in simplifying a small part of a program at the expense of complicating a larger part. As I have shown, using `0` simplifies programs as a whole, an it is more consistent. ## FORTH-RECOGNIZE, deferred or getter and setter? Anton, I see that you consider only optimization and only in Gforth. I consider programs that extend standard Forth systems in general. For example, if I want to implement `append-perceptor ( xt-recognizer -- )` and `prepend-perceptor ( xt-recognizer -- )`, I may have to redefine the setter `set-perceptor ( xt-recognizer -- )` and getter `perceptor ( -- xt-recognizer )`. This is impossible without a getter and setter. ,------------------------------------------ | 2025-02-23 16:28:37 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1440 `------------------------------------------ The C preprocessor is by design line oriented, and can't see beyond a single line. This is unlikely most modern programming languages, which aren't line oriented anymore (Fortran and COBOL e.g. are line-oriented languages, and need line continuation characters, either `&` at the end in FORTRAN, or '-' in column 7 in COBOL in the next line). Forth is in many respects not a line-oriented language, but it has some line-oriented limitations (e.g. with `PARSE`). What we should talk about is to escape line breaks if they shouldn't go into the output and are only in the string to facilitate editability. Then you can copy-paste a C multiline string, and it also works. And when you forget the closing quote of a string in Forth, you get weird errors, even within the same line. The way to figure what goes wrong is by using a syntax highlighting editor that knows about strings (and if they go multi-line). ``` : .error-line ( line# error# -- ) ." error . ." in line" . ; *the terminal*:2:23: error: Undefined word ." error . ." in >>>line"<<< . ; ``` Yes, I forgot the closing quote after “error”. ,------------------------------------------ | 2025-02-24 00:39:15 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1441 `------------------------------------------ # Making translator a subtype of xt > As for Bernd Paysan simplifying code when rewriting it, sure, that's his way. That's why I expect that, if he puts his mind to it, he will design an implementation of recognizers without `state`-dependent `translate:` children that's just as convenient. We had that before. It was less convenient. The whole point is that the translator is or isn't `state`-dependent, depending on the language you are creating (if it is Forth, it is). The result of moving the state dispatch around showed that this is the position where you actually can get rid of it when your language doesn't have states. You actually don't get rid of the state-infested translator if you say “this is a table, and in order to handle what's in there in the interpreter, you need `state-translating`“. It's still infested with the concept of states. By putting `state-translating` into the interpreter, which is a reusable component (you can just replace the entire recognizer stack and read in different languages with normal words like `included` or `evaluate`), you force this concept upon all translators, whether their language has that concept or not. We now have these direct access words (`interpreting`, `compiling`, and `postponing`), and their use is very limited. Two of the three serve as text for the prompt. `postponing` is used in `postpone`. And there's the possibility in Gforth, to extend these tables to further states for other languages, which reuse existing recognizers, by patching their operation into the additional field. The newly created operator is used to populate the tables, and to set the state, and that's it. If you want to get completely rid of `state` in the long run, put it into the Forth-specific translators. If your modified Forth-like language doesn't need `state` anymore, your translators won't need it, either. And then, it's just gone. ,------------------------------------------ | 2025-02-26 16:36:23 ruv replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1442 `------------------------------------------ ## Named translators Anton [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1426) on 2025-02-15: > The benefit of having each translator word return a translator token is that one does not need to tick the translator words in all the recognizers. A slight improvement in writability and readability with no downside This has several disadvantages: - when you use a translator to translate a semantic token, you have to do it via `execute` (or compile a call using `compile,` directly); - e.g.: `xt-translator execute`; - when a new translator is defined using other translators, you have to call them via `execute` (or `compile,`); - if you define new translators as colon definitions (which is very convenient), these translators do translation on execution, and if standard named translators return xt on execution — this will lead to inconsistency. On the other hand, the need of ticking the translator words in recognizers is mitigated when we use a tick recognizer. Thus, instead of `['] translate-xt` we can write `'translate-xt` (or with back-tick in Gforth's parlance). ## Gerund Anton [wrote](https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers?hideDiff#reply-1427) on 2025-02-16: > It's not clear to me why the gerund form is used (`INTERPRETING` etc.), I think, they are temporary quick and dirty names. According to the naming convention of standard words, the names of these words must begin with an English verb, or be just an English verb, because they perform some actions with side effects, i.e. change some states (the reverse is not true). ,------------------------------------------ | 2025-03-06 01:10:47 BerndPaysan replies: | proposal - minimalistic core API for recognizers | see: https://forth-standard.org/proposals/minimalistic-core-api-for-recognizers#reply-1443 `------------------------------------------ # Gerund One reason for using this is that the usage of these words has been shown extremely limited (other than `postponing`, which is used once in `postpone`), and one of the remaining use cases was to print the current state as readable text in the prompt by just doing `get-state id.` (`id.` is getting from the xt to the nt, and then does `name>string type`). Grammar-wise, it also looks more natural to use the gerund here.