,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2020-09-05 15:09:39 AndrewHaley wrote: | proposal - An alternative to the RECOGNIZER proposal | see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#contribution-159 `------------------------------------------ As I've said more than once, I think the proposal is too complex and does too much. But I've been challenged to up with an alternative I think is better. I'd like it to: - Allow user-defined literals, dot parsers, etc. - Make small changes to the standard. - Allow simple implementations. - Have a small API surface. - Be easy to use. - Avoid adding any ambiguous conditions. - Work well with the rest of the system and is as non-intrusive as possible. With regard to that last one, I would like recognizers to be associated with wordlists, so that when a recognizer is defined it is in the current wordlist, and when that wordlist is visible so is the recognizer. Recognizer ordering therefore should follow the search order and not use a separate ordering and visibility mechanism. When a recognizer is forgotten, it should simply disappear. This is ideal for libraries that define recognizers and live in their own wordlist: to use a library add its wordlist to the search order, and the recognizers come too. A rough draft of my proposed change is at https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt The new part is section 3.X.X. An example of such a recognizer for ` quoted characters is ``` : rectype-num ( n mode -- n | ) 2 = if postpone literal then ; : [recognize] ( a n mode - x...x t ) -rot 3 = if dup c@ [char] ` = if dup 2 + c@ [char] ` = if 1+ c@ swap rectype-num true exit then then then drop 0 ; ``` For simplicity I haven't included POSTPONE actions or recognizer queries in this proposal, but they are trivial to add if we ever agree that they should be. [And yes, the numbered modes should really be named constants, but that makes no difference to the idea.] An implementation of this proposal (for SwiftForth) follows. I suspect it'd be similar in most Forths. ``` \ Execute all of the recognizers defined in wordlist WID until one \ succeeds. : (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 ) s" [recognize]" rot hashed {: a n mode link :} link begin @rel dup @ while dup l>name count s" [recognize]" compare(cs) 0= if to link a n mode link link> execute if ( success?) link link> exit then link then repeat drop a n mode 0 ; \ Execute all of the recognizers in the search order until one \ succeeds. : recognizers ( a n mode -- x...x xt | a n 0 ) context #order @ cells over + swap do i @ (execute-recognizers) ?dup if unloop exit then cell +loop drop 0 ; ``` [ I also had to patch the SwiftForth interpreter to call RECOGNIZERS, but I haven't included that here. ] ,---------. | Replies | `---------´ ,------------------------------------------ | 2020-09-04 11:53:28 ruv replies: | proposal - Nestable Recognizer Sequences | see: https://forth-standard.org/proposals/nestable-recognizer-sequences#reply-489 `------------------------------------------ > `get-rec-sequence ( xt -- xt1 .. xtn n )`
> If xt refers to a recognizer sequence, return the contained recognizers. If xt refers to a deferred word, perform DEFER@ followed by GET-REC-SEQUENCE (i.e., GET-REC-SEQUENCE works through deferred words).
> IF xt refers to neither, return 0. If recognizer sequences are immutable, a recognizer that is not a sequence can be viewed as a sequence with a single element. I.e., `get-rec-sequence` can have effect `( xt -- xt 1 )` for such recognizer. Can it be useful? ,------------------------------------------ | 2020-09-04 17:52:36 UlrichHoffmann replies: | proposal - Recognizer | see: https://forth-standard.org/proposals/recognizer#reply-490 `------------------------------------------ The output of the scanner is often called "token class": From Wikipedia ([Lexical Analysis](https://en.wikipedia.org/wiki/Lexical_analysis)) > Lexers and parsers are most often used for compilers, but can be used for other computer language tools, such as prettyprinters or linters. Lexing can be divided into two stages: the scanning, which segments the input string into syntactic units called lexemes and categorizes these into **token classes**; and the evaluating, which converts lexemes into processed values. Also see ["Modern Compiler Design" Second Edition](https://www.amazon.com/Modern-Compiler-Design-Dick-Grune/dp/1461446988), Dick Grune • Kees van Reeuwijk • Henri E. Bal • Ceriel J.H. Jacobs • Koen Langendoen they use the same term. So it might be reasonable to call `RECTYPE:` just `TOKEN-CLASS`. ,------------------------------------------ | 2020-09-04 20:21:59 UlrichHoffmann replies: | proposal - Recognizer | see: https://forth-standard.org/proposals/recognizer#reply-491 `------------------------------------------ or maybe `TOKENCLASS` ? ,------------------------------------------ | 2020-09-05 15:23:52 AndrewHaley replies: | proposal - An alternative to the RECOGNIZER proposal | see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-492 `------------------------------------------ As I've said more than once, I think the proposal is too complex and does too much. But I've been challenged to up with an alternative I think is better. I'd like it to: - Allow user-defined literals, dot parsers, etc. - Make small changes to the standard. - Allow simple implementations. - Have a small API surface. - Be easy to use. - Avoid adding any ambiguous conditions. - Work well with the rest of the system and is as non-intrusive as possible. With regard to that last one, I would like recognizers to be associated with wordlists, so that when a recognizer is defined it is in the current wordlist, and when that wordlist is visible so is the recognizer. Recognizer ordering therefore should follow the search order and not use a separate ordering and visibility mechanism. When a recognizer is forgotten, it should simply disappear. This is ideal for libraries that define recognizers and live in their own wordlist: to use a library add its wordlist to the search order, and the recognizers come too. Here's what I suggest as a revised standard. Note that the only change to the test interpreter is Section d. Section 3.X.X is the new part ``` Text interpretation (see 6.1.1360 EVALUATE and 6.1.2050 QUIT) shall repeat the following steps until either the parse area is empty or an ambiguous condition exists: a) Skip leading spaces and parse a name (see 3.4.1); b) Search the dictionary name space (see 3.4.2). If a definition name matching the string is found: 1) if interpreting, perform the interpretation semantics of the definition (see 3.4.3.2), and continue at a). 2) if compiling, perform the compilation semantics of the definition (see 3.4.3.3), and continue at a). c) If a definition name matching the string is not found, attempt to convert the string to a number (see 3.4.1.3). If successful: 1) if interpreting, place the number on the data stack, and continue at a); 2) if compiling, compile code that when executed will place the number on the stack (see 6.1.1780 LITERAL), and continue at a); d) Execute any recognizers in the dictionary name space (see 3.X.X) If any of them succeeds, continue at a). e) If unsuccessful, an ambiguous condition exists (see 3.4.4). 3.X.X User-defined recognizers Do the following until no more recognizers are found or one of them succeeds: 1) Search the dictionary name space (see 3.4.2) for a definition whose name is "[RECOGNIZE]". This defintion is called the found recognizer. If none is found, the search is terminated. 2) If found, perform the interpretation sematics of the found recognizer, passing it the string which has been parsed and a mode which is 1 if interpreting, 2 if compiling: [RECOGNIZE] ( a n mode -- flag) If the flag returned is true, terminate the search. [ Comment: The found recognizer performs some kind of action, perhaps compiling something into the dictionary, then returns true or false. ] If the flag returned is false, continue at 1), but searching the dictionary name space from the point of the definition which precedes the found recognizer. [ Discussion: This isn't as inefficient as might seem at first glance because the result of searching the dictionary for words called [RECOGNIZE] can be cached. The dictionary only needs to be re-scanned when the search order is changed or a new definition of [RECOGNIZE] is added to the dictionary. Obviously, this removes any need for recognizer stacks. If a recognizer is defined by a library that has its own wordlist, the recognizer becomes visible to the interpreter (and therefore active) when the library's wordlist is added to the search order. This is, I believe, in most cases exactly what will be desired: the visibility of library-defined recognizers changes with the visibility of the words in the library. However, if it's necessary to define a recognizer which can be added or removed from view independently of any other definitions, it can be defined in a wordlist which contains ony one word, the recognizer. This wordlist can be added or removed from the search order as required. If it's necessary to have POSTPONE actions, another mode can be added, and the specification of POSTPONE amended to search for user-defined POSTPONE recognizers. ] ``` https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt The new part is section 3.X.X. An example of such a recognizer for ` quoted characters is ``` : rectype-num ( n mode -- n | ) 2 = if postpone literal then ; : [recognize] ( a n mode - x...x t ) -rot 3 = if dup c@ [char] ` = if dup 2 + c@ [char] ` = if 1+ c@ swap rectype-num true exit then then then drop 0 ; ``` For simplicity I haven't included POSTPONE actions or recognizer queries in this proposal, but they are trivial to add if we ever agree that they should be. [And yes, the numbered modes should really be named constants, but that makes no difference to the idea.] An implementation of this proposal (for SwiftForth) follows. I suspect it'd be similar in most Forths. ``` \ Execute all of the recognizers defined in wordlist WID until one \ succeeds. : (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 ) s" [recognize]" rot hashed {: a n mode link :} link begin @rel dup @ while dup l>name count s" [recognize]" compare(cs) 0= if to link a n mode link link> execute if ( success?) link link> exit then link then repeat drop a n mode 0 ; \ Execute all of the recognizers in the search order until one \ succeeds. : recognizers ( a n mode -- x...x xt | a n 0 ) context #order @ cells over + swap do i @ (execute-recognizers) ?dup if unloop exit then cell +loop drop 0 ; ``` [ I also had to patch the SwiftForth interpreter to call RECOGNIZERS, but I haven't included that here. ] ,------------------------------------------ | 2020-09-05 15:29:50 AndrewHaley replies: | proposal - An alternative to the RECOGNIZER proposal | see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-493 `------------------------------------------ As I've said more than once, I think the proposal is too complex and does too much. But I've been challenged to up with an alternative I think is better. I'd like it to: - Allow user-defined literals, dot parsers, etc. - Make small changes to the standard. - Allow simple implementations. - Have a small API surface. - Be easy to use. - Avoid adding any ambiguous conditions. - Work well with the rest of the system and is as non-intrusive as possible. With regard to that last one, I would like recognizers to be associated with wordlists, so that when a recognizer is defined it is in the current wordlist, and when that wordlist is visible so is the recognizer. Recognizer ordering therefore should follow the search order and not use a separate ordering and visibility mechanism. When a recognizer is forgotten, it should simply disappear. This is ideal for libraries that define recognizers and live in their own wordlist: to use a library add its wordlist to the search order, and the recognizers come too. Here's what I suggest as a revised standard. Note that the only change to the test interpreter is Section d. Section 3.X.X is the new part ``` Text interpretation (see 6.1.1360 EVALUATE and 6.1.2050 QUIT) shall repeat the following steps until either the parse area is empty or an ambiguous condition exists: a) Skip leading spaces and parse a name (see 3.4.1); b) Search the dictionary name space (see 3.4.2). If a definition name matching the string is found: 1) if interpreting, perform the interpretation semantics of the definition (see 3.4.3.2), and continue at a). 2) if compiling, perform the compilation semantics of the definition (see 3.4.3.3), and continue at a). c) If a definition name matching the string is not found, attempt to convert the string to a number (see 3.4.1.3). If successful: 1) if interpreting, place the number on the data stack, and continue at a); 2) if compiling, compile code that when executed will place the number on the stack (see 6.1.1780 LITERAL), and continue at a); d) Execute any recognizers in the dictionary name space (see 3.X.X) If any of them succeeds, continue at a). e) If unsuccessful, an ambiguous condition exists (see 3.4.4). 3.X.X User-defined recognizers Do the following until no more recognizers are found or one of them succeeds: 1) Search the dictionary name space (see 3.4.2) for a definition whose name is "[RECOGNIZE]". This defintion is called the found recognizer. If none is found, the search is terminated. 2) If found, perform the interpretation sematics of the found recognizer, passing it the string which has been parsed and a mode which is 1 if interpreting, 2 if compiling: [RECOGNIZE] ( a n mode -- flag) If the flag returned is true, terminate the search. If the flag returned is false, continue at 1), but searching the dictionary name space from the point of the definition which precedes the found recognizer. ``` https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt The new part is section 3.X.X. An example of such a recognizer for ` quoted characters is ``` : rectype-num ( n mode -- n | ) 2 = if postpone literal then ; : [recognize] ( a n mode - x...x t ) -rot 3 = if dup c@ [char] ` = if dup 2 + c@ [char] ` = if 1+ c@ swap rectype-num true exit then then then drop 0 ; ``` For simplicity I haven't included POSTPONE actions or recognizer queries in this proposal, but they are trivial to add if we ever agree that they should be. [And yes, the numbered modes should really be named constants, but that makes no difference to the idea.] An implementation of this proposal (for SwiftForth) follows. I suspect it'd be similar in most Forths. ``` \ Execute all of the recognizers defined in wordlist WID until one \ succeeds. : (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 ) s" [recognize]" rot hashed {: a n mode link :} link begin @rel dup @ while dup l>name count s" [recognize]" compare(cs) 0= if to link a n mode link link> execute if ( success?) link link> exit then link then repeat drop a n mode 0 ; \ Execute all of the recognizers in the search order until one \ succeeds. : recognizers ( a n mode -- x...x xt | a n 0 ) context #order @ cells over + swap do i @ (execute-recognizers) ?dup if unloop exit then cell +loop drop 0 ; ``` [ I also had to patch the SwiftForth interpreter to call RECOGNIZERS, but I haven't included that here. ] ,------------------------------------------ | 2020-09-05 21:40:18 ruv replies: | proposal - Recognizer | see: https://forth-standard.org/proposals/recognizer#reply-494 `------------------------------------------ TL;DR: **token class** from lexers is a wrong association. ### The "token" term in lexical analysis 1\. in computer science, the **token** term is used in [various meanings](https://en.wikipedia.org/wiki/Token#Computing). 2\. In lexical analysis (and compilers theory), **token** is actually a shorthand for **lexical token**. (It's the same as in the Forth topic, the "definition" term is a shorthand for "Forth definition"). 3\. A **lexical token** is a tuple of a lexeme and the _kind_ of this lexeme
(it's my rewording [from Wikipedia](https://en.wikipedia.org/wiki/Lexical_token#Token), and also: "The lexeme's type combined with its value is what properly constitutes a token"). A lexical token usually doesn't bear any semantic information, it bears only lexical kind — it it an identifier, a number, string literal, or a particular key word. Lexical tokens are not used in Forth since Forth doesn't distinguish lexemes by the different kinds. 4\. In Forth, the **token** term is not used in the sense of **lexical token**. As we can conclude from the "execution token" and "name token" terms, in the Forth standard a _token_ is just a kind of identifier, symbol, or something that represents something another (in general case; but numbers can represent themselves). ### Lexical token class > The output of the scanner is often called "token class" It's not quite correct. Output of the scanner (i.e. a lexer) is a sequence of **lexical tokens**. Concerning **"token class"** — it is a shorthand for **"lexical token class"**. And actually, token class, token type, token category, token name (in the famous Dragon book) — all of them refers to the same thing, that I have called *lexeme kind* above. ### Qualification of the tokens in Forth We need to name the entity that qualifies a token. We can call it "token class" (but without referring to "token class" in lexers). Initially I [called](https://groups.google.com/forum/message/raw?msg=comp.lang.forth/8orqw1vjTOY/wMskqvDWCAAJ) this entity "token type". But then I realized that "token descriptor" is far better. This entity can be created in run time, and it describes how to translate a token, — it is not an abstraction, it is an actual object (and the identifier of this object). So "create descriptor" shorthand sounds better than "create class" or "create type" (that look as more abstract things). Only when we find names for the terms (in the human language, in the language of the standard ), we can find good names for words. ,------------------------------------------ | 2020-09-05 23:06:38 ruv replies: | proposal - An alternative to the RECOGNIZER proposal | see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-495 `------------------------------------------ 1\. "recognizer" term is used in another meaning and its definition is not provided. By my terminology, a Forth definition `( i*x c-addr u mode -- j*x true | i*x false )` tries to translate the lexeme _( c-addr u )_ according to the _mode_. I like the idea of translators. 2\. Ambiguous conditions This approach introduces a very big and ugly ambiguous condition: if a program uses a word with the "[recognize]" name for its own purposes, it may crash. Forth didn't have reserved names. But now some particular names cannot be used by a program. A similar approach is used in SP-Forth. There is a magic name "NOTFOUND". 3\. Low reusing factor What a program should use to recognize a number? How to create a user-defined Forth text interpreter? How to reuse the system's Forth text interpreter? This approach doesn't make these things simple. 4\. Too limited area of use cases. E.g. a recognizer for `'X` form cannot be defined to recognize X in any currently available form (e.g. `wordlist::word` form).