,---------------.
| Contributions |
`---------------´
,------------------------------------------
| 2020-09-05 15:09:39 AndrewHaley wrote:
| proposal - An alternative to the RECOGNIZER proposal
| see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#contribution-159
`------------------------------------------
As I've said more than once, I think the proposal is too complex and
does too much. But I've been challenged to up with an alternative I
think is better.
I'd like it to:
- Allow user-defined literals, dot parsers, etc.
- Make small changes to the standard.
- Allow simple implementations.
- Have a small API surface.
- Be easy to use.
- Avoid adding any ambiguous conditions.
- Work well with the rest of the system and is as non-intrusive as possible.
With regard to that last one, I would like recognizers to be
associated with wordlists, so that when a recognizer is defined it is
in the current wordlist, and when that wordlist is visible so is the
recognizer. Recognizer ordering therefore should follow the search
order and not use a separate ordering and visibility mechanism. When a
recognizer is forgotten, it should simply disappear. This is ideal for
libraries that define recognizers and live in their own wordlist: to
use a library add its wordlist to the search order, and the
recognizers come too.
A rough draft of my proposed change is at
https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt
The new part is section 3.X.X.
An example of such a recognizer for ` quoted characters is
```
: rectype-num ( n mode -- n | ) 2 = if postpone literal then ;
: [recognize] ( a n mode - x...x t )
-rot 3 = if
dup c@ [char] ` = if
dup 2 + c@ [char] ` = if
1+ c@ swap rectype-num true exit
then then
then drop 0 ;
```
For simplicity I haven't included POSTPONE actions or recognizer
queries in this proposal, but they are trivial to add if we ever agree
that they should be.
[And yes, the numbered modes should really be named constants, but
that makes no difference to the idea.]
An implementation of this proposal (for SwiftForth) follows. I suspect
it'd be similar in most Forths.
```
\ Execute all of the recognizers defined in wordlist WID until one
\ succeeds.
: (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 )
s" [recognize]" rot hashed {: a n mode link :}
link
begin @rel
dup @ while
dup l>name count s" [recognize]" compare(cs) 0= if
to link
a n mode link link> execute
if ( success?) link link> exit then
link
then
repeat drop
a n mode 0 ;
\ Execute all of the recognizers in the search order until one
\ succeeds.
: recognizers ( a n mode -- x...x xt | a n 0 )
context #order @ cells over + swap do
i @ (execute-recognizers)
?dup if unloop exit then
cell +loop
drop 0 ;
```
[ I also had to patch the SwiftForth interpreter to call RECOGNIZERS,
but I haven't included that here. ]
,---------.
| Replies |
`---------´
,------------------------------------------
| 2020-09-04 11:53:28 ruv replies:
| proposal - Nestable Recognizer Sequences
| see: https://forth-standard.org/proposals/nestable-recognizer-sequences#reply-489
`------------------------------------------
> `get-rec-sequence ( xt -- xt1 .. xtn n )`
> If xt refers to a recognizer sequence, return the contained recognizers. If xt refers to a deferred word, perform DEFER@ followed by GET-REC-SEQUENCE (i.e., GET-REC-SEQUENCE works through deferred words).
> IF xt refers to neither, return 0.
If recognizer sequences are immutable, a recognizer that is not a sequence can be viewed as a sequence with a single element.
I.e., `get-rec-sequence` can have effect `( xt -- xt 1 )` for such recognizer. Can it be useful?
,------------------------------------------
| 2020-09-04 17:52:36 UlrichHoffmann replies:
| proposal - Recognizer
| see: https://forth-standard.org/proposals/recognizer#reply-490
`------------------------------------------
The output of the scanner is often called "token class":
From Wikipedia ([Lexical Analysis](https://en.wikipedia.org/wiki/Lexical_analysis))
> Lexers and parsers are most often used for compilers, but can be used for other computer language tools, such as prettyprinters or linters. Lexing can be divided into two stages: the scanning, which segments the input string into syntactic units called lexemes and categorizes these into **token classes**; and the evaluating, which converts lexemes into processed values.
Also see ["Modern Compiler Design" Second Edition](https://www.amazon.com/Modern-Compiler-Design-Dick-Grune/dp/1461446988), Dick Grune • Kees van Reeuwijk • Henri E. Bal • Ceriel J.H. Jacobs • Koen Langendoen they use the same term.
So it might be reasonable to call `RECTYPE:` just `TOKEN-CLASS`.
,------------------------------------------
| 2020-09-04 20:21:59 UlrichHoffmann replies:
| proposal - Recognizer
| see: https://forth-standard.org/proposals/recognizer#reply-491
`------------------------------------------
or maybe `TOKENCLASS` ?
,------------------------------------------
| 2020-09-05 15:23:52 AndrewHaley replies:
| proposal - An alternative to the RECOGNIZER proposal
| see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-492
`------------------------------------------
As I've said more than once, I think the proposal is too complex and
does too much. But I've been challenged to up with an alternative I
think is better.
I'd like it to:
- Allow user-defined literals, dot parsers, etc.
- Make small changes to the standard.
- Allow simple implementations.
- Have a small API surface.
- Be easy to use.
- Avoid adding any ambiguous conditions.
- Work well with the rest of the system and is as non-intrusive as possible.
With regard to that last one, I would like recognizers to be
associated with wordlists, so that when a recognizer is defined it is
in the current wordlist, and when that wordlist is visible so is the
recognizer. Recognizer ordering therefore should follow the search
order and not use a separate ordering and visibility mechanism. When a
recognizer is forgotten, it should simply disappear. This is ideal for
libraries that define recognizers and live in their own wordlist: to
use a library add its wordlist to the search order, and the
recognizers come too.
Here's what I suggest as a revised standard. Note that the only change to the test interpreter is Section d.
Section 3.X.X is the new part
```
Text interpretation (see 6.1.1360 EVALUATE and 6.1.2050 QUIT) shall
repeat the following steps until either the parse area is empty or an
ambiguous condition exists:
a) Skip leading spaces and parse a name (see 3.4.1);
b) Search the dictionary name space (see 3.4.2). If a definition name
matching the string is found:
1) if interpreting, perform the interpretation semantics of the
definition (see 3.4.3.2), and continue at a).
2) if compiling, perform the compilation semantics of the
definition (see 3.4.3.3), and continue at a).
c) If a definition name matching the string is not found, attempt to
convert the string to a number (see 3.4.1.3). If successful:
1) if interpreting, place the number on the data stack, and
continue at a);
2) if compiling, compile code that when executed will place the
number on the stack (see 6.1.1780 LITERAL), and continue at a);
d) Execute any recognizers in the dictionary name space (see 3.X.X) If
any of them succeeds, continue at a).
e) If unsuccessful, an ambiguous condition exists (see 3.4.4).
3.X.X User-defined recognizers
Do the following until no more recognizers are found or one of them
succeeds:
1) Search the dictionary name space (see 3.4.2) for a definition
whose name is "[RECOGNIZE]". This defintion is called the found
recognizer. If none is found, the search is terminated.
2) If found, perform the interpretation sematics of the found
recognizer, passing it the string which has been parsed and a mode
which is 1 if interpreting, 2 if compiling:
[RECOGNIZE] ( a n mode -- flag)
If the flag returned is true, terminate the search.
[ Comment: The found recognizer performs some kind of action,
perhaps compiling something into the dictionary, then returns true
or false. ]
If the flag returned is false, continue at 1), but searching the
dictionary name space from the point of the definition which
precedes the found recognizer.
[ Discussion:
This isn't as inefficient as might seem at first glance because the
result of searching the dictionary for words called [RECOGNIZE] can
be cached. The dictionary only needs to be re-scanned when the
search order is changed or a new definition of [RECOGNIZE] is added
to the dictionary.
Obviously, this removes any need for recognizer stacks. If a
recognizer is defined by a library that has its own wordlist, the
recognizer becomes visible to the interpreter (and therefore
active) when the library's wordlist is added to the search
order. This is, I believe, in most cases exactly what will be
desired: the visibility of library-defined recognizers changes with
the visibility of the words in the library.
However, if it's necessary to define a recognizer which can be
added or removed from view independently of any other definitions,
it can be defined in a wordlist which contains ony one word, the
recognizer. This wordlist can be added or removed from the search
order as required.
If it's necessary to have POSTPONE actions, another mode can be
added, and the specification of POSTPONE amended to search for
user-defined POSTPONE recognizers.
]
```
https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt
The new part is section 3.X.X.
An example of such a recognizer for ` quoted characters is
```
: rectype-num ( n mode -- n | ) 2 = if postpone literal then ;
: [recognize] ( a n mode - x...x t )
-rot 3 = if
dup c@ [char] ` = if
dup 2 + c@ [char] ` = if
1+ c@ swap rectype-num true exit
then then
then drop 0 ;
```
For simplicity I haven't included POSTPONE actions or recognizer
queries in this proposal, but they are trivial to add if we ever agree
that they should be.
[And yes, the numbered modes should really be named constants, but
that makes no difference to the idea.]
An implementation of this proposal (for SwiftForth) follows. I suspect
it'd be similar in most Forths.
```
\ Execute all of the recognizers defined in wordlist WID until one
\ succeeds.
: (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 )
s" [recognize]" rot hashed {: a n mode link :}
link
begin @rel
dup @ while
dup l>name count s" [recognize]" compare(cs) 0= if
to link
a n mode link link> execute
if ( success?) link link> exit then
link
then
repeat drop
a n mode 0 ;
\ Execute all of the recognizers in the search order until one
\ succeeds.
: recognizers ( a n mode -- x...x xt | a n 0 )
context #order @ cells over + swap do
i @ (execute-recognizers)
?dup if unloop exit then
cell +loop
drop 0 ;
```
[ I also had to patch the SwiftForth interpreter to call RECOGNIZERS,
but I haven't included that here. ]
,------------------------------------------
| 2020-09-05 15:29:50 AndrewHaley replies:
| proposal - An alternative to the RECOGNIZER proposal
| see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-493
`------------------------------------------
As I've said more than once, I think the proposal is too complex and
does too much. But I've been challenged to up with an alternative I
think is better.
I'd like it to:
- Allow user-defined literals, dot parsers, etc.
- Make small changes to the standard.
- Allow simple implementations.
- Have a small API surface.
- Be easy to use.
- Avoid adding any ambiguous conditions.
- Work well with the rest of the system and is as non-intrusive as possible.
With regard to that last one, I would like recognizers to be
associated with wordlists, so that when a recognizer is defined it is
in the current wordlist, and when that wordlist is visible so is the
recognizer. Recognizer ordering therefore should follow the search
order and not use a separate ordering and visibility mechanism. When a
recognizer is forgotten, it should simply disappear. This is ideal for
libraries that define recognizers and live in their own wordlist: to
use a library add its wordlist to the search order, and the
recognizers come too.
Here's what I suggest as a revised standard. Note that the only change to the test interpreter is Section d.
Section 3.X.X is the new part
```
Text interpretation (see 6.1.1360 EVALUATE and 6.1.2050 QUIT) shall
repeat the following steps until either the parse area is empty or an
ambiguous condition exists:
a) Skip leading spaces and parse a name (see 3.4.1);
b) Search the dictionary name space (see 3.4.2). If a definition name
matching the string is found:
1) if interpreting, perform the interpretation semantics of the
definition (see 3.4.3.2), and continue at a).
2) if compiling, perform the compilation semantics of the
definition (see 3.4.3.3), and continue at a).
c) If a definition name matching the string is not found, attempt to
convert the string to a number (see 3.4.1.3). If successful:
1) if interpreting, place the number on the data stack, and
continue at a);
2) if compiling, compile code that when executed will place the
number on the stack (see 6.1.1780 LITERAL), and continue at a);
d) Execute any recognizers in the dictionary name space (see 3.X.X) If
any of them succeeds, continue at a).
e) If unsuccessful, an ambiguous condition exists (see 3.4.4).
3.X.X User-defined recognizers
Do the following until no more recognizers are found or one of them
succeeds:
1) Search the dictionary name space (see 3.4.2) for a definition
whose name is "[RECOGNIZE]". This defintion is called the found
recognizer. If none is found, the search is terminated.
2) If found, perform the interpretation sematics of the found
recognizer, passing it the string which has been parsed and a mode
which is 1 if interpreting, 2 if compiling:
[RECOGNIZE] ( a n mode -- flag)
If the flag returned is true, terminate the search.
If the flag returned is false, continue at 1), but searching the
dictionary name space from the point of the definition which
precedes the found recognizer.
```
https://sourceforge.net/p/concurrentforth/code/ci/default/tree/recognizers/forth-interp.txt
The new part is section 3.X.X.
An example of such a recognizer for ` quoted characters is
```
: rectype-num ( n mode -- n | ) 2 = if postpone literal then ;
: [recognize] ( a n mode - x...x t )
-rot 3 = if
dup c@ [char] ` = if
dup 2 + c@ [char] ` = if
1+ c@ swap rectype-num true exit
then then
then drop 0 ;
```
For simplicity I haven't included POSTPONE actions or recognizer
queries in this proposal, but they are trivial to add if we ever agree
that they should be.
[And yes, the numbered modes should really be named constants, but
that makes no difference to the idea.]
An implementation of this proposal (for SwiftForth) follows. I suspect
it'd be similar in most Forths.
```
\ Execute all of the recognizers defined in wordlist WID until one
\ succeeds.
: (execute-recognizers) ( a n mode wid -- x...x xt | a n mode 0 )
s" [recognize]" rot hashed {: a n mode link :}
link
begin @rel
dup @ while
dup l>name count s" [recognize]" compare(cs) 0= if
to link
a n mode link link> execute
if ( success?) link link> exit then
link
then
repeat drop
a n mode 0 ;
\ Execute all of the recognizers in the search order until one
\ succeeds.
: recognizers ( a n mode -- x...x xt | a n 0 )
context #order @ cells over + swap do
i @ (execute-recognizers)
?dup if unloop exit then
cell +loop
drop 0 ;
```
[ I also had to patch the SwiftForth interpreter to call RECOGNIZERS,
but I haven't included that here. ]
,------------------------------------------
| 2020-09-05 21:40:18 ruv replies:
| proposal - Recognizer
| see: https://forth-standard.org/proposals/recognizer#reply-494
`------------------------------------------
TL;DR: **token class** from lexers is a wrong association.
### The "token" term in lexical analysis
1\. in computer science, the **token** term is used in [various meanings](https://en.wikipedia.org/wiki/Token#Computing).
2\. In lexical analysis (and compilers theory), **token** is actually a shorthand for **lexical token**. (It's the same as in the Forth topic, the "definition" term is a shorthand for "Forth definition").
3\. A **lexical token** is a tuple of a lexeme and the _kind_ of this lexeme
(it's my rewording [from Wikipedia](https://en.wikipedia.org/wiki/Lexical_token#Token), and also: "The lexeme's type combined with its value is what properly constitutes a token").
A lexical token usually doesn't bear any semantic information, it bears only lexical kind — it it an identifier, a number, string literal, or a particular key word.
Lexical tokens are not used in Forth since Forth doesn't distinguish lexemes by the different kinds.
4\. In Forth, the **token** term is not used in the sense of **lexical token**. As we can conclude from the "execution token" and "name token" terms, in the Forth standard a _token_ is just a kind of identifier, symbol, or something that represents something another (in general case; but numbers can represent themselves).
### Lexical token class
> The output of the scanner is often called "token class"
It's not quite correct. Output of the scanner (i.e. a lexer) is a sequence of **lexical tokens**.
Concerning **"token class"** — it is a shorthand for **"lexical token class"**.
And actually, token class, token type, token category, token name (in the famous Dragon book) — all of them refers to the same thing, that I have called *lexeme kind* above.
### Qualification of the tokens in Forth
We need to name the entity that qualifies a token.
We can call it "token class" (but without referring to "token class" in lexers). Initially I [called](https://groups.google.com/forum/message/raw?msg=comp.lang.forth/8orqw1vjTOY/wMskqvDWCAAJ) this entity "token type". But then I realized that "token descriptor" is far better.
This entity can be created in run time, and it describes how to translate a token, — it is not an abstraction, it is an actual object (and the identifier of this object). So "create descriptor" shorthand sounds better than "create class" or "create type" (that look as more abstract things).
Only when we find names for the terms (in the human language, in the language of the standard ), we can find good names for words.
,------------------------------------------
| 2020-09-05 23:06:38 ruv replies:
| proposal - An alternative to the RECOGNIZER proposal
| see: https://forth-standard.org/proposals/an-alternative-to-the-recognizer-proposal#reply-495
`------------------------------------------
1\. "recognizer" term is used in another meaning and its definition is not provided.
By my terminology, a Forth definition `( i*x c-addr u mode -- j*x true | i*x false )` tries to translate the lexeme _( c-addr u )_ according to the _mode_.
I like the idea of translators.
2\. Ambiguous conditions
This approach introduces a very big and ugly ambiguous condition: if a program uses a word with the "[recognize]" name for its own purposes, it may crash.
Forth didn't have reserved names. But now some particular names cannot be used by a program.
A similar approach is used in SP-Forth. There is a magic name "NOTFOUND".
3\. Low reusing factor
What a program should use to recognize a number? How to create a user-defined Forth text interpreter? How to reuse the system's Forth text interpreter?
This approach doesn't make these things simple.
4\. Too limited area of use cases.
E.g. a recognizer for `'X` form cannot be defined to recognize X in any currently available form (e.g. `wordlist::word` form).