Proposal: [160] minimalistic core API for recognizers

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

BerndPaysanavatar of BerndPaysan [160] minimalistic core API for recognizersProposal2020-09-06 09:40:07

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned data type id is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- rectype )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;

then be told that this is not the right way, even though it looks like it is working.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes a string and returns a rectype+additional data on the stack (no additional data for RECTYPE-NULL):

REC-SOMETYPE ( addr len -- i*x RECTYPE-SOMETYPE | RECTYPE-NULL )

XY.3 Additional usage requirements

XY.3.1 Data type id

rectype: subtype of xt, and executes with the following stack effect:

RECTYPE-SOMETYPE ( i*x state -- j*x )

state is:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i?x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x rectype | RECTYPE-NULL ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

RECTYPE-NULL ( state -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x rectype / rectype-null )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer state @ swap execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: rectype-null ( state -- ) -13 throw ;
: rectype-nt ( nt state -- )
  case
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: rectype-num ( n state -- )
  case
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt rectype-nt / rectype-null )
  forth-wordlist find-name-in dup IF  ['] rectype-nt  ELSE  drop ['] rectype-null  THEN ;
: rec-num ( addr u -- n rectype-num / rectype-null )
  0. 2swap >number 0= IF  2drop ['] rectype-num  ELSE  2drop drop ['] rectype-null  THEN ;

: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
  2>r
  2r@ rec-nt dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r@ rec-num dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r> 2drop ['] rectype-null ;

' minimal-recognizer is forth-recognizer

Testing

JennyBrienavatar of JennyBrien

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

I don't think so. It doesn't make much difference in application, because you (almost always?) need to consume the rec-type immediately to use whatever else might be on the stack(s). It you already know what you've got, but, for example, can't remember the words to POSTPONE it you could with an active RECTYPE do something like:

    -2 RECTYPE-X

But mostly you'll have the RECTYPE sitting passively on the stack as a return for a recognizer, and I don't see a great deal of difference between:

    : postponed  -2 swap execute ; 

and

    : postponed  @ execute ;

Passive rectypes are easier to use (no need to remember to when to tick them) and easier to code (no need to check for a bogus mode on the stack)

Compare:

: rectype-nt ( nt state -- )
  case
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;

with:

 : rectype: create , , , ;
 :noname name>interpret execute ;
 ;noname name>compile execute ;
 ;noname name>compile swap lit, compile, ;  rectype: rectype-nt

BerndPaysanavatar of BerndPaysan

One possible thing is to have an automatic postpone for literals.

: rectype-lit: ( compile-xt "name" -- )
  create ,
  does> @ swap
  case
      0  of  drop  endof
      -1 of  execute  endof
      -2 of  dup >r execute r> compile,  endof
  endcase ;

' lit, rectype-lit: rectype-num
' 2lit, rectype-lit: rectype-dnum
' flit, rectype-lit: rectype-float
' slit, rectype-lit: rectype-string

This works with this method, but not with the previous way.

BerndPaysanavatar of BerndPaysan

Furthermore, obviously anyone sane who doesn't want to be 100% minimal would instantly define

: rectype: ( xt-int xt-comp xt-post "name" -- )
  create , , , does> swap 2 + cells + @ execute ;

and then define generic rectypes just like in Matthias Trute's version with rectype:

JennyBrienavatar of JennyBrien

: rectype-lit: ( xt -- )  ['] noop swap dup >r :noname r@ compile, r> postpone literal postpone compile, postpone ; rectype: ;

not so straightforward, but possible.

ruvavatar of ruv

Previous works

In general, I like the approach of active "rectype", i.e. when you can execute it to translate a token — so a "rectype" is a token translator: ( i*x token -- j*x ). I described this approach in comp.lang.forth in 2018 (news:pngvcc$pta$1@gioia.aioe.org).

Bernd should also remember comparison of version D with Resolvers API, where I specified this approach, and even several POCs.

and then define generic rectypes just like in Matthias Trute's version with rectype:

I also shown, just for illustration, a hybrid variant, when "rectype" can be executed and be an argument of the accessors (and it also is compatible with version D, i.e. it is a "passive rectype" as JennyBrien mentioned above).

But the accessors from version D exclude some implementation approaches. Actually these accessors are useless when the higher methods are provided. Getting an xt and then executing this xt has an excessive step without any profit in the most cases. Let's provide the corresponding methods instead of the accessors.

This works with this method, but not with the previous way.

Don't sure what you refer to, but "automatic postpone for literals" can be implemented in version D too.

: create-rectype-for-literal ( xt-compiler "name" -- )
  ['] noop swap dup rectype:
;

Token translator

Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves

RECTYPE-SOMETYPE ( i*x state -- j*x )

By convention, the name for such a word should start from an English verb.

Concerning passing the state. In my Resolvers API, the state is passed indirectly, i.e. not via the stack. It makes more easy the combinations of translators.

E.g.:

: tt-3lit ( 3*x -- 3*x | ) >r tt-2lit  r> tt-lit ;

VS

: tt-3lit-s ( 3*x state -- 3*x | ) dup >r swap >r tt-2lit-s  r> r>  tt-lit-s ;

Passing the state is cumbersome. Also, take into account that it's usually already kept in a variable in any way. Why do you need to pass it via the stack again and again? What is a rationale for passing it directly?

Terminology

Please stop using the confusing terminology such as "data type id" (in "The core principle is still that the recognizer is not aware of state, and the returned data type id is"). This terminology is not compatible with the language of the standard. I suggested the proper terminology before and have published on forth-standard.org now the proposal, let's use it (and let's make it better, if any), or let's accurately define another terminology. The fact is that all the proposals about recognizers can share the same terminology.

Another example is "recognizer types" term. If a recognizer is a Forth definition having particular behavior, then "recognizer type" is "type of a recognizer", that is a type of a Forth definition, something like function type. But actually you mean a "token descriptor", that is "descriptor of a token", that tells something about the corresponding token, and tells nothing about the recognizers (as Forth definitions).

ruvavatar of ruv

Advantages

A huge advantage of this approach (but when the state is passed indirectly) is that the most user-defined token translators can be created far easily than the corresponding descriptors ("rectypes"). You don't need to cope with three actions, and you don't need to cope with the state at all, since any token translator can be created via other already defined translators!

BerndPaysanavatar of BerndPaysan

Yes, I proposed that kind of solution years ago. In effect, both ways have the same expressive power, but one does it by creation of noname words, the other by normal code. Acceptance may differ.

ruvavatar of ruv

@JennyBrien wrote

Compare: [...] with:

 : rectype: create , , , ;
 :noname name>interpret execute ;
 :noname name>compile execute ;
 :noname name>compile swap lit, compile, ;  rectype: rectype-nt

(sic: the full postpone action).

This comparison is incorrect since in the proposed API rectype: (that generates a token translator) can be defined as the following:

: rectype: ( xt-executer xt-compiler xt-postponer "name" -- )
  >r >r >r : ]]
    0  of  [[ r> xt, ]] endof
    -1 of  [[ r> xt, ]] endof
    -2 of  [[ r> xt, ]] endof
    -22 throw
  endcase [[ postpone ;
;

And you can use the same your code to define your rectype-nt or anything else.

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned data type id is. If you have for some reason legacy code that looks like

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- rectype )
: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;

then be told that this is not the right way, even though it looks like it is working.

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- rectype )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes a string and returns a rectype+additional data on the stack (no additional data for RECTYPE-NULL):

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x RECTYPE-SOMETYPE | RECTYPE-NULL )
REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3 Additional usage requirements

XY.3.1 Data type id

XY.3.1 Translator

rectype: subtype of xt, and executes with the following stack effect:

translator: subtype of xt, and executes with the following stack effect:

RECTYPE-SOMETYPE ( i*x state -- j*x )
SOME-TRANSLATOR ( i*x -- j*x )

state is:

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i?x is the additional information provided by the recognizer.

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6 Glossary

XY.6.1 Recognizer Words

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x rectype | RECTYPE-NULL ) RECOGNIZER

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

RECTYPE-NULL ( state -- ) RECOGNIZER

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x rectype / rectype-null )
Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer state @ swap execute
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: rectype-null ( state -- ) -13 throw ;
: rectype-nt ( nt state -- )
  case
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: rectype-num ( n state -- )
  case
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;
: rec-nt ( addr u -- nt rectype-nt / rectype-null )
  forth-wordlist find-name-in dup IF  ['] rectype-nt  ELSE  drop ['] rectype-null  THEN ;
: rec-num ( addr u -- n rectype-num / rectype-null )
  0. 2swap >number 0= IF  2drop ['] rectype-num  ELSE  2drop drop ['] rectype-null  THEN ;
: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
  2>r
  2r@ rec-nt dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r@ rec-num dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r> 2drop ['] rectype-null ;
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;
' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- rectype )
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing

BerndPaysanavatar of BerndPaysan

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult. Instead of

: postpone ( "name" -- ) parse-name forth-recognizer -2 swap execute ; immediate

it is more convoluted

: postpone ( "name" -- )
  parse-name forth-recognizer
  state @ >r -2 state !  catch  r> state !  throw ; immediate

How to detect [[ at the end of a postpone sequence is also not so trivial.

ruvavatar of ruv

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult.

It's OK. Actually, we distribute complexity among various parts. When we make one thing less complex, we make another thing more complex. But due to the different numbers of occurrences of various things (in systems, libraries, programs) the summary complexity can be less or more.

This approach also makes some things more complex, but the summary complexity decreases, I believe.

Concerning POSTPONE. I think, some useful parts should be factored out.

Also, we don't need to catch exception — usually, it's a stop error, and the state is ambiguous in any case. QUIT resets all the internal states. Concerning programs — we need a standard way to reset the internal states of the Forth text interpreter, regardless of Recognizers proposal.

In my "lexeme resolvers" implementation I use conception of postponing level that can be 0, 1, 2, and introduce the words to increment and to decrement this level. So, POSTPONE is defined as the following:

: postpone  ( " name" --      )   parse-name inc-state translate-lexeme dec-state ( flag ) ?nf ; immediate

Where translate-lexeme is defined as the following:

: perceive-lexeme ( c-addr u -- k*x xt-tt | c-addr u 0 )
  perceptor dup if execute then
;
: translate-lexeme ( i*x c-addr u -- j*x true | c-addr u 0 )
  perceive-lexeme dup if execute true then
;

(Note that in contrast of this proposal, resolvers return ( c-addr u 0 ) on fail)

How to detect [[ at the end of a postpone sequence is also not so trivial.

An appropriate approach is that the word ]] is a parsing word.

: ]] ( -- )
  inc-state begin
    next-lexeme 2dup s" [[" equals 0= while
    translate-lexeme ?nf
  repeat 2drop dec-state
; immediate

So we don't have any problem to detect [[ at the end.

An advantage of the postponing level conception is that the following code works as expected:

: foo [  ]] 123 . [[  ]  ;   foo \ prints 123

In the message news:rdcur5$ga4$1@dont-email.me (the full message: news:rdcn35$sd2$1@dont-email.me) I showed another approach, when postponing action is not required at all (i.e., -2 state in this proposal).

ruvavatar of ruv

translator: subtype of xt, and executes with the following stack effect:
SOME-TRANSLATOR ( i*x -- j*x )

It's correct in the general case, but it makes a little sense, since any definition meets this stack effect.

So I think we should distinguish the parameters of a translator itself from the effect of translating of the code that is passed to the translator. Possible variants:

\ We can define 'token' data type
TRANSLATE-SOMETOKEN ( i*x token -- j*x )

\ Some hybrid variant
TRANSLATE-SOMETOKEN  ( i*x token{k*x} -- j*x )

\ Only low level data types
TRANSLATE-SOMETOKEN  ( i*x k*x -- j*x ) 

(NB: I use a conventional naming {verb}-{noun} for such a words).

It should be also noted that these x may be distributed in all the stacks: the data stack, the floating-pint stack, the control-flow stack (except token k*x, that cannot be in the contrlo-fow stack).

BerndPaysanavatar of BerndPaysan

Indeed, TRANSLATE-SOMETHING sounds better than SOMETHING-TRANSLATOR.

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.

ruvavatar of ruv

"FORTH-RECOGNIZER" name

I thought about FORTH-RECOGNIZER name. It makes a strong impression that this word is similar to FORTH-WORDLIST ( -- wid ). The problem is that it isn't.

FORTH-WORDLIST is a constant (it always return the same value), that indicates a one the same word list among all the word lists. This word list can be included into the search order, and it can be absent in the search order.

By analogy, FORTH-RECOGNIZER should be a constant that indicates a one the same recognizer among all the recognizers. This recognizer can be included into the recognizer that is used by the Forth text interpreter, and it can be absent in the recognizer that is used by the Forth text interpreter. (In accordance with the conception that a sequence of recognizers is also a recognizer).

All these should be right to hold consistent naming. But actually it is wrong. It means, that this name breaks consistency and isn't inappropriate for the proposed word.

FORTH-RECOGNIZER ( -- xt ) can be a word that returns xt of the system's recognizer that is used by the Forth text interpreter by default (i.e. initially).

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.

Also, it makes a strong impression that it returns a recognizer. But it's wrong. Also, it's result is analyzed much more often than it's followed by EXECUTE.

Basic methods

By no means, we need

  1. a method that tells the Forth text interpreter to use a given recognizer.
  2. a method that returns the recognizer that is currently used by the Forth text interpreter,
  3. a method that performs the recognizer that is currently used by the Forth text interpreter

A one differed word (a vector) X can solve it:

  1. set: IS X
  2. get: ACTION-OF X
  3. perform: X

But I insist that this approach limits implementations too much. A Forth system can want to perform its internal actions on switching the recognizer that is used by the Forth text interpreter. But it cannot do it, if this recognizer is switched via IS X method. For that, the different getter and setter words are usually provided in the Standard (except very ancient BASE and >IN — due to back compatibility). Yes, perhaps Gforth can attach any additional internal actions for IS X phrase. But we shouldn't complicate all Forth system implementations.

A possible implementation via deferred word and distinct getter and setter words:

defer perceive ( c-addr u -- k*x tt )
: perceptor ( -- xt ) action-of perceive ;
: set-perceptor ( xt -- ) is perceive ;

Perhaps, the more specific names are better (?):

defer perceive-lexeme ( c-addr u -- k*x tt )
: lexeme-perceptor ( -- xt ) action-of perceive-lexeme ;
: set-lexeme-perceptor ( xt -- ) is perceive-lexeme ;

ruvavatar of ruv

Correction: pleas read "By anyway, we need" instead of "By no means, we need".

BerndPaysanavatar of BerndPaysan

´DEFERis a core word now, so usingDEFER` for such a thing is ok. We don't need a special getter and setter for everything.

The implication that FORTH-RECOGNIZER returns a recognizer (and does not, it executes one) is a valid point. A better name is needed. At the moment it is a VALUE and does return a recognizer. Now, it is a deferred word, and does recognize strings. We should keep it with Anton's unification: a sequence of recognizers can be combined to one recognizer. Just because it's now recognizing more different things, it's still a recognizer. No need to find another synonym. Takes string, returns data+translator token ? is a recognizer.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.

ruvavatar of ruv

DEFER is a core word now, so using DEFER for such a thing is ok.

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

Back to my first argument, what do you suggest if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter?

You can ask, do I have an example of such requirement. Yes, I do. I want to provide a method to undo such switching in my system. It's similar to effect of the "PREVIOUS" word for the search order. Perhaps you can suggest some solution with the deferred word?

Anton's unification: a sequence of recognizers can be combined to one recognizer.

Yes. I too said that any sequence of recognizers seq-x (from API v4) can be represented as a single recognizer : recognize-x seq-x recognize ;. So, sequences are excessive in the basic API, — a Forth system doesn't need to know is it a sequence or not.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.

It's better. But it recognizes not valid FORTH, but anything what the Forth text interpreter can currently recognize (and only that).

Conceptually, this word isn't just a recognizer. There is a single special system's slot for a recognizer that is used by the Forth text interpreter. We can put any recognizer into this slot. We can also perform the recognizer that is placed into this slot. So this word performs the recognizer from this slot. I incline to call this slot "perceptor". And after that the word that performs the recognizer from this slot becomes "perceive".

All recognizer names have the pattern RECOGNIZE-*. The idea is to not put this special word on a par with all other recognizers. For that, its better to find a name that is distinct from the RECOGNIZE-SOMETHING pattern. What do you think?

ruvavatar of ruv

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

This argument is that a Forth system can be implemented as a minimal kernel and additional libraries. And DEFER, IS, ACTION-OF can be available via a library. But when we put a deferred word into this API, we force a system's author to put DEFER, IS, ACTION-OF into the kernel too. But actually they isn't required in the kernel. It would be too restrictive limitation on the implementations.

ruvavatar of ruv

Locate

locate cannot work for lexemes that can be recognized (translated) according to this proposal.

ruvavatar of ruv

The last comment was intend for the proposal of AndrewHaley, and it was mistakenly placed here.

BerndPaysanavatar of BerndPaysan

The recognizer will be an option, as well. At the moment, FORTH-RECOGNIZER is proposed to be a value. That's also a CORE EXT word (as is TO).

A minimalistic system that wants to implement recognizers needs FORTH-RECOGNIZER to be a deferred word. I.e. it needs code for DODEFER. It can load the rest of the deferred word stuff later as extension.

ruvavatar of ruv

Certainly, recognizers is an option. I didn't mean that some required part requires an optional part. I mean that one optional part requires another complex optional part without any good and fair ground.

Yes, a minimalistic system that wants to provide a deferred word needs only code for DODEFER. But it still makes bootstrapping of this system more complex. Hence, when we put a deferred word into API, we make things more complex for some implementations. But we don't even have a rationale for that.

Also, with deferred word we still don't have a solution if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.

BerndPaysanavatar of BerndPaysan

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e.

forth-recognizer @ execute execute

Clumsy interfaces can not be changed if you have better things at hand. You can probably wrap around the clumsy interface, e.g.

Defer recognize-forth
addr recognize-forth Constant forth-recognizer

if you can use ADDR to access the deferred word's xt storage location. But then you have another interface, less clumsy, and only available when you have DEFER+ADDR (and ADDR is not even part of the standard).

A minimalistic API, as what I am looking for here is one where you don't have to document much. The less uniform an API is, the more you have to document. The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect. And combinations of recognizers have the same effect. And the system's recognizer is just another one, which you can swap in and out. And you can define a REC-SEQUENCE, where you can manipulate the sequence, and put that into the system's recognizer.

This uniformity is broken when you don't use a deferred word for the system's recognizer — you can't just call that one as you can call the others. You need @ EXECUTE. This is clumsy.

ruvavatar of ruv

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e. forth-recognizer @ execute execute

I don't suggest to use a variable in the interface, — it's even worse than a defer. When a variable is used to change something, this changing cannot be effectively detected. But the requirement is: an ability for a system to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.

For that I would prefer to have the separate words in the API: a setter, a getter and a "performer" (a word that performs the recognizer that is currently used by the Forth text interpreter).

What are your objections to have several separate words in the minimalistic API?

The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect.

I strongly support this approach (and I myself suggested this approach too, with slightly different stack effects).

This uniformity is broken when you don't use a deferred word for the system's recognizer

It seems, the set of words like the following (the names may vary):

perceive ( c-addr u -- k*x tt )
set-perceptor ( xt -- )
perceptor ( -- xt )

doesn't brake the mentioned uniformity. Please, clarify.

BerndPaysanavatar of BerndPaysan

Using special setters and getters means you have another (special purpose) DEFER mechanism here. Of course you can implement that with

variable current-perceptor
: perceive ( addr u -- i*j token ) current-perceptor @ execute ;
: set-perceptor ( xt -- ) current-perceptor ! ;
: perceptor ( -- xt ) current-perceptor @ ;

which is probably a bit less implementation effort than DEFER, IS, and ACTION-OF. Or really?

State-Smart:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body state @ if  ]] literal ! [[  else  !  then ; immediate
: action-of  ' >body state @ if  ]] literal @ [[  else  @  then ; immediate

or with NDCS:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body ! ; ndcs: ' >body ]] literal ! [[ ;
: action-of  ' >body @ ;  ndcs: ' >body ]] literal @ [[ ;

DEFER is really a lightweight way to define words that can be changed.

These three lines of code are doing more than the three lines of code you need in addition when you have your special-purpose setter and getter, but they are still one-liners.

Forthers like to reinvent the wheel. But don't overdo this.

ruvavatar of ruv

Using special setters and getters means you have another (special purpose) DEFER mechanism here.

Not necessary. It's up to an author/implementer. It can be just wrappers over standard DEFER, as I shown earlier. So it doesn't mean reinventing the wheel. The implementation details are just hidden.

So the arguments concerning implementation of DEFER mechanism say nothing against three separate words in the minimalistic API.

BTW, having translators for the basic data types, the words is and action-of can be even shorter:

: is  ' >body tt-lit ['] ! tt-xt ; immediate
: action-of  ' >body tt-lit ['] @ tt-xt ; immediate

Well, in any case I would agree that the arguments concerning complexity are more or less weak.

A strong argument (that wasn't yet commented) is about additional actions that a system needs to perform in the setter. What do you thing in this regard?

ruvavatar of ruv

One more strong argument against DEFER word in the API, and pro the different getter and setter is following.

Having DEFER in the API, we cannot define this API over another API at all. But having the different getter and setter (and "executer") — it's possible to defined this API over some other APIs.

Example: news:rn1csa$b02$1@dont-email.me

BerndPaysanavatar of BerndPaysan

Gforth's new header structure allows to overload TO, IS (which are essentially the same) and DEFER@, so we can use the DEFER API to access similar changeable execution patterns implemented differently. So for us, it makes sense to use these access words, regardless how it is implemented.

Other systems may not have this capability, though the way the standard now extends TO for FVALUE and others, you need to have one way or the other to deal with that. Same, when you have an UDEFER in your system for user-specific deferred words.

For me, it is needless clutter of the dictionary and the mental space of the programmer to add setters and getters for things where you already have a generic one. But I see the point that not every system can do this.

ruvavatar of ruv

needless clutter of the dictionary and the mental space of the programmer

I used an approach when a defined word creates two words — a getter and a setter. It's something like after the phrase create-prop x the words x and set-x are created. I didn't noticed any mental space clutter in this regard. Sometimes I redefined set-x to add additional checks or actions.

Concerning dictionary space — I don't see any problem.

But I see the point that not every system can do this.

True. And even if a system can do this, it's done in some system specific way only.

So, due to the combination of all reasons, it's better to have distinct ordinary words in the standard API.

StephenPelcavatar of StephenPelc

If people are interested, I can arrange a virtual meeting for recognisers. They have been workshopped at various Forth Standards meetings but little of substance has emerged so far. I would suggest that such a meeting concentrate on finding what we can agree on.

Note that Forth-200x meetings are public, and the use of real names is strongly encouraged.

ruvavatar of ruv

If people are interested, I can arrange a virtual meeting for recognisers. ... concentrate on finding what we can agree on.

I like this idea.

If people are interested, I will prepare before the meeting a proof of concept — an implementation of Recognizer API v4, Nestable Recognizer Sequences, or some other over this API.

Perhaps, somebody could share his list of questions before the meeting. My list at GitHub.

StefanKavatar of StefanK

A small remark to the POSTPONE test.

We can factor postpone in two parts with state-execute similiar to base-execute:

   : state-execute ( xt s -- )  state@ >r state ! catch r> state ! throw ;
   : POSTPONE ( "name" -- ) parse-name forth-recognizer -2 state-execute ; immediate

That's not very difficult anymore.

StefanKavatar of StefanK

IMHO the idea to use a deferred forth-recognize is good and more flexible than a stack of recognizers. But the translator xt makes postpone more difficult. But we can factor postpone into two parts. One that restores the stack contents at runtime similar to lit,, and one that does the compilation. If we use rectype, similar to the proposal of recognizers from 2018, but with lit, as third method, we get an easy postpone and '. Here, we can reuse the compile method directly by the second factor of postpone.

variable state

: translate>interpret @ ;
: translate>compile   cell+ @ ;
: translate>lit,      cell+ cell+ @ ;
\ Well, its translate>*lit, in fact; i.e. regenerate ( i*x ) at runtime.

Defer forth-recognizer ( addr u -- i*x translator / notfound )
Defer perform ( i*x translator -- j*x )

: perform>interpret translate>interpret execute ;
: perform>compile   translate>compile   execute ;

: on  -1 swap ! ;
: off  0 swap ! ;

: [ ['] perform>interpret is perform state off ; IMMEDIATE
: ] ['] perform>compile   is perform state on ;

\ alternativly:
\ :noname is perform state ; dup
\ : [ ['] perform>interpret [ compile, ]  off ; IMMEDIATE
\ : ] ['] perform>compile   [ compile, ]  on ;
\ another alternative:
\ : perform state @ IF translator>compile ELSE translator>interpret THEN execute ;

' [ execute \ initialize state and perform
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer perform
  REPEAT ;

: lit,  ( n -- )  lit lit , ,  ; \ or postpone literal
: throw-13 -13 throw ;

: translator ( xt-*lit, xt-compile  xt-interpret "name" -- )
    create , , , ;

' throw-13 dup dup translator notfound

' lit,
:noname ( nt -- xt-execute | xt-compile, ) dup >cfa swap immediate? IF execute ELSE compile, THEN ;
:noname ( i*x nt -- j*x ) >cfa execute ;
translator translate-nt

' lit,
' lit,
:name ; \ noop
translator translate-const-cell

: rec-nt ( addr u -- nt translate-nt | notfound )
  forth-wordlist find-name-in dup IF  translate-nt  ELSE  drop notfound  THEN ;
: rec-num ( addr u -- n translate-const-cell | notfound )
  0. 2swap >number 0= IF  2drop translate-const-cell  ELSE  2drop drop notfound  THEN ;

: minimal-recognizer ( addr u -- nt nt-translator | n num-translator | notfound )
  2>r 2r@ rec-nt dup notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

\ simple postpone
: postpone ( "name" -- )
    parse'n'recognize
    dup translator>compile >r translator>lit, execute r> compile,
; IMMEDIATE

\ postpone optimized for immedate words
: postpone ( "name" -- ) \ optimized for immediate words
    parse'n'recognize \ ( i*x translator )
    dup translate-nt = IF ( nt translator )
        over immediate? IF drop >cfa compile, exit THEN
    THEN
    dup translator>compile >r translator>lit, execute r> compile, ;
; IMMEDIATE

: ' ( "name" -- xt ) parse'n'recognize translate-nt <> IF throw-13 THEN >cfa ; IMMEDIATE

ruvavatar of ruv

@StefanK, thank you for your participation. But it looks like you have missed too many arguments discussed above.

For example, a deferred word forth-recognizer has a confusing name, and it cannot be acceptable in the API, since it's difficult for the Forth system to detect when its value is changed (NB: it isn't an argument in favor of "stack of recognizers").


: translator ( xt-*lit, xt-compile  xt-interpret "name" -- )
    create , , , ;
' lit,
:noname ( nt -- xt-execute | xt-compile, ) dup >cfa swap immediate? IF execute ELSE compile, THEN ;
:noname ( i*x nt -- j*x ) >cfa execute ;
translator translate-nt

Also, could you please stick to a consistent and clear terminology?

In this example you create not a token translator, but a named token descriptor (and the corresponding token descriptor object). See Common terminology for recognizers (improvements and critics are welcome).

   token descriptor object: an implementation dependent data object that describes how to interpret, how to compile and how to postpone (if any) a token .

I also proposed the following naming convention for the corresponding words:

  • For token translators use names in the form tt-* — that is the abbreviation of translate-token-*; for example, tt-lit, tt-nt.
  • For token descriptors use names in the form td-*— that is the abbreviation of token-descriptor-*; (for example, td-lit, td-nt)

The employed approach in your example to create a token descriptor can be called "three components" approach. A significant disadvantage of this approach is that it doesn't provide a way to reuse old descriptors when you create a new descriptor. Compare to token translators — they can be easily reused to create new token translators. For example, a token translator for a pair ( nt nt ) can be created using the token translator tt-nt for a single nt as:

: tt-2nt ( i*x nt nt -- j*x ) >r tt-nt r> tt-nt ;

To create a token descriptor td-2nt in the three components approach, you need to put in a lot more effort, and you cannot reuse td-nt descriptor.

One possible solution is to don't expose the three components approach in the API and instead provide a special method to create a descriptor from another descriptors. For td-2nt it can look as:

  tt-nt dup 2 descriptor constant tt-2nt
  \ or
  td{ tt-nt tt-nt }td constant tt-2nt

It seems, a user never needs to provide three components for a new descriptor since any new descriptor is always based on some already defined descriptors.

But the approach based on the token translators is far simpler.


By the way, a well known word to get xt from nt is name> ( nt -- xt )(see Forth-83 / "C. Experimental proposal" / "Definition field address conversion operators").

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

A recognizer takes the string of a lexeme and returns a recognized xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )
REC-SOMETYPE ( addr len -- i*x recognized | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

XY.3.1 Recognized

translator: subtype of xt, and executes with the following stack effect:

recognized: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )
RECOGNIZED-THING ( j*x i*x state -- k*x )

A translator depends on STATE to translate the given arguments:

A recognized xt acts on the state passed to it on the stack

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer.

i*x is the additional information provided by the recognizer, jx and kx are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

FORTH-RECOGNIZE ( addr len -- i*x recognized-xt | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or NOTFOUND if not.

Takes a string and tries to recognize it, returning the recognized xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Performs -13 THROW. An ambiguous condition exists if the exception word set is not available.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

GET-FORTH-RECOGNIZE ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE.

REC-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-REC-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-REC-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

RECOGNIZED: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a recognized word under the name "name", which performs xt-int for state=0, xt-comp for state=-1 and xt-post for state=-2.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

Extensions reference implementation:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: recognized: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing

Stacks TBD.

Testing

TBD

ruvavatar of ruv

A recognized xt acts on the state passed to it on the stack

  1. A proper term for "recognized xt" ("recognized execution token") should be chosen. "recognized xt" means "xt that is recognized", but we don't recognize execution tokens, but recognize lexemes. This xt just is a result of recognizing a lexeme. And it should be named according what it does, not according who produces it.

  2. There is no reason to pass state on the stack — we discussed that, and the reference implementation reflect that.

BerndPaysanavatar of BerndPaysan

The STATE discussion in the 2021 workshop concluded that words or xt executed should not depend on STATE. The reference implementation needs to be adjusted.

For the name of the result values we might want to have another round of bikeshedding. In particular with more native speakers. The current wording represents the last round of bikeshedding.

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a recognized xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x recognized | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Recognized

recognized: subtype of xt, and executes with the following stack effect:

RECOGNIZED-THING ( j*x i*x state -- k*x )

A recognized xt acts on the state passed to it on the stack

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer, jx and kx are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x recognized-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the recognized xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. An ambiguous condition exists if the exception word set is not available.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

GET-FORTH-RECOGNIZE ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE.

REC-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-REC-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-REC-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

RECOGNIZED: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a recognized word under the name "name", which performs xt-int for state=0, xt-comp for state=-1 and xt-post for state=-2.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
: recognized-nt ( nt state -- )
  case
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
: recognized-num ( n state -- )
  case
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
  forth-wordlist find-name-in dup IF  ['] recognized-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;
  0. 2swap >number 0= IF  2drop ['] recognized-num  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: recognized: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Stacks TBD.

Testing

TBD

ruvavatar of ruv

The STATE discussion in the 2021 workshop concluded that words or xt executed should not depend on STATE.

I see the following in the report by @ulli on 2021-09-08:

Given the two variations to handle STATE (either in RECOGNIZER:'s DOES> part or in INTERPRET), yesterdays participants favoured to have the single occurrence of STATE in INTERPRET. Further investigation and model implementations will show whether on or the other is beneficial.

So it implies further investigation and model implementations.

Could someone provide a rationale in favor to pass state (better say "mode") via the stack?


My rationale against mode on the stack is following:

  1. It makes combination of token translators cumbersome. E.g. a definition : tt-3lit ( 3*x -- 3*x | ) >r tt-2lit r> tt-lit ; becomes far more complex.
  2. In most cases a program doesn't need to execute a token translator in a mode that is different from the current mode (counter examples are welcome, except postpone).
  3. The current mode is already held by the system anyway.
  4. (most importantly) It introduces unnecessary coupling between the Forth text interpreter loop and the Recognizer API. This loop does not need to know anything about modes and STATE at all. If we are replacing the system's lexeme translator (along with the system's set of token translators), we should be able to replace it along with the system's STATE (and the set of the system's modes) too. Moreover, a token translator can technically ignore the passed value and use it's own set of modes. And even such a simpler mode-beyond-stack API can be implemented over one that passes mode via the stack.

On the other hand I don't think that including (mentioning) STATE in a new API is a good choice. STATE returns a read-only address, and it's provided for back compatibility only. So a better method instead of STATE is required anyway.

Actually, the system's token translators are the only ones who depend on the system's set of modes. In most cases user-defined token translators are defined via system's token translators (which should be standardized) and they need to know nothing about system's set of modes, and about STATE at all. In the same time, a user is able to define own set of recognizers and set of token translators that don't depend on system's set of modes, but introduce own set of modes.

So, the specification for Recognizer API should not mention nether STATE nor a set of magic values like {0, -1, -2}.


Concerning your mode -2 — I believe, the standard word postpone doesn't need an own mode. But in postponing mode, if any, string literals (like s" foo bar") and comments should be properly treated.

ruvavatar of ruv

  1. It introduces unnecessary coupling between the Forth text interpreter loop and the Recognizer API.

See a block-based illustration of this idea in my Gist.

ruvavatar of ruv

: recognized: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

In the reference implementation you keep the mode {0,-1,-2} in the state variable. But it's problematic, regardless how the mode is passed into token translators (directly via the stack, or indirectly using a dedicated method).

Since, when interpretation state is set by [ (so state is set to 0), and then compilation state is set by ], the mode should be the same as before [. If it was -2, it should be set to -2. But information that the mode was -2 is lost. So another variable should be used to keep a flag whether "POSTPONE" mode is active or not.

Actually, the mode of compilation/interpretation and "POSTPONE" mode are not mutual-exclusive. They can be set independently of each other.

For example, the code:

: foo postpone bar [ postpone baz ] ;

conceptually can be pretty clear defined (see my comment). In this fragment, for the lexeme bar "POSTPONE" mode is active in compilation state, for baz "POSTPONE" mode is active in interpretation state.

So, if "POSTPONE" mode is employed, a different variable for it should be used for this reason too.


On the other hand I'm not convinced that we need "POSTPONE" mode at all. Except to implement the word postpone itself, where and how this mode can be used? Even for the questionable construct ]] ... [[ the mode "POSTPONE" isn't needed.

BerndPaysanavatar of BerndPaysan

OK, the most convincing argument is that STATE can go away as specified thing. You can use and combine system translators, and you can create table-driven translators, but STATE is an implementation detail.

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a recognized xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x recognized | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Recognized

XY.3.1 Translator

recognized: subtype of xt, and executes with the following stack effect:

translator: subtype of xt, and executes with the following stack effect:

RECOGNIZED-THING ( j*x i*x state -- k*x )
THING-TRANSLATOR ( j*x i*x -- k*x )

A recognized xt acts on the state passed to it on the stack

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer, jx and kx are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x recognized-xt | NOTFOUND-xt ) RECOGNIZER

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the recognized xt and additional information if successful, or NOTFOUND if not.

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. An ambiguous condition exists if the exception word set is not available.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

GET-FORTH-RECOGNIZE ( -- xt ) RECOGNIZER EXT

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE.

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused.

REC-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-REC-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-REC-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

RECOGNIZED: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a recognized word under the name "name", which performs xt-int for state=0, xt-comp for state=-1 and xt-post for state=-2.

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognizer ( addr u -- i*x translator / notfound )
Defer forth-recognizer ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: recognized-nt ( nt state -- )
  case
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: recognized-num ( n state -- )
  case
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;
: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] recognized-nt  ELSE  drop ['] notfound  THEN ;
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] recognized-num  ELSE  2drop drop ['] notfound  THEN ;
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: recognized: ( xt-interpret xt-compile xt-postpone "name" -- )
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  2 cells + @ execute ;
: translate-comp ( translate-xt -- )  cell+ @ execute ;
: translate-post ( translate-xt -- )  @ execute ;

Stacks TBD.

Stacks TBD, copy from Trute proposal.

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a recognized xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x recognized | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

THING-TRANSLATOR ( j*x i*x -- k*x )

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, jx and kx are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. An ambiguous condition exists if the exception word set is not available.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused.

REC-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-REC-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-REC-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognizer ( addr u -- i*x translator-xt / notfound )
Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  2 cells + @ execute ;
: translate-comp ( translate-xt -- )  cell+ @ execute ;
: translate-post ( translate-xt -- )  @ execute ;
: translate-int ( translate-xt -- )  >body 2 cells + @ execute ;
: translate-comp ( translate-xt -- )  >body cell+ @ execute ;
: translate-post ( translate-xt -- )  >body @ execute ;

Stacks TBD, copy from Trute proposal.

Stack library

: STACK: ( size "name" -- )
  CREATE 1+ ( size ) CELLS ALLOT
  0 OVER ! \ empty stack
;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize   ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP NOTFOUND
;
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  dup stack: dup cells negate here + set-stack
  DOES>  recognize ; 
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) >body get-stack ;

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a recognized xt and additional data on the stack (no additional data for NOTFOUND):

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x recognized | NOTFOUND )
REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

THING-TRANSLATOR ( j*x i*x -- k*x )

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, jx and kx are the stack inputs and outputs of interpreting/compiling or postponing the thing.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. An ambiguous condition exists if the exception word set is not available.

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused.

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  >body 2 cells + @ execute ;
: translate-comp ( translate-xt -- )  >body cell+ @ execute ;
: translate-post ( translate-xt -- )  >body @ execute ;

Defining translators

Once you have TRANSLATE:, and the associated invocation tools, you shall define the translators using it:

: lit, ( n -- ) postpone Literal ;
' noop ' lit, :noname lit, postpone lit, ; translate: translate-num
:noname name>interpret execute ;
:noname name>compile execute ;
:noname lit, postpone name>compile postpone execute ; translate: translate-nt

Stack library

: STACK: ( size "name" -- )
  CREATE 1+ ( size ) CELLS ALLOT
  0 OVER ! \ empty stack
;
  CREATE 0 , CELLS ALLOT ;
: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize   ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP NOTFOUND <> IF
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP NOTFOUND
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  dup stack: dup cells negate here + set-stack
  DOES>  recognize ; 
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) >body get-stack ;
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
' root ' forth ' forth 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;
' minimal-recognizer is forth-recognizer
' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  >body 2 cells + @ execute ;
: translate-comp ( translate-xt -- )  >body cell+ @ execute ;
: translate-post ( translate-xt -- )  >body @ execute ;

Defining translators

Once you have TRANSLATE:, and the associated invocation tools, you shall define the translators using it:

: lit, ( n -- ) postpone Literal ;
' noop ' lit, :noname lit, postpone lit, ; translate: translate-num
:noname name>interpret execute ;
:noname name>compile execute ;
:noname lit, postpone name>compile postpone execute ; translate: translate-nt

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
' root ' forth ' forth 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
  action-of forth-recognize ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  >body 2 cells + @ execute ;
: translate-comp ( translate-xt -- )  >body cell+ @ execute ;
: translate-post ( translate-xt -- )  >body @ execute ;

Defining translators

Once you have TRANSLATE:, and the associated invocation tools, you shall define the translators using it:

: lit, ( n -- ) postpone Literal ;
' noop ' lit, :noname lit, postpone lit, ; translate: translate-num
:noname name>interpret execute ;
:noname name>compile execute ;
:noname lit, postpone name>compile postpone execute ; translate: translate-nt

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
' root ' forth ' forth 3 recognizer-sequence: search-order
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

AntonErtlavatar of AntonErtl

It seems to me that, given the reference implementation

' translate-nt translate-int
' translate-num translate-int
' translate-dnum translate-int

does not work (nor with translate-comp nor translate-post). Assuming you solve this, do you really want me to define, e.g.,

:noname ['] translate-nt translate-int ;

to get an xt equivalent to one of the xts that has been passed to translate:?

How do you implement POSTPONE (IIRC Matthias Trute has a reference implementation for that)?

What problem is solved by making all the translators state-smart? The problem I see is that you can only access the individual actions by saving state, setting state, executing the translator, and restoring the state. That's not a good design.

The specification of translate: mentions a "current mode". Where do I find out what a "mode" is? This is non-standard terminology.

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that performs or compiles the action of the thing according to what the state the system is in.

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

INTERPRET-TRANSLATOR ( tanslate-xt -- xt-interpret ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

TRANSLATE-INT ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

Translate as in interpretation state

TRANSLATE-COMP ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

COMPILE-TRANSLATOR ( tanslate-xt -- xt-compile ) RECOGNIZER EXT

Translate as in compilation state

TRANSLATE-POST ( jx ix translator-xt -- k*x ) RECOGNIZER EXT

POSTPONE-TRANSLATOR ( tanslate-xt -- xt-postpone ) RECOGNIZER EXT

Translate as in postpone state

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate-nt ( nt -- )
  case state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      nip \ do nothing if state is unknown; possible error handling goes here
  endcase ;
: translate-num ( n -- )
  case state @
      -1 of   lit,  endof
  endcase ;
: translate-dnum ( d -- )
  \ example of a composite translator using existing translators
  >r translate-num r> translate-num ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap literal compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )
: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: get-forth-recognize ( -- xt )
: forth-recognizer ( -- xt )
  action-of forth-recognize ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
: translate-int ( translate-xt -- )  >body 2 cells + @ execute ;
: translate-comp ( translate-xt -- )  >body cell+ @ execute ;
: translate-post ( translate-xt -- )  >body @ execute ;
: interpret-translator ( tanslate-xt -- xt-interpret ) >body 2 cells + @ ;
: compile-translator ( translate-xt -- xt-compile) >body 1 cells + @ ;
: postpone-translator ( translate-xt -- xt-postpone ) >body 0 cells + @ ;

Defining translators

Once you have TRANSLATE:, and the associated invocation tools, you shall define the translators using it:

: lit, ( n -- ) postpone Literal ;
' noop ' lit, :noname lit, postpone lit, ; translate: translate-num
:noname name>interpret execute ;
:noname name>compile execute ;
:noname lit, postpone name>compile postpone execute ; translate: translate-nt

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

INTERPRET-TRANSLATOR ( tanslate-xt -- xt-interpret ) RECOGNIZER EXT

Translate as in interpretation state

Get the interpreter xt from the translator

COMPILE-TRANSLATOR ( tanslate-xt -- xt-compile ) RECOGNIZER EXT

Translate as in compilation state

Get the compiler xt from the translator

POSTPONE-TRANSLATOR ( tanslate-xt -- xt-postpone ) RECOGNIZER EXT

Translate as in postpone state

Get the postpone xt from the translator

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap literal compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: forth-recognizer ( -- xt )
  action-of forth-recognize ;
: interpret-translator ( tanslate-xt -- xt-interpret ) >body 2 cells + @ ;
: compile-translator ( translate-xt -- xt-compile) >body 1 cells + @ ;
: postpone-translator ( translate-xt -- xt-postpone ) >body 0 cells + @ ;

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a user-defined translator from scratch.

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn and proceeding towards xt1 until successful.

SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as xt1 .. xtn n.

INTERPRET-TRANSLATOR ( tanslate-xt -- xt-interpret ) RECOGNIZER EXT

Get the interpreter xt from the translator

COMPILE-TRANSLATOR ( tanslate-xt -- xt-compile ) RECOGNIZER EXT

Get the compiler xt from the translator

POSTPONE-TRANSLATOR ( tanslate-xt -- xt-postpone ) RECOGNIZER EXT

Get the postpone xt from the translator

TANSLATE-NT ( jx nt -- kx ) RECOGNIZER EXT

Translates a name token; system component you can use to construct other translators of.

TRANSLATE-NUM ( jx x -- kx ) RECOGNIZER EXT

Translates a number; system component you can use to construct other translators of.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap literal compile, ;
:noname name>compile swap lit, compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: forth-recognizer ( -- xt )
  action-of forth-recognize ;
: interpret-translator ( tanslate-xt -- xt-interpret ) >body 2 cells + @ ;
: compile-translator ( translate-xt -- xt-compile) >body 1 cells + @ ;
: postpone-translator ( translate-xt -- xt-postpone ) >body 0 cells + @ ;

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Testing

TBD

Reply New Version