Proposal: minimalistic core API for recognizers

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

BerndPaysanavatar of BerndPaysan minimalistic core API for recognizersProposal2020-09-06 09:40:07

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned data type id is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- rectype )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;

then be told that this is not the right way, even though it looks like it is working.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes a string and returns a rectype+additional data on the stack (no additional data for RECTYPE-NULL):

REC-SOMETYPE ( addr len -- i*x RECTYPE-SOMETYPE | RECTYPE-NULL )

XY.3 Additional usage requirements

XY.3.1 Data type id

rectype: subtype of xt, and executes with the following stack effect:

RECTYPE-SOMETYPE ( i*x state -- j*x )

state is:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i?x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x rectype | RECTYPE-NULL ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

RECTYPE-NULL ( state -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x rectype / rectype-null )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer state @ swap execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: rectype-null ( state -- ) -13 throw ;
: rectype-nt ( nt state -- )
  case
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: rectype-num ( n state -- )
  case
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt rectype-nt / rectype-null )
  forth-wordlist find-name-in dup IF  ['] rectype-nt  ELSE  drop ['] rectype-null  THEN ;
: rec-num ( addr u -- n rectype-num / rectype-null )
  0. 2swap >number 0= IF  2drop ['] rectype-num  ELSE  2drop drop ['] rectype-null  THEN ;

: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
  2>r
  2r@ rec-nt dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r@ rec-num dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r> 2drop ['] rectype-null ;

' minimal-recognizer is forth-recognizer

Testing

JennyBrienavatar of JennyBrien

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

I don't think so. It doesn't make much difference in application, because you (almost always?) need to consume the rec-type immediately to use whatever else might be on the stack(s). It you already know what you've got, but, for example, can't remember the words to POSTPONE it you could with an active RECTYPE do something like:

    -2 RECTYPE-X

But mostly you'll have the RECTYPE sitting passively on the stack as a return for a recognizer, and I don't see a great deal of difference between:

    : postponed  -2 swap execute ; 

and

    : postponed  @ execute ;

Passive rectypes are easier to use (no need to remember to when to tick them) and easier to code (no need to check for a bogus mode on the stack)

Compare:

: rectype-nt ( nt state -- )
  case
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;

with:

 : rectype: create , , , ;
 :noname name>interpret execute ;
 ;noname name>compile execute ;
 ;noname name>compile swap lit, compile, ;  rectype: rectype-nt

BerndPaysanavatar of BerndPaysan

One possible thing is to have an automatic postpone for literals.

: rectype-lit: ( compile-xt "name" -- )
  create ,
  does> @ swap
  case
      0  of  drop  endof
      -1 of  execute  endof
      -2 of  dup >r execute r> compile,  endof
  endcase ;

' lit, rectype-lit: rectype-num
' 2lit, rectype-lit: rectype-dnum
' flit, rectype-lit: rectype-float
' slit, rectype-lit: rectype-string

This works with this method, but not with the previous way.

BerndPaysanavatar of BerndPaysan

Furthermore, obviously anyone sane who doesn't want to be 100% minimal would instantly define

: rectype: ( xt-int xt-comp xt-post "name" -- )
  create , , , does> swap 2 + cells + @ execute ;

and then define generic rectypes just like in Matthias Trute's version with rectype:

JennyBrienavatar of JennyBrien

: rectype-lit: ( xt -- )  ['] noop swap dup >r :noname r@ compile, r> postpone literal postpone compile, postpone ; rectype: ;

not so straightforward, but possible.

ruvavatar of ruv

Previous works

In general, I like the approach of active "rectype", i.e. when you can execute it to translate a token — so a "rectype" is a token translator: ( i*x token -- j*x ). I described this approach in comp.lang.forth in 2018 (news:pngvcc$pta$1@gioia.aioe.org).

Bernd should also remember comparison of version D with Resolvers API, where I specified this approach, and even several POCs.

and then define generic rectypes just like in Matthias Trute's version with rectype:

I also shown, just for illustration, a hybrid variant, when "rectype" can be executed and be an argument of the accessors (and it also is compatible with version D, i.e. it is a "passive rectype" as JennyBrien mentioned above).

But the accessors from version D exclude some implementation approaches. Actually these accessors are useless when the higher methods are provided. Getting an xt and then executing this xt has an excessive step without any profit in the most cases. Let's provide the corresponding methods instead of the accessors.

This works with this method, but not with the previous way.

Don't sure what you refer to, but "automatic postpone for literals" can be implemented in version D too.

: create-rectype-for-literal ( xt-compiler "name" -- )
  ['] noop swap dup rectype:
;

Token translator

Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves

RECTYPE-SOMETYPE ( i*x state -- j*x )

By convention, the name for such a word should start from an English verb.

Concerning passing the state. In my Resolvers API, the state is passed indirectly, i.e. not via the stack. It makes more easy the combinations of translators.

E.g.:

: tt-3lit ( 3*x -- 3*x | ) >r tt-2lit  r> tt-lit ;

VS

: tt-3lit-s ( 3*x state -- 3*x | ) dup >r swap >r tt-2lit-s  r> r>  tt-lit-s ;

Passing the state is cumbersome. Also, take into account that it's usually already kept in a variable in any way. Why do you need to pass it via the stack again and again? What is a rationale for passing it directly?

Terminology

Please stop using the confusing terminology such as "data type id" (in "The core principle is still that the recognizer is not aware of state, and the returned data type id is"). This terminology is not compatible with the language of the standard. I suggested the proper terminology before and have published on forth-standard.org now the proposal, let's use it (and let's make it better, if any), or let's accurately define another terminology. The fact is that all the proposals about recognizers can share the same terminology.

Another example is "recognizer types" term. If a recognizer is a Forth definition having particular behavior, then "recognizer type" is "type of a recognizer", that is a type of a Forth definition, something like function type. But actually you mean a "token descriptor", that is "descriptor of a token", that tells something about the corresponding token, and tells nothing about the recognizers (as Forth definitions).

ruvavatar of ruv

Advantages

A huge advantage of this approach (but when the state is passed indirectly) is that the most user-defined token translators can be created far easily than the corresponding descriptors ("rectypes"). You don't need to cope with three actions, and you don't need to cope with the state at all, since any token translator can be created via other already defined translators!

BerndPaysanavatar of BerndPaysan

Yes, I proposed that kind of solution years ago. In effect, both ways have the same expressive power, but one does it by creation of noname words, the other by normal code. Acceptance may differ.

ruvavatar of ruv

@JennyBrien wrote

Compare: [...] with:

 : rectype: create , , , ;
 :noname name>interpret execute ;
 :noname name>compile execute ;
 :noname name>compile swap lit, compile, ;  rectype: rectype-nt

(sic: the full postpone action).

This comparison is incorrect since in the proposed API rectype: (that generates a token translator) can be defined as the following:

: rectype: ( xt-executer xt-compiler xt-postponer "name" -- )
  >r >r >r : ]]
    0  of  [[ r> xt, ]] endof
    -1 of  [[ r> xt, ]] endof
    -2 of  [[ r> xt, ]] endof
    -22 throw
  endcase [[ postpone ;
;

And you can use the same your code to define your rectype-nt or anything else.

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned data type id is. If you have for some reason legacy code that looks like

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- rectype )
: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;

then be told that this is not the right way, even though it looks like it is working.

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- rectype )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes a string and returns a rectype+additional data on the stack (no additional data for RECTYPE-NULL):

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x RECTYPE-SOMETYPE | RECTYPE-NULL )
REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3 Additional usage requirements

XY.3.1 Data type id

XY.3.1 Translator

rectype: subtype of xt, and executes with the following stack effect:

translator: subtype of xt, and executes with the following stack effect:

RECTYPE-SOMETYPE ( i*x state -- j*x )
SOME-TRANSLATOR ( i*x -- j*x )

state is:

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i?x is the additional information provided by the recognizer.

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6 Glossary

XY.6.1 Recognizer Words

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x rectype | RECTYPE-NULL ) RECOGNIZER

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

RECTYPE-NULL ( state -- ) RECOGNIZER

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x rectype / rectype-null )
Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer state @ swap execute
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: rectype-null ( state -- ) -13 throw ;
: rectype-nt ( nt state -- )
  case
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: rectype-num ( n state -- )
  case
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;
: rec-nt ( addr u -- nt rectype-nt / rectype-null )
  forth-wordlist find-name-in dup IF  ['] rectype-nt  ELSE  drop ['] rectype-null  THEN ;
: rec-num ( addr u -- n rectype-num / rectype-null )
  0. 2swap >number 0= IF  2drop ['] rectype-num  ELSE  2drop drop ['] rectype-null  THEN ;
: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
  2>r
  2r@ rec-nt dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r@ rec-num dup ['] rectype-null <> IF  EXIT  THEN  drop
  2r> 2drop ['] rectype-null ;
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;
' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing

BerndPaysanavatar of BerndPaysanNew Version

Hide differences

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- rectype )
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;
: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing

BerndPaysanavatar of BerndPaysan

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult. Instead of

: postpone ( "name" -- ) parse-name forth-recognizer -2 swap execute ; immediate

it is more convoluted

: postpone ( "name" -- )
  parse-name forth-recognizer
  state @ >r -2 state !  catch  r> state !  throw ; immediate

How to detect [[ at the end of a postpone sequence is also not so trivial.

ruvavatar of ruv

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult.

It's OK. Actually, we distribute complexity among various parts. When we make one thing less complex, we make another thing more complex. But due to the different numbers of occurrences of various things (in systems, libraries, programs) the summary complexity can be less or more.

This approach also makes some things more complex, but the summary complexity decreases, I believe.

Concerning POSTPONE. I think, some useful parts should be factored out.

Also, we don't need to catch exception — usually, it's a stop error, and the state is ambiguous in any case. QUIT resets all the internal states. Concerning programs — we need a standard way to reset the internal states of the Forth text interpreter, regardless of Recognizers proposal.

In my "lexeme resolvers" implementation I use conception of postponing level that can be 0, 1, 2, and introduce the words to increment and to decrement this level. So, POSTPONE is defined as the following:

: postpone  ( " name" --      )   parse-name inc-state translate-lexeme dec-state ( flag ) ?nf ; immediate

Where translate-lexeme is defined as the following:

: perceive-lexeme ( c-addr u -- k*x xt-tt | c-addr u 0 )
  perceptor dup if execute then
;
: translate-lexeme ( i*x c-addr u -- j*x true | c-addr u 0 )
  perceive-lexeme dup if execute true then
;

(Note that in contrast of this proposal, resolvers return ( c-addr u 0 ) on fail)

How to detect [[ at the end of a postpone sequence is also not so trivial.

An appropriate approach is that the word ]] is a parsing word.

: ]] ( -- )
  inc-state begin
    next-lexeme 2dup s" [[" equals 0= while
    translate-lexeme ?nf
  repeat 2drop dec-state
; immediate

So we don't have any problem to detect [[ at the end.

An advantage of the postponing level conception is that the following code works as expected:

: foo [  ]] 123 . [[  ]  ;   foo \ prints 123

In the message news:rdcur5$ga4$1@dont-email.me (the full message: news:rdcn35$sd2$1@dont-email.me) I showed another approach, when postponing action is not required at all (i.e., -2 state in this proposal).

ruvavatar of ruv

translator: subtype of xt, and executes with the following stack effect:
SOME-TRANSLATOR ( i*x -- j*x )

It's correct in the general case, but it makes a little sense, since any definition meets this stack effect.

So I think we should distinguish the parameters of a translator itself from the effect of translating of the code that is passed to the translator. Possible variants:

\ We can define 'token' data type
TRANSLATE-SOMETOKEN ( i*x token -- j*x )

\ Some hybrid variant
TRANSLATE-SOMETOKEN  ( i*x token{k*x} -- j*x )

\ Only low level data types
TRANSLATE-SOMETOKEN  ( i*x k*x -- j*x ) 

(NB: I use a conventional naming {verb}-{noun} for such a words).

It should be also noted that these x may be distributed in all the stacks: the data stack, the floating-pint stack, the control-flow stack (except token k*x, that cannot be in the contrlo-fow stack).

BerndPaysanavatar of BerndPaysan

Indeed, TRANSLATE-SOMETHING sounds better than SOMETHING-TRANSLATOR.

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.

ruvavatar of ruv

"FORTH-RECOGNIZER" name

I thought about FORTH-RECOGNIZER name. It makes a strong impression that this word is similar to FORTH-WORDLIST ( -- wid ). The problem is that it isn't.

FORTH-WORDLIST is a constant (it always return the same value), that indicates a one the same word list among all the word lists. This word list can be included into the search order, and it can be absent in the search order.

By analogy, FORTH-RECOGNIZER should be a constant that indicates a one the same recognizer among all the recognizers. This recognizer can be included into the recognizer that is used by the Forth text interpreter, and it can be absent in the recognizer that is used by the Forth text interpreter. (In accordance with the conception that a sequence of recognizers is also a recognizer).

All these should be right to hold consistent naming. But actually it is wrong. It means, that this name breaks consistency and isn't inappropriate for the proposed word.

FORTH-RECOGNIZER ( -- xt ) can be a word that returns xt of the system's recognizer that is used by the Forth text interpreter by default (i.e. initially).

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.

Also, it makes a strong impression that it returns a recognizer. But it's wrong. Also, it's result is analyzed much more often than it's followed by EXECUTE.

Basic methods

By no means, we need

  1. a method that tells the Forth text interpreter to use a given recognizer.
  2. a method that returns the recognizer that is currently used by the Forth text interpreter,
  3. a method that performs the recognizer that is currently used by the Forth text interpreter

A one differed word (a vector) X can solve it:

  1. set: IS X
  2. get: ACTION-OF X
  3. perform: X

But I insist that this approach limits implementations too much. A Forth system can want to perform its internal actions on switching the recognizer that is used by the Forth text interpreter. But it cannot do it, if this recognizer is switched via IS X method. For that, the different getter and setter words are usually provided in the Standard (except very ancient BASE and >IN — due to back compatibility). Yes, perhaps Gforth can attach any additional internal actions for IS X phrase. But we shouldn't complicate all Forth system implementations.

A possible implementation via deferred word and distinct getter and setter words:

defer perceive ( c-addr u -- k*x tt )
: perceptor ( -- xt ) action-of perceive ;
: set-perceptor ( xt -- ) is perceive ;

Perhaps, the more specific names are better (?):

defer perceive-lexeme ( c-addr u -- k*x tt )
: lexeme-perceptor ( -- xt ) action-of perceive-lexeme ;
: set-lexeme-perceptor ( xt -- ) is perceive-lexeme ;

ruvavatar of ruv

Correction: pleas read "By anyway, we need" instead of "By no means, we need".

BerndPaysanavatar of BerndPaysan

´DEFERis a core word now, so usingDEFER` for such a thing is ok. We don't need a special getter and setter for everything.

The implication that FORTH-RECOGNIZER returns a recognizer (and does not, it executes one) is a valid point. A better name is needed. At the moment it is a VALUE and does return a recognizer. Now, it is a deferred word, and does recognize strings. We should keep it with Anton's unification: a sequence of recognizers can be combined to one recognizer. Just because it's now recognizing more different things, it's still a recognizer. No need to find another synonym. Takes string, returns data+translator token ? is a recognizer.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.

ruvavatar of ruv

DEFER is a core word now, so using DEFER for such a thing is ok.

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

Back to my first argument, what do you suggest if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter?

You can ask, do I have an example of such requirement. Yes, I do. I want to provide a method to undo such switching in my system. It's similar to effect of the "PREVIOUS" word for the search order. Perhaps you can suggest some solution with the deferred word?

Anton's unification: a sequence of recognizers can be combined to one recognizer.

Yes. I too said that any sequence of recognizers seq-x (from API v4) can be represented as a single recognizer : recognize-x seq-x recognize ;. So, sequences are excessive in the basic API, — a Forth system doesn't need to know is it a sequence or not.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.

It's better. But it recognizes not valid FORTH, but anything what the Forth text interpreter can currently recognize (and only that).

Conceptually, this word isn't just a recognizer. There is a single special system's slot for a recognizer that is used by the Forth text interpreter. We can put any recognizer into this slot. We can also perform the recognizer that is placed into this slot. So this word performs the recognizer from this slot. I incline to call this slot "perceptor". And after that the word that performs the recognizer from this slot becomes "perceive".

All recognizer names have the pattern RECOGNIZE-*. The idea is to not put this special word on a par with all other recognizers. For that, its better to find a name that is distinct from the RECOGNIZE-SOMETHING pattern. What do you think?

ruvavatar of ruv

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

This argument is that a Forth system can be implemented as a minimal kernel and additional libraries. And DEFER, IS, ACTION-OF can be available via a library. But when we put a deferred word into this API, we force a system's author to put DEFER, IS, ACTION-OF into the kernel too. But actually they isn't required in the kernel. It would be too restrictive limitation on the implementations.

ruvavatar of ruv

Locate

locate cannot work for lexemes that can be recognized (translated) according to this proposal.

ruvavatar of ruv

The last comment was intend for the proposal of AndrewHaley, and it was mistakenly placed here.

BerndPaysanavatar of BerndPaysan

The recognizer will be an option, as well. At the moment, FORTH-RECOGNIZER is proposed to be a value. That's also a CORE EXT word (as is TO).

A minimalistic system that wants to implement recognizers needs FORTH-RECOGNIZER to be a deferred word. I.e. it needs code for DODEFER. It can load the rest of the deferred word stuff later as extension.

ruvavatar of ruv

Certainly, recognizers is an option. I didn't mean that some required part requires an optional part. I mean that one optional part requires another complex optional part without any good and fair ground.

Yes, a minimalistic system that wants to provide a deferred word needs only code for DODEFER. But it still makes bootstrapping of this system more complex. Hence, when we put a deferred word into API, we make things more complex for some implementations. But we don't even have a rationale for that.

Also, with deferred word we still don't have a solution if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.

BerndPaysanavatar of BerndPaysan

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e.

forth-recognizer @ execute execute

Clumsy interfaces can not be changed if you have better things at hand. You can probably wrap around the clumsy interface, e.g.

Defer recognize-forth
addr recognize-forth Constant forth-recognizer

if you can use ADDR to access the deferred word's xt storage location. But then you have another interface, less clumsy, and only available when you have DEFER+ADDR (and ADDR is not even part of the standard).

A minimalistic API, as what I am looking for here is one where you don't have to document much. The less uniform an API is, the more you have to document. The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect. And combinations of recognizers have the same effect. And the system's recognizer is just another one, which you can swap in and out. And you can define a REC-SEQUENCE, where you can manipulate the sequence, and put that into the system's recognizer.

This uniformity is broken when you don't use a deferred word for the system's recognizer — you can't just call that one as you can call the others. You need @ EXECUTE. This is clumsy.

ruvavatar of ruv

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e. forth-recognizer @ execute execute

I don't suggest to use a variable in the interface, — it's even worse than a defer. When a variable is used to change something, this changing cannot be effectively detected. But the requirement is: an ability for a system to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.

For that I would prefer to have the separate words in the API: a setter, a getter and a "performer" (a word that performs the recognizer that is currently used by the Forth text interpreter).

What are your objections to have several separate words in the minimalistic API?

The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect.

I strongly support this approach (and I myself suggested this approach too, with slightly different stack effects).

This uniformity is broken when you don't use a deferred word for the system's recognizer

It seems, the set of words like the following (the names may vary):

perceive ( c-addr u -- k*x tt )
set-perceptor ( xt -- )
perceptor ( -- xt )

doesn't brake the mentioned uniformity. Please, clarify.

BerndPaysanavatar of BerndPaysan

Using special setters and getters means you have another (special purpose) DEFER mechanism here. Of course you can implement that with

variable current-perceptor
: perceive ( addr u -- i*j token ) current-perceptor @ execute ;
: set-perceptor ( xt -- ) current-perceptor ! ;
: perceptor ( -- xt ) current-perceptor @ ;

which is probably a bit less implementation effort than DEFER, IS, and ACTION-OF. Or really?

State-Smart:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body state @ if  ]] literal ! [[  else  !  then ; immediate
: action-of  ' >body state @ if  ]] literal @ [[  else  @  then ; immediate

or with NDCS:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body ! ; ndcs: ' >body ]] literal ! [[ ;
: action-of  ' >body @ ;  ndcs: ' >body ]] literal @ [[ ;

DEFER is really a lightweight way to define words that can be changed.

These three lines of code are doing more than the three lines of code you need in addition when you have your special-purpose setter and getter, but they are still one-liners.

Forthers like to reinvent the wheel. But don't overdo this.

ruvavatar of ruv

Using special setters and getters means you have another (special purpose) DEFER mechanism here.

Not necessary. It's up to an author/implementer. It can be just wrappers over standard DEFER, as I shown earlier. So it doesn't mean reinventing the wheel. The implementation details are just hidden.

So the arguments concerning implementation of DEFER mechanism say nothing against three separate words in the minimalistic API.

BTW, having translators for the basic data types, the words is and action-of can be even shorter:

: is  ' >body tt-lit ['] ! tt-xt ; immediate
: action-of  ' >body tt-lit ['] @ tt-xt ; immediate

Well, in any case I would agree that the arguments concerning complexity are more or less weak.

A strong argument (that wasn't yet commented) is about additional actions that a system needs to perform in the setter. What do you thing in this regard?

ruvavatar of ruv

One more strong argument against DEFER word in the API, and pro the different getter and setter is following.

Having DEFER in the API, we cannot define this API over another API at all. But having the different getter and setter (and "executer") — it's possible to defined this API over some other APIs.

Example: news:rn1csa$b02$1@dont-email.me

BerndPaysanavatar of BerndPaysan

Gforth's new header structure allows to overload TO, IS (which are essentially the same) and DEFER@, so we can use the DEFER API to access similar changeable execution patterns implemented differently. So for us, it makes sense to use these access words, regardless how it is implemented.

Other systems may not have this capability, though the way the standard now extends TO for FVALUE and others, you need to have one way or the other to deal with that. Same, when you have an UDEFER in your system for user-specific deferred words.

For me, it is needless clutter of the dictionary and the mental space of the programmer to add setters and getters for things where you already have a generic one. But I see the point that not every system can do this.

ruvavatar of ruv

needless clutter of the dictionary and the mental space of the programmer

I used an approach when a defined word creates two words — a getter and a setter. It's something like after the phrase create-prop x the words x and set-x are created. I didn't noticed any mental space clutter in this regard. Sometimes I redefined set-x to add additional checks or actions.

Concerning dictionary space — I don't see any problem.

But I see the point that not every system can do this.

True. And even if a system can do this, it's done in some system specific way only.

So, due to the combination of all reasons, it's better to have distinct ordinary words in the standard API.

Reply New Version