Digest #121 2020-09-17

Contributions

[162] 2020-09-16 15:33:45 JennyBrien wrote:

requestClarification - Extending MARKER

Many years ago, when I was modifying F-83 to be ANS-compilant, I had to temporarily patch several parts of the core. I used:

   CHANGED \  n addr -- ;  add addr and old value to a linked list and then store n at addr

I then modified FORGET to run down the list and restore any values changed after that point.

Is there any common practice on how to add similar further "landmark information" that MARKER should restore?

Replies

[r514] 2020-09-08 08:36:42 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] rectype-null  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- rectype )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing


[r515] 2020-09-08 08:39:23 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make the system's forth-recognizer a deferred word to allow plugging in new recognizer sequences

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-nt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: word-translator ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-word ( addr u -- ... translator )
  here place  here find dup IF  [']  word-translator
  ELSE  drop ['] notfound  THEN ;

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )

A translator depends on STATE to translate the given arguments:

  • 0 for interpretation
  • -1 for compilation
  • -2 for POSTPONE

i*x is the additional information provided by the recognizer.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER

This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW if the exception wordset is available.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:

Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognizer execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
  case  state @
      0  of  name>interpret execute  endof
      -1 of  name>compile execute  endof
      -2 of  name>compile swap lit, compile,  endof
      nip // do nothing if state is unknown; possible error handling goes here
  endcase ;
: num-translator ( n -- )
  case  state @
      -1 of   lit,  endof
      -2 of   lit, postpone lit,  endof
  endcase ;

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] nt-translator  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] num-translator  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognizer

The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:

: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;

Testing


[r516] 2020-09-08 11:33:30 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult. Instead of

: postpone ( "name" -- ) parse-name forth-recognizer -2 swap execute ; immediate

it is more convoluted

: postpone ( "name" -- )
  parse-name forth-recognizer
  state @ >r -2 state !  catch  r> state !  throw ; immediate

How to detect [[ at the end of a postpone sequence is also not so trivial.


[r517] 2020-09-08 14:48:20 ruv replies:

proposal - minimalistic core API for recognizers

Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult.

It's OK. Actually, we distribute complexity among various parts. When we make one thing less complex, we make another thing more complex. But due to the different numbers of occurrences of various things (in systems, libraries, programs) the summary complexity can be less or more.

This approach also makes some things more complex, but the summary complexity decreases, I believe.

Concerning POSTPONE. I think, some useful parts should be factored out.

Also, we don't need to catch exception — usually, it's a stop error, and the state is ambiguous in any case. QUIT resets all the internal states. Concerning programs — we need a standard way to reset the internal states of the Forth text interpreter, regardless of Recognizers proposal.

In my "lexeme resolvers" implementation I use conception of postponing level that can be 0, 1, 2, and introduce the words to increment and to decrement this level. So, POSTPONE is defined as the following:

: postpone  ( " name" --      )   parse-name inc-state translate-lexeme dec-state ( flag ) ?nf ; immediate

Where translate-lexeme is defined as the following:

: perceive-lexeme ( c-addr u -- k*x xt-tt | c-addr u 0 )
  perceptor dup if execute then
;
: translate-lexeme ( i*x c-addr u -- j*x true | c-addr u 0 )
  perceive-lexeme dup if execute true then
;

(Note that in contrast of this proposal, resolvers return ( c-addr u 0 ) on fail)

How to detect [[ at the end of a postpone sequence is also not so trivial.

An appropriate approach is that the word ]] is a parsing word.

: ]] ( -- )
  inc-state begin
    next-lexeme 2dup s" [[" equals 0= while
    translate-lexeme ?nf
  repeat 2drop dec-state
; immediate

So we don't have any problem to detect [[ at the end.

An advantage of the postponing level conception is that the following code works as expected:

: foo [  ]] 123 . [[  ]  ;   foo \ prints 123

In the message news:rdcur5$ga4$1@dont-email.me (the full message: news:rdcn35$sd2$1@dont-email.me) I showed another approach, when postponing action is not required at all (i.e., -2 state in this proposal).


[r518] 2020-09-08 16:45:16 ruv replies:

proposal - minimalistic core API for recognizers

translator: subtype of xt, and executes with the following stack effect:

SOME-TRANSLATOR ( i*x -- j*x )

It's correct in the general case, but it makes a little sense, since any definition meets this stack effect.

So I think we should distinguish the parameters of a translator itself from the effect of translating of the code that is passed to the translator. Possible variants:

\ We can define 'token' data type
TRANSLATE-SOMETOKEN ( i*x token -- j*x )

\ Some hybrid variant
TRANSLATE-SOMETOKEN  ( i*x token{k*x} -- j*x )

\ Only low level data types
TRANSLATE-SOMETOKEN  ( i*x k*x -- j*x ) 

(NB: I use a conventional naming {verb}-{noun} for such a words).

It should be also noted that these x may be distributed in all the stacks: the data stack, the floating-pint stack, the control-flow stack (except token k*x, that cannot be in the contrlo-fow stack).


[r519] 2020-09-08 20:14:11 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Indeed, TRANSLATE-SOMETHING sounds better than SOMETHING-TRANSLATOR.

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.


[r520] 2020-09-09 08:13:21 ruv replies:

proposal - minimalistic core API for recognizers

"FORTH-RECOGNIZER" name

I thought about FORTH-RECOGNIZER name. It makes a strong impression that this word is similar to FORTH-WORDLIST ( -- wid ). The problem is that it isn't.

FORTH-WORDLIST is a constant (it always return the same value), that indicates a one the same word list among all the word lists. This word list can be included into the search order, and it can be absent in the search order.

By analogy, FORTH-RECOGNIZER should be a constant that indicates a one the same recognizer among all the recognizers. This recognizer can be included into the recognizer that is used by the Forth text interpreter, and it can be absent in the recognizer that is used by the Forth text interpreter. (In accordance with the conception that a sequence of recognizers is also a recognizer).

All these should be right to hold consistent naming. But actually it is wrong. It means, that this name breaks consistency and isn't inappropriate for the proposed word.

FORTH-RECOGNIZER ( -- xt ) can be a word that returns xt of the system's recognizer that is used by the Forth text interpreter by default (i.e. initially).

FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.

Also, it makes a strong impression that it returns a recognizer. But it's wrong. Also, it's result is analyzed much more often than it's followed by EXECUTE.

Basic methods

By no means, we need

  1. a method that tells the Forth text interpreter to use a given recognizer.
  2. a method that returns the recognizer that is currently used by the Forth text interpreter,
  3. a method that performs the recognizer that is currently used by the Forth text interpreter

A one differed word (a vector) X can solve it:

  1. set: IS X
  2. get: ACTION-OF X
  3. perform: X

But I insist that this approach limits implementations too much. A Forth system can want to perform its internal actions on switching the recognizer that is used by the Forth text interpreter. But it cannot do it, if this recognizer is switched via IS X method. For that, the different getter and setter words are usually provided in the Standard (except very ancient BASE and >IN — due to back compatibility). Yes, perhaps Gforth can attach any additional internal actions for IS X phrase. But we shouldn't complicate all Forth system implementations.

A possible implementation via deferred word and distinct getter and setter words:

defer perceive ( c-addr u -- k*x tt )
: perceptor ( -- xt ) action-of perceive ;
: set-perceptor ( xt -- ) is perceive ;

Perhaps, the more specific names are better (?):

defer perceive-lexeme ( c-addr u -- k*x tt )
: lexeme-perceptor ( -- xt ) action-of perceive-lexeme ;
: set-lexeme-perceptor ( xt -- ) is perceive-lexeme ;

[r521] 2020-09-09 08:25:37 ruv replies:

proposal - minimalistic core API for recognizers

Correction: pleas read "By anyway, we need" instead of "By no means, we need".


[r522] 2020-09-10 12:50:53 KrishnaMyneni replies:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

@ruv, that's a good point. Originally, I thought it might make writing the implementation more consistent between 32 and 64 bit Forths. However, from the user point of view it is easier to deal with one double integer rather than two singles. I will rewrite the proposal to specify that the bits of udfraction specify the binary fraction for the floating point datum. Of course the MAKE-IEEE-DFLOAT word will still check for illegal values. The order of the inputs will also be changed.


[r523] 2020-09-10 15:43:15 ruv replies:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )

  1. uexp should be n-exp (i.e. a signed number).

  2. Is it any profit to have signbit ud-mantissa instead of d-mantissa ? (i.e. taking the sign from the mantissa).

  3. What is the radix for the exponent? 2 or 10? (it should be mentioned).

  4. Yes, it's better if error is a throw code.

  5. What is the value of r in the case of error? What is better: 0 or NaN?

  6. Is it any sense to use this function in a recognizer for floating point numbers (if the radix of exponent is 2)?

HEX 0 54442D18 921FB 1 MAKE-IEEE-DFLOAT fconstant pi

How can we get 3.14 from these numbers?


[r524] 2020-09-10 21:36:12 BerndPaysan replies:

proposal - minimalistic core API for recognizers

´DEFERis a core word now, so usingDEFER` for such a thing is ok. We don't need a special getter and setter for everything.

The implication that FORTH-RECOGNIZER returns a recognizer (and does not, it executes one) is a valid point. A better name is needed. At the moment it is a VALUE and does return a recognizer. Now, it is a deferred word, and does recognize strings. We should keep it with Anton's unification: a sequence of recognizers can be combined to one recognizer. Just because it's now recognizing more different things, it's still a recognizer. No need to find another synonym. Takes string, returns data+translator token ? is a recognizer.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.


[r525] 2020-09-11 03:46:35 ruv replies:

proposal - minimalistic core API for recognizers

DEFER is a core word now, so using DEFER for such a thing is ok.

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

Back to my first argument, what do you suggest if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter?

You can ask, do I have an example of such requirement. Yes, I do. I want to provide a method to undo such switching in my system. It's similar to effect of the "PREVIOUS" word for the search order. Perhaps you can suggest some solution with the deferred word?

Anton's unification: a sequence of recognizers can be combined to one recognizer.

Yes. I too said that any sequence of recognizers seq-x (from API v4) can be represented as a single recognizer : recognize-x seq-x recognize ;. So, sequences are excessive in the basic API, — a Forth system doesn't need to know is it a sequence or not.

Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.

It's better. But it recognizes not valid FORTH, but anything what the Forth text interpreter can currently recognize (and only that).

Conceptually, this word isn't just a recognizer. There is a single special system's slot for a recognizer that is used by the Forth text interpreter. We can put any recognizer into this slot. We can also perform the recognizer that is placed into this slot. So this word performs the recognizer from this slot. I incline to call this slot "perceptor". And after that the word that performs the recognizer from this slot becomes "perceive".

All recognizer names have the pattern RECOGNIZE-*. The idea is to not put this special word on a par with all other recognizers. For that, its better to find a name that is distinct from the RECOGNIZE-SOMETHING pattern. What do you think?


[r526] 2020-09-11 04:10:31 ruv replies:

proposal - minimalistic core API for recognizers

Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.

This argument is that a Forth system can be implemented as a minimal kernel and additional libraries. And DEFER, IS, ACTION-OF can be available via a library. But when we put a deferred word into this API, we force a system's author to put DEFER, IS, ACTION-OF into the kernel too. But actually they isn't required in the kernel. It would be too restrictive limitation on the implementations.


[r527] 2020-09-11 12:23:36 ruv replies:

proposal - minimalistic core API for recognizers

Locate

locate cannot work for lexemes that can be recognized (translated) according to this proposal.


[r528] 2020-09-11 17:45:12 ruv replies:

proposal - minimalistic core API for recognizers

The last comment was intend for the proposal of AndrewHaley, and it was mistakenly placed here.


[r529] 2020-09-11 21:55:54 BerndPaysan replies:

proposal - minimalistic core API for recognizers

The recognizer will be an option, as well. At the moment, FORTH-RECOGNIZER is proposed to be a value. That's also a CORE EXT word (as is TO).

A minimalistic system that wants to implement recognizers needs FORTH-RECOGNIZER to be a deferred word. I.e. it needs code for DODEFER. It can load the rest of the deferred word stuff later as extension.


[r530] 2020-09-12 07:46:25 ruv replies:

proposal - minimalistic core API for recognizers

Certainly, recognizers is an option. I didn't mean that some required part requires an optional part. I mean that one optional part requires another complex optional part without any good and fair ground.

Yes, a minimalistic system that wants to provide a deferred word needs only code for DODEFER. But it still makes bootstrapping of this system more complex. Hence, when we put a deferred word into API, we make things more complex for some implementations. But we don't even have a rationale for that.

Also, with deferred word we still don't have a solution if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.


[r531] 2020-09-12 13:12:35 ruv replies:

proposal - Nestable Recognizer Sequences

Binary constructor

: two-recognizers ( xt1 xt2 "name" -- )
  create , ,
does>
  dup >r @ execute dup rectype-null <> if
    r> drop exit then
  r> cell+ @ execute ;

This constructor expects that a recognizer doesn't consume ( c-addr u ) on rejection.

Otherwise (if a recognizer consumes ( c-addr u) in any case) the definition will be a bit more complex:

: two-recognizers ( xt1 xt2 "name" -- )
    create , ,
  does> ( c-addr u  a-addr-body )
    dup >r -rot 2dup 2>r rot
    @ execute dup rectype-null <> if
      rdrop rdrop rdrop exit
    then drop
    2r> r> cell+ @ execute
;

Nevertheless, I'm inclined to agree that if a recognizer consumes ( c-addr u ) in any case, it seemingly makes shorter the total lexical size of overall code.

Whether to pass the first recognizer on top or bottom is also unclear

It is more clear if they are passed left to right, i.e., we place them into the stack in the same order in which they should be executed: the first placed is executed fist, the second placed is executed second (if any), the last placed (that is topmost) is executed last.

This situation is similar to the order of local variables (in declaration): direct mapping is more clear.


[r532] 2020-09-12 22:51:48 ruv replies:

proposal - Traverse-wordlist does not find unnamed/unfinished definitions

I would suggest to avoid "named word" pleonasm in "for every named word that can be found", since an unnamed definition cannot be found. I.e., if a definition can be found, then it certainly has a name.

A possible variant of this part:

"Execute xt once for every word that can be found,"

A possible variant that unites both corrections into a single one:

"Execute xt once for every word that can be found in the word list wid, and for every word whose name matches the name of a found word but placed earlier in this word list,"

The phrase "same name" is inappropriate since it doesn't take into account possible case insensitivity. However, names matching is described in 3.4.2 Finding definition names.

Also, the following typo can be corrected:

"words with the same name are called in the order newest-to-oldest (possibly with other words in between)"

?

"words with the <b>matched names are visited in the order newest-to-oldest (possibly with other words in between)"


[r533] 2020-09-13 06:49:49 AntonErtl replies:

proposal - Traverse-wordlist does not find unnamed/unfinished definitions

The proposal was voted on and accepted 10Y/0N/1A. The vote was closed on 2020-09-03. If you think that the voted-on version is unclear enough to be improved, you need to make a new proposal.

I think it is clear enough, though. "Named word" may be a pleonasm, but it is clear. The way that "same name" is used in the voted-on version makes it clear that all matching names are considered to be the same.

Concerning "are called": Yes, "are visited" is intended, so one could make another proposal for fixing that. But nobody seems to have been confused by "are called" yet.


[r534] 2020-09-13 16:19:15 AntonErtl replies:

proposal - Traverse-wordlist does not find unnamed/unfinished definitions

If someone proposes another revision, one could write:

When a word becomes findable, it also becomes traversable. The word the stays traversable until it is deleted.

and then define the rest in terms of "traversable", in particular:

Execute xt once for every traversable word in the wordlist wid,


[r535] 2020-09-14 09:12:42 AndrewHaley replies:

proposal - An alternative to the RECOGNIZER proposal

Firstly, there is a need for user-defined literals and some other kinds of prefix notation. Anyone who needs anything more exotic (or powerful) and wants it to be standardized had better provide evidence that it's needed for Forth programs. A good design will have everything you need and nothing more.

Secondly, 'a::b would just work. Any system supporting a::b as wordlist::word would have to redefine FIND to break the tokens apart: a recognizer for '-prefixed words would call FIND, which would find the word.


[r536] 2020-09-14 09:20:04 AndrewHaley replies:

proposal - An alternative to the RECOGNIZER proposal

Re Jenny's point. It's necessary to define some mechanism by which "performing the interpretation semantics" of some rec-type might be performed. It seems to me more appropriate to specify exactly how that gets done here: it gets done by the called recognizer word. The "semantics" are whatever the recognizer does.


[r537] 2020-09-14 13:21:26 ruv replies:

proposal - An alternative to the RECOGNIZER proposal

Are you objecting to the use of the common word "recognize"?

Though the common word "recognize" is used in a non usual meaning. Your "recognizer" does not just recognize a lexeme, but also performs interpretation or compilation semantics for the lexeme. It's confusing that performing semantics is a part of recognizing by your interpretation.

Firstly, there is a need for user-defined literals and some other kinds of prefix notation

What is a literal?

By the first glance,'X is a literal, a::b is a literal, 'a::b is a literal too — the run-time semantics for all of them is just to put a number (an xt) into the stack.

Any system supporting a::b as wordlist::word would have to redefine FIND

Do you mean that it should be done in a non standard way (i.e., not over the API you are proposing)?

An issue of your API is that we cannot define 'X format in the general form: '\<any-literal-that-is-mapped-to-single-xt\>. Ditto we cannot define wordlist::word format in the general form \<any-literal-that-is-mapped-to-single-xt\>::name.

Re Jenny's point.

Jenny is right substantially (since "rectype" is not used in this proposal). The idea is that the found "[RECOGNIZE]" word should perform interpretation semantics for the lexeme if interpreting, and compilation semantics if compiling.

"If found, perform the interpretation sematics of the found recognizer"

The Forth text interpreter only performs interpretation semantics if interpreting, and compilation semantics if compiling. So this phrase in the specification makes things too confusing. Better to say: "perform the execution semantics".


[r538] 2020-09-14 13:25:16 ruv replies:

proposal - An alternative to the RECOGNIZER proposal

Correction: By the first glance,'X is a literal, 'a::b is a literal too — the run-time semantics for all of them is just to put a number (an xt) into the stack.

Re a::b — it's run-time semantics may be other.


[r539] 2020-09-14 21:33:55 BerndPaysan replies:

proposal - minimalistic core API for recognizers

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e.

forth-recognizer @ execute execute

Clumsy interfaces can not be changed if you have better things at hand. You can probably wrap around the clumsy interface, e.g.

Defer recognize-forth
addr recognize-forth Constant forth-recognizer

if you can use ADDR to access the deferred word's xt storage location. But then you have another interface, less clumsy, and only available when you have DEFER+ADDR (and ADDR is not even part of the standard).

A minimalistic API, as what I am looking for here is one where you don't have to document much. The less uniform an API is, the more you have to document. The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect. And combinations of recognizers have the same effect. And the system's recognizer is just another one, which you can swap in and out. And you can define a REC-SEQUENCE, where you can manipulate the sequence, and put that into the system's recognizer.

This uniformity is broken when you don't use a deferred word for the system's recognizer — you can't just call that one as you can call the others. You need @ EXECUTE. This is clumsy.


[r540] 2020-09-15 10:04:01 ruv replies:

proposal - minimalistic core API for recognizers

CORE has only VARIABLE as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e. forth-recognizer @ execute execute

I don't suggest to use a variable in the interface, — it's even worse than a defer. When a variable is used to change something, this changing cannot be effectively detected. But the requirement is: an ability for a system to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.

For that I would prefer to have the separate words in the API: a setter, a getter and a "performer" (a word that performs the recognizer that is currently used by the Forth text interpreter).

What are your objections to have several separate words in the minimalistic API?

The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect.

I strongly support this approach (and I myself suggested this approach too, with slightly different stack effects).

This uniformity is broken when you don't use a deferred word for the system's recognizer

It seems, the set of words like the following (the names may vary):

perceive ( c-addr u -- k*x tt )
set-perceptor ( xt -- )
perceptor ( -- xt )

doesn't brake the mentioned uniformity. Please, clarify.


[r541] 2020-09-16 14:26:03 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Using special setters and getters means you have another (special purpose) DEFER mechanism here. Of course you can implement that with

variable current-perceptor
: perceive ( addr u -- i*j token ) current-perceptor @ execute ;
: set-perceptor ( xt -- ) current-perceptor ! ;
: perceptor ( -- xt ) current-perceptor @ ;

which is probably a bit less implementation effort than DEFER, IS, and ACTION-OF. Or really?

State-Smart:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body state @ if  ]] literal ! [[  else  !  then ; immediate
: action-of  ' >body state @ if  ]] literal @ [[  else  @  then ; immediate

or with NDCS:

: defer  Create ['] noop ,  does> @ execute ;
: is  ' >body ! ; ndcs: ' >body ]] literal ! [[ ;
: action-of  ' >body @ ;  ndcs: ' >body ]] literal @ [[ ;

DEFER is really a lightweight way to define words that can be changed.

These three lines of code are doing more than the three lines of code you need in addition when you have your special-purpose setter and getter, but they are still one-liners.

Forthers like to reinvent the wheel. But don't overdo this.


[r542] 2020-09-16 17:17:50 JennyBrien replies:

proposal - Wording: declare undefined interpretation semantics for locals

POSTPONEing a local doesn't/shouldn't work either.


[r543] 2020-09-16 20:25:35 JennyBrien replies:

proposal - Nestable Recognizer Sequences

The similarity between wordlists and a search order has inspired the idea of nestable search orders: Several wordlists could be combined into a sequence that itself would work like a wordlist in other search orders. However, the search order words had already been standardized, so this idea never made it out of the concept stage.

The similarity between the search order and recognizer sequences has led to the present recognizer proposal containing the words GET-RECOGNIZER and SET-RECOGNIZER, which are mostly modeled on GET-ORDER and SET-ORDER.

At first glance, it's simple to convert a wordlist into a recognizer, so recognizer sequences would also give nestable search orders. If WORDLIST returned the xt of an anonymous recognizer... but there would still be problems deciding how to SET-CURRENT. There would still have to be a difference between recognizers that search the dictionary (called by REC-NAME or similar) and other recognizers, otherwise there can be no concept of a 'current search order'

So, do we need a FORTH-RECOGNIZER that combines the two? Is it sufficient to replace the 'word-not-found' portion of the interpreter? So far, I have only seen one use-case for a user-written recognizer to precede REC-NAME and I suspect such users would be better served by having their own interpreter loop rather than patching in to the system one. Maybe all that is needed is the ability to add a recognizer to the current stack and leave it their until it is removed by MARKER or the stack is reset by QUIT, in which case:

   : +RECOGNIZER (  _name_ -- )  ' action-of recognized two-recognizers ;