Digest #121 2020-09-17
Contributions
Many years ago, when I was modifying F-83 to be ANS-compilant, I had to temporarily patch several parts of the core. I used:
CHANGED \ n addr -- ; add addr and old value to a linked list and then store n at addr
I then modified FORGET
to run down the list and restore any values changed after that point.
Is there any common practice on how to add similar further "landmark information" that MARKER should restore?
Replies
Author:
Bernd Paysan
Change Log:
- 2020-09-06 initial version
- 2020-09-08 taking ruv's approach and vocabulary at translators
Problem:
The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.
Solution:
Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.
Important changes to the original proposal:
- Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
- Make the recognizer sequence executable with the same effect as a recognizer
- Make the system's
forth-recognizer
a deferred word to allow plugging in new recognizer sequences
This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.
The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like
: rec-nt ( addr u -- translator )
here place here find dup IF
0< state @ and IF compile, ELSE execute THEN ['] drop
ELSE drop ['] rectype-null THEN ;
then you should factor the part starting with state @ out and return it as translator:
: word-translator ( xt flag -- )
0< state @ and IF compile, ELSE execute THEN ;
: rec-word ( addr u -- rectype )
here place here find dup IF ['] word-translator
ELSE drop ['] notfound THEN ;
Typical use
TBD
Proposal:
XY. The optional Recognizer Wordset
A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND
):
REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )
XY.3 Additional usage requirements
XY.3.1 Translator
translator: subtype of xt, and executes with the following stack effect:
SOME-TRANSLATOR ( i*x -- j*x )
A translator depends on STATE
to translate the given arguments:
- 0 for interpretation
- -1 for compilation
- -2 for POSTPONE
i*x
is the additional information provided by the recognizer.
XY.6 Glossary
XY.6.1 Recognizer Words
FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER
This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or RECTYPE-NULL
if not.
NOTFOUND ( -- ) RECOGNIZER
Performs -13 THROW
if the exception wordset is available.
Reference implementation:
This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:
Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
BEGIN
?stack parse-name dup WHILE
forth-recognizer execute
REPEAT ;
: lit, ( n -- ) postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
case state @
0 of name>interpret execute endof
-1 of name>compile execute endof
-2 of name>compile swap lit, compile, endof
nip // do nothing if state is unknown; possible error handling goes here
endcase ;
: num-translator ( n -- )
case state @
-1 of lit, endof
-2 of lit, postpone lit, endof
endcase ;
: rec-nt ( addr u -- nt nt-translator / notfound )
forth-wordlist find-name-in dup IF ['] nt-translator ELSE drop ['] notfound THEN ;
: rec-num ( addr u -- n num-translator / notfound )
0. 2swap >number 0= IF 2drop ['] num-translator ELSE 2drop drop ['] notfound THEN ;
: minimal-recognizer ( addr u -- nt rectype-nt / n rectype-num / rectype-null )
2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ;
' minimal-recognizer is forth-recognizer
The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:
: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , ,
does> state @ 2 + cells + @ execute ;
Testing
Author:
Bernd Paysan
Change Log:
- 2020-09-06 initial version
- 2020-09-08 taking ruv's approach and vocabulary at translators
- 2020-09-08 replace the remaining rectypes with translators
Problem:
The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.
Solution:
Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.
Important changes to the original proposal:
- Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
- Make the recognizer sequence executable with the same effect as a recognizer
- Make the system's
forth-recognizer
a deferred word to allow plugging in new recognizer sequences
This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.
The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like
: rec-nt ( addr u -- translator )
here place here find dup IF
0< state @ and IF compile, ELSE execute THEN ['] drop
ELSE drop ['] notfound THEN ;
then you should factor the part starting with state @ out and return it as translator:
: word-translator ( xt flag -- )
0< state @ and IF compile, ELSE execute THEN ;
: rec-word ( addr u -- ... translator )
here place here find dup IF ['] word-translator
ELSE drop ['] notfound THEN ;
Typical use
TBD
Proposal:
XY. The optional Recognizer Wordset
A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND
):
REC-SOMETYPE ( addr len -- i*x translator | NOTFOUND )
XY.3 Additional usage requirements
XY.3.1 Translator
translator: subtype of xt, and executes with the following stack effect:
SOME-TRANSLATOR ( i*x -- j*x )
A translator depends on STATE
to translate the given arguments:
- 0 for interpretation
- -1 for compilation
- -2 for POSTPONE
i*x
is the additional information provided by the recognizer.
XY.6 Glossary
XY.6.1 Recognizer Words
FORTH-RECOGNIZER ( addr len -- i*x translator | NOTFOUND-xt ) RECOGNIZER
This is a deferred word. It takes a string and tries to recognize it, returning the recognized recognizer type and additional information if successful, or NOTFOUND
if not.
NOTFOUND ( -- ) RECOGNIZER
Performs -13 THROW
if the exception wordset is available.
Reference implementation:
This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix:
Defer forth-recognizer ( addr u -- i*x translator / notfound )
: interpret ( i*x -- j*x )
BEGIN
?stack parse-name dup WHILE
forth-recognizer execute
REPEAT ;
: lit, ( n -- ) postpone literal ;
: notfound ( state -- ) -13 throw ;
: nt-translator ( nt -- )
case state @
0 of name>interpret execute endof
-1 of name>compile execute endof
-2 of name>compile swap lit, compile, endof
nip // do nothing if state is unknown; possible error handling goes here
endcase ;
: num-translator ( n -- )
case state @
-1 of lit, endof
-2 of lit, postpone lit, endof
endcase ;
: rec-nt ( addr u -- nt nt-translator / notfound )
forth-wordlist find-name-in dup IF ['] nt-translator ELSE drop ['] notfound THEN ;
: rec-num ( addr u -- n num-translator / notfound )
0. 2swap >number 0= IF 2drop ['] num-translator ELSE 2drop drop ['] notfound THEN ;
: minimal-recognizer ( addr u -- nt nt-translator / n num-translator / notfound )
2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ;
' minimal-recognizer is forth-recognizer
The different actions during interpret/compile/postpone can be factored out easily, and used by a common dispatcher:
: translator: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , ,
does> state @ 2 + cells + @ execute ;
Testing
Downside of using STATE
right in the dispatcher: POSTPONE
becomes more difficult. Instead of
: postpone ( "name" -- ) parse-name forth-recognizer -2 swap execute ; immediate
it is more convoluted
: postpone ( "name" -- )
parse-name forth-recognizer
state @ >r -2 state ! catch r> state ! throw ; immediate
How to detect [[
at the end of a postpone sequence is also not so trivial.
Downside of using STATE right in the dispatcher: POSTPONE becomes more difficult.
It's OK. Actually, we distribute complexity among various parts. When we make one thing less complex, we make another thing more complex. But due to the different numbers of occurrences of various things (in systems, libraries, programs) the summary complexity can be less or more.
This approach also makes some things more complex, but the summary complexity decreases, I believe.
Concerning POSTPONE
. I think, some useful parts should be factored out.
Also, we don't need to catch exception — usually, it's a stop error, and the state is ambiguous in any case. QUIT resets all the internal states. Concerning programs — we need a standard way to reset the internal states of the Forth text interpreter, regardless of Recognizers proposal.
In my "lexeme resolvers" implementation I use conception of postponing level that can be 0, 1, 2, and introduce the words to increment and to decrement this level.
So, POSTPONE
is defined as the following:
: postpone ( " name" -- ) parse-name inc-state translate-lexeme dec-state ( flag ) ?nf ; immediate
Where translate-lexeme
is defined as the following:
: perceive-lexeme ( c-addr u -- k*x xt-tt | c-addr u 0 )
perceptor dup if execute then
;
: translate-lexeme ( i*x c-addr u -- j*x true | c-addr u 0 )
perceive-lexeme dup if execute true then
;
(Note that in contrast of this proposal, resolvers return ( c-addr u 0 )
on fail)
How to detect
[[
at the end of a postpone sequence is also not so trivial.
An appropriate approach is that the word ]]
is a parsing word.
: ]] ( -- )
inc-state begin
next-lexeme 2dup s" [[" equals 0= while
translate-lexeme ?nf
repeat 2drop dec-state
; immediate
So we don't have any problem to detect [[
at the end.
An advantage of the postponing level conception is that the following code works as expected:
: foo [ ]] 123 . [[ ] ; foo \ prints 123
In the message news:rdcur5$ga4$1@dont-email.me (the full message: news:rdcn35$sd2$1@dont-email.me) I showed another approach, when postponing action is not required at all (i.e., -2 state in this proposal).
translator: subtype of xt, and executes with the following stack effect:
SOME-TRANSLATOR ( i*x -- j*x )
It's correct in the general case, but it makes a little sense, since any definition meets this stack effect.
So I think we should distinguish the parameters of a translator itself from the effect of translating of the code that is passed to the translator. Possible variants:
\ We can define 'token' data type
TRANSLATE-SOMETOKEN ( i*x token -- j*x )
\ Some hybrid variant
TRANSLATE-SOMETOKEN ( i*x token{k*x} -- j*x )
\ Only low level data types
TRANSLATE-SOMETOKEN ( i*x k*x -- j*x )
(NB: I use a conventional naming {verb}-{noun} for such a words).
It should be also noted that these x may be distributed in all the stacks: the data stack, the floating-pint stack, the control-flow stack (except token k*x, that cannot be in the contrlo-fow stack).
Indeed, TRANSLATE-SOMETHING
sounds better than SOMETHING-TRANSLATOR
.
FORTH-RECOGNIZER
is ok, because it's followed by EXECUTE
, so this is a noun.
"FORTH-RECOGNIZER" name
I thought about FORTH-RECOGNIZER
name.
It makes a strong impression that this word is similar to FORTH-WORDLIST ( -- wid )
. The problem is that it isn't.
FORTH-WORDLIST
is a constant (it always return the same value), that indicates a one the same word list among all the word lists. This word list can be included into the search order, and it can be absent in the search order.
By analogy, FORTH-RECOGNIZER
should be a constant that indicates a one the same recognizer among all the recognizers. This recognizer can be included into the recognizer that is used by the Forth text interpreter, and it can be absent in the recognizer that is used by the Forth text interpreter. (In accordance with the conception that a sequence of recognizers is also a recognizer).
All these should be right to hold consistent naming. But actually it is wrong. It means, that this name breaks consistency and isn't inappropriate for the proposed word.
FORTH-RECOGNIZER ( -- xt )
can be a word that returns xt of the system's recognizer that is used by the Forth text interpreter by default (i.e. initially).
FORTH-RECOGNIZER is ok, because it's followed by EXECUTE, so this is a noun.
Also, it makes a strong impression that it returns a recognizer. But it's wrong. Also, it's result is analyzed much more often than it's followed by EXECUTE.
Basic methods
By no means, we need
- a method that tells the Forth text interpreter to use a given recognizer.
- a method that returns the recognizer that is currently used by the Forth text interpreter,
- a method that performs the recognizer that is currently used by the Forth text interpreter
A one differed word (a vector) X can solve it:
- set:
IS X
- get:
ACTION-OF X
- perform:
X
But I insist that this approach limits implementations too much. A Forth system can want to perform its internal actions on switching the recognizer that is used by the Forth text interpreter. But it cannot do it, if this recognizer is switched via IS X
method. For that, the different getter and setter words are usually provided in the Standard (except very ancient BASE
and >IN
— due to back compatibility).
Yes, perhaps Gforth can attach any additional internal actions for IS X
phrase. But we shouldn't complicate all Forth system implementations.
A possible implementation via deferred word and distinct getter and setter words:
defer perceive ( c-addr u -- k*x tt )
: perceptor ( -- xt ) action-of perceive ;
: set-perceptor ( xt -- ) is perceive ;
Perhaps, the more specific names are better (?):
defer perceive-lexeme ( c-addr u -- k*x tt )
: lexeme-perceptor ( -- xt ) action-of perceive-lexeme ;
: set-lexeme-perceptor ( xt -- ) is perceive-lexeme ;
Correction: pleas read "By anyway, we need" instead of "By no means, we need".
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
@ruv, that's a good point. Originally, I thought it might make writing the implementation more consistent between 32 and 64 bit Forths. However, from the user point of view it is easier to deal with one double integer rather than two singles. I will rewrite the proposal to specify that the bits of udfraction specify the binary fraction for the floating point datum. Of course the MAKE-IEEE-DFLOAT word will still check for illegal values. The order of the inputs will also be changed.
MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )
uexp should be n-exp (i.e. a signed number).
Is it any profit to have signbit ud-mantissa instead of d-mantissa ? (i.e. taking the sign from the mantissa).
What is the radix for the exponent? 2 or 10? (it should be mentioned).
Yes, it's better if error is a throw code.
What is the value of r in the case of error? What is better: 0 or NaN?
Is it any sense to use this function in a recognizer for floating point numbers (if the radix of exponent is 2)?
HEX 0 54442D18 921FB 1 MAKE-IEEE-DFLOAT fconstant pi
How can we get 3.14 from these numbers?
´DEFERis a core word now, so using
DEFER` for such a thing is ok. We don't need a special getter and setter for everything.
The implication that FORTH-RECOGNIZER
returns a recognizer (and does not, it executes one) is a valid point. A better name is needed. At the moment it is a VALUE
and does return a recognizer. Now, it is a deferred word, and does recognize strings. We should keep it with Anton's unification: a sequence of recognizers can be combined to one recognizer. Just because it's now recognizing more different things, it's still a recognizer. No need to find another synonym. Takes string, returns data+translator token ? is a recognizer.
Maybe RECOGNIZE-FORTH
is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.
DEFER
is a core word now, so usingDEFER
for such a thing is ok.
Actually, DEFER
, as well as TO
, is a Core extension word, so it's optional. But it's another argument.
Back to my first argument, what do you suggest if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter?
You can ask, do I have an example of such requirement. Yes, I do. I want to provide a method to undo such switching in my system. It's similar to effect of the "PREVIOUS" word for the search order. Perhaps you can suggest some solution with the deferred word?
Anton's unification: a sequence of recognizers can be combined to one recognizer.
Yes. I too said that any sequence of recognizers seq-x (from API v4) can be represented as a single recognizer : recognize-x seq-x recognize ;
. So, sequences are excessive in the basic API, — a Forth system doesn't need to know is it a sequence or not.
Maybe RECOGNIZE-FORTH is the corresponding verb. It takes a string and recognizes it if this is valid FORTH.
It's better. But it recognizes not valid FORTH, but anything what the Forth text interpreter can currently recognize (and only that).
Conceptually, this word isn't just a recognizer. There is a single special system's slot for a recognizer that is used by the Forth text interpreter. We can put any recognizer into this slot. We can also perform the recognizer that is placed into this slot. So this word performs the recognizer from this slot. I incline to call this slot "perceptor". And after that the word that performs the recognizer from this slot becomes "perceive".
All recognizer names have the pattern RECOGNIZE-*. The idea is to not put this special word on a par with all other recognizers. For that, its better to find a name that is distinct from the RECOGNIZE-SOMETHING pattern. What do you think?
Actually, DEFER, as well as TO, is a Core extension word, so it's optional. But it's another argument.
This argument is that a Forth system can be implemented as a minimal kernel and additional libraries. And DEFER
, IS
, ACTION-OF
can be available via a library. But when we put a deferred word into this API, we force a system's author to put DEFER
, IS
, ACTION-OF
into the kernel too. But actually they isn't required in the kernel. It would be too restrictive limitation on the implementations.
Locate
locate
cannot work for lexemes that can be recognized (translated) according to this proposal.
The last comment was intend for the proposal of AndrewHaley, and it was mistakenly placed here.
The recognizer will be an option, as well. At the moment, FORTH-RECOGNIZER
is proposed to be a value. That's also a CORE EXT word (as is TO
).
A minimalistic system that wants to implement recognizers needs FORTH-RECOGNIZER
to be a deferred word. I.e. it needs code for DODEFER
. It can load the rest of the deferred word stuff later as extension.
Certainly, recognizers is an option. I didn't mean that some required part requires an optional part. I mean that one optional part requires another complex optional part without any good and fair ground.
Yes, a minimalistic system that wants to provide a deferred word needs only code for DODEFER
. But it still makes bootstrapping of this system more complex. Hence, when we put a deferred word into API, we make things more complex for some implementations. But we don't even have a rationale for that.
Also, with deferred word we still don't have a solution if a system needs to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.
Binary constructor
: two-recognizers ( xt1 xt2 "name" -- )
create , ,
does>
dup >r @ execute dup rectype-null <> if
r> drop exit then
r> cell+ @ execute ;
This constructor expects that a recognizer doesn't consume ( c-addr u )
on rejection.
Otherwise (if a recognizer consumes ( c-addr u)
in any case) the definition will be a bit more complex:
: two-recognizers ( xt1 xt2 "name" -- )
create , ,
does> ( c-addr u a-addr-body )
dup >r -rot 2dup 2>r rot
@ execute dup rectype-null <> if
rdrop rdrop rdrop exit
then drop
2r> r> cell+ @ execute
;
Nevertheless, I'm inclined to agree that if a recognizer consumes ( c-addr u )
in any case, it seemingly makes shorter the total lexical size of overall code.
Whether to pass the first recognizer on top or bottom is also unclear
It is more clear if they are passed left to right, i.e., we place them into the stack in the same order in which they should be executed: the first placed is executed fist, the second placed is executed second (if any), the last placed (that is topmost) is executed last.
This situation is similar to the order of local variables (in declaration): direct mapping is more clear.
proposal - Traverse-wordlist does not find unnamed/unfinished definitions
I would suggest to avoid "named word" pleonasm in "for every named word that can be found", since an unnamed definition cannot be found. I.e., if a definition can be found, then it certainly has a name.
A possible variant of this part:
"Execute xt once for every word that can be found,"
A possible variant that unites both corrections into a single one:
"Execute xt once for every word that can be found in the word list wid, and for every word whose name matches the name of a found word but placed earlier in this word list,"
The phrase "same name" is inappropriate since it doesn't take into account possible case insensitivity. However, names matching is described in 3.4.2 Finding definition names.
Also, the following typo can be corrected:
"words with the same name are called in the order newest-to-oldest (possibly with other words in between)"
?
"words with the <b>matched names are visited in the order newest-to-oldest (possibly with other words in between)"
proposal - Traverse-wordlist does not find unnamed/unfinished definitions
The proposal was voted on and accepted 10Y/0N/1A. The vote was closed on 2020-09-03. If you think that the voted-on version is unclear enough to be improved, you need to make a new proposal.
I think it is clear enough, though. "Named word" may be a pleonasm, but it is clear. The way that "same name" is used in the voted-on version makes it clear that all matching names are considered to be the same.
Concerning "are called": Yes, "are visited" is intended, so one could make another proposal for fixing that. But nobody seems to have been confused by "are called" yet.
proposal - Traverse-wordlist does not find unnamed/unfinished definitions
If someone proposes another revision, one could write:
When a word becomes findable, it also becomes traversable. The word the stays traversable until it is deleted.
and then define the rest in terms of "traversable", in particular:
Execute xt once for every traversable word in the wordlist wid,
Firstly, there is a need for user-defined literals and some other kinds of prefix notation. Anyone who needs anything more exotic (or powerful) and wants it to be standardized had better provide evidence that it's needed for Forth programs. A good design will have everything you need and nothing more.
Secondly, 'a::b
would just work. Any system supporting a::b
as wordlist::word would have to redefine FIND
to break the tokens apart: a recognizer for '
-prefixed words would call FIND
, which would find the word.
Re Jenny's point. It's necessary to define some mechanism by which "performing the interpretation semantics" of some rec-type might be performed. It seems to me more appropriate to specify exactly how that gets done here: it gets done by the called recognizer word. The "semantics" are whatever the recognizer does.
Are you objecting to the use of the common word "recognize"?
Though the common word "recognize" is used in a non usual meaning. Your "recognizer" does not just recognize a lexeme, but also performs interpretation or compilation semantics for the lexeme. It's confusing that performing semantics is a part of recognizing by your interpretation.
Firstly, there is a need for user-defined literals and some other kinds of prefix notation
What is a literal?
By the first glance,'X
is a literal, a::b
is a literal, 'a::b
is a literal too — the run-time semantics for all of them is just to put a number (an xt) into the stack.
Any system supporting a::b as wordlist::word would have to redefine FIND
Do you mean that it should be done in a non standard way (i.e., not over the API you are proposing)?
An issue of your API is that we cannot define 'X
format in the general form: '\<any-literal-that-is-mapped-to-single-xt\>
. Ditto we cannot define wordlist::word
format in the general form \<any-literal-that-is-mapped-to-single-xt\>::name
.
Re Jenny's point.
Jenny is right substantially (since "rectype" is not used in this proposal). The idea is that the found "[RECOGNIZE]" word should perform interpretation semantics for the lexeme if interpreting, and compilation semantics if compiling.
"If found, perform the interpretation sematics of the found recognizer"
The Forth text interpreter only performs interpretation semantics if interpreting, and compilation semantics if compiling. So this phrase in the specification makes things too confusing. Better to say: "perform the execution semantics".
Correction:
By the first glance,'X
is a literal, 'a::b
is a literal too — the run-time semantics for all of them is just to put a number (an xt) into the stack.
Re a::b
— it's run-time semantics may be other.
CORE has only VARIABLE
as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER
has to be clumsy, i.e.
forth-recognizer @ execute execute
Clumsy interfaces can not be changed if you have better things at hand. You can probably wrap around the clumsy interface, e.g.
Defer recognize-forth
addr recognize-forth Constant forth-recognizer
if you can use ADDR
to access the deferred word's xt storage location. But then you have another interface, less clumsy, and only available when you have DEFER
+ADDR
(and ADDR
is not even part of the standard).
A minimalistic API, as what I am looking for here is one where you don't have to document much. The less uniform an API is, the more you have to document. The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt )
as stack effect. And combinations of recognizers have the same effect. And the system's recognizer is just another one, which you can swap in and out. And you can define a REC-SEQUENCE
, where you can manipulate the sequence, and put that into the system's recognizer.
This uniformity is broken when you don't use a deferred word for the system's recognizer — you can't just call that one as you can call the others. You need @ EXECUTE
. This is clumsy.
CORE has only
VARIABLE
as option for storing things to change. As a result, the interface to use FORTH-RECOGNIZER has to be clumsy, i.e.forth-recognizer @ execute execute
I don't suggest to use a variable in the interface, — it's even worse than a defer. When a variable is used to change something, this changing cannot be effectively detected. But the requirement is: an ability for a system to perform internal actions on switching the recognizer that is currently used by the Forth text interpreter.
For that I would prefer to have the separate words in the API: a setter, a getter and a "performer" (a word that performs the recognizer that is currently used by the Forth text interpreter).
What are your objections to have several separate words in the minimalistic API?
The uniformity here is that a recognizer is a word that has ( addr u -- i*x translator-xt ) as stack effect.
I strongly support this approach (and I myself suggested this approach too, with slightly different stack effects).
This uniformity is broken when you don't use a deferred word for the system's recognizer
It seems, the set of words like the following (the names may vary):
perceive ( c-addr u -- k*x tt )
set-perceptor ( xt -- )
perceptor ( -- xt )
doesn't brake the mentioned uniformity. Please, clarify.
Using special setters and getters means you have another (special purpose) DEFER
mechanism here. Of course you can implement that with
variable current-perceptor
: perceive ( addr u -- i*j token ) current-perceptor @ execute ;
: set-perceptor ( xt -- ) current-perceptor ! ;
: perceptor ( -- xt ) current-perceptor @ ;
which is probably a bit less implementation effort than DEFER
, IS
, and ACTION-OF
. Or really?
State-Smart:
: defer Create ['] noop , does> @ execute ;
: is ' >body state @ if ]] literal ! [[ else ! then ; immediate
: action-of ' >body state @ if ]] literal @ [[ else @ then ; immediate
or with NDCS:
: defer Create ['] noop , does> @ execute ;
: is ' >body ! ; ndcs: ' >body ]] literal ! [[ ;
: action-of ' >body @ ; ndcs: ' >body ]] literal @ [[ ;
DEFER
is really a lightweight way to define words that can be changed.
These three lines of code are doing more than the three lines of code you need in addition when you have your special-purpose setter and getter, but they are still one-liners.
Forthers like to reinvent the wheel. But don't overdo this.
proposal - Wording: declare undefined interpretation semantics for locals
POSTPONEing a local doesn't/shouldn't work either.
The similarity between wordlists and a search order has inspired the idea of nestable search orders: Several wordlists could be combined into a sequence that itself would work like a wordlist in other search orders. However, the search order words had already been standardized, so this idea never made it out of the concept stage.
The similarity between the search order and recognizer sequences has led to the present recognizer proposal containing the words GET-RECOGNIZER and SET-RECOGNIZER, which are mostly modeled on GET-ORDER and SET-ORDER.
At first glance, it's simple to convert a wordlist into a recognizer, so recognizer sequences would also give nestable search orders. If WORDLIST
returned the xt of an anonymous recognizer... but there would still be problems deciding how to SET-CURRENT
. There would still have to be a difference between recognizers that search the dictionary (called by REC-NAME
or similar) and other recognizers, otherwise there can be no concept of a 'current search order'
So, do we need a FORTH-RECOGNIZER
that combines the two? Is it sufficient to replace the 'word-not-found' portion of the interpreter? So far, I have only seen one use-case for a user-written recognizer to precede REC-NAME
and I suspect such users would be better served by having their own interpreter loop rather than patching in to the system one. Maybe all that is needed is the ability to add a recognizer to the current stack and leave it their until it is removed by MARKER
or the stack is reset by QUIT
, in which case:
: +RECOGNIZER ( _name_ -- ) ' action-of recognized two-recognizers ;