Digest #232 2023-09-16
Contributions
Hint: Please delete the blockquote explanations, they are just for your convenience while writing the proposal
Author:
The name of the author(s) of the proposal.
Change Log:
A list of changes to the last published edition on the proposal.
Problem:
This states what problem the proposal addresses.
Solution:
A short informal description of the proposed solution to the problem identified by the proposal.
This gives the rationale for specific decisions you have taken in the proposal (often in response to comments), or discusses specific issues that have not been decided yet.
Typical use: (Optional)
Shows a typical use of the word or feature proposed; this should make the formal wording easier to understand.
Proposal:
This should enumerate the changes to the document.
For the wording of word definitions, use existing word definitions as a template. Where possible, include the rationale for the definition.
Reference implementation:
This makes it easier for system implementors to adopt the proposal. Where possible, the reference implementation should be provided in standard Forth. Where this is not possible because system specific knowledge is required or non-standard words are used, this should be documented.
Testing: (Optional)
This should test the words or features introduced by the proposal, in particular, it should test boundary conditions. Test cases should work with the test harness in Appendix F.
Replies
As you note, this state-smart approximation to the implementation of S"
does not quite provide implement what the standard requires, so instead better do something like:
Typical use:
: quine [ source ] sliteral type ;
Note that you can use 2literal
instead of sliteral
if the string has permanent lifetime.
Actually, reading through the discussion of the request, Jim Peterson wants to implement N>R as copying to a separate buffer, and leaving the address of the return stack. This kind of implementation is suggested by the rationale of N>R.
By comparison, with the first version of this proposal postpone
can be implemented like this:
: postpone parse-name forth-recognize -2 swap execute ; immediate
which would not contain non-standard usage like -2 state !
, and it would also work in interpret state (not the most important feature, but a feature nonetheless). And ]]
could also be implemented as a standard program.
I don't want to restrict the usage of rectypes/translators to state-dependent outer interpreters. Other uses may be rare, but they exist, and people may come up with more over time if we make the interface flexible enough. The proposal does not propose to standardize state-independent ways to get at the functionality. Therefore, if the proposal is accepted, they don't exist for standard programs, and therefore they are not counterarguments against the disadvantages of the proposed state-dependent-only translators. The fact that this state-dependence means that you cannot use rectypes/translators to build other rectypes/translators is another (minor) argument against the state-dependence.
Concerning having a state-independent rectype as an abstract data type, the first version of this proposal proposed that rectype is an executable word with stack effect ( i*x state -- j*x )
where state would be 0, -1, or -2. This does not expose anything about the internals, and even allows to define rectypes without using a special defining word. The invocation in the text interpreter is ( i*x rectype ) state @ swap execute
, and in postpone
it´s as shown above.
Alternatively, if the rectype is the address of some data structure, yes, we would need an additional word, maybe rectype-translate ( i*x rectype n -- j*x )
that performs the access to the data structure. The usage in the text interpreter would be ( i*x rectype ) state @ rectype-translate
and the usage in postpone
would be ( i*x rectype ) -2 rectype-translate
.
Bernd writes:
The most obvious difference is that with
translator-execute
, you need another word.
Yes, essentially I agree concerning translator-execute
and execute
alternatives.
Yet another difference is that with translator-execute
the Forth text interpreter (the outer loop) should know this additional word (probably it means more degree of coupling). But with execute
— it should not know any additional word.
to make word list ids executable [...] but it is clear that they can't be normal colon definitions
Another example is defer-words (words created by defer
), which are executable but are not normal colon definitions — defer!
and defer@
can be applied to their xt.
The following implementation should be standard, too
The provided implementation for ]]
is system dependent, namely it depends on implementation of Recognizers API.
But, anyway, Gforth's ]]
can be implemented in a standard way via postpone
.
A translator is the address of a data structure, which also happens to be executable. This is not a contradiction! And there was a proposed standard way to access fields directly, renamed from the Trute proposal (but with otherwise identical, value-field like semantics) to INTERPRET-TRANSLATOR
, COMPILE-TRANSLATOR
, and POSTPONE-TRANSLATOR
. The reason I deleted these is that we don't even use them in Gforth, we only use >POSTPONE
, which has a different effect (it does not read out the xts, it executes it right away). If there is consensus that this is the right interface (not a value-field, but a defer-field), I can add this back to the proposal; as well as adding a standard way to set the state without knowing the internals of the system, for which the file recognizer-ext.fs
in Gforth also provides a suggestion:
: translate-state ( translator-access-xt -- )
\ takes a translator access xt, and may check if that actually is one
>body @ cell/ negate state ! ;
The hypothetical more performant implementation in Reply 1043 would have a different translate-state
, which would contain something like
>body @ ['] do-translate >body cell+ !
and only change STATE
for interpret/compile.
This proposal is minimalistic on purpose and does not cover all corner cases, especially not those where no consensus has been reached yet.
I consider the magic number dispatch method proposed earlier as not appropriate: this is tied to a specific implementation, and not a good interface. Method invocation or field access should be done by named access words, not by numbers.
Anton writes:
postpone can be implemented like this
postpone
can be implemented in any variant of the Recognizer API, with more or less code.
A difference is whether the behavior of postpone
can be extended/changed without redefinition of postpone
.
My point: if users need to extended behavior of postpone
without redefinition, then a special method can be specified for that. OTOH, postpone
(and ]]
) is a poor man's "postponing mode". An example of a more convenient tool is my c-state PoC, which provides a better tool for users, and it even supports any new user-defined special words.
I don't want to restrict the usage of rectypes/translators to state-dependent outer interpreters.
It's not an argument, since the API can provide words like compile-token
, execute-token
, postpone-token
, having ( i*x xt.translator -- j*x )
or ``( ix rectype -- jx )`, which are state-independent and don't restrict usage in the mentioned way.
@enoch wrote:
Readability of the source code is my main concern.
I rarely need a forward definition due to mutual recursion. And I also value readability.
My latest variant is to use recognizers to refer to a forward (not yet defined) definition, for example use a prefix fw:
:
: foo ... fw:bar ... ;
: bar ... foo ... ;
In this solution we don't need any separate definition.
The committee thinks this idea looks promising and that it pops up repeatedly in discussions.
We encourage the author or anybody else interested to track past practice like f:
and r:
or forward \<foo\>
and : \<foo\>
and create a full proposal.
The committee asks the author to please work the comments into your proposal and update it. Also please provide a full reference implementation.
Anton wrote
Letting all synonyms have the same xt is certainly a good solution. Should we require it? I don't know a reason for or against (in particular, I don't think that there is a system that has SYNONYM where synonyms of the same word have different xts), so in the spirit of tightening, we probably should require it.
This requirement is too restrictive for systems, without a profit for programs.
In some systems xt
is equivalent to nt
for the same word. And the standard allows that. So, in such a system a synonym with different name cannot have the same xt as the original word.
Also, without this requirement a system may have different xt for synonyms, which allows it to show a correct name in a decompilation tool.
I wrote above:
To support this feature, it is enough to just create the lower-case synonyms for the standard words.
I was wrong. Actually, it depends on the system. It's probably enough only if a synonym will have the same xt as the original word. Otherwise synonym
cannot help.
In general, this proposal is a halfway to encourage systems to support lower-case spelling as the the modern programming languages do.
proposal - Let us adopt the Gerry Jackson test suite as part of Forth 200x
The committee thinks, that the test-harness should be proposed as standard words and be part of an existing wordset (e.g. tools) or a new one (e.g. testing).
Furthermore the testcases shall be easily usable by systems and appear under the words they apply to. Existing testcases in the standard also will be moved into the new test-system.
Just for reference, see also the discussion concerning addresses between different runs from the same saved image. Probably, this issue should be discussed further and reflected in the rationale too.
The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.
proposal - F>R and FR> to support dynamically-scoped floating point variables
The brace notation (really {:
and :}
) replaced words like (LOCAL)
, and there, you can define floating point locals with a F:
word before them, i.e. {: F: r1 F: r2 :}
creates two floating point locals called r1
and r2
.
Gforth, bigForth and VFX supports this notation, SwiftForth doesn't.
The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.
All classic single-xt systems conform to this specification. And its easy to implement the specified find
in dual-xt systems.
This proposal is not enough formal, it should be made more formal.
Author
Ruv
Change Log
2020-02-20 Initial comment for NAME>INTERPRET 2023-09-14 Make this proposal more formal
Problem
Currently the specification for name>interpret
says that returned "xt represents the interpretation semantics of the word nt".
But actually, in some cases a Forth system cannot provide an xt that performs the defined interpretation semantics for the corresponding word regardless of the STATE
.
Particularly, when the words like s"
or to
are implemented as STATE-dependent immediate words. Technically it is possible to return a correct xt according to the current specification (e.g. via generation of the corresponding definition on the fly), but it can be too burden.
Another minor problem is that it's not clear what the word represent means. According to the language of the standard, xt identifies some semantics.
Solution
The specification for name>interpret
can be adjusted to solve the mentioned problem.
There are two options:
Allow to return
0
if the system cannot return xt that identifies the interpretation semantics for the word identified bynt
Allow to return state-dependent xt, which performs interpretation semantics in interpretation state only.
Proposal
Replace the following phrase in the section 15.6.2.1909.20 NAME>INTERPRET:
xt represents the interpretation semantics of the word nt. If nt has no interpretation semantics,
NAME>INTERPRET
returns 0.
by the following phrase:
xt identifies the execution semantics for the word identified by nt. When this xt is executed in interpretation state, the interpretation semantics for the word is performed. If the system does not provide execution semantics for the word,
NAME>INTERPRET
returns 0.
A new idea to discuss
Initially I suggested to declare ambiguity when Tick is applied to any word for which are not defined both execution semantics and interpretation semantics.
But instead we can precisely specify this cases to reduce ambiguity on some degree.
We can say that if execution semantics for the word are not specified by the standard, the returned xt identifies some system-dependent execution semantics, and when this xt is performed in interpretation state, the interpretation semantics for the word are performed. Performing this xt in compilation state is ambiguous.
This idea is close to the proposal [212] Tick and undefined execution semantics - 2
It says: "If name has no execution semantics, the behavior of xt is implementation dependent and may lead to an ambiguous condition"
The difference is that I suggest to specify the behavior in interpretation state.
If the standard does not specify interpretation semantics for the word, then system-defined interpretation semantics are performed.
Concerning this proposal in general — see also my comment about reducing ambiguity on some degree.
The committee decided to put this proposal in formal state. The author decides when to put it into community vote.
proposal - 2023 Standards meeting agenda (2023-09-13 to 2023-09-15)
Archived for posterity
proposal - Agenda Forth-200x interim Meeting 2023-02-17T15:00Z
Archived for posterity
Archived for posterity
proposal - Agenda Forth-200x interim Meeting 2020-02-18T14:00Z
Archived for posterity
In the version of 2022-09-16 we retain referencing of CREATE
. But it adds some complexity for SYNONYM
(proposal), since a synonym for CREATE
is another word.
The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.
proposal - Revert rewording the term "execution token"
The committee considers this proposal formal.
The committee considers this proposal formal.
proposal - Obsolescence for SAVE-INPUT and RESTORE-INPUT
The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.
Note: The committee likes to point out, that these words cannot be made informal, as they are used to implement interpreted loops.
proposal - Include a revised 79-STANDARD Specification for "><" To "Core Ext"
In networking code you need to switch endianness for an integer number of particular width (in bits). But a cell size can vary.
Probably, we need such a word for each width from 16, 32, 64 bits.
Author:
Anton Ertl Leon Wagner
Change Log
2023-09-14 Revision after discussion (AE) 2023-09-13 Initial proposal
Problem:
The stack comments for N>R and NR> don't make it clear that n items are moved between the data and return stacks.
Solution:
The stack comments should more clearly indicate that n data stack items are moved to or from the return stack.
Proposal:
In the definition of N>R
, replace
( i * n +n -- ) ( R: -- j * x +n )
with
( x_n ... x_1 n -- ) ( R: -- j * x +n )
In the definition of NR>
, replace
( -- i * x +n ) ( R: j * x +n -- )
with
( -- x_n ... x_1 +n ) ( R: j * x +n -- )
Discussion
On the return stack, j*x +n
because the data may be in a separate buffer and only the address and +n
on the return stack. +n
on the return stack because the original specified that, and changing that would be a substantial change.
On the data stack x_n ... x_1 +n
because that is the way we usually specify a numbered number of cells (even for +n=0
). See, e.g., get-order
.
Author:
Bernd Paysan
Change Log:
- 2020-09-06 initial version
- 2020-09-08 taking ruv's approach and vocabulary at translators
- 2020-09-08 replace the remaining rectypes with translators
- 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
- 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
- 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
- 2022-09-10 More complete reference implementation
- 2022-09-10 Add use of extended words in reference implementation
- 2022-09-10 Typo fixed
- 2022-09-12 Fix for search order reference implementation
- 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
- 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
- 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
- 2023-09-13 Make clear that
TRANSLATE:
is the only way to define a standard-conforming translator. - 2023-09-15 Add list of example recognizers and their names.
Problem:
The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.
Solution:
Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.
Important changes to the original proposal:
- Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
- Make the recognizer sequence executable with the same effect as a recognizer
- Make sure the API is not mandating a special implementation
This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.
The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like
: rec-xt ( addr u -- translator )
here place here find dup IF
0< state @ and IF compile, ELSE execute THEN ['] drop
ELSE drop ['] notfound THEN ;
then you should factor the part starting with state @ out and return it as translator:
: translate-xt ( xt flag -- )
0< state @ and IF compile, ELSE execute THEN ;
: rec-xt ( addr u -- ... translator )
here place here find dup IF ['] translate-xt
ELSE drop ['] notfound THEN ;
In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use -48 throw
, otherwise define a postpone action:
:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ;
:noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ;
translate: translate-xt
The standard interpreter loop should look like this:
: interpret ( i*x -- j*x )
BEGIN parse-name dup WHILE forth-recognize execute REPEAT
2drop ;
with the usual additions to check e.g. for empty stacks and such.
Typical use
TBD
Proposal:
XY. The optional Recognizer Wordset
A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND
):
REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )
XY.3 Additional usage requirements
XY.3.1 Translator
translator: subtype of xt, and executes with the following stack effect:
TRANSLATE-THING ( j*x i*x -- k*x )
A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.
i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.
XY.6 Glossary
XY.6.1 Recognizer Words
FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER
Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND
if not.
NOTFOUND ( -- ) RECOGNIZER
Performs -13 THROW
. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.
TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT
Create a translator word under the name "name". This word is the only standard way to define a translator.
"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.
Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by EXECUTE
to keep the API small. There may be other, non-standard modes of operation, where the individual component xts are accessed STATE
-independently, which only works on translators created by TRANSLATE:
(e.g. for implementing POSTPONE
), so any other way to define a translator is non-standard.
XY.6.2 Recognizer Extension Words
SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT
Assign the recognizer xt to FORTH-RECOGNIZE.
Rationale:
FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE
.
FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT
Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.
Rationale:
FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE
. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER
, too.
RECOGNIZER-SEQUENCE: ( n*xt n "name" -- ) RECOGNIZER EXT
Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with the topmost xt on stack and proceeding towards the bottommost xt until successful.
SET-RECOGNIZER-SEQUENCE ( n*xt n xt-seq -- ) RECOGNIZER EXT
Set the recognizer sequence of xt-seq to xt1 .. xtn.
GET-RECOGNIZER-SEQUENCE ( xt-seq -- n*xt n ) RECOGNIZER EXT
Obtain the recognizer sequence xt-seq as n*xt n.
TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT
Translates a name token.
TRANSLATE-NUM ( n -- n | ) RECOGNIZER EXT
Translates a number.
TRANSLATE-DNUM ( d -- d | ) RECOGNIZER EXT
Translates a double number.
TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT
Translates a floating point number.
TRANSLATE-STRING ( addr u -- addr u | ) RECOGNIZER EXT
Translates a string.
Reference implementation:
This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.
Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
BEGIN
?stack parse-name dup WHILE
forth-recognize execute
REPEAT ;
: lit, ( n -- ) postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , ,
does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap lit, compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )
: rec-nt ( addr u -- nt nt-translator / notfound )
forth-wordlist find-name-in dup IF ['] translate-nt ELSE drop ['] notfound THEN ;
: rec-num ( addr u -- n num-translator / notfound )
0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop ['] notfound THEN ;
: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ;
' minimal-recognizer is forth-recognize
Extensions reference implementation:
: set-forth-recognize ( xt -- )
is forth-recognize ;
: forth-recognizer ( -- xt )
action-of forth-recognize ;
Stack library
: STACK: ( size "name" -- )
CREATE 0 , CELLS ALLOT ;
: SET-STACK ( item-n .. item-1 n stack-id -- )
2DUP ! CELL+ SWAP CELLS BOUNDS
?DO I ! CELL +LOOP ;
: GET-STACK ( stack-id -- item-n .. item-1 n )
DUP @ >R R@ CELLS + R@ BEGIN
?DUP
WHILE
1- OVER @ ROT CELL - ROT
REPEAT
DROP R> ;
Recognizer sequences
: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
DUP >R @
BEGIN
DUP
WHILE
DUP CELLS R@ + @
2OVER 2>R SWAP 1- >R
EXECUTE DUP ['] NOTFOUND <> IF
2R> 2DROP 2R> 2DROP EXIT
THEN
DROP R> 2R> ROT
REPEAT
DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
DOES> recognize ;
: ?defer@ ( xt1 -- xt2 )
BEGIN dup is-defer? WHILE defer@ REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;
Once you have recognizer sequences, you shall define
' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize
The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.
: find-name-in ( addr u wid -- nt / 0 )
execute ['] notfound = IF 0 THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
['] search-order set-recognizer-sequence ;
Recognizer examples
REC-NT ( addr u -- nt translate-nt | notfound ) Search the locals wordlist if locals have been defined, and then the search order for a definition matching the string addr u, and provide that name token as result.
REC-NUM ( addr u -- n translate-num | d translate-dnum | notfound ) Try converting addr u into a number, and on success return either a single number n and translate-num, or a double number d and translate-dnum.
REC-FLOAT ( addr u -- r translate-float | notfound ) Try converting addr u into a floating point number, and on success return that number r and translate-float.
REC-STRING ( addr u "string"<"> -- addrs us translate-string | notfound "string"<"> ) Convert quoted strings (i.e. addr u starts with '"') in the input stream into string literals, performing the same escape handling as S\" and on success return the converted string as addrs us and translate-string.
REC-TICK ( addr u -- xt translate-num | notfound ) If addr u starts with a ````` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.
REC-SCOPE ( addr u -- nt translate-nt | notfound ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:
name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE
is identical in effect to REC-NT
.
REC-TO ( addr u -- xt n translate-to | notfound ) Handle the following syntax of TO
-like operations of value-like words:
* ->
value as TO
value or IS
value
* +>
value as +TO
value
* '>
value as ADDR
value
* @>
value as ACTION-OF
value
xt is the execution token of the value found, n indexes which variant of a TO
-like operation is meant, and translate-to is the corresponding translator.
REC-ENV ( addr u -- addrs us translate-env | notfound ) Takes a pattern in the form of ${
name}
and provides the name as addrs us on the stack. The corresponding translator translate-env is responsible for looking up that name in the operating system's environment variable array.
REC-COMPLEX ( addr u -- rr ri translate-complex | notfound ) Converts a pair of floating point numbers in the form of float1+
float2i
into a complex number on the stack, and returns translate-complex on success.
Testing
TBD
Author:
Bernd Paysan
Change Log:
- 2020-09-06 initial version
- 2020-09-08 taking ruv's approach and vocabulary at translators
- 2020-09-08 replace the remaining rectypes with translators
- 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
- 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
- 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
- 2022-09-10 More complete reference implementation
- 2022-09-10 Add use of extended words in reference implementation
- 2022-09-10 Typo fixed
- 2022-09-12 Fix for search order reference implementation
- 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
- 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
- 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
- 2023-09-13 Make clear that
TRANSLATE:
is the only way to define a standard-conforming translator. - 2023-09-15 Add list of example recognizers and their names.
Problem:
The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.
Solution:
Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.
Important changes to the original proposal:
- Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
- Make the recognizer sequence executable with the same effect as a recognizer
- Make sure the API is not mandating a special implementation
This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.
The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like
: rec-xt ( addr u -- translator )
here place here find dup IF
0< state @ and IF compile, ELSE execute THEN ['] drop
ELSE drop ['] notfound THEN ;
then you should factor the part starting with state @ out and return it as translator:
: translate-xt ( xt flag -- )
0< state @ and IF compile, ELSE execute THEN ;
: rec-xt ( addr u -- ... translator )
here place here find dup IF ['] translate-xt
ELSE drop ['] notfound THEN ;
In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use -48 throw
, otherwise define a postpone action:
:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ;
:noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ;
translate: translate-xt
The standard interpreter loop should look like this:
: interpret ( i*x -- j*x )
BEGIN parse-name dup WHILE forth-recognize execute REPEAT
2drop ;
with the usual additions to check e.g. for empty stacks and such.
Typical use
TBD
Proposal:
XY. The optional Recognizer Wordset
A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND
):
REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )
XY.3 Additional usage requirements
XY.3.1 Translator
translator: subtype of xt, and executes with the following stack effect:
TRANSLATE-THING ( j*x i*x -- k*x )
A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.
i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.
XY.6 Glossary
XY.6.1 Recognizer Words
FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER
Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND
if not.
NOTFOUND ( -- ) RECOGNIZER
Performs -13 THROW
. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.
TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT
Create a translator word under the name "name". This word is the only standard way to define a translator.
"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.
Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by EXECUTE
to keep the API small. There may be other, non-standard modes of operation, where the individual component xts are accessed STATE
-independently, which only works on translators created by TRANSLATE:
(e.g. for implementing POSTPONE
), so any other way to define a translator is non-standard.
XY.6.2 Recognizer Extension Words
SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT
Assign the recognizer xt to FORTH-RECOGNIZE.
Rationale:
FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE
.
FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT
Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.
Rationale:
FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE
. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER
, too.
RECOGNIZER-SEQUENCE: ( n*xt n "name" -- ) RECOGNIZER EXT
Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with the topmost xt on stack and proceeding towards the bottommost xt until successful.
SET-RECOGNIZER-SEQUENCE ( n*xt n xt-seq -- ) RECOGNIZER EXT
Set the recognizer sequence of xt-seq to xt1 .. xtn.
GET-RECOGNIZER-SEQUENCE ( xt-seq -- n*xt n ) RECOGNIZER EXT
Obtain the recognizer sequence xt-seq as n*xt n.
TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT
Translates a name token.
TRANSLATE-NUM ( n -- n | ) RECOGNIZER EXT
Translates a number.
TRANSLATE-DNUM ( d -- d | ) RECOGNIZER EXT
Translates a double number.
TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT
Translates a floating point number.
TRANSLATE-STRING ( addr u -- addr u | ) RECOGNIZER EXT
Translates a string.
Reference implementation:
This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.
Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
BEGIN
?stack parse-name dup WHILE
forth-recognize execute
REPEAT ;
: lit, ( n -- ) postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , ,
does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap lit, compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )
: rec-nt ( addr u -- nt nt-translator / notfound )
forth-wordlist find-name-in dup IF ['] translate-nt ELSE drop ['] notfound THEN ;
: rec-num ( addr u -- n num-translator / notfound )
0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop ['] notfound THEN ;
: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ;
' minimal-recognizer is forth-recognize
Extensions reference implementation:
: set-forth-recognize ( xt -- )
is forth-recognize ;
: forth-recognizer ( -- xt )
action-of forth-recognize ;
Stack library
: STACK: ( size "name" -- )
CREATE 0 , CELLS ALLOT ;
: SET-STACK ( item-n .. item-1 n stack-id -- )
2DUP ! CELL+ SWAP CELLS BOUNDS
?DO I ! CELL +LOOP ;
: GET-STACK ( stack-id -- item-n .. item-1 n )
DUP @ >R R@ CELLS + R@ BEGIN
?DUP
WHILE
1- OVER @ ROT CELL - ROT
REPEAT
DROP R> ;
Recognizer sequences
: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
DUP >R @
BEGIN
DUP
WHILE
DUP CELLS R@ + @
2OVER 2>R SWAP 1- >R
EXECUTE DUP ['] NOTFOUND <> IF
2R> 2DROP 2R> 2DROP EXIT
THEN
DROP R> 2R> ROT
REPEAT
DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
DOES> recognize ;
: ?defer@ ( xt1 -- xt2 )
BEGIN dup is-defer? WHILE defer@ REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;
Once you have recognizer sequences, you shall define
' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize
The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.
: find-name-in ( addr u wid -- nt / 0 )
execute ['] notfound = IF 0 THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
['] search-order set-recognizer-sequence ;
Recognizer examples
REC-NT ( addr u -- nt translate-nt | notfound ) Search the locals wordlist if locals have been defined, and then the search order for a definition matching the string addr u, and provide that name token as result.
REC-NUM ( addr u -- n translate-num | d translate-dnum | notfound ) Try converting addr u into a number, and on success return either a single number n and translate-num, or a double number d and translate-dnum.
REC-FLOAT ( addr u -- r translate-float | notfound ) Try converting addr u into a floating point number, and on success return that number r and translate-float.
REC-STRING ( addr u "string"<"> -- addrs us translate-string | notfound "string"<"> ) Convert quoted strings (i.e. addr u starts with '"') in the input stream into string literals, performing the same escape handling as S\" and on success return the converted string as addrs us and translate-string.
REC-TICK ( addr u -- xt translate-num | notfound ) If addr u starts with a ````` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.
REC-SCOPE ( addr u -- nt translate-nt | notfound ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:
name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE
is identical in effect to REC-NT
.
REC-TO ( addr u -- xt n translate-to | notfound ) Handle the following syntax of TO
-like operations of value-like words:
->
value asTO
value orIS
value+>
value as+TO
value'>
value asADDR
value@>
value asACTION-OF
value
xt is the execution token of the value found, n indexes which variant of a TO
-like operation is meant, and translate-to is the corresponding translator.
REC-ENV ( addr u -- addrs us translate-env | notfound ) Takes a pattern in the form of ${
name}
and provides the name as addrs us on the stack. The corresponding translator translate-env is responsible for looking up that name in the operating system's environment variable array.
REC-COMPLEX ( addr u -- rr ri translate-complex | notfound ) Converts a pair of floating point numbers in the form of float1+
float2i
into a complex number on the stack, and returns translate-complex on success.
Testing
TBD
Things to discuss, because there are still too many variables.
ToDo:
- Rename Recognizers from
REC-
result toRECOGNIZE-
result. A solution for.RECOGNIZERS
drowning the reader inrecognize-
could be to skip that prefix, because all recognizers are supposed to have the same prefix, anyways. - Revert the name of translators to rectypes or some similar word showing that this does describe a type?
- Add mode/state-specific access words to the translators again and decide on how they work. I prefer defer-field likes, which right away execute the corresponding action, and not put an xt on the stack for consumption. Defer-fields could work together with
IS
andACTION-OF
to access the xts within (in Gforth, they do).
Answers to some questions:
A lot of thoughts went into it to make different subsets of this proposal useful on their own, and allow different implementation strategies. The answer to “can I do without feature X” is most likely yes. You can use the subset of the features you want. Stripping away too much results in a subset no longer usable.
- Opening up the whole idea to small systems is useful to gain wider use.
FORTH-RECOGNIZE
is a deferred word in the reference implementation on purpose, and that allows changing it without adding more words. To add more implementation options, you can use the setter and getter words (which are optional) if you don't want to implement it as deferred word to swap in and out named sequences.- The recognizer sequences do have words to get and set the sequence, so you can just work with a single sequence and set/get it if you like. The nesting capability comes by the magical fact that a recognizer sequence has the same stack effect as a recognizer.
- You can do without both, because recognizer sequences can be written as colon definitions “by foot”.
- Named sequences are useful, especially when you swap in recognizer sequences for applications that do something completely different than the Forth recognizer sequence. If you do not want to support named sequences, you can still provide the one single named sequence
FORTH-RECOGNIZE
, and allowSET-RECOGNIZER-SEQUENCE
andGET-RECOGNIZER-SEQUENCE
to operate just on that. That's also an option where recognizers are useful without havingFORTH-RECOGNIZE
being deferred and noRECOGNIZER-SEQUENCE:
. - The
NOTFOUND
return for failure is there so that you can alwaysEXECUTE
the result ofFORTH-RECOGNIZE
and don't have to check for errors there.
Tough question: The string recognizer has a side effect, which is not good. Moving that side effect to the translator is causing other problems, because TRANSLATE-STRING
no longer has the corresponding string on the stack, but needs parsing it later. Actually, parsing should happen in PARSE-NAME
. It still seems to be a hack that doesn't have a perfect solution.
Rename Recognizers from
REC-
result toRECOGNIZE-
result
In general, an abbreviation or acronym may be acceptable to me. But in this case I prefer RECOGNIZE-
rather than REC-
. The main disadvantage of rec
if that it has misleading associations. And the main advantage of recognize
is that it's a whole English word that is very appropriate for our case.
The part referred as "result" should not be a result (of recognizing), but the expected type of the input lexeme. Have a look in your examples — REC-NUM
and REC-TICK
produce the same result type translate-num
, but they accept different types of input lexemes, and these types are identified by NUM
and TICK
symbols correspondingly.
Thus, the naming form for recognizers can be expressed as RECOGNIZE-{lexeme-type-symbol}
.
Revert the name of translators to rectypes or some similar word showing that this does describe a type?
It does describe a type of what? It describes a type of a token i*x
, which is a result of recognizing. Actually, a token translator identifies the type of a token i*x
, which is a result of recognizing. Then, a token translator is a token type in the same time.
If we want to reflect this idea, we can use the acronym tt
, which stands for both: token translator and token type. Then, token translators can be named according to the form TT-{token-type-symbol}
. It looks elegant to me.
The names of translators are used for two purposes: to call a translator (for example, when we define a new translator via existing translators), and to obtain xt of a translator (which is an identifier for a token type in the same time) — to analyze a result of recognizing. The prefix tt-
looks good in these both case.