Digest #290 2024-12-31
Contributions
15.6.2.0930 CODE
says:
Those characters are processed in an implementation-defined manner, generating the corresponding machine code. The process continues, refilling the input buffer as needed, until an implementation-defined ending sequence is processed.
Does this imply that code
actively parses the input source?
If it actively parses, is it standard-compliant?
Replies
requestClarification - Return Stack Notation Mildly Inaccurate
I understand what you're saying, but in terms of conveying information to the reader, I think some indication that there may not be loop control parameters to UNLOOP
, presented in the stack notation, may be of use. I know that the text below it says as much, but ?DO
having the same stack notation as DO
just feels wrong or misleading.
I know, from a machine's perspective, the notation is technically correct, but I feel like this documentation is being written for humans, and they often require (or at least benefit from) a little more hand-holding. The change I suggest is also technically correct but more informative.
@BerndPaysan:
If you eliminate the state-dependence of translators, then text
interpreters that use more than just the xt-int action (e.g., the one
for colorforh-bw, see below) can be written without having to deal with state
.
And text interpreters that use xt-post can be written using the
proposed wordset rather than having to use a detour through postpone
(which is a parsing word, possibly introducing additional
complications).
The following is also relevant to @ruv:
Ruv's colorforth-bw
implementation
demonstrates the shortcomings of the present proposal, because it does
not use recognizers nor translators at all for implementing
recognize-colorforth-bw
; instead, it reimplements everything that the name
recognizer and the number recognizer already do internally, nicely
demonstrating that the present proposal buries the tools. And it only
implements dealing with names and single-cell numbers. Finally, the
implementation is so long (44 lines without putting it into
forth-recognize
) that you have not shown it inline, but posted a
link to github.
By contrast, let's take much of the proposal from [r1081], but replace the state-dependent translators with the state-independent rectypes of [160]. With such a proposal, colorforth-bw might look as follows (untested):
defer recognizer1 forth-recognizer is recognizer1
: prefix>index ( c -- n )
case
'[' of 0 endof
'_' of -1 endof
']' of -2 endof
1 swap
endcase ;
: rectype-colorforth-bw ( ... rectype index state -- ... )
drop \ we use index, not the surrounding Forth interpreter's state
swap execute ;
: recognize-colorforth-bw ( c-addr u -- )
dup 0= if 2drop ['] notfound exit then
over c@ prefix>index dup 0 > if 2drop drop ['] notfound exit then
>r 1 /string recognizer1 r> ['] rectype-colorforth-bw ;
' recognize-colorforth-bw set-forth-recognize
This has only 20 lines (vs. 44), and it uses all the recognizers
originally present in forth-recognizer
(name, integers (including
doubles), FP, etc.). This demonstrates the superior expressive power
of the rectypes from [160] over the translators from [r1081].
BTW, I find the presence of both forth-recognize
and
forth-recognizer
confusing, and would prefer to define
forth-recognize
as deferred word. If you have to have getters and
setters, call the getter get-forth-recognize
.
In this approach, why do you need to write «[postpone _foo» instead of «]foo» ?
Nobody is suggesting that. But you need to perform xt-post in order
to implement ]foo
. In your implementation, you do it by
reimplementing xt-post for the two recognizers you implement
internally to recognize-colorforh-bw
. If you would use a detour
through postpone
instead, you would use the xt-post invoked in that
way. And in my implementation above, xt-post is invoked directly.
requestClarification - Return Stack Notation Mildly Inaccurate
You are correct, but originally failed to get your point across to me, and apparently to ruv, who addressed the issue that a loop-sys may have 0 items on the return stack. But yes, if n1|u1 = n2|u2, no loop-sys is pushed by the run-time semantics of ?do
.
requestClarification - Return Stack Notation Mildly Inaccurate
JimPeterson, I now see what you mean. My argument about size of loop-sys is irrelevant.
that there may not be loop control parameters to
UNLOOP
This is impossible. UNLOOP
can only be used in a loop body. And if ?DO
Run-time semantics do not place loop-sys, the loop body does not gain control.
program1 ( S: u.limit u.initial ) ( R: 0*x ) ?DO ( S: 0*x ) ( R: loop-sys ) program2 ( R: loop-sys ) LOOP ( R: 0*x )
By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | )
would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.
Take a look:
program1 ( S: param-type.1 ) program2 ( S: param-type.2 ) program3
When we specify a stack diagram for program2
, which is ( param-type.1 -- param-type.2 )
, we indicate:
- the input parameter type for
program2
, which should be provided byprogram1
, - the output parameter type of
program2
, which is available forprogram3
, - we can indicate a case when
program3
does not gain control, but we have to indicate in prose who will gain control and what parameters will be passed.
For example:
: ?return-true ( -- ) ]] if true exit then [[ ; immediate
- A correct stack diagram for
?return-true
Run-time semantics:( x -- )
- A more narrow stack diagram for
?return-true
Run-time semantics:( 0 -- | x\0 -- true never )
- But this diagram does not tell us where the output parameter
true
is available (if any).
See also my post about the never data type.
NB: ( n1 | u1 n2 | u2 -- )
is incorrect due to excessive spaces, it's an editorial issue.
requestClarification - Return Stack Notation Mildly Inaccurate
- A correct stack diagram for ?return-true Run-time semantics:
( x -- )
It could be unclear why this diagram is correct if the ?return-true
Run-time semantics do not return control in some cases.
The answer is that the data type never is a subtype of any other data type, and it is a subtype of 0*x
.
( x -- )
⟺ ( x -- 0*x )
⟺ ( x -- 0*x|never )
⟺ ( x -- | x -- never )
A stack diagram by itself does not guarantee that a word returns control. It only guarantees that if the word returns control, it returns a parameter of specified type.
requestClarification - Return Stack Notation Mildly Inaccurate
By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.
Counterexample: THROW
requestClarification - Return Stack Notation Mildly Inaccurate
Anton, thank you for this example. Yes, I'm wrong that it "would be incorrect". Formally, it is still correct. Because specifying a wider data type than possible does not introduce contradictions. But it does introduce confusing. I think that data types should be specified as narrow as possible.
The stack diagram ( k*x n -- k*x | i*x n )
is justified only if neither type of ( i*x n )
and ( k*x )
is a subtype of the other. This holds for the word throw
because ( i*x n )
is never returned to the caller.
But specifying two data types in the union so that the members of one are sometimes returned and the members of the other are never returned is useless and just confusing.
A better diagram for throw
is ( k*x 0 -- k*x | k*x n1\0 -- i*x n1 never )
. This diagram explicitly says that the output parameter of the type ( i*x n1 )
(where the value that corresponds to n1
is the same in the input and in the output parameter) is not available to the caller.
A better diagram for ?do
Run-time semantics is ( n1 n2 -- ; R: -- loop-sys ; | u1 u2 -- ; R: -- loop-sys ; | x1 x1 -- never ; )
. This diagram specifies:
- an ambiguous condition exists if the input parameter has type neither
( n n )
nor( u u )
, - if the first input parameter and the second input parameter are identical, then the loop body may not gain control.
requestClarification - Return Stack Notation Mildly Inaccurate
By "a wider data type than possible" I mean a data type that has more members than another suitable data type. How best to phrase that?
requestClarification - Return Stack Notation Mildly Inaccurate
The stack diagram
( k*x n -- k*x | i*x n )
is justified only if neither type of( i*x n )
and( k*x )
is a subtype of the other.
Actually, this always holds.
( k*x n -- k*x | i*x n )
⟺ ( k*x n -- k*x | k*x n -- i*x n )
To prove that neither type from the union is a subtype of the other, it is enough to give two examples, one of which is a member of only ( k*x n -- k*x )
(in the union), and the other is a member of only ( k*x n -- i*x n )
(in the union).
Here are such examples:
- the mapping
( 0 ↦ )
is a member of only( k*x n -- k*x )
(in the union) - the mapping
( 1 ↦ 1 )
is a member of only( k*x n -- i*x n )
(in the union)
There are also members that belongs to both. For example, the following mappings:
( 0 0 ↦ 0 )
( 1 1 ↦ 1 )
( 123 0 0 ↦ 123 0 )
@AntonErtl writes:
Ruv's colorforth-bw implementation demonstrates the shortcomings of the present proposal, because it does not use recognizers nor translators at all for implementing
recognize-colorforth-bw
; instead, it reimplements everything that the name recognizer and the number recognizer already do internally,
It's wrong. Have a look at L18-L19:
\ Reuse a recognizer for numbers
['] recognize-number-n-prefixed apply-recognizer-cf dup 0= if exit then
It uses the recognizer for numbers. And it uses find-name
instead of the recognizer for names (Forth words) just because it's simpler in this case. It does not reuse token translators.
And it only implements dealing with names and single-cell numbers.
Because your original example implemented only that. And I just rewrote your original example.
Finally, the implementation is so long (44 lines without putting it into forth-recognize) that you have not shown it inline, but posted a link to github.
Why count 10 lines of comments at the beginning of the file? Without comments, 31 lines, the same as in your example (lexical size is greater due to nt vs xt, and improvements in the behavior).
By contrast, let's take much of the proposal from [r1081], but replace the state-dependent translators with the state-independent rectypes of [160]. With such a proposal, colorforth-bw might look as follows (untested):
[...]
This has only 20 lines (vs. 44), and it uses all the recognizers originally present in forth-recognizer (name, integers (including doubles), FP, etc.). This demonstrates the superior expressive power of the rectypes from [160] over the translators from [r1081].
(I corrected the [r1081] link in the citation above)
This comparison is incorrect. Below is an implementation against the latest API version (except compile-postpone-qtoken
that is a variation of discussed postpone-qtoken
, which should be either present or implementable in any variant of API):
: cf-prefix>tt? ( c -- tt true | c false )
case
'[' of ['] execute-interpreting endof
'_' of ['] execute-compiling endof
']' of ['] compile-postpone-qtoken endof
0 exit
endcase true
;
defer recognize-default perceptor is recognize-default
: recognize-colorforth-bw ( sd.lexeme -- qt|0 )
dup 0= if nip exit then
over c@ cf-prefix>tt? 0= if drop 2drop 0 exit then
>r 1 /string recognize-default dup if r> exit then rdrop
;
16 lines.
Can be tested in Gforth too:
gforth index.fth example/recognize-colorforth-bw.fth
:noname cf( _1. _drop _s" foo" ) ; execute s" foo" compare 0= .s \ prints "1 -1"
requestClarification - Return Stack Notation Mildly Inaccurate
It is obvious that the idea behind the stack diagram of throw
is to specify what happens on the data stack in both cases, including the case where the control flow does not continue sequentially. And I think it's a good idea to specify the stack effect for that case, and it should also be done for ?do
.
Whether the |
syntax as used for throw
is good enough or whether we should have separate stack diagrams for the two cases is something one might discuss. However, I have not seen confused questions about what the stack effect of throw
means, and I would not expect confusion from a similar usage of |
for the stack effect of ?do
. In both cases the prose makes it clear enough which part of the stack effect diagram corresponds to which case. Actually for the ?do
case it has taken 3 decades until someone asked about the lack of a stack-effect diagram for the case when index=limit.
The latest proposal is [r1081] and it does not contain execute-interpreting
, execute-compiling
, compile-postpone-qtoken
, or perceptor
. And that's what we were tasked with discussing and giving feedback on. And that's what I did.
The latest proposal is [r1081] and it does not contain
execute-interpreting
,execute-compiling
,compile-postpone-qtoken
, orperceptor
. And that's what we were tasked with discussing and giving feedback on. And that's what I did.
I see, thank you. Actually, [r1081] is outdated, a new version will be prepared soon and then it should be discussed (was noted in the recognizer chat). Nevertheless, my example implementation for recognize-colorforth-bw
above is compatible with [r1081] with the following exceptions: it relies on 0
instead of NOTFOUND
(you should note how it makes things simpler), and it uses the method compile-postpone-qtoken
that appends the compilation semantics of a qualified token to the current definition (this method is missing in [r1081]). The word perceptor
is simply a better name than forth-recognizer
in [r1081] (I just posted in ForthHub/fep-recognizer a rationale from the chat).
The words execute-interpreting
and execute-compiling
are general words that are needed anyway to perform interpretation or compilation semantics regardless the initial STATE
, they are implemented in the standard Forth as:
: compilation ( comp: true ; S: -- true ; | comp: false ; S: -- false ; ) state @ 0<> ;
: enter-compilation ( comp: false -- true ; S: -- ; | comp: true ; S: -- ; ) ] ;
: leave-compilation ( comp: true -- false ; S: -- ; | comp: false ; S: -- ; ) postpone [ ;
: execute-interpreting ( i*x xt -- j*x )
compilation 0= if execute exit then
leave-compilation execute enter-compilation
;
: execute-compiling ( i*x xt -- j*x )
compilation if execute exit then
enter-compilation execute leave-compilation
;
@AntonErtl writes:
If you eliminate the state-dependence of translators, then text interpreters that use more than just the xt-int action (e.g., the one for colorforh-bw, see below) can be written without having to deal with state.
Token translators cannot be written without having to deal with state (possibly indirectly), by the term definition. A token translator shall perform different actions depending on the state, and it does not matter how the state is passed to the translator: though the data stack, through a separate stack intended for this purpose, or though an internal variable. The state does not matter in only one case: if the translator shall perform the same action regardless of the state.
Moreover, if you pass a parameter that encodes compilation state or interpretation state not through STATE
, you have to make STATE
to be in sync with this parameter to guarantee that STATE-dependent words are translated correctly.
requestClarification - The case of undefined interpretation semantics
The initial problem in [defined]
was fixed by the proposal Remove the “rules of FIND”. Close.
5.6.2.2534 [UNDEFINED]
should also be updated according to the new wording in [DEFINED]
.
Minimalistic Recognizer API
Author:
Bernd Paysan
Change Log:
- 2020-09-06 initial version
- 2020-09-08 taking ruv's approach and vocabulary at translators
- 2020-09-08 replace the remaining rectypes with translators
- 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
- 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
- 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
- 2022-09-10 More complete reference implementation
- 2022-09-10 Add use of extended words in reference implementation
- 2022-09-10 Typo fixed
- 2022-09-12 Fix for search order reference implementation
- 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
- 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
- 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
- 2023-09-13 Make clear that
TRANSLATE:
is the only way to define a standard-conforming translator. - 2023-09-15 Add list of example recognizers and their names.
- 2024-12-15 Take comments after freezing the proposal into account
Problem
The Forth compiler can be extended easily. The Forth interpreter however has a fixed set of capabilities as outlined in section 3.4 of the standard text: Words from the dictionary and some number formats.
It's not possible to use the Forth text interpreter in an application or system extension context. Most interpreters in existing systems use a number of hooks to extent the interpreter. That makes it possible to use a loadable library to implement new data types to be handled like the built-in ones. An example are the floating point numbers. They have their own parsing and data handling words including a stack of their own.
Furthermore applications need to use system provided and system specific
words or have to re-invent the wheel to get numbers with a sign or
hex numbers with the $ prefix. The building blocks (FIND
, COMPILE,
,
>NUMBER
etc) are available but there is a gap between them and what
the Forth interpreter already does.
The Forth interpreter is stateful, but the API should avoid the problems of
the STATE
variable. In particular, an implementation without STATE
should
be possible, and there is only one place where the stateful dispatch is
necessary.
Solution
The monolithic design of the Forth interpreter is factored into three major blocks:
The interpreter. It extracts sub-strings (lexemes) from
SOURCE
, hands them over to the data parsing and processes the results.The actual data parsing. It analyses lexemes whether they match the criteria for a certain token type. These words, called recognizers, can be grouped to achieve an order of invocation.
The result of the recognizer, a translator and associated data, is handed over to the interpreter.
There is no strict 1:1 relation between a recognizer and the returned translator. A translator for e.g. single cell numbers can be used by different recognizers, a recognizer can return different translators (e.g. single and double cell numbers).
Whenever the Forth text interpreter is mentioned, the standard
words EVALUATE
(CORE), '
(tick, CORE), INCLUDE-FILE
(FILE), INCLUDED
(FILE), LOAD
(BLOCK) and THRU
(BLOCK)
are expected to act likewise. This proposal is not about to change
these words, but to provide the tools to do so. As long as the
standard feature set is used, a complete replacement with
recognizers is possible.
Important changes to the Matthias Trute proposal:
- Make the translators executable to dispatch according to the state (interpreting, compiling, postponing) themselves
- Use dedicated invocation methods to call a translator for a particular state
- Make the recognizer sequence executable with the same effect as a recognizer
- Make sure the API is not mandating any particular implementation
The core principle is that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like
: recognize-xt ( addr u -- translator-stub | 0 )
here place here find dup IF
0< state @ and IF compile, ELSE execute THEN ['] noop
THEN ;
then you should factor the part starting with STATE @
out and return it as
translator:
: translate-xt ( xt flag -- )
0< state @ and IF compile, ELSE execute THEN ;
: recognize-xt ( addr u -- ... translator | 0 )
here place here find dup IF ['] translate-xt THEN ;
In a second step, you need to remove the STATE @
entirely and use
TRANSLATE:
. If you don't know what to do on postpone in this stage,
use -48 throw
, otherwise define a postpone action:
:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF compile, ELSE execute THEN ;
:noname ( xt flag -- ) 0< IF postpone literal postpone compile, ELSE compile, THEN ;
translate: translate-xt
Typical use
The standard interpreter loop should look like this:
: interpret ( i*x -- j*x )
BEGIN parse-name dup WHILE forth-recognize ?found execute REPEAT
2drop ;
with the usual additions to check e.g. for empty stacks and such.
Operating a recognizer in a particular state, e.g. to postpone a single word, do
: postpone ( "name" -- )
parse-name forth-recognize ?found postponing ; immediate
to optain an xt for a name, use something like that:
: ' ( "name" -- xt )
parse-name forth-recognize ?found
['] translate-nt <> #-32 and throw
name>interpret ;
Proposal:
XY. The optional Recognizer Wordset
XY.1 Introduction
Recognizers have the form
REC-
SOMETYPE ( addr len -- i*x j*r translate-xt | 0/NOTFOUND )
A recognizer takes the string addr len of a lexeme and on success returns a translator translate-xt and additional data on the data and floating point stack.
[IF] NOTFOUND=0
If it fails, it returns 0.
[ELSE] NOTFOUND=xt
If it fails, it returns the xt of NOTFOUND
.
For clarity, unless this issue is decided, the non-success return value of a recognizer is notated as 0/NOTFOUND. The reference implementation uses the option 0.
[THEN] notfound
[IF] side-effect
A recognizer shall not have a side effect.
Rationale: Side effects are supposed to all happen inside the translators.
This promise allows to try recognize something and fail if the result is not
desired without having to roll back unkown changes. Examples: The tick and to
recognizer pass a substring of the to be translated string to
FORTH-RECOGNIZE
, and fail if the result is not a name type.
[THEN] side-effect
XY.3 Additional usage requirements
XY.3.1 Translator
translator: named subtype of xt, and executes with the following stack effect:
name ( j*x i*x -- k*x )
A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.
i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the recognized lexeme.
XY.6 Glossary
XY.6.1 Recognizer Words
FORTH-RECOGNIZE ( addr len -- i*x translator-xt | 0/NOTFOUND ) RECOGNIZER
Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or 0/NOTFOUND if not.
[IF] defer
FORTH-RECOGNIZE
is a deferred word. Changing the system recognizer can be
done with IS FORTH-RECOGNIZE
, obtaining the system recognizer with
ACTION-OF FORTH-RECOGNIZE
.
Rationale: use existing API to change it; most simple system have this available, and advanced systems have capabilities to work around limitations.
[ELSE] setter and getter
SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT
Assign the recognizer xt to FORTH-RECOGNIZE.
FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT
Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE
.
Rationale: not sufficiently advanced systems can work around the limitations
of IS
and ACTION-OF
better with this API.
[THEN]
TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER
Create a translator word under the name "name". This word is the only standard way to define a general purpose translator.
"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current state.
Rationale: The by far most common usage of translators is inside the outer
interpreter, and this default mode of operation is called by EXECUTE
to keep
the API small. You can not simply set STATE
, use EXECUTE
and afterwards
restore STATE
to perform interpretation or compilation semantics, because
words can change STATE
, so you need the words INTERPRETING
and COMPILING
defined below. This problem does not apply to POSTPONING
, so systems that
only want to implement direct access to POSTPONE
mode can get away without
TRANSLATE:
.
[IF] NOTFOUND=0
?FOUND ( translator-xt -- translator-xt | 0 -- never ) RECOGNIZER
Check if the recognizer was successful, and if not, perform a -13 THROW
or
display an appropriate error message if the exception wordset is not present.
[THEN] NOTFOUND=0
XY.6.2 Recognizer Extension Words
[IF] NOTFOUND=0
?NOTFOUND ( translator-xt -- translator-xt | 0 -- addr u notfound-xt )
Check if the recognizer was successful. If not, replace the 0 result with the
addr u of the last scanned lexeme, and put the xt of the NOTFOUND
translator on top of the stack.
NOTFOUND ( -- never ) RECOGNIZER
Translator for unsuccessful recognizers: perform a -13 THROW
.
[THEN] NOTFOUND=0
POSTPONE ( "<spaces>lexeme" -- ) RECOGNIZER
Compilation: recognize lexeme. On success, perform the postpone action of
the returned translator, otherwise -13 THROW
or display the appropriate
error message if the exception wordset is not present.
RECOGNIZER-SEQUENCE: ( xt1 .. xtn n "name" -- ) RECOGNIZER EXT
Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with xtn on stack and proceeding towards xt1 until successful.
SET-RECOGNIZER-SEQUENCE ( xt1 .. xtn n xt-seq -- ) RECOGNIZER EXT
Set the recognizer sequence of xt-seq to xt1 .. xtn.
GET-RECOGNIZER-SEQUENCE ( xt-seq -- xt1 .. xtn n ) RECOGNIZER EXT
Obtain the recognizer sequence from xt-seq as xt1 .. xtn n.
TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT
Translates a name token:
Interpretation: perform the interpretation semantics of the word
Compilation: perform the compilation semantics of the word
Postpone: append the compilation semantics above to the current definition
REC-NT ( addr u -- nt translate-nt | 0/NOTFOUND ) RECOGNIZER EXT
Search the dictionary for the string addr u. If successful, return the nt
and the xt of TRANSLATE-NT
. If the search fails, return 0/NOTFOUND.
TRANSLATE-NUM ( x -- x | ) RECOGNIZER EXT
Translates a number:
Interpretation: keep the number on the stack
Compilation: Append the run-time defined in LITERAL
to the current definition
Postpone: Append the compilation semantics above to the current definition
TRANSLATE-DNUM ( x1 x2 -- x1 x2 | ) RECOGNIZER EXT
Translates a double number:
Interpretation: keep the numbers on the stack
Compilation: Append the run-time defined in 2LITERAL
to the current definition
Postpone: Append the compilation semantics above to the current definition
REC-NUM ( addr u -- x translate-num | xd translate-dnum | 0/NOTFOUND ) RECOGNIZER EXT
Convert addr u to a number x and the xt of TRANSLATE-NUM
as specified in
3.4.1.3 or a double number xd and the xt of TRANSLATE-DNUM
as
specified in 8.3.1 if the double number wordset is available. If the
conversion fails, return 0/NOTFOUND.
TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT
Translates a floating point number:
Interpretation: Keep r on the stack
Compilation: Append the run-time defined in FLITERAL
to the current definition
Postpone: Append the compilation semantics above to the current definition
REC-FLOAT ( addr u -- r translate-float | 0/NOTFOUND ) RECOGNIZER EXT
Convdert addr u to a number r specified in 12.3.7 if the float wordset is availabe; if the conversion fails, return 0/NOTFOUND.
SCAN-TRANSLATE-STRING ( addr1 u1 string-rest<"> -- addr2 u2 | ) RECOGNIZER EXT
Complete parsing a string: addr1 u1 consists of the starting quote and
additional characters up to the first space in the string. addr2 u2
consists of the entire string without the starting quote up to (but not
including) the final quote, and translated the escape sequences according to
the rules of S\\"
. >IN
is modified appropriately, and points just after
the final quote. If there's no final quote in the current line, REFILL
can
be used to read in more lines, adding corresponding newlines into the string.
The final quote can be inside addr1 u1, setting >IN
backwards in that
case.
Translate the string:
Interpretation: keep the string on the stack
Compilation: Append the run-time defined in SLITERAL
to the current definition
Postpone: Append the compilation semantics stated above to the current definition
** TRANSLATE-STRING** ( addr1 u1 -- addr1 u1 | ) RECOGNIZER EXT
Translate the string:
Interpretation: keep the string on the stack
Compilation: Append the run-time defined in SLITERAL
to the current definition
Postpone: Append the compilation semantics stated above to the current definition
?SCAN-STRING ( addr1 u1 scan-translate-string string-rest<"> -- addr2 u2 translate-string | ... translator -- ... translator ) RECOGNIZER
If the recognized token is an incompleted string, complete the scanning as
defined for SCAN-TRANSLATE-STRING
and replace the translator with the xt of
TRANSLATE-STRING
.
REC-STRING ( addr u -- addr u translate-string | 0/NOTFOUND ) RECOGNIZER EXT
Check if addr u starts with a quote, and return that string and the xt of
SCAN-TRANSLATE-STRING
if it does, 0/NOTFOUND otherwise.
[IF] Optional API for direct access of translator states
INTERPRETING ( j*x xt -- k*x ) RECOGNIZER EXT
Execute xt-int of the translator xt. If xt is not a translator, do -21 THROW
, or a best-effort attempt to execute xt in interpreting state.
COMPILING ( j*x xt -- ) RECOGNIZER EXT
Execute xt-comp of the translator xt. If xt is not a translator, do -21 THROW
, or a best-effort attempt to execute xt in compiling state.
POSTPONING ( j*x xt -- ) RECOGNIZER EXT
Execute xt-post of the translator xt. If xt is not a translator, do -21 THROW
, or a best-effort attempt to execute xt in postponing state.
GET-STATE ( -- xt ) RECOGNIZER EXT
Obtain the operation xt performed when translating.
SET-STATE ( xt -- ) RECOGNIZER EXT
Makes xt the operation performed when translating. If xt is not related to
' INTERPRETING
, ' COMPILING
, or ' POSTPONING
, do -12 THROW
.
[THEN] optional API for direct access of translator states
]] ( -- ) RECOGNIZER EXT
Interpretation semantics: undefined
Compilation semantics: Set the system into postpone state. The interpreter
will then perform post-xt of all translators found. Compilation state
resumes when [[
is recognized. This word may change STATE
and the
recognizer sequence to reflect the change of this state.
[[ ( -- ) RECOGNIZER EXT
Interpretation semantics: undefined
Compilaton semantics: undefined
Postpone semantics: enter compilation state, see ]
; all changes to STATE
and recognizer sequence done by ]]
are reverted.
Note: [[
needs special treatment in postpone mode, so it might also use a
non-standard translator and be not a word at all.
STATE ( -- addr ) RECOGNIZER
If ]]
uses STATE
to store postpone state, extends the semantics of
6.1.2250 by adding a second non-zero value. ]]
enters this state, and [[
leaves it. Only translators and the code responsible for displaying the
prompt can see this third state, as all other words are postponed in this
state.
Reference implementation:
This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish. It uses NOTFOUND=0.
Defer forth-recognize ( addr u -- i*x translator-xt / 0 )
: ?found ( translator -- translator | 0 -- never )
dup 0= IF -13 throw THEN ;
: interpret ( i*x -- j*x )
BEGIN
parse-name dup WHILE
forth-recognize ?found execute
REPEAT ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , ,
does> state @ 2 + cells + @ execute ;
An alternative implementation for TRANSLATE:
can use a deferred word:
Defer do-translate
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
create , , , does> do-translate ;
: set-state ( xt -- ) dup is do-translate >body @ 2 - state ! ;
: get-state ( -- xt ) action-of do-translate ;
Extensions reference implementation:
: ]] -2 state ! ; immediate
: [[ -1 state ! ; immediate
:noname name>interpret execute ;
:noname name>compile execute ;
:noname dup name>interpret ['] [[ =
IF name>interpret execute \ special case
ELSE name>compile swap lit, compile, THEN ;
translate: translate-nt ( nt -- )
: lit, ( n -- ) postpone literal ;
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )
: rec-nt ( addr u -- nt nt-translator | 0 )
forth-wordlist find-name-in dup IF ['] translate-nt THEN ;
: rec-num ( addr u -- n num-translator | 0 )
0. 2swap >number 0= IF 2drop ['] translate-num ELSE 2drop drop 0 THEN ;
: minimal-recognize ( addr u -- nt nt-translator | n num-translator | 0 )
2>r 2r@ rec-nt dup ['] notfound = IF drop 2r@ rec-num THEN 2rdrop ;
' minimal-recognizer is forth-recognize
: translate-method: ( n -- )
Create , DOES> @ cells + >body @ execute ;
0 translate-method: postponing
1 translate-method: compiling
2 translate-method: interpreting
: set-state ( xt -- )
>body @ 2 - state ! ;
: get-state ( -- xt )
case state @
0 of ['] interpreting endof
-1 of ['] compiling endof
-2 of ['] postponing endof
-11 throw
endcase ;
: postpone ( "name" -- )
parse-name forth-recognize ?found postponing ; immediate
This reference implementation uses a table dispatch only. Note that this can give surprising results when you directly apply a particular state, and one of the words executed (translator or nt/xt found) is a state-smart word. If you want to use combined translators, like
: translate-dnum ( d -- ) >r translate-num r> translate-num ;
you can't do it like this. Neither does this work if you execute state-smart
words, as they expect STATE
to be set accordingly. Instead, you'll use
something like
: translate-method: ( n -- ) Create , DOES> @ dup state @ = IF drop execute EXIT THEN state @ >r state ! execute r> state ! ;
This will definitely work for combined literal translators, because those don't change state anyways.
This will also work for POSTPONE
, because apart from the tranlator, no word
is actually executed in one-shot POSTPONE
, and therefore, no state change is
possible.
This will also work for [
and ]
(and words using them) while interpreting
and compiling, because if you are already in the state from which the state is
changed away, you will not restore the state. If you are in the state this
will change to, this will work, too, because the state is restored after
EXECUTE
. This will not work if you are interpreting, and you do a s" ]]" forth-recognize ?found compiling
, because that transitions to postponing, and
then is reverted to interpreting.
[IF] setter and getter
: set-forth-recognize ( xt -- )
is forth-recognize ;
: forth-recognizer ( -- xt )
action-of forth-recognize ;
[THEN] setter and getter
Stack library
: STACK: ( size "name" -- )
CREATE 0 , CELLS ALLOT ;
: SET-STACK ( item-n .. item-1 n stack-id -- )
2DUP ! CELL+ SWAP CELLS BOUNDS
?DO I ! CELL +LOOP ;
: GET-STACK ( stack-id -- item-n .. item-1 n )
DUP @ >R R@ CELLS + R@ BEGIN
?DUP
WHILE
1- OVER @ ROT CELL - ROT
REPEAT
DROP R> ;
Recognizer sequences
: recognize ( addr len rec-seq-id -- i*x translator-xt | 0 )
DUP >R @
BEGIN
DUP
WHILE
DUP CELLS R@ + @
2OVER 2>R SWAP 1- >R
EXECUTE DUP IF
2R> 2DROP 2R> 2DROP EXIT
THEN
DROP R> 2R> ROT
REPEAT
DROP 2DROP R> DROP 0
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
DOES> recognize ;
: ?defer@ ( xt1 -- xt2 )
BEGIN dup is-defer? WHILE defer@ REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- )
?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n )
?defer@ >body get-stack ;
Once you have recognizer sequences, define
' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize
The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.
: find-name-in ( addr u wid -- nt / 0 )
execute dup IF drop THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
['] search-order set-recognizer-sequence ;
Recognizer examples
Apart from the standardized recognizers above, here are some more examples of recognizers:
REC-TICK ( addr u -- xt translate-num | 0/NOTFOUND ) If addr u starts with a ``` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.
REC-SCOPE ( addr u -- nt translate-nt | 0/NOTFOUND ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:
name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE
is identical in effect to REC-NT
.
REC-TO ( addr u -- xt n translate-to | 0/NOTFOUND ) Handle the following syntax of TO
-like operations of value-like words:
->
name asTO
name=>
name asIS
name+>
name as+TO
name'>
name asADDR
name@>
name asACTION-OF
name
xt is the execution token of the value found, n indexes which variant of a TO
-like operation is meant, and translate-to is the corresponding translator.
REC-ENV ( addr u -- addr1 u1 translate-env | 0/NOTFOUND ) Takes a pattern in the form of ${
name}
and provides the name as addr1 u1 on the stack. The corresponding translator TRANSLATE-ENV
is responsible for looking up that name in the operating system's environment variable array, or compiling appropriate code to do so.
REC-COMPLEX ( addr u -- rr ri translate-complex | 0/NOTFOUND ) Converts a pair of floating point numbers in the form of float1+
float2i
into a complex number on the stack, and returns the xt of TRANSLATE-COMPLEX
on success.
Testing
T{ 0 recognizer-sequence: RS -> }T
T{ :noname 1 ; :noname 2 ; :noname 3 ; translate: translate-1 -> }T
T{ :noname 10 ; :noname 20 ; :noname 30 ; translate: translate-2 -> }T
\ really stupid: 1 character length or 2 characters
T{ : rec-1 NIP 1 = IF ['] translate-1 ELSE 0 THEN ; -> }T
T{ : rec-2 NIP 2 = IF ['] translate-2 ELSE 0 THEN ; -> }T
T{ ' translate-1 interpreting -> 1 }T
T{ ' translate-1 compiling -> 2 }T
T{ ' translate-1 postponing -> 3 }T
\ set and get methods
T{ 0 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> 0 }T
T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 1 }T
T{ ' rec-1 ' rec-2 2 ' RS set-recognizer-sequence -> }T
T{ ' RS get-recognizer-sequence -> ' rec-1 ' rec-2 2 }T
\ testing RECOGNIZE
T{ 0 ' RS set-recognizer-sequence -> }T
T{ S" 1" RS -> 0 }T
T{ ' rec-1 1 ' RS set-recognizer-sequence -> }T
T{ S" 1" RS -> ' translate-1 }T
T{ S" 10" RS -> 0 }T
T{ ' rec-2 ' rec-1 2 ' RS set-recognizer-sequence -> }T
T{ S" 10" RS -> ' translate-2 }T