Digest #232 2023-09-16

Contributions

[308] 2023-09-15 08:23:13 PhillipEaton wrote:

proposal - Test

Hint: Please delete the blockquote explanations, they are just for your convenience while writing the proposal

Author:

The name of the author(s) of the proposal.

Change Log:

A list of changes to the last published edition on the proposal.

Problem:

This states what problem the proposal addresses.

Solution:

A short informal description of the proposed solution to the problem identified by the proposal.

This gives the rationale for specific decisions you have taken in the proposal (often in response to comments), or discusses specific issues that have not been decided yet.

Typical use: (Optional)

Shows a typical use of the word or feature proposed; this should make the formal wording easier to understand.

Proposal:

This should enumerate the changes to the document.

For the wording of word definitions, use existing word definitions as a template. Where possible, include the rationale for the definition.

Reference implementation:

This makes it easier for system implementors to adopt the proposal. Where possible, the reference implementation should be provided in standard Forth. Where this is not possible because system specific knowledge is required or non-standard words are used, this should be documented.

Testing: (Optional)

This should test the words or features introduced by the proposal, in particular, it should test boundary conditions. Test cases should work with the test harness in Appendix F.

Replies

[r1044] 2023-09-14 05:19:08 AntonErtl replies:

proposal - Update rationale for SLITERAL

As you note, this state-smart approximation to the implementation of S" does not quite provide implement what the standard requires, so instead better do something like:

Typical use:

: quine [ source ] sliteral type ;

Note that you can use 2literal instead of sliteral if the string has permanent lifetime.


[r1045] 2023-09-14 05:35:33 AntonErtl replies:

proposal - Fix stack comments for N>R and NR>

Actually, reading through the discussion of the request, Jim Peterson wants to implement N>R as copying to a separate buffer, and leaving the address of the return stack. This kind of implementation is suggested by the rationale of N>R.


[r1046] 2023-09-14 06:27:22 AntonErtl replies:

proposal - minimalistic core API for recognizers

By comparison, with the first version of this proposal postpone can be implemented like this:

: postpone parse-name forth-recognize -2 swap execute ; immediate

which would not contain non-standard usage like -2 state !, and it would also work in interpret state (not the most important feature, but a feature nonetheless). And ]] could also be implemented as a standard program.

I don't want to restrict the usage of rectypes/translators to state-dependent outer interpreters. Other uses may be rare, but they exist, and people may come up with more over time if we make the interface flexible enough. The proposal does not propose to standardize state-independent ways to get at the functionality. Therefore, if the proposal is accepted, they don't exist for standard programs, and therefore they are not counterarguments against the disadvantages of the proposed state-dependent-only translators. The fact that this state-dependence means that you cannot use rectypes/translators to build other rectypes/translators is another (minor) argument against the state-dependence.

Concerning having a state-independent rectype as an abstract data type, the first version of this proposal proposed that rectype is an executable word with stack effect ( i*x state -- j*x ) where state would be 0, -1, or -2. This does not expose anything about the internals, and even allows to define rectypes without using a special defining word. The invocation in the text interpreter is ( i*x rectype ) state @ swap execute, and in postpone it´s as shown above.

Alternatively, if the rectype is the address of some data structure, yes, we would need an additional word, maybe rectype-translate ( i*x rectype n -- j*x ) that performs the access to the data structure. The usage in the text interpreter would be ( i*x rectype ) state @ rectype-translate and the usage in postpone would be ( i*x rectype ) -2 rectype-translate.


[r1047] 2023-09-14 07:00:33 ruv replies:

proposal - minimalistic core API for recognizers

Bernd writes:

The most obvious difference is that with translator-execute, you need another word.

Yes, essentially I agree concerning translator-execute and execute alternatives.

Yet another difference is that with translator-execute the Forth text interpreter (the outer loop) should know this additional word (probably it means more degree of coupling). But with execute — it should not know any additional word.

to make word list ids executable [...] but it is clear that they can't be normal colon definitions

Another example is defer-words (words created by defer), which are executable but are not normal colon definitions — defer! and defer@ can be applied to their xt.

The following implementation should be standard, too

The provided implementation for ]] is system dependent, namely it depends on implementation of Recognizers API. But, anyway, Gforth's ]] can be implemented in a standard way via postpone.


[r1048] 2023-09-14 07:42:43 BerndPaysan replies:

proposal - minimalistic core API for recognizers

A translator is the address of a data structure, which also happens to be executable. This is not a contradiction! And there was a proposed standard way to access fields directly, renamed from the Trute proposal (but with otherwise identical, value-field like semantics) to INTERPRET-TRANSLATOR, COMPILE-TRANSLATOR, and POSTPONE-TRANSLATOR. The reason I deleted these is that we don't even use them in Gforth, we only use >POSTPONE, which has a different effect (it does not read out the xts, it executes it right away). If there is consensus that this is the right interface (not a value-field, but a defer-field), I can add this back to the proposal; as well as adding a standard way to set the state without knowing the internals of the system, for which the file recognizer-ext.fs in Gforth also provides a suggestion:

: translate-state ( translator-access-xt -- )
    \ takes a translator access xt, and may check if that actually is one
    >body @ cell/ negate state ! ;

The hypothetical more performant implementation in Reply 1043 would have a different translate-state, which would contain something like

>body @ ['] do-translate >body cell+ !

and only change STATE for interpret/compile.

This proposal is minimalistic on purpose and does not cover all corner cases, especially not those where no consensus has been reached yet.

I consider the magic number dispatch method proposed earlier as not appropriate: this is tied to a specific implementation, and not a good interface. Method invocation or field access should be done by named access words, not by numbers.


[r1049] 2023-09-14 08:00:39 ruv replies:

proposal - minimalistic core API for recognizers

Anton writes:

postpone can be implemented like this

postpone can be implemented in any variant of the Recognizer API, with more or less code.

A difference is whether the behavior of postpone can be extended/changed without redefinition of postpone.

My point: if users need to extended behavior of postpone without redefinition, then a special method can be specified for that. OTOH, postpone (and ]]) is a poor man's "postponing mode". An example of a more convenient tool is my c-state PoC, which provides a better tool for users, and it even supports any new user-defined special words.

I don't want to restrict the usage of rectypes/translators to state-dependent outer interpreters.

It's not an argument, since the API can provide words like compile-token, execute-token, postpone-token, having ( i*x xt.translator -- j*x ) or ``( ix rectype -- jx )`, which are state-independent and don't restrict usage in the mentioned way.


[r1050] 2023-09-14 08:26:40 ruv replies:

proposal - DEFER this not :-)

@enoch wrote:

Readability of the source code is my main concern.

I rarely need a forward definition due to mutual recursion. And I also value readability.

My latest variant is to use recognizers to refer to a forward (not yet defined) definition, for example use a prefix fw::

: foo ... fw:bar ... ;
: bar ... foo ... ;

In this solution we don't need any separate definition.


[r1051] 2023-09-14 08:26:56 GeraldWodni replies:

proposal - DEFER this not :-)

The committee thinks this idea looks promising and that it pops up repeatedly in discussions. We encourage the author or anybody else interested to track past practice like f: and r: or forward \<foo\> and : \<foo\> and create a full proposal.


[r1052] 2023-09-14 08:36:39 GeraldWodni replies:

proposal - Directory experiemental proposal

The committee asks the author to please work the comments into your proposal and update it. Also please provide a full reference implementation.


[r1053] 2023-09-14 09:36:46 ruv replies:

proposal - Tighten the specification of SYNONYM (version 1)

Anton wrote

Letting all synonyms have the same xt is certainly a good solution. Should we require it? I don't know a reason for or against (in particular, I don't think that there is a system that has SYNONYM where synonyms of the same word have different xts), so in the spirit of tightening, we probably should require it.

This requirement is too restrictive for systems, without a profit for programs.

In some systems xt is equivalent to nt for the same word. And the standard allows that. So, in such a system a synonym with different name cannot have the same xt as the original word.

Also, without this requirement a system may have different xt for synonyms, which allows it to show a correct name in a decompilation tool.


[r1054] 2023-09-14 10:02:18 ruv replies:

proposal - Case sensitivity

I wrote above:

To support this feature, it is enough to just create the lower-case synonyms for the standard words.

I was wrong. Actually, it depends on the system. It's probably enough only if a synonym will have the same xt as the original word. Otherwise synonym cannot help.


In general, this proposal is a halfway to encourage systems to support lower-case spelling as the the modern programming languages do.


[r1055] 2023-09-14 10:04:02 GeraldWodni replies:

proposal - Let us adopt the Gerry Jackson test suite as part of Forth 200x

The committee thinks, that the test-harness should be proposed as standard words and be part of an existing wordset (e.g. tools) or a new one (e.g. testing).

Furthermore the testcases shall be easily usable by systems and appear under the words they apply to. Existing testcases in the standard also will be moved into the new test-system.


[r1056] 2023-09-14 10:07:13 ruv replies:

proposal - Revise Rationale of Buffer:

Just for reference, see also the discussion concerning addresses between different runs from the same saved image. Probably, this issue should be discussed further and reflected in the rationale too.


[r1057] 2023-09-14 12:36:49 GeraldWodni replies:

proposal - Case sensitivity

The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.


[r1058] 2023-09-14 12:39:05 BerndPaysan replies:

proposal - F>R and FR> to support dynamically-scoped floating point variables

The brace notation (really {: and :}) replaced words like (LOCAL), and there, you can define floating point locals with a F: word before them, i.e. {: F: r1 F: r2 :} creates two floating point locals called r1 and r2.

Gforth, bigForth and VFX supports this notation, SwiftForth doesn't.


[r1059] 2023-09-14 13:02:16 GeraldWodni replies:

proposal - Case insensitivity

The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.


[r1060] 2023-09-14 13:03:55 ruv replies:

proposal - Clarify FIND, more classic approach

All classic single-xt systems conform to this specification. And its easy to implement the specified find in dual-xt systems.


[r1061] 2023-09-14 13:21:08 ruv replies:

proposal - NAME>INTERPRET wording

This proposal is not enough formal, it should be made more formal.


[r1062] 2023-09-14 14:05:21 ruv replies:

proposal - NAME>INTERPRET wording

Author

Ruv

Change Log

2020-02-20 Initial comment for NAME>INTERPRET 2023-09-14 Make this proposal more formal

Problem

Currently the specification for name>interpret says that returned "xt represents the interpretation semantics of the word nt".

But actually, in some cases a Forth system cannot provide an xt that performs the defined interpretation semantics for the corresponding word regardless of the STATE.

Particularly, when the words like s" or to are implemented as STATE-dependent immediate words. Technically it is possible to return a correct xt according to the current specification (e.g. via generation of the corresponding definition on the fly), but it can be too burden.

Another minor problem is that it's not clear what the word represent means. According to the language of the standard, xt identifies some semantics.

Solution

The specification for name>interpret can be adjusted to solve the mentioned problem. There are two options:

  1. Allow to return 0 if the system cannot return xt that identifies the interpretation semantics for the word identified by nt

  2. Allow to return state-dependent xt, which performs interpretation semantics in interpretation state only.

Proposal

Replace the following phrase in the section 15.6.2.1909.20 NAME>INTERPRET:

xt represents the interpretation semantics of the word nt. If nt has no interpretation semantics, NAME>INTERPRET returns 0.

by the following phrase:

xt identifies the execution semantics for the word identified by nt. When this xt is executed in interpretation state, the interpretation semantics for the word is performed. If the system does not provide execution semantics for the word, NAME>INTERPRET returns 0.


[r1063] 2023-09-14 14:11:05 StephenPelc replies:

proposal - 2020 Forth Standards meeting agenda


[r1064] 2023-09-14 14:28:07 ruv replies:

proposal - Tick and undefined execution semantics

A new idea to discuss

Initially I suggested to declare ambiguity when Tick is applied to any word for which are not defined both execution semantics and interpretation semantics.

But instead we can precisely specify this cases to reduce ambiguity on some degree.

We can say that if execution semantics for the word are not specified by the standard, the returned xt identifies some system-dependent execution semantics, and when this xt is performed in interpretation state, the interpretation semantics for the word are performed. Performing this xt in compilation state is ambiguous.


[r1065] 2023-09-14 14:44:03 ruv replies:

proposal - Tick and undefined execution semantics

This idea is close to the proposal [212] Tick and undefined execution semantics - 2

It says: "If name has no execution semantics, the behavior of xt is implementation dependent and may lead to an ambiguous condition"

The difference is that I suggest to specify the behavior in interpretation state.

If the standard does not specify interpretation semantics for the word, then system-defined interpretation semantics are performed.


[r1066] 2023-09-14 14:52:11 StephenPelc replies:

proposal - 2021 Standards meeting agenda


[r1067] 2023-09-14 14:54:03 ruv replies:

proposal - Tick and undefined execution semantics - 2

Concerning this proposal in general — see also my comment about reducing ambiguity on some degree.


[r1068] 2023-09-14 14:55:13 UlrichHoffmann replies:

proposal - EMIT and non-ASCII values

The committee decided to put this proposal in formal state. The author decides when to put it into community vote.


[r1069] 2023-09-14 14:59:17 LeonWagner replies:

proposal - 2023 Standards meeting agenda (2023-09-13 to 2023-09-15)

Archived for posterity


[r1070] 2023-09-14 15:00:39 LeonWagner replies:

proposal - Agenda Forth-200x interim Meeting 2023-02-17T15:00Z

Archived for posterity


[r1071] 2023-09-14 15:00:57 LeonWagner replies:

proposal - 2022 Standards meeting agenda

Archived for posterity


[r1072] 2023-09-14 15:01:10 LeonWagner replies:

proposal - Agenda Forth-200x interim Meeting 2020-02-18T14:00Z

Archived for posterity


[r1073] 2023-09-14 15:01:45 ruv replies:

proposal - Better wording for "data field" term

In the version of 2022-09-16 we retain referencing of CREATE. But it adds some complexity for SYNONYM (proposal), since a synonym for CREATE is another word.


[r1074] 2023-09-14 15:06:56 UlrichHoffmann replies:

proposal - Better wording for "data field" term

The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.


[r1075] 2023-09-14 15:17:27 UlrichHoffmann replies:

proposal - Revert rewording the term "execution token"

The committee considers this proposal formal.


[r1076] 2023-09-14 15:18:24 UlrichHoffmann replies:

proposal - Clarification for execution token

The committee considers this proposal formal.


[r1077] 2023-09-14 15:33:36 GeraldWodni replies:

proposal - Obsolescence for SAVE-INPUT and RESTORE-INPUT

The committee considers this proposal formal and asks the author to change its status to "CfV - Call for Votes" whenever he deems it ready.

Note: The committee likes to point out, that these words cannot be made informal, as they are used to implement interpreted loops.


[r1078] 2023-09-14 15:50:33 ruv replies:

proposal - Include a revised 79-STANDARD Specification for "><" To "Core Ext"

In networking code you need to switch endianness for an integer number of particular width (in bits). But a cell size can vary.

Probably, we need such a word for each width from 16, 32, 64 bits.


[r1079] 2023-09-14 16:42:37 AntonErtl replies:

proposal - Fix stack comments for N>R and NR>

Author:

Anton Ertl Leon Wagner

Change Log

2023-09-14 Revision after discussion (AE) 2023-09-13 Initial proposal

Problem:

The stack comments for N>R and NR> don't make it clear that n items are moved between the data and return stacks.

Solution:

The stack comments should more clearly indicate that n data stack items are moved to or from the return stack.

Proposal:

In the definition of N>R, replace

( i * n +n -- ) ( R: -- j * x +n )

with

( x_n ... x_1 n -- ) ( R: -- j * x +n )

In the definition of NR>, replace

( -- i * x +n ) ( R: j * x +n -- )

with

( -- x_n ... x_1 +n ) ( R: j * x +n -- )

Discussion

On the return stack, j*x +n because the data may be in a separate buffer and only the address and +n on the return stack. +n on the return stack because the original specified that, and changing that would be a substantial change.

On the data stack x_n ... x_1 +n because that is the way we usually specify a numbered number of cells (even for +n=0). See, e.g., get-order.


[r1080] 2023-09-15 07:51:05 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
  • 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
  • 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
  • 2023-09-13 Make clear that TRANSLATE: is the only way to define a standard-conforming translator.
  • 2023-09-15 Add list of example recognizers and their names.

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use -48 throw, otherwise define a postpone action:

:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF  compile,  ELSE  execute  THEN ;
:noname ( xt flag -- ) 0< IF  postpone literal postpone compile,  ELSE  compile,  THEN ;
translate: translate-xt

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a translator.

"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by EXECUTE to keep the API small. There may be other, non-standard modes of operation, where the individual component xts are accessed STATE-independently, which only works on translators created by TRANSLATE: (e.g. for implementing POSTPONE), so any other way to define a translator is non-standard.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( n*xt n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with the topmost xt on stack and proceeding towards the bottommost xt until successful.

SET-RECOGNIZER-SEQUENCE ( n*xt n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- n*xt n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as n*xt n.

TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT

Translates a name token.

TRANSLATE-NUM ( n -- n | ) RECOGNIZER EXT

Translates a number.

TRANSLATE-DNUM ( d -- d | ) RECOGNIZER EXT

Translates a double number.

TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT

Translates a floating point number.

TRANSLATE-STRING ( addr u -- addr u | ) RECOGNIZER EXT

Translates a string.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap lit, compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: forth-recognizer ( -- xt )
  action-of forth-recognize ;

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Recognizer examples

REC-NT ( addr u -- nt translate-nt | notfound ) Search the locals wordlist if locals have been defined, and then the search order for a definition matching the string addr u, and provide that name token as result.

REC-NUM ( addr u -- n translate-num | d translate-dnum | notfound ) Try converting addr u into a number, and on success return either a single number n and translate-num, or a double number d and translate-dnum.

REC-FLOAT ( addr u -- r translate-float | notfound ) Try converting addr u into a floating point number, and on success return that number r and translate-float.

REC-STRING ( addr u "string"<"> -- addrs us translate-string | notfound "string"<"> ) Convert quoted strings (i.e. addr u starts with '"') in the input stream into string literals, performing the same escape handling as S\" and on success return the converted string as addrs us and translate-string.

REC-TICK ( addr u -- xt translate-num | notfound ) If addr u starts with a ````` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.

REC-SCOPE ( addr u -- nt translate-nt | notfound ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE is identical in effect to REC-NT.

REC-TO ( addr u -- xt n translate-to | notfound ) Handle the following syntax of TO-like operations of value-like words: * ->value as TO value or IS value * +>value as +TO value * '>value as ADDR value * @>value as ACTION-OF value xt is the execution token of the value found, n indexes which variant of a TO-like operation is meant, and translate-to is the corresponding translator.

REC-ENV ( addr u -- addrs us translate-env | notfound ) Takes a pattern in the form of ${name} and provides the name as addrs us on the stack. The corresponding translator translate-env is responsible for looking up that name in the operating system's environment variable array.

REC-COMPLEX ( addr u -- rr ri translate-complex | notfound ) Converts a pair of floating point numbers in the form of float1+float2i into a complex number on the stack, and returns translate-complex on success.

Testing

TBD


[r1081] 2023-09-15 08:18:00 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Author:

Bernd Paysan

Change Log:

  • 2020-09-06 initial version
  • 2020-09-08 taking ruv's approach and vocabulary at translators
  • 2020-09-08 replace the remaining rectypes with translators
  • 2022-09-08 add the requested extensions, integrate results of bikeshedding discussion
  • 2022-09-08 adjust reference implementation to results of last bikeshedding discussion
  • 2022-09-09 Take comments from ruv into account, remove specifying STATE involvement
  • 2022-09-10 More complete reference implementation
  • 2022-09-10 Add use of extended words in reference implementation
  • 2022-09-10 Typo fixed
  • 2022-09-12 Fix for search order reference implementation
  • 2022-09-15 Revert to Trute's table approach to call specific modes deliberately
  • 2023-08-08 Remove names for table access words; there's no usage outside POSTPONE seen; POSTPONE can do that without a standardized way.
  • 2023-09-11 Remove the role of system components for TRANSLATE-NT and TRANSLATE-NUM
  • 2023-09-13 Make clear that TRANSLATE: is the only way to define a standard-conforming translator.
  • 2023-09-15 Add list of example recognizers and their names.

Problem:

The current recognizer proposal has received a number of critics. One is that its API is too big. So this proposal tries to create a very minimalistic API for a core recognizer, and allows to implement more fancy stuff as extensions. The problem this proposal tries to solve is the same as with the original recognizer proposal, this proposal is therefore not a full proposal, but sketches down some changes to the original proposal.

Solution:

Define the essentials of the recognizer in a RECOGNIZER word set, and allow building upon that. Common extensions go to the RECOGNIZER EXT wordset.

Important changes to the original proposal:

  • Make the recognizer types executable to dispatch the methods (interpret, compile, postpone) themselves
  • Make the recognizer sequence executable with the same effect as a recognizer
  • Make sure the API is not mandating a special implementation

This replaces one poor man's method dispatch with another poor man's method dispatch, which is maybe less daunting and more flexible.

The core principle is still that the recognizer is not aware of state, and the returned translator is. If you have for some reason legacy code that looks like

: rec-xt ( addr u -- translator )
  here place  here find dup IF
      0< state @ and  IF  compile,  ELSE  execute  THEN  ['] drop
  ELSE  drop ['] notfound  THEN ;

then you should factor the part starting with state @ out and return it as translator:

: translate-xt ( xt flag -- )
  0< state @ and  IF  compile,  ELSE  execute  THEN ;
: rec-xt ( addr u -- ... translator )
  here place  here find dup IF  [']  translate-xt
  ELSE  drop ['] notfound  THEN ;

In a second step, you need to remove the STATE @ entirely and use TRANSLATE:, because otherwise POSTPONE won't work. If you are unclear about what to do on postpone in this stage, use -48 throw, otherwise define a postpone action:

:noname ( xt flag -- ) drop execute ;
:noname ( xt flag -- ) 0< IF  compile,  ELSE  execute  THEN ;
:noname ( xt flag -- ) 0< IF  postpone literal postpone compile,  ELSE  compile,  THEN ;
translate: translate-xt

The standard interpreter loop should look like this:

: interpret ( i*x -- j*x )
  BEGIN  parse-name dup  WHILE  forth-recognize execute  REPEAT
  2drop ;

with the usual additions to check e.g. for empty stacks and such.

Typical use

TBD

Proposal:

XY. The optional Recognizer Wordset

A recognizer takes the string of a lexeme and returns a translator xt and additional data on the stack (no additional data for NOTFOUND):

REC-SOMETYPE ( addr len -- i*x translate-xt | NOTFOUND )

XY.3 Additional usage requirements

XY.3.1 Translator

translator: subtype of xt, and executes with the following stack effect:

TRANSLATE-THING ( j*x i*x -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

i*x is the additional information provided by the recognizer, j*x and k*x are the stack inputs and outputs of interpreting/compiling or postponing the thing.

XY.6 Glossary

XY.6.1 Recognizer Words

FORTH-RECOGNIZE ( addr len -- i*x translator-xt | NOTFOUND-xt ) RECOGNIZER

Takes a string and tries to recognize it, returning the translator xt and additional information if successful, or NOTFOUND if not.

NOTFOUND ( -- ) RECOGNIZER

Performs -13 THROW. If the exception word set is not present, the system shall use a best effort approach to display an adequate error message.

TRANSLATE: ( xt-int xt-comp xt-post "name" -- ) RECOGNIZER EXT

Create a translator word under the name "name". This word is the only standard way to define a translator.

"name:" ( j*x i*x -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

Rationale: The by far most common usage of translators is inside the outer interpreter, and this default mode of operation is called by EXECUTE to keep the API small. There may be other, non-standard modes of operation, where the individual component xts are accessed STATE-independently, which only works on translators created by TRANSLATE: (e.g. for implementing POSTPONE), so any other way to define a translator is non-standard.

XY.6.2 Recognizer Extension Words

SET-FORTH-RECOGNIZE ( xt -- ) RECOGNIZER EXT

Assign the recognizer xt to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise can use this word to change the behavior instead of using IS FORTH-RECOGNIZE.

FORTH-RECOGNIZER ( -- xt ) RECOGNIZER EXT

Obtain the recognizer xt that is assigned to FORTH-RECOGNIZE.

Rationale:

FORTH-RECOGNIZE is likely a deferred word, but systems that implement it otherwise, can use this word to change the behavior instead of using ACTION-OF FORTH-RECOGNIZE. The old API has this function under the name FORTH-RECOGNIZER (as a value) and this name is reused. Systems that want to continue to support the old API can support TO FORTH-RECOGNIZER, too.

RECOGNIZER-SEQUENCE: ( n*xt n "name" -- ) RECOGNIZER EXT

Create a named recognizer sequence under the name "name", which, when executed, tries to recognize strings starting with the topmost xt on stack and proceeding towards the bottommost xt until successful.

SET-RECOGNIZER-SEQUENCE ( n*xt n xt-seq -- ) RECOGNIZER EXT

Set the recognizer sequence of xt-seq to xt1 .. xtn.

GET-RECOGNIZER-SEQUENCE ( xt-seq -- n*xt n ) RECOGNIZER EXT

Obtain the recognizer sequence xt-seq as n*xt n.

TANSLATE-NT ( j*x nt -- k*x ) RECOGNIZER EXT

Translates a name token.

TRANSLATE-NUM ( n -- n | ) RECOGNIZER EXT

Translates a number.

TRANSLATE-DNUM ( d -- d | ) RECOGNIZER EXT

Translates a double number.

TRANSLATE-FLOAT ( r -- r | ) RECOGNIZER EXT

Translates a floating point number.

TRANSLATE-STRING ( addr u -- addr u | ) RECOGNIZER EXT

Translates a string.

Reference implementation:

This is a minimalistic core implementation for a recognizer-enabled system, that handles only words and single numbers without base prefix. This implementation does only take interpret and compile state into account, and uses the STATE variable to distinguish.

Defer forth-recognize ( addr u -- i*x translator-xt / notfound )
: interpret ( i*x -- j*x )
  BEGIN
      ?stack parse-name dup  WHILE
      forth-recognize execute
  REPEAT ;

: lit,  ( n -- )  postpone literal ;
: notfound ( state -- ) -13 throw ;
: translate: ( xt-interpret xt-compile xt-postpone "name" -- )
  create , , ,
  does> state @ 2 + cells + @ execute ;
:noname name>interpret execute ;
:noname name>compile execute ;
:noname name>compile swap lit, compile, ;
translate: translate-nt ( nt -- )
' noop
' lit,
:noname lit, postpone lit, ;
translate: translate-num ( n -- )

: rec-nt ( addr u -- nt nt-translator / notfound )
  forth-wordlist find-name-in dup IF  ['] translate-nt  ELSE  drop ['] notfound  THEN ;
: rec-num ( addr u -- n num-translator / notfound )
  0. 2swap >number 0= IF  2drop ['] translate-num  ELSE  2drop drop ['] notfound  THEN ;

: minimal-recognize ( addr u -- nt nt-translator / n num-translator / notfound )
  2>r 2r@ rec-nt dup ['] notfound = IF  drop 2r@ rec-num  THEN  2rdrop ;

' minimal-recognizer is forth-recognize

Extensions reference implementation:

: set-forth-recognize ( xt -- )
  is forth-recognize ;
: forth-recognizer ( -- xt )
  action-of forth-recognize ;

Stack library

: STACK: ( size "name" -- )
  CREATE 0 , CELLS ALLOT ;

: SET-STACK ( item-n .. item-1 n stack-id -- )
  2DUP ! CELL+ SWAP CELLS BOUNDS
  ?DO I ! CELL +LOOP ;

: GET-STACK ( stack-id -- item-n .. item-1 n )
  DUP @ >R R@ CELLS + R@ BEGIN
    ?DUP
  WHILE
    1- OVER @ ROT CELL - ROT
  REPEAT
  DROP R> ;

Recognizer sequences

: recognize ( addr len rec-seq-id -- i*x translator-xt | NOTFOUND )
  DUP >R @
  BEGIN
    DUP
  WHILE
    DUP CELLS R@ + @
    2OVER 2>R SWAP 1- >R
    EXECUTE DUP ['] NOTFOUND <> IF
      2R> 2DROP 2R> 2DROP EXIT
    THEN
    DROP R> 2R> ROT
  REPEAT
  DROP 2DROP R> DROP ['] NOTFOUND
;
#10 Constant min-sequence#
: recognizer-sequence: ( rec1 .. recn n "name" -- )
  min-sequence# stack: min-sequence# 1+ cells negate here + set-stack
  DOES>  recognize ;
: ?defer@ ( xt1 -- xt2 )
  BEGIN dup is-defer? WHILE  defer@  REPEAT ;
: set-recognizer-sequence ( rec1 .. recn n rec-seq-xt -- ) ?defer@ >body set-stack ;
: get-recognizer-sequence ( rec-seq-xt -- rec1 .. recn n ) ?defer@ >body get-stack ;

Once you have recognizer sequences, you shall define

' rec-num ' rec-nt 2 recognizer-sequence: default-recognize
' default-recognize is forth-recognize

The recognizer stack looks surprisingly similar to the search order stack, and Gforth uses a recognizer stack to implement the search order. In order to do so, you define wordlists in a way that a wid is an execution token which searches the wordlist and returns the appropriate translator.

: find-name-in ( addr u wid -- nt / 0 )
  execute ['] notfound = IF  0  THEN ;
root-wordlist forth-wordlist dup 3 recognizer-sequence: search-order
: find-name ( addr u -- nt / 0 )
  ['] search-order find-name-in ;
: get-order ( -- wid1 .. widn n )
  ['] search-order get-recognizer-sequence ;
: set-order ( wid1 .. widn n -- )
  ['] search-order set-recognizer-sequence ;

Recognizer examples

REC-NT ( addr u -- nt translate-nt | notfound ) Search the locals wordlist if locals have been defined, and then the search order for a definition matching the string addr u, and provide that name token as result.

REC-NUM ( addr u -- n translate-num | d translate-dnum | notfound ) Try converting addr u into a number, and on success return either a single number n and translate-num, or a double number d and translate-dnum.

REC-FLOAT ( addr u -- r translate-float | notfound ) Try converting addr u into a floating point number, and on success return that number r and translate-float.

REC-STRING ( addr u "string"<"> -- addrs us translate-string | notfound "string"<"> ) Convert quoted strings (i.e. addr u starts with '"') in the input stream into string literals, performing the same escape handling as S\" and on success return the converted string as addrs us and translate-string.

REC-TICK ( addr u -- xt translate-num | notfound ) If addr u starts with a ````` (backtick), search the search order for the name specified by the rest of the string, and if found, return its xt and translate-num.

REC-SCOPE ( addr u -- nt translate-nt | notfound ) Search for words in specified vocabularies (the vocabulary needs to be found in the current search order), the string addr u has the form vocabulary:name, otherwise than that this specifies the vocabulary to be searched in, REC-SCOPE is identical in effect to REC-NT.

REC-TO ( addr u -- xt n translate-to | notfound ) Handle the following syntax of TO-like operations of value-like words:

  • ->value as TO value or IS value
  • +>value as +TO value
  • '>value as ADDR value
  • @>value as ACTION-OF value

xt is the execution token of the value found, n indexes which variant of a TO-like operation is meant, and translate-to is the corresponding translator.

REC-ENV ( addr u -- addrs us translate-env | notfound ) Takes a pattern in the form of ${name} and provides the name as addrs us on the stack. The corresponding translator translate-env is responsible for looking up that name in the operating system's environment variable array.

REC-COMPLEX ( addr u -- rr ri translate-complex | notfound ) Converts a pair of floating point numbers in the form of float1+float2i into a complex number on the stack, and returns translate-complex on success.

Testing

TBD


[r1082] 2023-09-15 14:03:42 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Things to discuss, because there are still too many variables.

ToDo:

  • Rename Recognizers from REC-result to RECOGNIZE-result. A solution for .RECOGNIZERS drowning the reader in recognize- could be to skip that prefix, because all recognizers are supposed to have the same prefix, anyways.
  • Revert the name of translators to rectypes or some similar word showing that this does describe a type?
  • Add mode/state-specific access words to the translators again and decide on how they work. I prefer defer-field likes, which right away execute the corresponding action, and not put an xt on the stack for consumption. Defer-fields could work together with IS and ACTION-OF to access the xts within (in Gforth, they do).

Answers to some questions:

A lot of thoughts went into it to make different subsets of this proposal useful on their own, and allow different implementation strategies. The answer to “can I do without feature X” is most likely yes. You can use the subset of the features you want. Stripping away too much results in a subset no longer usable.

  • Opening up the whole idea to small systems is useful to gain wider use.
  • FORTH-RECOGNIZE is a deferred word in the reference implementation on purpose, and that allows changing it without adding more words. To add more implementation options, you can use the setter and getter words (which are optional) if you don't want to implement it as deferred word to swap in and out named sequences.
  • The recognizer sequences do have words to get and set the sequence, so you can just work with a single sequence and set/get it if you like. The nesting capability comes by the magical fact that a recognizer sequence has the same stack effect as a recognizer.
  • You can do without both, because recognizer sequences can be written as colon definitions “by foot”.
  • Named sequences are useful, especially when you swap in recognizer sequences for applications that do something completely different than the Forth recognizer sequence. If you do not want to support named sequences, you can still provide the one single named sequence FORTH-RECOGNIZE, and allow SET-RECOGNIZER-SEQUENCE and GET-RECOGNIZER-SEQUENCE to operate just on that. That's also an option where recognizers are useful without having FORTH-RECOGNIZE being deferred and no RECOGNIZER-SEQUENCE:.
  • The NOTFOUND return for failure is there so that you can always EXECUTE the result of FORTH-RECOGNIZE and don't have to check for errors there.

Tough question: The string recognizer has a side effect, which is not good. Moving that side effect to the translator is causing other problems, because TRANSLATE-STRING no longer has the corresponding string on the stack, but needs parsing it later. Actually, parsing should happen in PARSE-NAME. It still seems to be a hack that doesn't have a perfect solution.


[r1084] 2023-09-16 00:38:53 ruv replies:

proposal - minimalistic core API for recognizers

Rename Recognizers from REC-result to RECOGNIZE-result

In general, an abbreviation or acronym may be acceptable to me. But in this case I prefer RECOGNIZE- rather than REC-. The main disadvantage of rec if that it has misleading associations. And the main advantage of recognize is that it's a whole English word that is very appropriate for our case.

The part referred as "result" should not be a result (of recognizing), but the expected type of the input lexeme. Have a look in your examples — REC-NUM and REC-TICK produce the same result type translate-num, but they accept different types of input lexemes, and these types are identified by NUM and TICK symbols correspondingly.

Thus, the naming form for recognizers can be expressed as RECOGNIZE-{lexeme-type-symbol}.


Revert the name of translators to rectypes or some similar word showing that this does describe a type?

It does describe a type of what? It describes a type of a token i*x, which is a result of recognizing. Actually, a token translator identifies the type of a token i*x, which is a result of recognizing. Then, a token translator is a token type in the same time.

If we want to reflect this idea, we can use the acronym tt, which stands for both: token translator and token type. Then, token translators can be named according to the form TT-{token-type-symbol}. It looks elegant to me.

The names of translators are used for two purposes: to call a translator (for example, when we define a new translator via existing translators), and to obtain xt of a translator (which is an identifier for a token type in the same time) — to analyze a result of recognizing. The prefix tt- looks good in these both case.