Digest #286 2024-11-26

Contributions

[368] 2024-11-25 22:50:28 BerndPaysan wrote:

proposal - Known undefined word XLERB

Author:

Bernd Paysan

Change Log:

2024-11-25 initial version

Problem:

For documentation purposes, running tests, and provoking exceptions there is a need for a word that is never in the dictionary.

Solution:

Reserve XLERB as word that is not in the dictionary. An ambiguous condition exists if the user tries to define XLERB (a warning probably is sufficient). The name XLERB has been used in Starting Forth for this documentation purpose.

Typical use:

XLERB comic in Starting Fortn

T{ ' ' catch xlerb -> -13 }T
T{ s" xlerb" find-name -> 0 }T
: e-xlerb s" 3 to xlerb" evaluate ;
T{ ' e-xlerb catch -> -13 }T

Proposal:

Append the following text to 3.3.1.2:

Neither the system nor programs shall create a definition with the name XLERB, which is reserved for documentation and tests of well-known undefined definitions.

Append the following ambiguous condition to 4.1.2, after item 2:

The definition name is XLERB.

Reference implementation:

empty

Testing:

T{ ' ' catch xlerb -> -13 }T
T{ s" xlerb" find-name -> 0 }T
: e-xlerb s" 3 to xlerb" evaluate ;
T{ ' e-xlerb catch -> -13 }T

Replies

[r1359] 2024-11-04 01:16:31 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Concerning explicit access methods to xt-int/xt-comp/xt-post, I can offer the following compromise, as a result of observations made:

It turns out that you can not access xt-int and xt-comp by setting STATE, executing the translator, and then reverting STATE to the value before, because words can change STATE as part of their interpretation or compilation semantics, and in that case, the state change is a desired result of performing interpretation or compilation semantics.

However, it turns out that you can access xt-post that way, because the only word that possibly changes that state is [[, and that token is a) no visible at all to POSTPONE, and b) changes the state back to compilation state, the state POSTPONE was in anyhow.

So if your system allows full explicit access to all three possible states, all translators have to be defined by 'TRANSLATE:', and I can offer you three access methods. If you only want to implement POSTPONE, the following definition actually works:

: postpone ( "string" -- )
  parse-name forth-recognize ?found
  state @ >r -2 state ! execute r> state ! ; immediate

Further observations:

Gforth has >INTERPRET and >COMPILE, and doesn't use them, only >POSTPONE is used. In exactly one place, in POSTPONE. All other invocations are through EXECUTE or only taking the data. The rest is implementation, including the extension towards more of those access methods for more, user-defined states. The question is whether you need to standardize a tool that has no use case, even if you don't bury it.

A possible way to deal with this is to move this out to a separate proposal.

What has been quite useful is the EXECUTE interface for user-written interpreters, because these are interpret-only, and don't need the complication of state-dependent translators at all.

[r1360] 2024-11-04 04:06:09 JimPeterson replies:

testcase - Incorrect test case

MSB is not -1. It's the same as MIN-INT.

[r1361] 2024-11-04 11:21:24 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Sleeping over it added a few ideas:

The invocation through changing STATE and restoring it works (in general) for translators that will definitely not change STATE as part of their own operation, e.g. translators for literals. It also works (as a special case) for POSTPONE, so a standard implementation of POSTPONE using that method is possible. The postpone mode itself, which needs to change STATE at [[ relies on the dispatch through STATE without setting and restoring STATE around the invocation, so it also works.

The question here is not if that implementation is a quality implementation, but whether it's not so bad that it is another bag full of inconsistencies. IMHO, TRANSLATE-NT will have demonstrable inconsistencies when not using the clean TRANSLATE: interface, but combined literal translators won't. For the cleaner interface outside of POSTPONE itself (which is special case enough to not require the cleaner interface), we have to demonstrate that there is an actual use case. So far, we don't have one.

Both POSTPONE with the additional functionality and the postpone mode ]] … [[ will become part of the proposal.

[r1362] 2024-11-04 20:33:24 ruv replies:

proposal - New words: latest-name and latest-name-in

The question is: what do you want to solve?

I want to solve a problem of obtaining the nt of the definition that was placed into the compilation word list most recently. This is possible in any system in you know the name of this definition, but is unreasonably difficult to do in a standard program.

I have given many examples where this functionality is in demand.

One problem here is that while it is still defined, you can't even search for it.

Concerning xt, this problem is solved with the suggested word germ. Concerning nt — do you have an example when you need nt of the current definition?

Another problem is that you may never search for it, because it doesn't even have a name.

In a standard program such a definition can only be created with :noname, and you have xt of this definition on the stack under colon-sys. It is also possible to obtain this xt with the suggested word germ during compilation of this definition.

And you want to access the last definition even after it was completed, so there's no current definition any longer.

The xt of an anonymous definition is on the stack. What is a problem?

Don't implement it that way, and that problem goes away.

What problem goes away? The way you mentioned is pretty perfect, I think.

So if you use LATEST-NAME-IN in Gforth by looking into the current wordlist, you'd never get the current definition

Yes, it is by design. Because there is no sound meaning what to return.

If you want to return nt for the definition compilation for which was started most recently, then you have to

The definition you want to access is the one you just defined (it may be incomplete or complete), and that's the last one in time. It always has an xt, it might not have an nt (and if it doesn't have one, that should be 0).

The problem is that the last one in time (i.e., the definition whose compilation has been started most recently) is different before and after compilation of a nested definition.

An example:

: foo
  [ ( here the last nt is the nt for "foo" ) ]
  [: bar
    [ ( here the last nt is 0 ) ]
  ;]
  [ ( here the last nt is 0 ) ]
;  ( here the last nt is 0 )

If we will have multiple dictionary sections, a library can create other named definitions in other sections while during compilation of the user's definition. And what definition will be last in time?

The word germ will return values as follows:

: foo
  [ ( here germ returns xt for "foo" ) ]
  [: bar
    [ ( here germ returns xt for the quotation ) ]
  ;]
  [ ( here germ returns xt for "foo" ) ]
;  ( here germ returns 0 )

If we will have multiple dictionary sections, it will not change anything for germ, because germ returns the _xt _of the definition in which compile, would add semantics.

So what we could implement is a LATEST-NAME-IN which returns the nt of the latest definition if the wid matches the current incomplete definition's wordlist.

If you mean to return the nt for the current definition, then implementing LATEST-NAME-IN will require to change Forth systems that create nt by ; (semicolon) only. And what's the use of this?

The first thing needed here is to really figure out what people actually want: The last element of a wordlist, the last incomplete definition of a wordlist (i.e. the element with the smudge bit set), or the last definition in time, regardless if it is completed or not, and what wordlist it goes into once completed.

I have provide a number of examples when the last element of the compilation word list is needed or enough.

Do you have practical examples when you need the nt of the current definition, or 0 if this definition is anonymous?

[r1363] 2024-11-05 01:26:19 ruv replies:

proposal - minimalistic core API for recognizers

@BerndPaysan writes:

It turns out that you can not access xt-int and xt-comp by setting STATE, executing the translator, and then reverting STATE to the value before, because words can change STATE as part of their interpretation or compilation semantics, and in that case, the state change is a desired result of performing interpretation or compilation semantics.

This is wrong. Yes, the state change can be a desired result of interpretation or compilation semantics, but this does not prevent us from performing the interpretation or compilation semantics regardless the initial value of STATE, as I shown many times.

We can use the following helpers for that.

\ Useful factors
: compilation ( -- flag )  state @ 0<> ;
: enter-compilation ( comp: false -- true  |  comp: true -- true )  ] ;
: leave-compilation ( comp: true -- false  |  comp: false -- false )  postpone [ ;

\ For the execution semantics identified by xt,
\ perform the part that can be observed in interpreted state.
: execute-interpreting ( i*x xt -- j*x )
  compilation 0= if execute exit then
  leave-compilation execute enter-compilation
;

\ For the execution semantics identified by xt,
\ perform the part that can be observed in compilation state.
: execute-compiling ( i*x xt -- j*x )
  compilation if execute exit then
  enter-compilation execute leave-compilation
;

If we have a result of recognizing with the xt of a translator at the top (i.e., a fully qualified token), and we want to perform the corresponding interpretation semantics regardless of the current value of STATE, we should execute this xt with execute-interpret. If we want to perform the corresponding compilation semantics regardless of the current value of STATE, we should execute it with execute-compiling. If we want to perform the semantics according to STATE, we should just execute this xt with execute.

The key point in the implementation of execute-interpreting and execute-compiling is that we do not save/restore STATE if it matches the semantics we want to perform — and if changing STATE is part of the semantics, STATE will be changed. On the other hand, if STATE does not match the semantics we want to perform, we change STATE and then restore it — if changing STATE is part of the semantics, then it will change STATE to the same value that was saved and to one we restore it to. Thus, the resulting STATE will be as expected!

NB: execute-interpreting and execute-compiling are also required if we want to perform the interpretation semantics or compilation semantics from an nt, regardless the current value of STATE. Moreover, these words are required even in the old approach for Recognizer API, which provides the words RECTYPE>INT and RECTYPE>COMP — because these words have the same flaw for state-dependent words as NAME>INTERPRET and NAME>COMPILE.

[r1364] 2024-11-05 03:29:18 ruv replies:

proposal - minimalistic core API for recognizers

@AntonErtl writes about token translators:

if you specify exactly what happens, it leads to lengthy texts that explain the state-dependence, and the three different cases. And you cannot even specify when xt-post is performed, because there is no "postpone state" in the standard. On the contrary the current document specifies that STATE is either 0 (interpretation state) or non-zero (compilation state), without any values left for a postpone state, and specifies only words for getting into interpretation state and compilation state, not postpone state.

This is reasonable. And we also discussed in the Recognizer chat group that the standard does not imply such a state as postponing (for the Forth text interpreter).

In my opinion, these problems can be avoided.

We should specify "to translate a token" and "token translator" in the common sections of term definitions, data types and usage requirements. Then, we do not need to repeat that for every token translator. It will be enough to specify that a word is a token translator, and the data type of the token (that it translates).
We can have a word like postpone-token ( qt -- ) that append the compilation semantics of a lexeme, which was recognized as qt, to the current definition. (qt is a qualified token, which is a pair of an unqualified token and token translator ( uq tt ))

So, any additional state, if any, is encapsulated into postpone-token. The standard should not specify it.

Thus, postpone can be defined like this (in my parlance):

: postpone ( "name" -- )
  parse-lexeme perceive ?found postpone-token
;

How postpone-token finds/performs the postponing action from tt — it's an internal problem of implementation. The word postpone-token should throw the exception -32 "invalid name argument" if a postponing action is not associated with tt.

We need to provide a way to associate a postponing action (an xt) with a tt, or to create a new tt from an xt and tt. The postponing action should be optional. The user needs to provide a postponing action only if they want to make postpone applicable to the corresponding lexemes.

For example, we can have an optional word postponable ( tt1 xt.postponing -- tt2 ). Probably, this word shall return the same tt2 for the same input pair ( tt1 xt.postponing ). This word is optional, because it can be implemented along with postpone-token in a standard program, and postpone can be redefined to use then.

[r1365] 2024-11-07 13:43:25 achowe replies:

proposal - Incorrect use of semantics terms

Circular references in implementation examples are hugely confusing when used before the definition exists. In the above example the definitions of LIT, and leave-compilation reference POSTPONE before it is defined. A word like POSTPONE is conceptually very complex, compared to say SWAP, that many misunderstandings by novice developers arise from these circular references.

It has been commented before that circular references in implementation examples are acceptable and might make for quick clean short code, but IMO they simply compound the confusion to anyone who is not a Committee member nor seen the evolution of the language. This harms learning and adoption by new comers.

Would this be clearer?

: lit, (C: x -- ; S: -- x ) ['] LIT COMPILE, , ;
: leave-compilation ( -- ) 0 STATE @ ;

[r1366] 2024-11-07 22:05:41 BerndPaysan replies:

proposal - New words: latest-name and latest-name-in

Nested definitions require some form of nested dictionary, and in fact, it looks like the easiest way to implement some form of LATEST in that context is to tie it to that nesting.

There are only few cases in Gforth, where the latest nt or 0 is needed, and REVEAL is one of them: it only needs to do something when LATEST is non-zero.

[r1367] 2024-11-08 11:19:32 ruv replies:

proposal - New words: latest-name and latest-name-in

There are only few cases in Gforth, where the latest nt or 0 is needed, and REVEAL is one of them: it only needs to do something when LATEST is non-zero.

So, REVEAL in Gforth should only have an observable effect when the current definition exists, is a named definition, and is not already placed into the compilation wordlist. That is, it should not add it into the compilation wordlist twice.

Taking into account that xt is a subtype of nt in Gforth, REVEAL can use the word GERM ( -- xt | 0 ) that returns the xt of the current definition (that is also an nt in Gforth).

: reveal ( -- )
  germ 0= if exit then \ no current definition
  germ string>name nip 0= if exit then \ unnamed definition
  germ get-current latest-name-in = if exit then \ already revealed
  germ ( nt ) ... \ place nt into the compilation word list
;

Note that we rely on the fact that latest-name-in cannot return an nt that is not available to traverse-wordlist (and hence to find-name-in).

Of course, the check if nt is already revealed may be different in Gforth.

[r1368] 2024-11-08 13:16:17 ruv replies:

proposal - Incorrect use of semantics terms

Circular references in implementation examples are hugely confusing when used before the definition exists.

Actually, it is a polyfill that intentionally redefines postpone.

Yes, to make it clearer, a system-dependent implementation can be provided. For example, assuming that [ and literal are immediate words:

: compilation  ( -- flag ) state @ 0<> ;
: enter-compilation  ( -- )      ] ;
: leave-compilation  ( -- )  ['] [ execute ;
: execute-compiling ( i*x xt -- j*x )
  compilation    if  execute  exit  then
  enter-compilation  execute  leave-compilation
;
[undefined] lit, [if]
  : lit, ( x -- ) ['] literal execute-compiling ;
[then]

Is it simpler?

postpone is the same.

['] LIT COMPILE, , is more system specific and could be difficult to understand.

[r1369] 2024-11-13 09:32:32 ruv replies:

proposal - minimalistic core API for recognizers

@AntonErtl writes:

In that case there is also no need for the translators to actually be executable. The recognizer could return an opaque translation token,

I researched this approach.

In general, a recognizer returns a qualified token (qt) on success, where a qualified token is a pair of an unqualified token (uq) and a token descriptor (td).

Data type relations:

unqualified token: ut => ( S: i*x F: k*r )
token descriptor: td => x\0
qualified token: qt => ( ut td )

It is always possible to define a word translate-qtoken ( any qt -- any ), which translates a qualified token (i.e., performs the interpretation or compilation semantics for the corresponding recognized lexeme depending on STATE). And as practice shows, it is very useful and in demand.

Additionally, in Forth, it is always technically possible to make the token descriptor also a token translator (that is a subtype of the execution token), without any loss (see an example).

token translator: tt => xt ; td = tt

So, instead of using a separate word translate-qtoken, we can use the word execute. And the Forth text interpreter simply executes the token translator (instead of applying translate-qtoken to qt). Note that regardless whether the token descriptor is a subtype of the execution token, the token descriptor is opaque for the Forth text interpreter. The only difference is whether translate-qtoken or execute is using by the Forth text interpreter.

The big advantage of token translators is that they can be defined inline as quotations, and they can be used to define other token translators. This simplifies programs and reduces the lexical size of programs.

Also, token translators allow us to define dual-semantics words simpler. For example, this is a definition for ['], which has the expected interpretation semantics:

: ['] ( -- xt | ) '  tt-xt ; immediate

See also in my gist the word missing(, which has the expected interpretation and compilation semantics. Without token translators such words are more difficult to implement.

[r1370] 2024-11-20 16:49:28 AntonErtl replies:

proposal - CS-DROP (revised 2019-08-22)

In the 2020 meeting the committee accepted this proposal in vote #18 with 11Y:0N:1A.

[r1371] 2024-11-23 18:46:34 AntonErtl replies:

proposal - Remove the “rules of FIND”

The committee accepted this wording change in the 2020 meeting with vote #10 10Y:0:0

[r1372] 2024-11-25 08:36:33 AntonErtl replies:

requestClarification - diff from CORE FIND?

In the latest draft, there is really no difference between core FIND and search-order FIND, because the difference has been factored out into Sections 3.4.2 and 16.3.3, as well as into 2.1 and 16.2.

Therefore I propose to delete 16.6.1.1550 FIND, and instead add a link to 16.3.3 to 6.1.1550 FIND. Given that we already have this contribution, I just reopen it instead of creating a new proposal.