Digest #284 2024-11-03

Contributions

[366] 2024-11-01 20:33:42 CandidMoe wrote:

example - Bad test case

The test

T{ MSB 2* -> 0S }T

fails, because the right result is -2, not 0.

Replies

[r1314] 2024-09-26 09:30:40 AntonErtl replies:

proposal - Allow the text interpreter to use `WORD` and the pictured numeric output

In the discussion at the committee meeting on 2024-09-26, the following suggestions were made: Split the proposal into one for word and one for the PNO buffer. And for the PNO buffer, enumerate the words that can clobber it, rather than making # and friends unusable interactively.


[r1315] 2024-09-26 10:00:47 AntonErtl replies:

proposal - Support for single line comments during `evaluate`

Currently, the standard does not support control characters (including newlines) in evaluated strings, see New Line characters in a string passed to EVALUATE. So your proposal would have to change this first. We are therefore moving it to considered. One way to strengthen your proposal would be to discuss existing practice: What do systems do when newlines are passed to evaluate?


[r1316] 2024-09-26 10:58:00 BerndPaysan replies:

proposal - Special memory access words

Wyde comes from here: https://en.wiktionary.org/wiki/wyde. I found it in Unicode documents, and that's also the quotation in Wiktionary.


[r1317] 2024-09-26 11:38:48 AntonErtl replies:

proposal - Special memory access words

The feedback I got at the Forth200x meeting is to have b@ b>s b! (with some dissenting opinions, but c@ c! will still work), to change c to b in the table of conventions, to use u, not x, and to use c-addr, not addr.


[r1318] 2024-09-26 12:40:16 AntonErtl replies:

proposal - Test

This was a test of the system.


[r1319] 2024-09-26 12:41:54 flaagel replies:

proposal - Include a revised 79-STANDARD Specification for "><" To "Core Ext"

Redundant with the "Special Memory Access" word set.


[r1320] 2024-09-26 12:55:26 StephenPelc replies:

proposal - 32-bit memory operators


[r1321] 2024-09-26 12:56:46 StephenPelc replies:

proposal - 16-bit memory access


[r1322] 2024-09-26 13:28:51 GeraldWodni replies:

proposal - Tick and undefined execution semantics - 2

The author wants to incorporate the following replies:


[r1323] 2024-09-26 13:31:54 GeraldWodni replies:

proposal - Common terminology for recognizers discurse and specifications

Ongoing discussion


[r1324] 2024-09-26 13:37:10 ruv replies:

requestClarification - How to perform the interpretation semantics for a word

I think, the wording should be adjusted so that everywhere the term interpretation semantics means observable interpretation semantics. Similar for "compilation semantics". This will eliminate some confusing, many questions and disputes.

I have prepared a proposal to fix Incorrect use of semantics terms.


[r1325] 2024-09-26 13:55:18 GeraldWodni replies:

proposal - minimalistic core API for recognizers

The committee thanks the authors for all the work. Here is the timetable:

  • Everybody interested in this proposal: please submit your comments by end of October.
  • Bernd (main author): please work this into a new version by the end of the year (2024).
  • The committee will have a special interim meeting for this very proposal in February (final date will be announced in mattermost)

[r1326] 2024-09-26 14:04:20 AntonErtl replies:

proposal - CS-DROP (revised 2019-08-22)

Please work in the comments.


[r1327] 2024-09-26 14:05:54 StephenPelc replies:

proposal - CS-DROP (revised 2019-08-22)

CASE ... NEXTCASE - loop that consumes orig from CASE CASE ... ENDCASE - not a loop, discards orig CASE


[r1328] 2024-09-26 14:26:31 ruv replies:

proposal - Incorrect use of semantics terms

Author

Ruv

Change Log

  • 2024-09-25 Initial revision
  • 2024-09-26 Add sentences about an environmental restriction and environmental dependency.

Problem

The section 2.1 Definitions of terms gives the following definitions:

  • interpretation semantics:The behavior of a Forth definition when its name is encountered by the text interpreter in interpretation state.
  • compilation semantics: The behavior of a Forth definition when its name is encountered by the text interpreter in compilation state.

These definitions are very good. Essentially, they talk about observable behavior, behavior that the Forth system exhibits under the specified conditions (namely, the word name is encountered in interpretation state or the word name is encountered in compilation state) and that a standard program or user can detect.

Note that if some effect cannot be detected under the specified conditions, than this effect is not a part of the corresponding semantics, and if some effect is a part of semantics, it can be detected under the corresponding conditions.

Almost in all cases when these terms are used in the normative parts, they are used correctly.

The problem is that in a few places these terms are used incorrectly. This produces inconsistency, confusing, many questions and disputes.

There are two places where they are used incorrectly:

  • about default interpretation semantics,
  • about compilation semantics of an immediate word.

About default interpretation semantics

The section 3.4.3.2 Interpretation semantics says:

  • Unless otherwise specified in an "Interpretation:" section of the glossary entry, the interpretation semantics of a Forth definition are its execution semantics.

We can create a word whose execution semantics, when performed in compilation state, exhibit effects that cannot be detected when the name of this word is encountered by the Forth text interpreter in interpretation state. It means, these effects are not a part of the interpretation semantics for this word. Thus, the interpretation semantics for this word are not the execution semantics of this word.

An example:

: foo1 ( -- 1 | ) s" 1" evaluate ;

When the execution semantics of foo1 are performed in compilation state, something is appended to the current definition. This effect is not a part of the interpretation semantics of this word. Thus, the interpretation semantics of this word are not the execution semantics of this word .

So, that is an inconsistency (or perhaps an omission or mistake) in the use of a formal term.

Discussion

There is a point of view that the section 3.4.3.2 defines a different meaning for the term "interpretation semantics" for the case when "Interpretation" section is absent. But this is not true. Every term is defined separately and explicitly. In this section the term "interpretation semantics" is used rather than defined.

About compilation semantics for an immediate word

The section 2.1 Definitions of terms says:

  • immediate word: A Forth word whose compilation semantics are to perform its execution semantics.

This definition does not say that the execution semantics shall be performed in compilation state. It means, they can be performed in interpretation state too. But we can create a word whose execution semantics, when performed in interpretation state, exhibit effects that cannot be detected when the name of this word is encountered by the Forth text interpreter in compilation state. It means, these effects are not a part of the compilation semantics of this word. Thus, the compilation semantics of this word are not to perform the execution semantics of this word (regardless the state).

An example:

: foo2 ( -- 2 | ) s" 2" evaluate ; immediate

When the execution semantics of foo2 are performed in interpretation state, the number 2 is placed on the data stack. This effect is not a part of the compilation semantics of this word. Thus, the compilation semantics of this word are not to perform the execution semantics of this word.

So, that is an inconsistency (or perhaps an omission or mistake) in the use of a formal term.

Discussion

There is a point of view that the definition for the term "immediate word" also defines another meaning for the term "compilation semantics", that applies to immediate words only. But this is not true. Because every term is defined separately and explicitly. In this definition the term "compilation semantics" is used rather than defined.

Solution

Change wording in the mentioned cases in such a way that the "interpretation semantics" and "compilation semantics" terms are used correctly:

  • Unless otherwise specified in an "Interpretation:" section of the glossary entry, the interpretation semantics of a Forth definition are to perform its execution semantics in interpretation state.
  • immediate word: A Forth word whose compilation semantics are to perform its execution semantics in compilation state.

Add notes about an environmental restriction and environmental dependency concerning behavior of POSTPONE (see also "Consequences" below).

Proposal

In the section 3.4.3.2 Interpretation semantics, replace the sentence:

Unless otherwise specified in an "Interpretation:" section of the glossary entry, the interpretation semantics of a Forth definition are its execution semantics.

with the sentence:

Unless otherwise specified in an "Interpretation:" section of the glossary entry, the interpretation semantics of a Forth definition are to perform its execution semantics in interpretation state.

In the section 2.1 Definitions of terms, replace the sentence:

immediate word: A Forth word whose compilation semantics are to perform its execution semantics.

with the sentence:

immediate word: A Forth word whose compilation semantics are to perform its execution semantics in compilation state.

In the "Compilation" section of the glossary entry 6.1.2033 POSTPONE add the paragraph:

A Forth system that provides POSTPONE that, when applied to an immediate word, appends the execution semantics of the word to the current definition, imposes an environmental restriction on programs: this definition (to which POSTPONE appends the semantics) shall be executed in compilation state only.
A program, which assumes that POSTPONE, when applied to an immediate word, appends the execution semantics of the word to the current definition, has an environmental dependency.

Consequences

The inconsistency in the term "immediate word" was used to argue that the word POSTPONE, when applied to an immediate word, shall append the execution semantics of the word to the current definition, because they are allegedly always equivalent to the compilation semantics of the word.

This inconsistency cannot be used anymore.

If a Forth system provides the word POSTPONE, which when applied to an immediate word appends the execution semantics of the word to the current definition, this perhaps adds an environmental restriction to the Forth system.

On the other hand, it is easy to implement POSTPONE according to the specification:

[undefined] lit, [if]
  : lit, ( x -- ) postpone literal ;
[then]
: compilation  ( -- flag ) state @ 0<> ;
: enter-compilation  ( -- )           ] ;
: leave-compilation  ( -- )  postpone [ ;
: execute-compiling ( i*x xt -- j*x )
  compilation    if  execute  exit  then
  enter-compilation  execute  leave-compilation
;
: postpone ( "name" -- )
  parse-name find-name dup 0= -13 and throw name>compile ( x xt.compiler )
  compilation if swap lit, lit, ['] execute-compiling compile, exit then
  execute-compiling
; immediate

(see other variants in my post at ForthHub)


[r1329] 2024-09-26 14:52:56 AntonErtl replies:

proposal - Revise Rationale of Buffer:

Here's a possible text for the revised Rationale:

BUFFER: provides a means of defining an uninitialized buffer. Embedded systems can take advantage of the lack of initialization and lack of contiguity of the memory area while hosted systems are permitted to ALLOCATE a buffer.

As a programmer, you can use BUFFER: instead of CREATE ALLOT for image size reduction on some systems, if continuity is required.

As a system implementer, you can put the BUFFER: data in uninitialized memory. You may put this data in any writable region of memory that you deem appropriate.


[r1330] 2024-09-26 15:02:15 AntonErtl replies:

proposal - Revise Rationale of Buffer:

Replace A.6.2.0825 Buffer: with the following text:

BUFFER: provides a means of defining an uninitialized buffer. Embedded systems can take advantage of the lack of initialization and lack of contiguity of the memory area while hosted systems are permitted to ALLOCATE a buffer.

As a programmer, you can use BUFFER: instead of CREATE ALLOT for image size reduction on some systems, if contiguity with other allocations is not required.

As a system implementer, you can put the BUFFER: data in uninitialized memory. You may put this data in any writable region of memory that you deem appropriate.


[r1331] 2024-09-26 15:03:53 AntonErtl replies:

proposal - Revise Rationale of Buffer:

This is just a wording change.


[r1332] 2024-09-26 15:33:13 BerndPaysan replies:

proposal - minimalistic core API for recognizers

Concerning the setters and getters: I would prefer to make it mandatory that FORTH-RECOGNIZE actually is a deferred word, and drop the additional getters and setters completely. DEFER, IS, and ACTION-OF are all CORE EXT; so if you implement the recognizers, you have a dependency on those. The previous proposals had VALUE and TO and interface, which is also CORE EXT.

Gforth could support IS and ACTION-OF on recognizer sequences, too (i.e. assign n elements in order), through its polymorphous approach at all those words for value-style words (TO, +TO, ADDR, IS, ACTION-OF all can do different things on different classes of values), but I guess that would be too much.

Can those setters and getters be optional in case you don't want to support DEFER, and how can a program be written to work in both cases? If you have TOOLS EXT available, you can use

[DEFINED] is [IF]
    is forth-recognize
[ELSE]
    [DEFINED] to [DEFINED] forth-recognizer and [IF]
        to forth-recognizer
    [ELSE]
        set-forth-recognizer
    [THEN]
[THEN]

Yes, this is ugly and shows that having different options is not a good idea.

For the reworked proposal, I will need to restructure the proposal in a way that optional parts I rather want to remove are outlined as such, so that the final rewrite is easy.


[r1333] 2024-09-26 15:48:45 AntonErtl replies:

proposal - F>R and FR> to support dynamically-scoped floating point variables

Please advance the proposal to be more formal, along the lines of ruv's reply. One thing that you should also mention is that the return stack pointer may not comply with the usual FP alignment requirements, and your words must cope with that.


[r1334] 2024-09-26 16:03:36 GeraldWodni replies:

proposal - Multi-Tasking Proposal

On behalf of the Committee: Needs new proposer to work in comments and current systems implementation comparisons


[r1335] 2024-09-26 16:21:49 AntonErtl replies:

proposal - BL rationale is wrong

Replace the rationale of BL with:

BL (for BLank) indicates the intent of providing a character while the number #32 or $20 does not.


[r1336] 2024-09-26 16:23:52 AntonErtl replies:

proposal - BL rationale is wrong

This is just a wording change.


[r1337] 2024-09-26 16:38:53 AntonErtl replies:

proposal - WLSCOPE -- wordlists switching made easier

As discussed in r32, the committee has discussed this.


[r1338] 2024-09-26 16:46:27 AntonErtl replies:

proposal - Implementations requiring BOTH 32 bit single floats and 64 bit double floats.

On behalf of the committee: There is no common practice for arithmetic operations on several FP sizes. F@, DF@, SF@ all put the same sized FP value on the FP stack. A proposal for multiple arithmetic sizes is unlikely to gain consensus.


[r1339] 2024-09-26 16:53:11 AntonErtl replies:

proposal - Core-ext S\" should reference File-ext S\"

This is work for the editor.


[r1340] 2024-09-26 21:39:01 GeraldWodni replies:

proposal - Test

Testing


[r1341] 2024-09-26 21:40:14 GeraldWodni replies:

proposal - Test

Vote answer testing


[r1342] 2024-09-26 21:40:51 GeraldWodni replies:

proposal - Test

yes!


[r1343] 2024-09-26 22:02:11 BerndPaysan replies:

proposal - Relax documentation requirements of Ambiguous Conditions

I still propose to reduce the number of ambiguous conditions the standard allows, but when the standard allows these, having documentation about these is helpful.


[r1344] 2024-09-27 10:07:57 AntonErtl replies:

proposal - Relax documentation requirements of Ambiguous Conditions


[r1345] 2024-09-27 10:10:08 PeterKnaggs replies:

proposal - Revised Proposal Process

Authors

  • Andrew Haley
  • Peter Knaggs
  • Leon Wagner
  • Gerald Wodni

Change Log

  • 14/09/2018 Text worked out in workshop at the 2018 Forth Standards Meeting
  • 27/09/2024 Removed process pending futher discussion

Problem

Now that the proposal process has been moved onto forth-standard.org the Proposal Process no longer describes the process.

Solution

Replace the Proposal Process in the front matter of the document with a new process that correctly describes the revised process of proposing change via this site.

Proposal

Replace the text of the Proposal Process in the front matter with the following:

The Proposal

In the initial proposal, some issues could be left undecided, leaving them open for discussion. These issues should be mentioned in the Problem or Solution section as well as in the Proposal section.

If you want to leave something open to the system implementor, make that explicit, e.g., by making it implementation dependent.

A proposal should include the following sections.

Author:

The name of the author(s) of the proposal.

Change Log:

A list of changes to the last published edition on the proposal.

Problem:

This states what problem the proposal addresses.

Solution:

A short informal description of the proposed solution to the problem identified by the proposal.

This gives the rationale for specific decisions you have take in the proposal (often in response to comments), or discusses specific issues that have not been decided yet.

Typical use: (Optional)

Shows a typical use of the word or feature proposed; this should make the formal wording easier to understand.

Document Changes:

This should enumerate the changes to the document.

For the wording of word definitions, use existing word definitions as a template. Where possible, include the rationale for the definition.

Reference implementation: (if applicable)

This makes it easier for system implementors to adopt the proposal. Where possible, the reference implementation should be provided in standard Forth. Where this is not possible because system specific knowledge is required or non-standard words are used, this should be documented.

Testing: (if applicable)

This should test the words or features introduced by the proposal, in particular, it should test boundary conditions. Test cases should work with the test harness in Appendix F.


[r1346] 2024-09-27 10:26:28 ruv replies:

proposal - Relax documentation requirements of Ambiguous Conditions

There is an inconsistency in the section 4.1.2 Ambiguous conditions, which says:

A system shall document the system action taken upon each of the general or specific ambiguous conditions identified in this standard.

and the the introduction into the chapter 4 Documentation requirements, which says:

When it is impossible or infeasible for a system or program to define a particular behavior itself, it is permissible to state that the behavior is unspecifiable and to explain the circumstances and reasons why this is so.

(emphasis mine in both cases)

If 4.1.2 will say "should" instead of "shall" — the inconsistency will be eliminated. Another way is to add into 4.1.2 the option "or state that the behavior is unspecifiable". Probably, it is also better to use either "system action" or "behavior" in both cases.

We should take into account that it's impossible to document the behavior in some cases because it depends on many factors. For example: "An ambiguous condition exists if an incorrectly typed data object is encountered" — what behavior can be documented in this case?

For example, SP-Forth/4 documents behavior in some ambiguous conditions, but not in all (see online version, source code). In SP-Forth, we will continue do document behavior in some ambiguous conditions.


Personally, I usually don't consult the documentation, but test the system interactively, and in some rare case I view the corresponding source code.


[r1347] 2024-09-27 12:51:10 ruv replies:

requestClarification - How to perform the interpretation semantics for a word

@BerndPaysan wrote:

As current pre-1.0-Gforth shows, you can have an xt as single identifier for a word and executing that xt always gives you the interpretation semantics;

How to perform the observable interpretation semantics from this xt in Gforth? In general case, should I set interpretation state or not?

This is obvious that for some words I can execute this xt in compilation state and that performs the observable interpretation semantics. But not for words test1 and test2 defined above, as I can see.

If I should set the interpretation state in the general case, what does this follow from? Shouldn't this be clear from the section 15.6.2.1909.20 (along with other sections)?


[r1348] 2024-10-04 13:04:14 ruv replies:

requestClarification - Behavior of EMPTY-BUFFERS when BLK is nonzero

Perhaps the section 7.3.2 Block buffer regions removes the guarantees that 3.3.3.5 provides. It says: “If the input source is a block, these restrictions also apply to the address returned by SOURCE”.

The word LOAD is not mentioned among the restrictions. So, it's unclear whether use of LOAD invalidates the address that was returned by SOURCE.


[r1349] 2024-10-07 14:49:31 ruv replies:

requestClarification - Behavior of EMPTY-BUFFERS when BLK is nonzero

it's unclear whether use of LOAD invalidates the address that was returned by SOURCE

Just for reference: this was discussed in 2019 regarding the addresses returned by BLOCK and BUFFER.


[r1350] 2024-10-07 15:09:31 ruv replies:

comment - Exception word set is not optional any more

For reference, I collect the cases where ambiguous condition can be eliminated in the issue "Eliminating of some ambiguous conditions" on ForthHub/standard-evolution.


[r1351] 2024-10-08 08:43:25 ruv replies:

proposal - minimalistic core API for recognizers

Deferred words in API considered harmful

make it mandatory that FORTH-RECOGNIZE actually is a deferred word

As we have discussed, the main problem with a deferred word is that it can't be redefined by wrappers that have additional actions when setting or getting the value. In this respect, such a word in an API is as bad as an address-flavoured variable (like BASE).

There is also a recent discussion in comp.lang.forth (link) under subjects "value-flavoured approach" and "value-flavoured structures".

Special data object on failure considered harmful

A question is what to return on failure (unsuccess): a special data object (xt of notfound) or a common data object 0 (zero).

Below is a copy of my rationale from 2023, with some rewording.

There are two strong arguments against a special data object:

  • consistency with other similar words;
  • impact on the overall lexical size of programs.

Consistency

Many standard words returns some data object on success, or 0 (zero) on unsuccess/failure. This is possible because this data object cannot be 0.

For example:

  • name>interpret ( nt -- xt | 0 )
  • find-name ( sd.name -- nt | 0 )
  • find-name-in ( sd.name wid -- nt | 0 )
  • find ( c-addr -- xt n | c-addr 0 )
  • search-wordlist ( sd.name -- xt n | 0 )
  • source-id ( -- fileid | -1 | 0 ) — not a fail, but also an example when zero was chosen instead of a special object.

Also, it is a common approach in practice. This allows common high-order functions operates on the common failure result 0.

Why should not recognizers follow this practice? Why should they return a special id on failure rather than zero?

Lexical code size

Returning notfound on failure makes the code shorter (in terms of lexemes) in some places. But the point is that it makes code longer in more places.

I checked the source codes in Gforth (as of 2023-09-17), which include both the implementation and usage of a Recognizer API. In its code:

  • ['] notfound with = or <> is used 10 times, and without checking — 32 times.
  • forth-recognize execute is used 3 times.

If we use 0 (zero) instead of the notfound xt, then:

  • ['] notfound <> is removed 5 times, which eliminates 15 lexemes;
  • ['] notfound = is replaced with 0<> 5 times, which eliminates 10 lexemes;
  • ['] notfound is replaced with 0 32 times, which eliminates 32 lexemes;
  • the definition for notfound is removed, a definition for ?found is added: : ?found ( x.some\0 -- x.some | 0 -- never ) dup 0= -13 and throw ;, which adds not more than +3 lexemes;
  • forth-recognize execute is replaced with forth-recognize ?found execute 3 times, which adds +3 lexemes;
  • the word ?found can be also used after find, search-wordlist, find-name, find-name-in — when the user needs to execute their result at once, and unsuccess should produce an exception.

Thus, replacing of notfound by zero reduces the overall lexical code size in Gforth by more than 51 lexemes, which is more than 0.4KiB in absolute size (as on 2023-09-17).

So why should we prefer an approach that increases the overall lexical size of programs?


[r1352] 2024-10-31 18:45:59 AntonErtl replies:

proposal - minimalistic core API for recognizers

About the proposal text

The "Problem" section does not describe a problem of Forth-2012 that the proposal wants to solve, but considers a problem with some other recognizer proposal. Similarly, the "Solution" section refers to some other recognizer proposal. This makes these sections useless for readers who have not first read up on the other proposal, which is not even linked here. Parts of the "Solution" section might be useful in another section on transitioning from the earlier proposal.

Instead, the "Problem" and "Solution" sections should describe what benefits this proposal adds to the standard, and how. A possible "Discussion" section and its subsections should describe the benefits of the present approach over possible alternative approaches (if that's too detailed, lazy system implementors will complain about the length of the proposal, but some complaints should just be ignored).

"Typical use" should of course be presented.

State-dependence

The proposal in its present form is unacceptable to me because it defines a defining word TRANSLATE: for state-dependent words, and expects recognizers to produce the xt of state-dependent words. This makes the translators hard to use anywhere except in INTERPRET; the proposed-for-standard interface is even hard (actually impossible with standard means) to use in POSTPONE, which is an intended user of translators, as the proposal admits itself:

POSTPONE can do that without a standardized way

Another problem with the state-dependent translators is that it leads to either handwaving specifications of what they do, as evidenced in XY.3.1:

TRANSLATE-THING ( jx ix -- k*x )

A translator xt that interprets, compiles or postpones the action of the thing according to what the state the system is in.

in the non-specification of what translator-xt does in FORTH-RECOGNIZE and the handwaving specification of "name:" in TRANSLATE:

"name:" ( jx ix -- k*x ) performs xt-int in interpretation, xt-comp in compilation and xt-post in postpone state using a system-specific way to determine the current mode.

and the nonspecification of what TRANSLATE-NT, TRANSLATE-NUM, TRANSLATE-DNUM, TRANSLATE-FLOAT, and TRANSLATE-STRING do.

Or if you specify exactly what happens, it leads to lengthy texts that explain the state-dependence, and the three different cases. And you cannot even specify when xt-post is performed, because there is no "postpone state" in the standard. On the contrary the current document specifies that STATE is either 0 (interpretation state) or non-zero (compilation state), without any values left for a postpone state, and specifies only words for getting into interpretation state and compilation state, not postpone state.

If you really believe that the state-dependent approach is a good idea, please specify all these words exactly; the editor won't do it for you.

Opaque solution

If there is no need to make POSTPONE implementable in a standardized way, there is no need to make INTERPRET (which is not even standardized) implementable in a standardized way, either, and the translators can become a completely opaque thing that the standard does not document. In that case there is also no need for the translators to actually be executable. The recognizer could return an opaque translation token, and standard programs can only use that for implementing recognizers, but not for implementing text interpreters, POSTPONE, or anything else.

Transparent solutions

Alternatively, we might heed "Don't bury your tools!" and have a more useful interface for translators, like what we have seen in earlier drafts and other recognizer proposals.

POSTPONE

If the idea of the proposal is that xt-post is actually used by POSTPONE, the proposal should specify the change to POSTPONE.

Standardize recognizers

I expect that more people will want to compose existing recognizers into recognizer sequences than to define new recognizers, but they usually need to know about existing recognizers in order to do that. Therefore the proposal (or an accompanying proposal) should not just propose standard translators, but also standard recognizers.


[r1353] 2024-11-02 07:27:39 AntonErtl replies:

example - Bad test case

MSB is the value where only the most significant bit is set:

0 INVERT CONSTANT 1S
1S 1 RSHIFT INVERT CONSTANT MSB
msb h. \ output on a 64-bit system: $8000000000000000  ok
msb 2* . \ output: 0  ok

For Forth-2012 one might argue that this case is an overflow of an arithmetic operation and therefore the result should be implementation-defined (so such a test case should not exist), but 2* is specified as a shift (i.e., a bitwise operation), and for a shift the most significant bit should clearly be shifted out by 2*.


[r1354] 2024-11-02 13:18:51 BerndPaysan replies:

proposal - New words: latest-name and latest-name-in

One reason to have LATESTNT/LATESTXT in Gforth is that it works even when you switch the current wordlist in between, and that you don't even need to have a definition header (i.e. it also works on :NONAME definitions). The state of them is tied to what RECURSE compiles — RECURSE is an alias to the latest definition.

There's another extension that makes this important: the NONAME word which renders the next definition unnamed, no matter how it is created. This allows to define unnamed definitions that are not colon definitions. To access their xt, you need LATESTXT. I suggest removing the relation from a wordlist, it is an implementation detail that doesn't really help.


[r1355] 2024-11-02 18:54:41 ruv replies:

proposal - New words: latest-name and latest-name-in

One reason to have LATESTNT/LATESTXT in Gforth is that it works even when you switch the current wordlist in between, and that you don't even need to have a definition header (i.e. it also works on :NONAME definitions). The state of them is tied to what RECURSE compiles — RECURSE is an alias to the latest definition.

  1. LATESTNT and LATESTXT are Gforth-specific words and may continue to be used by the Gforth core.

  2. LATEST-NAME cannot replace these words, because LATEST-NAME give you nt of a named and completed definition only (i.e., not a definition for which compilation has been started and not yet finished, and not an anonymous definition).


I see a drawback to LATESTXT in that it may return the xt of a definition that is not a current definition. That is, it does not provide useful information about whether a current definition exists. I would like to propose standardizing a word that returns either xt of the current definition or zero (if there is no current definition).

Formal definitios (draft):

GERM ( -- xt|0 )
If the current definition exists, return its xt, otherwise return zero. The returned xt may be processed by COMPILE,. The returned xt shall not be executed (directly or indirectly) while compilation of the corresponding definition is not ended.

The current definition: the definition whose compilation has been started most recently and not yet ended.

End of the formal definitions.

So, GERM returns a correct value even when a quotation is compiled. Using this word, RECURSE can be defined as following:

: recurse  germ dup if compile, exit then  true abort" there is no current definition" ; immediate

(NB: we should introduce a throw code for such a case)

Also, I would like to clarify DOES> to avoid ambiguity concerning what is the current definition after the compilation semantics of DOES> are performed.

For example, replace in 6.1.1250:

Consume colon-sys1 and produce colon-sys2.

With:

End compilation of the current definition, consuming colon-sys1. Start compilation of the new definition, producing colon-sys2.


[r1356] 2024-11-02 19:39:02 ruv replies:

proposal - New words: latest-name and latest-name-in

There's another extension that makes this important: the NONAME word which renders the next definition unnamed, no matter how it is created. This allows to define unnamed definitions that are not colon definitions. To access their xt, you need LATESTXT. I suggest removing the relation from a wordlist, it is an implementation detail that doesn't really help.

Why not continue using LATESTXT to access xt of these words?

Anyway, LATESTXT cannot be used instead of the proposed LATEST-NAME.

Usage example for LATEST-NAME (from my post in comp.lang.forth):

: vocabulary>wordlist ( xt.vocabulary -- wid )
  also execute  get-order swap >r 1- set-order r>
;

: exch-current ( wid -- wid )
  get-current swap set-current
;

wordlist constant (labels-for-wordlists)

: wordlist-labeled ( sd.label -- wid )
  (labels-for-wordlists) exch-current >r
  ['] vocabulary execute-parsing
  latest-name name>interpret ( xt.vocabulary )
  vocabulary>wordlist ( wid )
  r> exch-current drop
;

The word wordlist-labeled allows us to create a wordlist with a label (a string without spaces), and this label is indicated by order for this wordlist. For the user of wordlist-labeled it is unexpected, if latest-name returns different values before and after execution of wordlist-labeled.


[r1357] 2024-11-02 21:50:25 BerndPaysan replies:

proposal - New words: latest-name and latest-name-in

The question is: what do you want to solve? The usual problem you have is: you created a word, and want to get a handle to it. One problem here is that while it is still defined, you can't even search for it. Another problem is that you may never search for it, because it doesn't even have a name. And you want to access the last definition even after it was completed, so there's no current definition any longer.

The traditional implementation of the latest name/xt was to go into the current wordlist, take the linked list, and return the last element thereof. That's why the standard says there's an ambiguous condition if you change the current wordlist between : and ;. Don't implement it that way, and that problem goes away.

There's another traditional implementation how to hide the current definition: The smudge bit. The smudge bit is unimplementable in flash-based systems. Gforth doesn't have a smudge bit, it appends the current definition to the corresponding wordlist at ; and not before. So if you use LATEST-NAME-IN in Gforth by looking into the current wordlist, you'd never get the current definition: It`s not there yet. It only gets there once it stops being the current definition.

This proposal standardizes on a traditional implementation which never was a very good idea. The definition you want to access is the one you just defined (it may be incomplete or complete), and that's the last one in time. It always has an xt, it might not have an nt (and if it doesn't have one, that should be 0). Doing all the changes for the new header in Gforth was a bit complicated, because such a transition is fragile, and I think it requires another round of review before 1.0 is ready, especially the naming LATESTNT and LATEST, which both return an unified nt/xt data type, but LATEST returns 0 if the last definition really has no name. It is easy to tell if the current definition is incomplete or completed; that could be helpful. It's also possible to tell which wordlist an incomplete definition is going into when it is completed (it remembers the current wordlist at : and will go there at ;).

So what we could implement is a LATEST-NAME-IN which returns the nt of the latest definition if the wid matches the current incomplete definition's wordlist. If that definition is completed, there will be no non-zero LATEST-NAME-IN anymore. I can't tell you how often we need that functionality, but it looks like that's not quite the right one. The majority of use is after a definition is completed.

The first thing needed here is to really figure out what people actually want: The last element of a wordlist, the last incomplete definition of a wordlist (i.e. the element with the smudge bit set), or the last definition in time, regardless if it is completed or not, and what wordlist it goes into once completed.


[r1358] 2024-11-02 22:37:33 ruv replies:

proposal - minimalistic core API for recognizers

@AntonErtl writes:

The proposal in its present form is unacceptable to me because it defines a defining word TRANSLATE: for state-dependent words, and expects recognizers to produce the xt of state-dependent words.

I do not like TRANSLATE: either, but for a different reason. Sometimes it is very convenient to define a translator as a quotation (right inside the recognizer), and if you are forced to define a translator only with TRANSLATE:, you cannot define it as a quotation.

This makes the translators hard to use anywhere except in INTERPRET;

Could you provide some examples, please? It seems, this is not harder than performing the observable interpretation semantics using the result of name>interpret.