Digest #122 2020-10-29

Contributions

[163] 2020-10-29 00:28:43 ruv wrote:

proposal - Tick and undefined execution semantics

Author

Ruv

Change Log

Problem

Using of the word ' (Tick) is implicitly (and intrinsically) ambiguous in some cases, but this is not declared explicitly.

The word ' returns execution token that identifies execution semantics for the word in argument. Therefore, if execution semantics for a word are not defined by the standard, a program should not rely that Tick returns anything for this word, and that it returns control at all.

Solution

Declare an ambiguous condition for the case when Tick is applied to a word with undefined execution semantics.

For the most words, for which execution semantics are not defined, interpretation semantics are also not defined. And the standard already has the general rule that applying Tick to a word with undefined interpretation semantics is ambiguous. For the remaining words, for which interpretation semantics are defined, but execution semantics are not defined, the standard declares ambiguous condition individually for each such word. The only word that falls out of this line is the FILE S" word.

The right way is to remove individual workarounds and introduce the single general rule.

Actually, it would be enough to declare an ambiguous condition only in the case of undefined execution semantics, and remove the declaration of an ambiguous condition in the case of undefined interpretation semantics. But we cannot do it while the compilation semantics for some words with undefined interpretation semantics still defined via execution semantics (e.g. >R ).


Side note

The situation with these words can be made better in some of the following ways:

a. Rename Execution section into Run-time section, and add Compilation section into each of these words.

b. Rename Execution section into Run-time section, and change the default compilation semantics that they are to append run-time semantics, and default run-time semantics are to perform execution semantics.

c. Rename Execution section into Run-time section, and change the default compilation semantics that they are to append run-time semantics if they are defined, or to append execution semantics otherwise.


So, at the moment, the general rule is that it's ambiguous to apply Tick to a word with undefined execution semantics or undefined interpretation semantics.

After refactoring the specifications for the words mentioned above, the part concerning undefined interpretation semantics can be removed.

All above is applied to ['] BracketTick as well as for Tick.

Proposal

In the glossary entries 6.1.0070 ' (Tick) and 6.1.2510 ['] (BracketTick)

Remove the phrase:

An ambiguous condition exists if name is not found.

And add the following Note

Note

An ambiguous condition exists if name is not found, or execution semantics for name are undefined, or interpretation semantics for name are undefined.

In the glossary entries 6.2.2295 TO, 6.2.1725 IS, 6.2.0698 ACTION-OF,

replace the phrase:

An ambiguous condition exists if any of POSTPONE, [COMPILE], ' or ['] are applied to TO.

by the phrase:

An ambiguous condition exists if POSTPONE or [COMPILE] are applied to TO.

The rationale: since in the standard, the general ambiguous conditions are not repeated for each word in argument.

In the section 4.1.2 Ambiguous conditions

replace the phrase:

attempting to obtain the execution token, (e.g., with 6.1.0070 ', 6.1.1550 FIND, etc. of a definition with undefined interpretation semantics;

by the phrase:

attempting to obtain the execution token (e.g., with 6.1.0070 ', 6.1.1550 FIND, etc.) of a definition with undefined execution semantics or undefined interpretation semantics;

(the excessive comma and missing right parenthesis are also fixed).

Replies

[r544] 2020-09-17 05:34:07 MarcelHendrix replies:

requestClarification - Extending MARKER

As iForth supports FORGET and making executables, it has lots of support for saving 'structural' details of which MARKER is just one example. Of the top of my head: Each word has a FORGET and REVISION fields, and each word has the option of defining a FORGET> section with code that is executed when it is forgotten (like the DOES> construct). Words can be hooked into several lists ( chains ) that are walked through when certain events occur (like COLD). Included files can be marked as a revision that is automatically unhooked when the file is reloaded. When Forth is used as an application shell and is allowed to compile, one frequently wants to clean up the dictionary when the user wants to work on another file.


[r545] 2020-09-19 12:24:09 ruv replies:

proposal - minimalistic core API for recognizers

Using special setters and getters means you have another (special purpose) DEFER mechanism here.

Not necessary. It's up to an author/implementer. It can be just wrappers over standard DEFER, as I shown earlier. So it doesn't mean reinventing the wheel. The implementation details are just hidden.

So the arguments concerning implementation of DEFER mechanism say nothing against three separate words in the minimalistic API.

BTW, having translators for the basic data types, the words is and action-of can be even shorter:

: is  ' >body tt-lit ['] ! tt-xt ; immediate
: action-of  ' >body tt-lit ['] @ tt-xt ; immediate

Well, in any case I would agree that the arguments concerning complexity are more or less weak.

A strong argument (that wasn't yet commented) is about additional actions that a system needs to perform in the setter. What do you thing in this regard?


[r546] 2020-09-20 17:35:38 KrishnaMyneni replies:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

The arguments to MAKE-IEEE-DFLOAT have the same binary representation as the corresponding fields of the IEEE 754 binary specification for double precision float. Perhaps this wasn't clear from the description.

Only standard number recognition by the Forth system is needed for the arguments to MAKE-IEEE-DFLOAT. The output will go to a memory buffer instead of onto the floating point stack, per our previous discussion on c.l.f. You may write a recognizer for some equivalent representation such as the hexfloat format. MAKE-IEEE-(X)FLOAT can serve as a factor for writing such a recognizer.


[r547] 2020-09-21 13:23:29 AntonErtl replies:

proposal - Recognizer

A proof-of-concept dot-parser recognizer

There have been questions about writing a dot-parser as a recognizer, in particular wrt. POSTPONE and the way that works in the present proposal. Here I present a proof-of-concept of such a recognizer.

A proof-of-concept ClassVFX subset and its dot-parser

A dot-parser is used in the context of a way to control name spaces, as is typical for an object-oriented package. So first we implement a subset of ClassVFX (which has a dot-parser), without implementing the ClassVFX syntax exactly. Here's an example of it's usage:

type: point:
  int: x
  int: y
end-type

type: line:
  point: start
  point: end
end-type

instance: point: p1
instance: point: p2
instance: line: l1

This is essentially a structure package. The fields x and y are in a wordlist private to point:; likewise, start and end are private to line:. So how do we access these fields? We access them by writing point:.x and line:.start. The dot-parser is responsible for recognizing these "words".

In order to avoid having to write line:.start @ point:.x, you can combine these into line:.start.x (for arbitrarily long sequences).

In this proof-of-concept, the start and end fields in line: contain the address of a point: instance, not the point: itself. For the recognizer, this is the harder problem, because the dot-expression corresponds to a sequence like offset1 + @ offset2 + @ offset3 + (with the length of the sequence depending on the number of dots), while with the other variant a word equivalent to offset + would be sufficient.

You can find the complete implementation with examples (tested on gforth 0.7.9_20200917) in dot-parser.fs. Below you find the implementation of the recognizer.

Dot-parser implementation

The questions were about writing the dot-parser recognizer, so here I'll explain the code in more detail.

A central point is how to represent the dot-parsed "word". I represent it as a sequence of xts plus an integer sprecifying the number of xts in the sequence. E.g., line:.start.x is represented by the xts of start @ x (where start is in the line:-private wordlist, and x in the point:-private wordlist) followed by 3 (the count).

The recognizer itself has the stack effect

rec-dot-parser ( c-addr u -- xt1 ... xtn n rectype-dp | rectype-null )

For the rectype-dp we need three actions. The first is the action for interpreting the "word"; it just executes the xts, starting with xt1:

: dp-int ( ... xt1 .. xtn n -- ... )
    \ remove xt1 .. xtn n from the data stack, then execute xt1 .. xtn.
    dup if
    swap >r 1- recurse r> execute exit then
    drop ;

The action for compiling the "word" compiles these xts into the current definition:

: dp-comp ( xt1 .. xtn n -- )
    \ compile, xt1 .. xtn, in this order
    dup if
    swap >r 1- recurse r> compile, exit then
    drop ;

POSTPONE compiles dp-comp into the current definition (rather than executeing it). But because this compiled dp-comp needs the xts at a different time than when it is available from the recognizer, we need a literal-like action to get the xts from the recognizer time to the time when this dp-comp finally runs. This literal-like action is called the "postpone action" in the proposal. Anyway, its implementation is similar to that of dp-int and dp-comp:

: dp-lit1 ( x1 .. xn n -- )
    \ compile x1 .. xn as literals, in this order
    dup if
    swap >r 1- recurse r> postpone literal exit then
    drop ;

t{ :noname [ 2 3 4 3 dp-lit1 ] ; execute -> 2 3 4 }t

: dp-lit ( xt1 .. xtn n -- )
    \ compile xt1 .. xtn n as literals
    dup >r dp-lit1 r> postpone literal ;

Once all these actions exist, we can define rectype-dp:

' dp-int ' dp-comp ' dp-lit rectype: rectype-dp

The recognizer itself is a more complex piece of code, but not particularly important for understanding how postponeing a dot-parsed piece of code works, so I'll not explain it in detail:

: split ( c-addr1 u1 c-addr2 u2 -- c-addr3 u3 c-addr4 u4 true | c-addr1 u1 false )
    \ If c-addr2 u2 is found in c-addr1 u1, return true, and c-addr3
    \ u3 and caddr4 u4 are the parts to the left and the right of the
    \ found string.  If not return c-addr1 u1 and false
    2over 2>r dup >r search if
    over swap r> /string 2r> 2swap 2>r drop tuck - 2r> true
    else
    r> drop 2r> 2drop false
    then ;

: rec-dot-parser ( c-addr u -- xt1 ... xtn n rectype-dp | rectype-null )
    \ this leaves out the handling of a number of cases resulting in
    \ rectype-null in the interest of showing the successful case more clearly
    s" ." split 0= if
    2drop rectype-null exit then
    2swap find-name \ !! deal with not-found and not-\<typename\>
    name>interpret >body @ >r 1 -rot begin ( xt1 .. xtn n c-addr1 u1 r:wid )
    s" ." split while
        2swap r> find-name-in \ !! deal with not-found and not-\<fieldname\>
        name>interpret dup >body cell+ @ @ >r
        -rot 2>r ['] @ rot 2 + 2r>
    repeat
    r> find-name-in \ !! deal with not-found and not-\<fieldname\>
    name>interpret swap rectype-dp ;

Note that this recognizer does not properly handle the cases where the string contains a dot, but should not be recognized by the dot-parser (it's a proof-of-concept).

In gforth 0.7.9_20200917, this recognizer is searched last with

' rec-dot-parser get-recognizers 1+ set-recognizers

Usage examples

Using the definitions of point:, line:, their fields, and instances l1, p1, p2, you can do:

\ interpretive uses:
p1 l1 line:.start !
8  l1 line:.start.y !

\ compiled use:
: foo line:.start.y @ ;

\ postpone use:
: bar postpone line:.start.x ; immediate
: flip bar ;

Gforth's see decompiles foo, bar, and flip as:

: foo  
  start @ y @ ;
: bar  ['] start ['] @ ['] x 3 
  dp-comp ; immediate
: flip  
  start @ x ;

[r548] 2020-09-21 16:55:37 ruv replies:

proposal - Recognizer

Just for comparison, in the minimalistic API, instead of dp-int, dp-comp, dp-lit1, dp-lit and rectype-dp words (five words in total) we should define only one general purpose word tt-nxt:

: tt-nxt ( ... xt1 .. xtn n -- ... )
    \ remove xt1 .. xtn n from the data stack, then translate xt1 .. xtn one by one in this order.
    dup if
    swap >r 1- recurse r> tt-xt exit then
    drop ;

Changes in rec-dot-parser is that rectype-dp is replaced by ['] tt-nxt.


[r549] 2020-09-22 00:00:44 ruv replies:

proposal - Nestable Recognizer Sequences

Is it sufficient to replace the 'word-not-found' portion of the interpreter?

I think, no.

Some system may have word 'X. If a program have word X and a recognizer for '\<ccc\>, the phrase 'X in this program will be translated incorrectly when the recognizer doesn't precede "REC-NAME".

So, a program should have ability to override any system's recognizers.

Also, as I wrote before (news:news:rduhlf$hor$1@dont-email.me), it can be useful to reuse the system's interpreter loop, since otherwise too many words should be re-implemented in some cases.

Maybe all that is needed is the ability to add a recognizer to the current stack and leave it their until it is removed by MARKER or the stack is reset by QUIT

For libraries (independent modules), it's critical to have ability to revert the system's recognizer back.


[r550] 2020-09-22 09:44:37 JennyBrien replies:

proposal - Nestable Recognizer Sequences

Some system may have word 'X. If a program have word X and a recognizer for '\<ccc\>, the phrase 'X in this program will be translated incorrectly when the recognizer doesn't precede "REC-NAME".

This runs counter to the user's expectation that they can name a definition anything printable and have it recognized.

So, a program should have ability to override any system's recognizers.

Which may raise more theoretical questions such as whether or not FIND can find locals :)

For libraries (independent modules), it's critical to have ability to revert the system's recognizer back.

As I see it there are two possible kinds of module:

  1. Included modules that search the CURRENT wordlist and add their definitions to it. They do not alter recognizers.
  2. Required modules that create their definitions on their own wordlist and add it to the search order. They may also set recognizers.

That's something we need to discuss elsewhere.

The effects of a Required module are local to the module that Requires it.


[r551] 2020-09-22 16:01:25 PeterKnaggs replies:

proposal - 2020 Forth Standards meeting agenda

Draft minutes are now available on GitHub


[r552] 2020-09-22 17:05:31 ruv replies:

proposal - Nestable Recognizer Sequences

This runs counter to the user's expectation that they can name a definition anything printable and have it recognized.

Don't confuse a system (and it's user) and a program (and it's user). You talk about a system's user. I talk about a program (and perhaps a user of the program).

A standard system is not allowed to recognize '\<ccc\> before any word. At the same time, the system is allowed to provide the word 'FOO in the FORTH-WORDLIST.

A standard program is allowed to configure the Forth text interpreter to properly translate source codes of this program (or DSL from a user of this program).

So, the program may have the word FOO and may configure recognizer for '\<ccc\>. If this recognizer takes control after the recognizer for Forth word, 'FOO will be resolved incorrectly in the system that provides the word 'FOO. NB: the program knows nothing about 'FOO word since it's a standard program, and the standard doesn't specify such a word.

Actually, this problem existed before recognizers too. A system may provide a word FED. A program may use hexadecimal number FED (i.e. when BASE is 16). And this standard program will be translated incorrectly in a standard system that provides a word FED, but correctly in other systems.

I think, for words it can be solved (independently of recognizers) by a kind of declaration that a program requires the standard environment. As a variant, FORTH-WORDLIST shall contains only standard words, and SYSTEM-WORDLIST may contains all other words.


[r553] 2020-09-25 16:40:00 AntonErtl replies:

comment - Ambiguous conditions

The committee discussed and voted on whether to remove this ambiguous condition. The result of the vote was 6Y/4N/2A, which is not enough for a consensus, so the ambiguous condition is not being removed. The TRAVERSE-WORDLIST issue has been discussed and decided separately.


[r554] 2020-09-25 17:11:24 AntonErtl replies:

referenceImplementation - Reference implementation of SYNONYM

The committee decided (vote #6, 12Y:0:0) to delete the existing reference implementation and replace it with this rationale text:

The implementation of SYNONYM requires detailed knowledge of the host implementation, which is why it is standardized.


[r555] 2020-09-25 17:16:55 AntonErtl replies:

proposal - VOCABULARY

The committee felt there was no question that this is common practice, so it skipped the CfV part and went directly to committee vote. Vote #7: 12Y:0:0 Accepted


[r556] 2020-09-25 17:46:06 AntonErtl replies:

proposal - Licence to use reference implementations

Vote #19: 12Y:0:0 Accepted


[r557] 2020-09-25 17:57:53 AntonErtl replies:

comment - wording - "current region" term is undefined

The committee accepted the following wording change (vote #14 11Y:0N:1A):

Proposed wording change in REPLACES:

This breaks the contiguity of the current region and is not allowed during compilation of a colon definition

with

Therefore REPLACES cannot be performed during compilation of a colon definition or in the middle of a contiguous region.


[r558] 2020-09-25 18:05:08 AntonErtl replies:

proposal - Input values other than true and false

The committee accepted this proposal: Vote #16 12Y:0:0


[r559] 2020-09-25 18:13:57 AntonErtl replies:

proposal - Better wording for Colon

Instead of the proposed wording, the committee accepted the following wording change (Vote #13: 11Y:0N:1A):

Replace the first paragraph of 6.1.0450 : (colon)

Skip leading space delimiters. Parse name delimited by a space. Create a definition for name, called a "colon definition". Enter compilation state and start the current definition, producing colon-sys. Append the initiation semantics given below to the current definition.

with the following

Skip leading space delimiters. Parse name delimited by a space. Create a definition for name. Enter compilation state and start the current definition, producing colon-sys. Append the initiation semantics given below to the current definition.


[r560] 2020-09-25 18:37:37 AntonErtl replies:

proposal - NAME>INTERPRET wording

I have drafted Proposal: Reword the term "execution token (latest version) to address the execution token issue.

The interpretation semantics of a STATE-dependent immediate word is STATE-dependent. It is trivial for NAME>INTERPRET to return that. In a classic single-xt+immediate-flag system NAME>INTERPRET will never return 0. 0 allows systems with compile-only flags where the text interpreter produces an error when it encounters such a word in interpretation state (e.g., gforth-0.7).

Your proposed change makes no sense; on the usage side, it would mean that a text interpreter that uses FIND-NAME and NAME>INTERPRET is not guaranteed to work. On the implementation side, NAME>INTERPRET does not know whether a word is STATE-dependent, so it cannot return 0 for such words anyway.


[r561] 2020-09-26 16:16:33 AntonErtl replies:

proposal - Wording: declare undefined interpretation semantics for locals

The committee accepted the following wording change (Vote #17, 12Y:0:0)

This is true for (LOCAL) so we should add:

local Interpretation:

Interpretation semantics for this word are undefined.

LOCALS| refers to (LOCAL) so (LOCAL) covers the case.

For {: we need to add:

name Interpretation

The interpretation semantics of name are undefined

then remove the ambiguous condition in name Execution.


[r562] 2020-09-26 16:59:35 AntonErtl replies:

comment - Defer Implementation

Reference implementations are always just possible implementations. Forth-2012 is not an implementation standard. A potential for stack overflow does not rule this implementation out (otherwise pretty much every reference implementation in Forth would be ruled out).

There is the myth that the usual implementation uses a direct jump instruction, with IS changing the target address inside that jump. I have looked at several implementations, and I have yet to see this one.

System implementors are free to define throw codes in the range -4095..-256, including for this purpose. However, many implementations initialize deferred words with noop.


[r563] 2020-09-26 17:10:03 AntonErtl replies:

referenceImplementation - Please fix word spelling in F.1 second paragraph second word.

Thanks for the report. Will be fixed.


[r564] 2020-10-22 03:09:51 coconut replies:

proposal - F>R and FR> to support dynamically-scoped floating point variables

Which Forth compiler supports FLOCAL? I tried Gforth, SwiftForth, VFX Forth, and bigFORTH. None of them has FLOCAL defined. Would you please give me the definition of it?


[r565] 2020-10-29 00:48:42 ruv replies:

proposal - minimalistic core API for recognizers

One more strong argument against DEFER word in the API, and pro the different getter and setter is following.

Having DEFER in the API, we cannot define this API over another API at all. But having the different getter and setter (and "executer") — it's possible to defined this API over some other APIs.

Example: news:rn1csa$b02$1@dont-email.me