Digest #141 2021-04-16

Contributions

[186] 2021-04-14 17:41:19 TG9541 wrote:

requestClarification - Does the standard assume that DEFER was created with CREATE?

The Implementation part of DEFER@ is as follows:

Implementation:
: DEFER@ ( xt1 -- xt2 )
   >BODY @ ; 

This is surprising as >BODY states the following:

>BODY
( xt -- a-addr )

a-addr is the data-field address corresponding to xt. An ambiguous condition exists if xt is not for a word defined via CREATE. 

This question is relevant for STM8 eForth, an STC system, since >BODY wouldn't work with anything created by DEFER.


[187] 2021-04-15 08:17:45 AntonErtl wrote:

proposal - Reference implementations are not normative

Author:

M. Anton Ertl

Change Log:

2021-04-15 First version

Problem:

The Introduction E.1 of the reference implementations creates the impression that reference implementations are normative. IIRC that is not what we decided. Moreover, it would be a bad idea, because implementations by necessity overspecify the behaviour. E.g., the reference implementation of DEFER uses CREATE, but that does not mean that every implementation of DEFER has to use CREATE; the reference implementation of DEFER@ is to perform >BODY @, but that does not mean that a standard program can just use >BODY @ instead of DEFER@.

A related problem is that proposals contain reference implementations of several words that are intended to work together (e.g., the reference implementation of DEFER@ is designed to work with the reference implementation of DEFER), but they are split into individual words in Annex E. This should be explained in the introduction.

In the long run it is probably better to avoid this splitting, because it makes the reference implementations harder to understand; on forth-standard.org we should then probably show the full section (all the reference implementations of words that are designed together). The splitting has led to requests for clarification here (e.g., "Does the standard assume that DEFER was created with CREATE?"), so it obviously is not as clear as we would like.

Solution:

Change the wording of E.1 accordingly. This is just a wording change.

Proposal:

In E.1, replace

System implementors are free to implement any operation in a manner that suits their system, but it must exhibit the same behavior as the reference implementation given here.

with

These reference implementations are not normative. System implementors are free to implement an operation in any manner that conforms with the specification.

Some of these reference implementations are designed to work together, and rely on implementation details of the reference implementations of related words. They are best understood together.

Replies

[r629] 2021-04-05 02:08:59 MitraArdron replies:

proposal - EMIT and non-ASCII values

It would seem like a mistake to me to use 16 bit strings in Forth. Pretty much everything else seems to use UTF8, and it makes transition from existing Ascii MUCH easier, since the default case is to treat the string exactly as before.

As en example - for the javascript case in node, I use process.stdout.setEncoding('utf8'); to tell it that the output is UTF8, then TextDecoder().decode(this.buff8(byteStart, byteEnd - byteStart)) to turn Forth string to Javascript string then process.stdout.write(s) to output that string, this has the advantage of working just the same whether the string is old-style forth (1 byte per ascii character) or is UTF8.

If we used 16 bit strings, we'd need versions of all the string words - S", abort" etc for 16 bit rather than just fixing the output routines NOT to convert strings back to characters.


[r630] 2021-04-05 10:12:01 ruv replies:

proposal - EMIT and non-ASCII values

It seems, to support characters beyond pchar in a JavaScript-based Forth implementation, we need to introduce own buffer before output to ensure that only completed xchars are passed to JS. And it's regardless of UTF-8 or UTF-16 is used in Forth.

Obviously, for better performance type should not be implemented over emit. But type should also check for uncompleted characters since s\" \xC3" type s\" \xA4" type should be equivalent to $C3 emit $A4 emit. So emit can be implemented over type without noticeable performance loss.

The word -TRAILING-GARBAGE can be used to separate the completed part from the last uncompleted xchar.


[r631] 2021-04-05 11:10:42 ruv replies:

proposal - EMIT and non-ASCII values

Certainly, this buffering before output can be also implemented on the JS side. It should ensure that the last uncompleted xchar is not passed to decode(), but it's buffered to be concatenated with the next part.


[r632] 2021-04-06 06:23:08 AntonErtl replies:

requestClarification - Stack effect of LEAVE during compilation

Consider

: leave1 save-data-stack POSTPONE leave is-data-stack-same-as-saved ; immediate
: test ?do leave1 loop ;

So if your control-flow stack is on the data stack, a standard program can see if LEAVE changes control-flow stack items, so such a system would not conform to the standard. I don't see how to do such a check in a standard program for a separate control-flow stack, so you may be able to use such an approach there.

Alternative approaches: A classic technique (probably used by many systems) is to use the space for the target address of each leave branch to store a link to the previous LEAVE. Gforth stores more than fits there, so it uses a separate LEAVE stack.


[r633] 2021-04-06 11:54:14 ruv replies:

requestClarification - Stack effect of LEAVE during compilation

Yes, a program can see if LEAVE changes the control-flow stack items on they are on the data stack. But I don't sure that a standard program is allowed to rely on this change due to 3.1.5.1 System-compilation types:

These data types denote zero or more items on the control-flow stack (see 3.2.3.2). The possible presence of such items on the data stack means that any items already there shall be unavailable to a program until the control-flow-stack items are consumed.

Then your save-data-stack is allowed to access neither the data objects of system data types nor the items that were on the data stack before the system data types were placed on the control-flow stack.

It's obvious that when the compilation semantics for LEAVE are performed, the control flow stack should contain at least one do-sys: ( C: do-sys i*x ). Since LEAVE is connected with the innermost syntactically enclosing DO/?DOLOOP/+LOOP. And so it's connected with the topmost do-sys among several do-sys in the control-flow stack.

Then we can claim that in a standard system LEAVE compilation should have stack effect either ( C: do-sys i*x -- do-sys i*x ) or ( C: do-sys1 i*x -- do-sys2 i*x ).

By the current wording for LEAVE, it looks like if a system uses the second variant, then it should either use the separate control-flow stack, or the same size of do-sys1 and do-sys2. These limitations look irrelevant, superfluous, and actually they don't exist.

Another argument is that if the standard allows ( C: do-sys1 i*x -- do-sys2 i*x ) when the separate control flow stack is used, then it should allow the same when the control-flow stack is united with the data stack.


[r634] 2021-04-07 08:16:04 AntonErtl replies:

requestClarification - Stack effect of LEAVE during compilation

This sentence in 3.1.5.1 is somewhat self-contradictory. The "means that" indicates that it describes a consequence of an earlier normative statement, while the "shall" indicates that the sentence is intended to be normative itself. However, I don't see a reason why the Forth-94 committee should have made such a normative restriction, so I lean towards the interpretation that it is intended as a description of a consequence of the fact that the size of system-compilation types is not known to standard programs. Of course, you can lean towards the normative interpretation.

But even with the normative interpretation, I don't think you can keep the information about all LEAVEs on the data stack. Consider:

: test ?do [ depth ] leave leave leave leave [ depth - . ] loop ;

Compiling this has to print -1.


[r635] 2021-04-07 10:09:00 StephenPelc replies:

proposal - EMIT and non-ASCII values

I support the idea that <b><f>EMIT and <b><f>KEY use pchars. Note that pchars are already defined in the standard.

Many Forth systems support redirectable I/O. It is almost impossible to guarantee that all comms channels handle xchars. In particular, both TCP/IP and USB may have breaks in the middle of UTF-8 characters.


[r636] 2021-04-07 10:17:39 StephenPelc replies:

requestClarification - Stack effect of LEAVE during compilation

For use in a standard program is there any reason at all for <b><f>LEAVE to modify a stack at compile time? And if you do so, do you change any entitlements? I think that you do, so my answer is that the behaviour is not permitted.

For use in a non-standard program, we do not need to care.


[r637] 2021-04-07 12:53:15 ruv replies:

requestClarification - Stack effect of LEAVE during compilation

Well, it's arguable question

For use in a standard program is there any reason at all for LEAVE to modify a stack at compile time?

I talk not abut a program, but about a standard system.

The reason for LEAVE to modify the control-flow stack (in compatible manner) is to simplify the implementation. For example, Gforth uses a separate stack solely for LEAVE. If LEAVE is allowed to modify the control-flow stack then some implementations can be simpler since no need for a separate stack.

Well, I would suggest a proposal to have ( C: do-sys1 i*x -- do-sys2 i*x ) for LEAVE compilation. Do you think such a system can break some programs? (except artificial examples like Anton's example above).

Having this stack effect in the specification, a system is allowed to throw an exception if the control-flow stack doesn't contain do-sys during compilation of LEAVE. At the moment, we don't have such ambiguous condition explicitly, but having this stack effect a system can rely on the clause "An ambiguous condition exists if an incorrectly typed data object is encountered" (from 3.1 Data types).


[r638] 2021-04-08 11:25:35 AntonErtl replies:

requestClarification - Stack effect of LEAVE during compilation

I don't expect that there are production programs that would break on a system like you envision, but I could be wrong.

OTOH, the benefit to systems of such a change would be close to zero. All existing systems manage to implement LEAVE without this restriction. And if this restriction was standardized, we would not make use of it in Gforth, because it's simpler to implement a separate stack than to keep track of the latest do-sys in the data stack, make room for additional information by moving the closer-to-top stack iterms, and storing the leave information that.

LEAVE outside of a DO...LOOP is not a common problem, although I remember one user who intentionally did it because he thought that LEAVE would leave the dynamically enclosing DO...LOOP. One way to deal with that would be to let an unresolved LEAVE branch to an appropriate error throw (maybe "LEAVE unresolved (?DO ... LOOP missing)").


[r639] 2021-04-10 09:14:58 ruv replies:

requestClarification - Stack effect of LEAVE during compilation

it's simpler to implement a separate stack than to keep track of the latest do-sys in the data stack, make room for additional information by moving the closer-to-top stack iterms, and storing the leave information that.

Don't sure that a separate stack is simpler. The actual code is even less than your description:

: leave postpone 2r> cs-cnt n>r postpone ahead nr> drop ac+ ; immediate

And it's far less than the separate LEAVE-stack in cond.fs of Gforth.

A default branch for LEAVE — is an interesting solution. But it's a run-time error (and it leaks an item of the LEAVE-stack). Whereas a system could also raise a compilation-time error.


[r640] 2021-04-15 07:34:17 AntonErtl replies:

requestClarification - Does the standard assume that DEFER was created with CREATE?

This reference implementation is designed to work with the reference implementation of DEFER, which uses CREATE. Actually the reference implementations of these two words and a few more were originally proposed as a unit.

Implementations (including reference implementations) usually have additional properties beyond the specified and intended behaviour, and this is an example.


[r641] 2021-04-15 10:29:49 ruv replies:

requestClarification - Does the standard assume that DEFER was created with CREATE?

Does the standard assume that DEFER was created with CREATE?

A short answer: the standard does not require a system to implement DEFER via CREATE. A consequence is that a standard program cannot assume that >BODY is applied to xt of a word that is created with DEFER, or that DOES> can change its behavior.


[r642] 2021-04-15 12:47:32 ruv replies:

proposal - Reference implementations are not normative

Agree. Just for reference: another proposal also removes the part concerning "it must exhibit the same behavior".