Digest #141 2021-04-16
Implementation part of DEFER@ is as follows:
Implementation: : DEFER@ ( xt1 -- xt2 ) >BODY @ ;
This is surprising as >BODY states the following:
>BODY ( xt -- a-addr ) a-addr is the data-field address corresponding to xt. An ambiguous condition exists if xt is not for a word defined via CREATE.
This question is relevant for STM8 eForth, an STC system, since >BODY wouldn't work with anything created by DEFER.
M. Anton Ertl
2021-04-15 First version
The Introduction E.1 of the reference implementations creates the impression that reference implementations are normative. IIRC that is not what we decided. Moreover, it would be a bad idea, because implementations by necessity overspecify the behaviour. E.g., the reference implementation of DEFER uses CREATE, but that does not mean that every implementation of DEFER has to use CREATE; the reference implementation of DEFER@ is to perform >BODY @, but that does not mean that a standard program can just use >BODY @ instead of DEFER@.
A related problem is that proposals contain reference implementations of several words that are intended to work together (e.g., the reference implementation of DEFER@ is designed to work with the reference implementation of DEFER), but they are split into individual words in Annex E. This should be explained in the introduction.
In the long run it is probably better to avoid this splitting, because it makes the reference implementations harder to understand; on forth-standard.org we should then probably show the full section (all the reference implementations of words that are designed together). The splitting has led to requests for clarification here (e.g., "Does the standard assume that DEFER was created with CREATE?"), so it obviously is not as clear as we would like.
Change the wording of E.1 accordingly. This is just a wording change.
In E.1, replace
System implementors are free to implement any operation in a manner that suits their system, but it must exhibit the same behavior as the reference implementation given here.
These reference implementations are not normative. System implementors are free to implement an operation in any manner that conforms with the specification.
Some of these reference implementations are designed to work together, and rely on implementation details of the reference implementations of related words. They are best understood together.
It would seem like a mistake to me to use 16 bit strings in Forth. Pretty much everything else seems to use UTF8, and it makes transition from existing Ascii MUCH easier, since the default case is to treat the string exactly as before.
process.stdout.setEncoding('utf8'); to tell it that the output is UTF8, then
process.stdout.write(s) to output that string, this has the advantage of working just the same whether the string is old-style forth (1 byte per ascii character) or is UTF8.
If we used 16 bit strings, we'd need versions of all the string words - S", abort" etc for 16 bit rather than just fixing the output routines NOT to convert strings back to characters.
Obviously, for better performance
type should not be implemented over
type should also check for uncompleted characters since
s\" \xC3" type s\" \xA4" type should be equivalent to
$C3 emit $A4 emit. So
emit can be implemented over
type without noticeable performance loss.
-TRAILING-GARBAGE can be used to separate the completed part from the last uncompleted xchar.
Certainly, this buffering before output can be also implemented on the JS side. It should ensure that the last uncompleted xchar is not passed to
decode(), but it's buffered to be concatenated with the next part.
: leave1 save-data-stack POSTPONE leave is-data-stack-same-as-saved ; immediate : test ?do leave1 loop ;
So if your control-flow stack is on the data stack, a standard program can see if LEAVE changes control-flow stack items, so such a system would not conform to the standard. I don't see how to do such a check in a standard program for a separate control-flow stack, so you may be able to use such an approach there.
Alternative approaches: A classic technique (probably used by many systems) is to use the space for the target address of each leave branch to store a link to the previous LEAVE. Gforth stores more than fits there, so it uses a separate LEAVE stack.
Yes, a program can see if
LEAVE changes the control-flow stack items on they are on the data stack.
But I don't sure that a standard program is allowed to rely on this change due to 188.8.131.52 System-compilation types:
These data types denote zero or more items on the control-flow stack (see 184.108.40.206). The possible presence of such items on the data stack means that any items already there shall be unavailable to a program until the control-flow-stack items are consumed.
save-data-stack is allowed to access neither the data objects of system data types nor the items that were on the data stack before the system data types were placed on the control-flow stack.
It's obvious that when the compilation semantics for
LEAVE are performed, the control flow stack should contain at least one
( C: do-sys i*x ). Since
LEAVE is connected with the innermost syntactically enclosing
+LOOP. And so it's connected with the topmost
do-sys among several
do-sys in the control-flow stack.
Then we can claim that in a standard system LEAVE compilation should have stack effect either
( C: do-sys i*x -- do-sys i*x ) or
( C: do-sys1 i*x -- do-sys2 i*x ).
By the current wording for
LEAVE, it looks like if a system uses the second variant, then it should either use the separate control-flow stack, or the same size of
do-sys2. These limitations look irrelevant, superfluous, and actually they don't exist.
Another argument is that if the standard allows
( C: do-sys1 i*x -- do-sys2 i*x ) when the separate control flow stack is used, then it should allow the same when the control-flow stack is united with the data stack.
This sentence in 220.127.116.11 is somewhat self-contradictory. The "means that" indicates that it describes a consequence of an earlier normative statement, while the "shall" indicates that the sentence is intended to be normative itself. However, I don't see a reason why the Forth-94 committee should have made such a normative restriction, so I lean towards the interpretation that it is intended as a description of a consequence of the fact that the size of system-compilation types is not known to standard programs. Of course, you can lean towards the normative interpretation.
But even with the normative interpretation, I don't think you can keep the information about all LEAVEs on the data stack. Consider:
: test ?do [ depth ] leave leave leave leave [ depth - . ] loop ;
Compiling this has to print -1.
I support the idea that <b><f>EMIT and <b><f>KEY use pchars. Note that pchars are already defined in the standard.
Many Forth systems support redirectable I/O. It is almost impossible to guarantee that all comms channels handle xchars. In particular, both TCP/IP and USB may have breaks in the middle of UTF-8 characters.
For use in a standard program is there any reason at all for <b><f>LEAVE to modify a stack at compile time? And if you do so, do you change any entitlements? I think that you do, so my answer is that the behaviour is not permitted.
For use in a non-standard program, we do not need to care.
Well, it's arguable question
For use in a standard program is there any reason at all for
LEAVEto modify a stack at compile time?
I talk not abut a program, but about a standard system.
The reason for
LEAVE to modify the control-flow stack (in compatible manner) is to simplify the implementation. For example, Gforth uses a separate stack solely for
LEAVE is allowed to modify the control-flow stack then some implementations can be simpler since no need for a separate stack.
Well, I would suggest a proposal to have
( C: do-sys1 i*x -- do-sys2 i*x ) for LEAVE compilation. Do you think such a system can break some programs? (except artificial examples like Anton's example above).
Having this stack effect in the specification, a system is allowed to throw an exception if the control-flow stack doesn't contain do-sys during compilation of
LEAVE. At the moment, we don't have such ambiguous condition explicitly, but having this stack effect a system can rely on the clause "An ambiguous condition exists if an incorrectly typed data object is encountered" (from 3.1 Data types).
I don't expect that there are production programs that would break on a system like you envision, but I could be wrong.
OTOH, the benefit to systems of such a change would be close to zero. All existing systems manage to implement LEAVE without this restriction. And if this restriction was standardized, we would not make use of it in Gforth, because it's simpler to implement a separate stack than to keep track of the latest do-sys in the data stack, make room for additional information by moving the closer-to-top stack iterms, and storing the leave information that.
LEAVE outside of a DO...LOOP is not a common problem, although I remember one user who intentionally did it because he thought that LEAVE would leave the dynamically enclosing DO...LOOP. One way to deal with that would be to let an unresolved LEAVE branch to an appropriate error throw (maybe "LEAVE unresolved (?DO ... LOOP missing)").
it's simpler to implement a separate stack than to keep track of the latest do-sys in the data stack, make room for additional information by moving the closer-to-top stack iterms, and storing the leave information that.
Don't sure that a separate stack is simpler. The actual code is even less than your description:
: leave postpone 2r> cs-cnt n>r postpone ahead nr> drop ac+ ; immediate
And it's far less than the separate LEAVE-stack in cond.fs of Gforth.
A default branch for LEAVE — is an interesting solution. But it's a run-time error (and it leaks an item of the LEAVE-stack). Whereas a system could also raise a compilation-time error.
This reference implementation is designed to work with the reference implementation of DEFER, which uses CREATE. Actually the reference implementations of these two words and a few more were originally proposed as a unit.
Implementations (including reference implementations) usually have additional properties beyond the specified and intended behaviour, and this is an example.
Does the standard assume that
DEFERwas created with
A short answer: the standard does not require a system to implement
CREATE. A consequence is that a standard program cannot assume that
>BODY is applied to xt of a word that is created with
DEFER, or that
DOES> can change its behavior.
Agree. Just for reference: another proposal also removes the part concerning "it must exhibit the same behavior".