Digest #256 2024-02-19
Contributions
requestClarification - The Standard page does not show Chapters 1-5
Hello,
I am using the latest Chrome on Windows. The Standard page lists "List of chapter 1-5," but does not show links to chapters 1-5. It may help beginners list the chapters with a little explanation for each grouping, i.e. chapters 1-5 contains the base standard, chapters 7-18 contain the optional word sets than can be implemented by a Forth, etc.
Kind Regards, Josef
Replies
If recognizers are ever standardized, they provide a way for user-defining the recognizing part of the text interpreter. However, at least with the current proposals, the parsing is done outside the recognizers (i.e., by the system), and this is good design. WRT "clarify find", given the lack of consensus we have seen in that proposal, my guess is that even with recognizers there will be no consensus that users should be able to use find
for the general dictionary-search recognizer.
If a user uses a user-defined text interpreter is used on some string, and uses word
in that text interpreter, they should be aware that this text interpreter clobbers the word
buffer; whoever writes this text interpreter should document this property, but that is not something that the standard needs to say anything about.
If there is ever a standard way to plug the parsing part of a user-defined text interpreter into the system, that plugging is again under the program's control. So the program's author should be aware of whether that text interpreter clobbers the word
buffer, and write the rest of the program to cope with that. So even if we had such a standard feature, it would not be directly relevant to the question at hand. And given that we don't, it's certainly not relevant.
In iForth64 thefollowing happens:
FORTH> : x]] postpone ]] ; immediate ok FORTH> : to, [ 123 . x]] to [[ 456 . ] ; 123 456 ok [2]FORTH> .s Data: 554736352 19651504 --- System: --- Float: --- ok
-marcel
requestClarification - An ambiguous condition in LSHIFT
In fact there is some variation among architectures. x86 use the least significant n bits
requestClarification - An ambiguous condition in LSHIFT
(Oops sorry clicked too early)
x86 use the least significant n bits, except the original 8086 and 8088 when the shift count is in the CL register, in which case all 8 bits would be used (this was how the 80186 was distinguished from the 8086). The use of just n bits is common enough among modern archs, but 32-bit ARM (e.g. in ARM Cortex M0+ or M4 microcontrollers) is an exception, since it uses the low eight bits of the operand (but still ignores the higher bits). Enforcing a result of zero when the count is higher than the register width is certainly doable but requires some extra specific instructions, so it is certainly not done in general.
Hello, I noticed this proposal has been accepted for well over 3 years now, and yet the relevant section has not been added. Is this a mistake?
The document has not been updated since 2019. I have become the editor some time ago, but I have not gotten around to it.
Moreover, if you mean this website, it reflects the contents of the Forth-2012 document. When the next release (rather than draft) of the standard document happens, this will be updated. There are also ideas of having a way to let it reflect the latest draft, but for now the process from the document to this website requires too much work to make this realistic.
proposal - S( "Request for Discussion" (revised 2018-08-16)
S( "Request for Discussion"
Change History
2018-08-16 Improve with additional explanation, rewording
2018-07-09 First version
Problem
There have been extensive discussions about correct implementations (cf. [Ertl98], [Pelc17], [clf18]) of Forth words that have both
- an interpretation semantics and
- a compilation semantics that is different from adding the word's execution semantics to the current definition (sometimes called non-default compilation semantics, NDCS).
The Forth-94 word S"
is an example for this as it has a defined compilation semantics
(and an explicitly undefined interpretation semantics) in the CORE word set
and a defined interpretation semantics in the FILE word set. (cf. [Forth94])
Words like this can have the so called copy&paste property: Program phrases that have been tested in interpretation mode can be copied unchanged into definitions and continue to work there in a seemingly identical way. This is attractive to quite some developers.
The desire to have words with diverging compilation semantics and interpretation semantics lead many system implementors to state-smart immediate words (which inspect the variable STATE to distinguish between compilation and interpretation) but these have the drawback to fail unexpectedly in corner cases [Ertl98].
To identify whether or not a word is state-smart typically requires studying its implementation or documentation. Problems arise when these state-smart words are used as buildung blocks for more sophisticated words, e.g. by means of POSTPONE-ing them. The distinction between interpret and compile state is then also postponed and happens at inapropriate times.
There are ways to structure the outer interpreter so that words with non-default compilation semantics can be implemented without failures in corner cases. Systems such as gforth development version and VFX version 5 deal with this issue (cf. [Pelc17], [clf18]).
Looking at the Forth-94 standard only few words actually have non default compilation semantics, namely the already mentioned S" and TO. Forth-2012 adds others.
In general Forth-94 and Forth-2012 define execution semantics for normal words. When the word is interpreted this execution semantics is performed, when it is compiled its execution semantics is appended to the current definition (the standard compilation semantics). For most compiling words the standards explicitly leave the interpretation semantics undefined and define only a compilation semantics. This compilation semantics might include appending an additionally defined runtime semantics to the current definition. Colloquially these words are called compile-only. The standards also explicitly define immediate words: Words that have identical interpretation and compilation semantics: both perform the word's execution semantics. Finally there are special words with non-default compilation semantics (NDCS) where interpretation semantics and compilation semantics diverge.
The following table summarizes this situation:
word kind | interpretation semantics | compilation semantics | example | comment |
---|---|---|---|---|
normal | perform execution semantics | compile execution semantics | DUP | normal :-definitions |
immediate | perform execution semantics | perform execution semantics | .( | IMMEDIATE definitions |
compile-only | undefined | perform execution semantics | IF | interpretation semantics undefined |
NDCS | perform execution semantics for interpretation | perform execution semantics for compilation | S" | divergent interpretation, compilation |
S" and TO seem to be very special: For other words such as '
(tick) or CHAR
, or .(
(dot-paren)
the standards define similar compilation words namely [']
(bracket-tick), [CHAR]
(bracket-char), or ."
(dot-quote)
for use in compiling mode inside definitions.
Some argue that complicating the outer interpreter or dictionary design is not beneficial especially
in memory restricted small system and either live with the failures of state-smart words
in corner cases or avoid implementing an interpretive S"
to stay standard compliant.
Solution
For small systems it might be reasonable and simpler to not implement an S"
with non default compilation semantics
as defined in the FILE word set but to define two diffently named words
that captures the compilation semantics and the interpretation semantics of FILE S"
respectively.
Possible naming choices could be
S"
(s-quote) for interpretation and[S"]
(bracket-s-quote-bracket) for compilation
used in the formS" hello"
and: xxx ... [S"] hello" ... ;
orS"
(s-quote) for interpretation and
[S"
(bracket-s-quote) for compilation
used in the formS" hello"
and: xxx ... [S" hello"] ... ;
neither of which is really appealing.
Instead --- similar to ."
for string output in definitions and. .(
for
string output outside definitions --- we propose to keep CORE S"
with its behaviour and to standardize
S(
with interpretation semantics of FILE S"
but using )
(right parenthesis) as delimiter.
Proposal
Add the word S(
to the CORE Extension Word Set (CORE EXT):
S(
"s-paren" CORE EXT
Interpretation: Perform the execution semantics given below.
Compilation: Perform the execution semantics given below.
Execution:
( "ccc\<paren\>" -- c-addr u )
Parse ccc delimited by ) (right parenthesis). Store the resulting string in a transient buffer c-addr u.S(
is an immediate word.
See: Parsing, S"
Typical Use
Typical use of S(
would be interactively when temporary strings are required
for example for use with INCLUDED
:
S( s-paren.fs) INCLUDED
Remarks
As S(
is proposed to be standardized in the CORE extension word set no standard
system is required to provide S(
. However if a system chooses to implement the S"
compilation
and interpretation semantics with two separately named words, it could choose the standard name
S"
and the (not yet standardized) name S(
for this.
Defining the interpretation semantics explicitly in the glossary entry above is not strictly necessary as both Forth-94 and Forth-2012 state:
Unless otherwise specified in an “Interpretation:” section of the glossary entry, the interpretation semantics of a Forth definition are its execution semantics.
Reference implementation
With only a single string buffer, a minimal S(
-implementation could
look like this:
CREATE buf DECIMAL 80 CHARS ALLOT
: S( ( "ccc\<paren\>"-- c-addr len )
[CHAR] ) PARSE 80 MIN >R buf R@ MOVE buf R> ;
A more elabortated implementation using mutliple string buffers in a circular fashion is:
DECIMAL 80 CHARS CONSTANT |buf|
4 CONSTANT #bufs
CREATE bufs |buf| #bufs * ALLOT
VARIABLE buf# 0 buf# !
: buf ( -- c-addr )
bufs buf# @ |buf| * + ;
: bump ( -- )
buf# @ 1+ #bufs MOD buf# ! ;
: str ( char "ccc\<char\>" -- c-addr u )
bump buf SWAP PARSE |buf| MIN >R OVER R@ MOVE R> ;
: S( ( "ccc\<paren\>"-- c-addr len )
[CHAR] ) str ;
Testing
The following tests assure that S(
pushes the desired c-addr u
CREATE s 3 c, CHAR a c, CHAR b c, CHAR c c,
t{ 99 S( abc) SWAP DROP -> 99 3 }t
t{ 99 S( abc) s COUNT COMPARE -> 99 0 }t
Experience
S(
is not yet defined in any of the contemporary systems such as
- gForth
- VFX
- PFE
- DXForth
- FLT
- SwiftForth
- Win32Forth
- noForth
- amForth
- camelForth
- ciForth
- mecrisp
so it has no common use. However the name S(
seems to be available
in all these systems.
Discussion
The proposal avoids issues with the NDCS word S"
by providing S(
as an alternative notation for an interpretive S"
.
It is intended for ressource restricted standard systems that want to support interpretive strings but which
are not able to provide the FILE word set S"
.
S(
is very simple to implement so this proposal is rather about standarizing the name S(
with the intended
functionality than a sophisticated feature.
One can argue to remove S"
from the FILE word set, however this is not proposed here. Forth systems that provide
the FILE word set are hopefully capable of providing a complete and correct S"
implementation.
References
[Forth-94]: "American National Standard for Information Systems — Programming Languages — Forth", ANSI X3.215-1994
[clf18]: discussion about special words in comp.lang.forth, https://groups.google.com/forum/#!topic/comp.lang.forth/Gb9Hvj3Wm_Y%5B1-25%5D
[Ertl98]: "State-smartness - Why it is Evil and How to Exorcise it", Anton Ertl, euroForth 1998
[Pelc17]: "Special Words in Forth", Stephen Pelc, euroForth 2017
Author
Ulrich Hoffmann uho@xlerb.de
Thanks @AlexDyachenko,
your proposal has been discussed at the Standards Meeting in Rome in September 2023.
CATCH and THROW only deals with stack handling, no other part of the system state is saved nor restored. All other parts - like STATE, the values of local variable, file-open states, BASE, etc. - are considered to be application specific and thus should be handled by the application program.
In case you want to preserve STATE a construct like
: STATE-CATCH ( i * x xt -- j * x 0 | i * x n )
STATE @ >R CATCH R> IF ] ELSE POSTPONE [ THEN ;
Would preserve STATE over the execution of XT.
Anton Ertl discussed the committee's opinion thoroughly in the proposal's discussion.
This propsal will be put into state considered
. It could be be revived if further discussion should be required.
Regards, Ulrich Hoffmann
requestClarification - "... the remainder of the current line."?
Hello @JimPeterson,
your contribution has been discussed in the interim standard's meeting in February 2023.
Source code stored in blocks does not contain any newline or carriage-return characters relevant for the line structure. The line structure is completely imposed by the fixed character count in each line, typically 64 characters.
the remainder of the current line." actually means. Does it mean to imply that \ will skip >IN to the next multiple of 64, or would CR characters terminate a line?
When compiling from blocks (BLK is zero) \
is supposed to skip to multiples of the fixed line length, typically 64 character. No CRs or other control characters are to be considered.
This request for clarification will be put to state closed
. It could be revived at any time if further discussion should be required.