Digest #144 2021-04-20
Contributions
I didn't implement my DO
/LOOP
to store limits/counters on the return stack (limited system, no room there, the standard does not require it). If I were to THROW
up through a DO
/LOOP
(particularly a nested one), using the above specification for THROW
as my bare-bones attempt at implementing it, I know that "an ambiguous condition" would then most certainly exist.
This is my own fault, but shouldn't there at least be a note in the standard along the lines of "The system should implement an appropriate number of UNROLL
operations when throwing an exception up through DO
/LOOP
constructs."? For most systems, this is implied by adjusting the return stack, and systems implementors would just think "easy", but at least it would be explicitly stated.
Certainly, any similar user-made constructs have identical issues that are entirely the user's responsibility. Maybe this is an argument for making CATCH
/THROW
compatible with DEFER!
? How else would user code that acquires global resources properly interoperate in the presence of THROW
?
Replies
The most complete discussion of these topics is given by the documents written by Stephen Pelc (MPE) and Elizabeth Rather (Forth Inc) for the OTA project in the late 1990s. The documents are a result of a drive to source compatibility between the MPE and FI cross compilers. Most of the action took place on white boards in Waterloo and in pubs in Brussels. The documents are available at:
<code>https://www.mpeforth.com/resource-links/downloads/
Go to the <code>Cross Compilation section and download the documents <code>XCapp5.PDF and <code>XCtext5.PDF
The basis is that there are three memory types: CDATA, IDATA, UDATA (code, initialised RAM, uninitialised RAM) and that at least one SECTION of each type exists. The behaviour of CREATE and the memory access words may change according to the current section type.
The current MPE and FI cross compilers broadly follow this design. I have been reluctant to bring this to full standards level because
- There are cross compiler designs for which the notation may not be appropriate
- There are not that many serious cross compilers in use with a body of users beyond MPE and FI
- Leon and I can usually resolve our differences over coffee and wine.
If people are interested, I can arrange a virtual meeting. Note that Forth-200x meetings are public, and the use of real names is strongly encouraged.
If people are interested, I can arrange a virtual meeting for recognisers. They have been workshopped at various Forth Standards meetings but little of substance has emerged so far. I would suggest that such a meeting concentrate on finding what we can agree on.
Note that Forth-200x meetings are public, and the use of real names is strongly encouraged.
Thanks for raising this issue, I ran into the same problem with webForth, with the intent that a dictionary can be created, then Rommed, then, potentially after reboot, extended in Ram. The standard does appear to be ambiguous to me, it seems to implicitly assume that its in Ram, but with Ram being so limited on many processors, the last thing I want to do is copy my dictionary from Flash to Ram.
Here is how I solved it - and I'm not claiming its the best way .... note my implementation started with eForth before being made Forth2012 compliant (passes test suite) and Rommable. vCREATE uses vALLOT which explicitly creates in RAM CREATE uses "," which creates in wherever the dictionary is being built (typically initially in an area that will be flashed, and when booted from flash is in the ram) I keep a separate VP which is like CP but is set during pre-flash build to point at a separate (uninitialized) RAM area.
The code is in https://github.com/mitra42/webForth/blob/master/index.js if its useful to look at.
Words like VARIABLE & BUFFER: use vCREATE, or CREATE followed by vALLOT so their header is in ROM, but data space in RAM. This means VARIABLE's aren't initialized, but eForth also has USER variables which are initialized from flash to ram.
BODY has to be clever enough to know where to look.
There are words useRam, which move pointers around but are intended to be used just once, when running in Ram, building a rommable image, before switching to Ram for further work.
I'm in the process of adding useEeprom to define words that write to the non-volatile EEPROM area, along with eePromSave which writes the eeprom (initially just on ESP8266).
comment - Note incompatability (double v single) with some older Forth's.
Leon, Anton - Does eForth predate those standards? There seem to be a lot of systems based on it - probably because its one of the few (if not the only) one that is both simple, and relatively complete - I've happily converted webForth which was based on eForth but is now Forth2012 compliant, but this tripped me up as its one of only a couple of places where the same word is used in eForth and Forth2012 with different stack effects. (I was asked to write the opening contribution as a constructive place to document that difference here, so that it doesn't trip other people up).
comment - Note incompatability (double v single) with some older Forth's.
As far as I remember, eForth is later than most Forth standards, all of which use double-precision cells for the <# # #> and friends. Even on 64 bit systems, there is an argument for <# # #> and friends to use double-cell numbers, as this can make floating point conversion more accurate as the mantissa conversion can be arranged to be to 128 bit integers.
The design decisions of eForth and JonesForth should not influence the standard, nor should these be treated as model implementations of anything other than themselves.
ROMmed systems are out-of-scope for the existing standards. There have been discussions about standardizing cross-compilation for a long time, and I presume that includes ROMmable systems, but nothing even close to consensus has been reached (my impression as an outsider is that this is not because of fundamental differences, but because the motivation is insufficient; it seems to suffice only for "I do it this way; let's standardize that").
So Forth-2012 is based on the idea that data space is mutable. There is a difference between uninitialized (BUFFER:, VARIABLE) and initialized (ALLOT, ",") data, but that's all there is in Forth-2012.
The way all the dictionary words work assume that the dictionary is mutable.
In a classical cross-compiled system (not in the standard) there is first a cross-compilation phase where the dictionary is mutable, and then a run-time phase where parts of it are read-only.
In a compile-to-flash system (with byte granularity for writes) there are ways to write-once to the dictionary, or to tell the system that something should be in RAM.
There have also been discussions about declaring some memory regions read-only in order to allow optimizing read accesses to this memory.
There have also been discussions about what is preserved in an image (using SAVESYSTEM on some systems, or other non-standard mechanisms on others); the classical answer is that the dictionary is preserved and ALLOCATEd memory is not. I don't know if all systems preserve BUFFER: and VARIABLE contents, however.
I think that all these issues should be considered when proposing something for one of these issues (so ideally we can address the needs of several or all of them with one proposal), but I am not going to write that proposal.
comment - Note incompatability (double v single) with some older Forth's.
According to eforth.src, eForth is from 1990, based on bForth from 1990, a decade after Forth-79 standardized # as working on doubles (and all standards since Forth-79 have kept this # (with the only variation that Forth-83 let # work on +d, while Forth-79, Forth-94, and Forth-2012 let it work on ud).
Some examples
The examples to illustrate this version of the specification
Implementation approach: | single-xt | dual-nt (and then dual-xt) | dual-xt (but single-nt) |
---|---|---|---|
Comment: | Some words are immediate | The second nt is optional, and it's possible that it's associated with the same xt as the first | The second xt can be optional, and it can be the same as the first |
Result of FIND in the different state | |||
For «DUP» | |||
DUP interpretation |
( xt-exe -1 ) |
( xt-exe -1 ) |
( xt-exe -1 ) |
DUP compilation |
( xt-exe -1 ) |
( xt-exe -1 | xt-comp 1 ) |
( xt-exe -1 | xt-comp 1 ) |
For «[IF]» | |||
[IF] interpretation |
( xt-exe 1 ) |
( xt-exe 1 ) |
( xt-exe 1 ) |
[IF] compilation |
( xt-exe 1 ) |
( xt-exe 1 | xt-comp 1 ) |
( xt-exe 1 ) |
For «S"» | |||
S" interpretation |
( xt-exe 1 ) |
( xt-int -1 ) |
( xt-int -1 ) |
S" compilation |
( xt-exe 1 ) |
( xt-comp 1 ) |
( xt-comp 1 ) |
For «IF» | |||
IF interpretation |
( xt-exe 1 ) |
( xt-comp 1 | xt-int -1 | c-addr 0 ) |
( xt-comp 1 | xt-int -1 | c-addr 0 ) |
IF compilation |
( xt-exe 1 ) |
( xt-comp 1 ) |
( xt-comp 1 ) |
(Concerning meaning of xt-int, xt-comp see also my another comment)
Can such results be easy implemented, or should the n value in interpretation state be more loose?
In classical Forths and in VFX you can tick any word that has a name - provided that you have the search order correct. This is valuable for tools such as <b><f>LOCATE and <b><f>XREF. It is not an error to tick a word. There may be an error if the returned result is used incorrectly. In a dangerous language such as Forth, this is the programmer's problem.
With this usage, there are no ambiguous conditions when ticking a word. There may be ambiguous conditions when using the returned result. There are no ambiguous conditions in having a unique result.
I think that we just have to live with this - after all it's Forth.
There are no ambiguous conditions when ticking a word.
It's wrong. By the Forth-2012, ticking of TO
is ambiguous, as well many other words.
And the general rule is that an ambiguous condition exists when ticking a word that doesn't have default interpretation semantics. The only exception from this rule is S"
word. I suggest to eliminate this strange exception and introduce a general rule (I will prepare the next version of the proposal with better wording).
The result of ticking S"
(as well as the result or ticking TO
) cannot be used in compilation state correctly in any standard system. If it's enough to make ticking TO
ambiguous, why it isn't enough to make ticking S"
ambiguous?
The str notation was rejected in 1999 because of the potential confusion between caddr/len strings and counted strings.
Caddr/u for strings implies that the length is only bounded by the cell size, so the caddr/len notation was introduced to indicate that string lengths may be bounded by the Forth system. As yet, caddr/len has not been formally adopted.
If people are interested, I can arrange a virtual meeting for recognisers. ... concentrate on finding what we can agree on.
I like this idea.
If people are interested, I will prepare before the meeting a proof of concept — an implementation of Recognizer API v4, Nestable Recognizer Sequences, or some other over this API.
Perhaps, somebody could share his list of questions before the meeting. My list at GitHub.
I don't mean to extend this discussion further than it needs to go, but I'm confused by the original issue. Maybe I misunderstand what is meant by "dictionary" (defined in 2.1 as "An extensible structure that contains definitions and associated data space."), but I don't see any details in the definitions of ALLOT
, CREATE
, HERE
, or ,
that say anything about where compiled code or word lists ("dictionary"?) are stored relative to the data space. Therefore, I don't understand why the original complaint is "ALLOT assumes that the data space for mutable data and for compiled code is one and the same". My own system keeps the word list in a completely different section of memory than what is pointed to by HERE
, and I could have also easily put the executable code at yet another location (though I chose not to do so) without seeing any deviation from the standard or violation of the tests.
Certainly, if you do HERE 1 ALLOT HERE
, as shown in the test code, your stack would be left with ( x x+1 )? Subsequently setting constants 2NDA
to x+1 and 1STA
to x would allow for the given tests to succeed regardless of where compiled code goes?
While it is convenient to be able to make/compile the basic, ROM-ed portion of a system using ALLOT
, etc., isn't this rather easily done just by adjusting what HERE
points to before creating the ROM-able portion of the system and then switching it back before flashing the ROM (or having the boot process set it), when you're ready to make stuff in RAM? Either way, I see no assumption in the standard that claims compiled code must go anywhere near HERE
.
What's more, to say that VARIABLE carray 4 CELLS ALLOT
would "define a mutable array with 10 chars" seems to assume that the data space reserved for the cell that stores carray
is somehow guaranteed to be adjacent to HERE
, and therefore also adjacent to the cells reserved with the subsequent ALLOT
. The definition of VARIABLE
doesn't explicitly say that the one cell of data space reserved for storage actually came from the data-space pointer. Maybe that's the default/assumed behavior when the standard says that something "reserves one cell of data space", but that is quite ambiguous if it is the intent. My system will actually interject a small amount of code between the storage used for carray
and the four cells allocated with ALLOT
, such that it appears as though the intent of that line would fail. Is my system non-standard in that sense? I don't understand what portion of the standard I have violated in doing so.
It feels as though the standard may need to be far more explicit about what operations are guaranteed to not change the data-space pointer and what operations are not guaranteed to not change the data-space pointer. If I do 10 ALLOT <other_stuff> 10 ALLOT
, under what conditions on <other_stuff> am I guaranteed that the two calls to ALLOT
will provide me with contiguous regions? Just about any call to define new words (:
, :NONAME
, CONSTANT
, DEFER
, MARKER
, VARIABLE
, VALUE
, possibly even S"
) feels like it could inject something on systems that intermingle code and data, but it feels like calling CREATE
might not inject something except on systems that also intermingle word list entries. The standard explicitly states that space pointed to by a CREATE-d word is right where HERE
pointed just after executing CREATE
, but it says nothing about VARIABLE
, VALUE
, etc.
Sorry... there's a whole can of worms.
If I do
10 ALLOT <other_stuff> 10 ALLOT
, under what conditions on<other_stuff>
am I guaranteed that the two calls toALLOT
will provide me with contiguous regions?
3.3.3.2 Contiguous regions says: “Since an implementation is free to allocate data space for use by code, the above operators need not produce contiguous regions of data space if definitions are added to or removed from the dictionary between allocations”.
Also, the following words are allowed to allot data space: WORDLIST
, REPLACES
, INCLUDED
(and then several other standard words that perform the function of INCLUDED
).
Except ,
, C,
, XC,
, ALLOT
that are intended to allot data space, there are
ALIGN
,
FALIGN
,
SFALIGN
,
DFALIGN
,
that also allot data space to align it.
FILE S"
in interpretation state is not allowed to allot data space (i.e., to change data-space pointer).
CREARE
is a defining word, it adds new definition into the dictionary, and so it is allowed to use data space for its internal purposes.
Concerning variables, 3.3.3.3 explicitly says: “The region allocated for a variable may be non-contiguous with regions subsequently allocated with ,
(comma) or ALLOT
”.
Perhaps this section should be referenced in the glossary entry for VARIABLE
.
Here's how I handle it in cforth. When you save the image that will go into ROM, everything that is currently in the dictionary becomes immutable. Then when you run that image, the dictionary pointer is set so it points to RAM and anything you incrementally compile thereafter is mutable. As a usage requirement, if you want to precompile something that will be mutable, do not use ALLOT or , (comma). BUFFER: usually does what you want.
It is not an error to tick a word. There may be an error if the returned result is used incorrectly.
I understand this point. But an ambiguous condition is a formal thing.
And concerning Tick there are two options:
- declare an ambiguous condition for execution of the result of ticking in the certain cases;
- declare an ambiguous condition for ticking in the certain cases.
The choice of the standard is the second option. And I agree, this choice is better than another, since it's simpler, less dangerous, more options for implementations.