Digest #247 2023-12-27
Contributions
[…] words with the same name are called in the order newest-to-oldest
Does this mean the system needs to keep around word definitions even after they are shadowed by a re-definition (thus precluding a simple hash-table implementation of wordlists)...?
Replies
The specification of throw
specifies the case when there is no exception frame from the program on the exception stack under the label "there is no exception frame on the exception stack", but systems are free to implement throw
and catch
in any way that ensures that programs see the specified behaviour. In case of throw
an efficient implementation is to handle the "no exception frame" case with a system-supplied exception frame so that throw
does not have to distinguish these cases. The suggested reference implementation of throw
does not separate out the "no exception frame" case, and the suggested reference implementation of quit
provides this system-supplied exception frame. What is missing from the reference implementation of quit
is the emptying of the data stack that is specified in throw
when there is no exception frame. I'll do another iteration on the reference implementation to fix this.
3.1.5.1 explicitly mentions colon-sys, do-sys, case-sys, of-sys, orig and dest as system-compilation types, and these live on the control-flow stack (usually the data stack). There are also system-execution types (nest-sys and loop-sys) that live on the return stack.
State
just selects whether the text interpreter performs interpretation or compilation semantics, and it has no other meaning in the standard (and, e.g., in Gforth). Programs can switch the state with ]
and [
(more typically postpone [
) around calls to standard words, and if the program switches the state back before the text intrerpreter is reentered again, such switching should not make a difference except for the result of state @
and the few words (such as to
) where it is specified that performing their interpretation or compilation semantics in a certain state is ambiguous.
In particular, there are no separate rules for manipulating the data stack for different states. The data stack has traditionally been used as control-flow stack because programs usually do not use the data stack for anything else while defining a word, so the restrictions imposed by using the data stack as control-flow stack are not particularly burdensome. But they exist, because programs are allowed to use the data stack while compiling words. And certain uses are not allowed in a standard program, e.g.
5 : foo literal ;
because the colon-sys blocks the access to the value 5. Instead, you have to write
: foo [ 5 ] literal ;
(Of course, for a literal value like 5, you would normally compile it directly, but let's assume that the value is the result of a call to a word or somesuch).
BTW, the term "primitive" is used for words implemented in a lower-level language (typically assembler) than Forth, not for all pre-defined words. And in most Forth systems, the words dealing with the control-flow stack are defined in Forth.
referenceImplementation - The reference implementation is incorrect
[r327] still misses the clearing of the data stack as discussed in [r1154]. The present version also makes the clearing of the exception stack explicit (the parenthesis could be misunderstood as describing the following code).
One other thing to note here is that handler
(from the reference implementation of throw
) is not reset, because there is never a throw
that goes below the exception frame pushed by the catch
here; because in the reference implementation the return stack is used as the exception stack, no additional work is necessary to clear the exception stack (partially addressing [259]; however, the specification of quit
also needs to be updated).
: QUIT
RP0 @ RP! ( empty the return stack )
... ( set the input source to the user input device )
POSTPONE [
BEGIN
REFILL
WHILE
['] INTERPRET CATCH
CASE
0 OF STATE @ 0= IF ." OK" THEN CR ENDOF
POSTPONE [
-1 OF ( Aborted ) ENDOF
-2 OF ( display message from ABORT" ) ENDOF
( default ) DUP ." Exception # " .
ENDCASE
SP0 @ SP! ( perform the "clear the data stack" action of ABORT )
REPEAT BYE
;
where sp0
and rp0
are (user) variables containing the addresses that point to the bottom of the data and return stacks, respectively.
Look at this reference implementation together with the one for catch
and throw
.
A wordlist may be dynamically allocated in data space
, which means it can come from the dictionary space mentioned by marker
(this terminological inconsistency should be fixed). And the wid may be the address of that location, so marker
is allowed to delete wids and programs cannot rely on their continued existence. Is marker
guaranteed to do that? I find no trace of such a guarantee in the specification (but of course that would require the specification of marker
to be updated in the chapter on the search-order wordset).
Consequently, if marker
deletes wids, subsequent invocations of wordlist
may or may not reuse deleted wids.
Overall, my take on marker
is that it is a programmer convenience for those cases where exiting the session and restarting is from the beginning (what I usually do) is too inconvenient. Apart from the minimum guarantees required by the standard, implement as little functionality as you want without unduly inconveniencing your users and as much as you can without unnecessarily complicating your system. Or maybe don't implement marker
at all.
Thanks, that makes sense. I guess I had in mind a simple "increasing counter" implementation of wid
's. I'll restate the question as: once a wid
is freed up by MARKER
, can it still count towards the (8, or whatever) WORDLIST limit the system imposes? Or must the system only limit "active" wid's (if at all)?
For example, a system implementing u8int wid
's (as an index into a fixed vector) would be forced, in the latter case, to reuse wid
's after at most 255 paired WORDLIST
/ MARKER
invocations. If the system is allowed to not reuse wid
's, OTOH, then it can declare overflow after 255 such invocations, since it would run out of "fresh" wid
's.
The CfV says:
Whether duplicated nodes are visited or not by TRAVERSE-WORDLIST is undefined.
The text of the CfV was condensed, and this particular clause was omitted, but IIRC there was no intent to change the meaning.
However, note that if you implement marker
or forget
, you may want to keep the shadowed names anyway.
As mentioned, there is no guarantee, so a system could run out of wordlists after 8 calls to WORDLIST
, whether there were markers used or not.
However, if you use a counter for generating wids, its straightforward to implement marker
such that it saves the counter in the generated marker and that the marker restores the counter.
if you implement marker or forget, you may want to keep the shadowed names anyway
Yes, but I can implement MARKER
as a "clojure" around deep copies of
- the
wid
-indexed "all-wordlists" vector of word-name -> name-token hashtables - the "current-wid" pointer to the top of the "all-wordlists" vector (as per the
MARKER
comment https://forth-standard.org/standard/core/MARKER#reply-1159 — I previously thought that someMARKER
usage patterns could leave "holes" in this vector, thus requiring the maintenance of some kind of free-list; upon reflecting this doesn't seem possible), and - the "search-order" vector of
wid
's
It's probably still necessary to maintain shadowed name-tokens (or at least shadowed XTs?) since the user could have saved them using FIND
, SEARCH-WORDLIST
or the FIND-NAME[-IN]
extensions