Digest #254 2024-01-31
Contributions
This is a simple way to fulfill the INTERPRET
part of the reference implementation, plus ]]
… [[
via a special STATE
of -2. FWIW, it relies on as few predefined words as possible. It computes "actions" to undertake for each token/state, with a single EXECUTE
in the main loop. I think it might be useful; there are other sections of the spec that are relevant but (1) I haven't seen a self-contained version of INTERPRET
and (2) those sections are rather large, so I thought I'd leave this here. I've tested it in GForth (with a suitable prelude). Prerequisites:
- compile(ish) words:
: ; [ ] PARSE-NAME NAME>COMPILE NAME>INTERPRET COMPILE,
: FIND-NAME ( c-addr u -- nt ) ( find a name in search order ) ;
: LIT, ( n -- ) ( ... ) ; \ non-immediate, non-STATE-bound LITERAL
: PARSE-NUM?#-13 ( c-addr u -- n ) ( parse num or #-13 THROW ) ;
: LITERAL ( n -- ) LIT, ; IMMEDIATE RESTRICT
: FIND-NAME?#-13 ( c-addr u -- nt ) \ FIND-NAME or #-13 THROW
FIND-NAME DUP 0= #-13 AND THROW
;
: PARSE-NAME?#-16 ( "name" -- c-addr u ) \ PARSE-NAME or #-16 THROW
PARSE-NAME DUP 0= #-16 AND THROW
;
: NAME' ( c-addr u -- nt ) PARSE-NAME?#-16 FIND-NAME?#-13 ;
: ' ( "name" -- ixt ) NAME' NAME>INTERPRET ;
: NAME>COMPILED ( nt -- cxt ) NAME>COMPILE DROP ;
: COMPILE' ( "name" -- cxt ) NAME' NAME>COMPILED ;
: [NAME'] ( c-addr u -- nt ) NAME' LIT, ; IMMEDIATE RESTRICT
: ['] ( "name" -- ixt ) ' LIT, ; IMMEDIATE RESTRICT
: [COMPILE'] ( "name" -- xt ) COMPILE' LIT, ; IMMEDIATE RESTRICT
: [COMPILE] ( "name" -- xt ) COMPILE' COMPILE, ; IMMEDIATE RESTRICT
: SWAP&LIT&COMP ( x xt -- )
DUP ['] EXECUTE =
IF DROP \ optimization
ELSE SWAP LIT,
THEN COMPILE,
;
: POSTPONE-NAME ( nt -- ) NAME>COMPILE SWAP&LIT&COMP ;
: POSTPONE ( "name" -- ) NAME' POSTPONE-NAME ; IMMEDIATE RESTRICT
: NAME>STATE?ACTION ( nt -- x xt )
STATE @ IF
NAME>COMPILE
ELSE
NAME>INTERPRET DUP 0= #-14 AND THROW
['] EXECUTE
THEN
;
: NUMBER>STATE?ACTION ( n -- n xt )
STATE @ IF ['] LIT,
ELSE ['] NOOP
THEN
;
: ]] #-2 STATE ! ; IMMEDIATE RESTRICT
: [[ TRUE STATE ! ; IMMEDIATE RESTRICT
: _XT_[[ [COMPILE'] [[ ; \ CONSTANT ...
: POSTPONE?ACTION ( x xt -- x xt pxt )
\ takes an \<x xt\> pair as produced by NAME>COMPILE
\ adds SWAP&LIT&COMP if STATE = -2 and word <> [[
OVER _XT_[[ <>
STATE @ #-2 =
AND IF ['] SWAP&LIT&COMP
ELSE ['] EXECUTE
THEN
;
: STRING>STATE?ACTION ( c-addr u -- x xt pxt )
2DUP FIND-NAME
?DUP 0= IF
PARSE-NUM?#-13 NUMBER>STATE?ACTION
ELSE NIP NIP NAME>STATE?ACTION
THEN
POSTPONE?ACTION
;
: INTERPRET
BEGIN
PARSE-NAME DUP 0= IF 2DROP EXIT THEN
STRING>STATE?ACTION EXECUTE
AGAIN
;
Replies
example - "make the user input device the input source"
Sounds right. The input source specification is explicitly mentioned in throw
, which binds it to the exception stack and thus the return stack. It would be better to specify this in load
, included
etc., because that's where the implementations actually ensure this requirement (by catch
ing, restoring, and rethrow
ing), and the current specification has led to confusion among some readers. Another possible fix to the document.
by catching, restoring, and rethrowing
I've been thinking about this. It seems that for both the data and the return stack, there is a need for a "structured" version (i.e. one that can somehow accomodate blobs / arrays / higher-level data structures, whatever you want to call them) as opposed to just integers.
For the data stack, the structured version is the control stack. What is the structured version of the return stack? I think it is the exception stack. This would make catching and re-throwing unnecessary, and abiding by the spec would become automatic.
Specifically, pushing a new N>R
item, pushing a new input source, and pushing an exception frame would all push to the "enhanced" exception stack (call it estk
).
There are two ways estk
can be popped. THROW
has to search from the top of estk
downward to the most recent item that is an exception frame, discard everything above it, and reset the data stack / control stack / return stack pointers as specified in the exception frame thus located. The action to take when discarding an item from estk
varies (for a N>R
item it means freeing up a buffer; for an input source it means closing the file etc).
Now let's see what happens in non-throw situations. NR>
needs to pop off a N>R
item from estk
, INCLUDE/EVALUATE
need to pop off an input source from estk
when REFILL
returns 0, CATCH
needs to pop off an exception frame from estk
when the XT returns. If the item popped does not match the expected tag / type, we throw an error. The questions is whether these actions could happen out of order in a standard program — and tt seems to me they couldn't.
There are some other details to take care of (e.g. when popping off an input source from estk
, the next most-recent one has to be searched for and set as stdin
; when popping off a N>R
item, the "dumb" return stack also has to be popped).
This seems tempting to implement … provided I haven't missed things.
example - "make the user input device the input source"
The usual implementation is to have control-flow stack items as a bunch of cells on the data stack and exception stack items as a bunch of cells on the return stack; the words that deal with them know the structure. If you instead use several stacks, you have to save all their depths on catch
and restore these depths on throw
.
The usual way to find the current exception frame is by having a (user) variable that points to the frame; when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one and stored in the variable. No need for a search.
Concerning the input sources, what you have to restore when include
etc. are left depends on the implementation. E.g., usually a file has to be closed. This does not happen automatically when throw
ing, unless you put the restoring action into throw
, which would slow down throw
and catch
even for cases where the input source does not change. That's why the common approach is catch
, restore, rethrow
in included
etc.
Concerning the order of popping, yes, they are popped in the reverse order of pushing and there cannot be any reorderings. That's because catch
puts the stuff on the return/exception stack before execute
ing the xt, and restores it afterwards. Likewise, included
etc. put the input stream stuff on the return stack before interpreting the contents of the file and restore them afterwards; so this is all fully nested. The restrictions on n>r
nr>
also mean that no reordering can happen.
Your idea reminds me of my EuroForth 2022 paper, where I sketch having separate stacks to avoid subverting memory safety. The paper does not discuss these things in depth, though.
[ … ] control-flow stack items as a bunch of cells [ … ] use several stacks, you have to save all their depths on catch and restore these depths on throw
Of course — and also clear the ones with return-stack semantics on QUIT
. I've also thought about adding a word to create custom stacks, but then the cleanup problem multiplies (I'll have to look a little closer at what GForth does).
when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one
Ah, an implicit linked-list of exception frames; the same can be done for input-source frames.
This does not happen automatically when throwing, unless you put the restoring action into throw, which would slow down throw and catch even for cases where the input source does not change
This I'm not sure I understand. Items placed on this "enhanced" return stack estk
absolutely need "destructors". If a N>R
frame was placed on estk
and gets discarded by THROW
, the memory needs to be free()
'd somewhere. If an input-source frame gets discarded by THROW
, someone needs to close the file (except for EVALUATE
) and restore stdin
.
If so, what standard programs would get slowed down by moving the destructor invocations into THROW
? For a program that doesn't leave behind any INCLUDE
, EVALUATE
or N>R
frames between (1) CATCH
(which pushes the exception frame onto estk
) and (2) whatever causes the THROW
, there would be no destructors to invoke (or actually at all) on estk
— so no slowdown...?
Also, how does an "allocating" N>R
ensure memory is free()d
in a single "dumb" return stack implementation? By also catching and rethrowing...?
reminds me of my EuroForth 2022 paper
Thanks, that's quite relevant. Is it still just a "paper design"?
I think the comma already aligns, so, while we're onto this particular horse, the following is probably even more efficient:
: VARIABLE ( "\<spaces\> name" -- )
0 , HERE CELL- CONSTANT
;
... and the above also proves an ironic point about the requirement for stack diagrams: one can't actually input <
and >
right before or after a non-whitespace char within triple-backtick sections with the current web front-end (not with ampersand-gt / lt, not with plain less-than). It gets backslash-ified. So one can't actually provide "accurate" stack diagrams (accurate in the sense that I tried to copy the stack diagram from the reference above itself).
referenceImplementation - Possible Reference Implementation
,
is not guaranteed to align (and, e.g., in Gforth it does not). This is explicity stated in 6.1.0150:
An ambiguous condition exists if the data-space pointer is not aligned prior to execution of
,
.
The reference implementations by Jim Peterson are standard programs, your reference implementation could be described as having an environmental dependency on an aligning ,
.
example - "make the user input device the input source"
Gforth's throw just checks whether the ball is non-zero, and if so, mainly restores a few virtual-machine registers. It does not need to check whether there is any destructor and invoke it; instead, destructors are invoked through the idiom
( xt ) catch destruct throw
which means that you have an exception frame for every destructor, but given that exception frames are cheap, that's fine.
For n>r
you cannot use the idiom above, because n>r
nr>
are a pair of words rather than one word that wraps around some inner execution mechanism (whether it's the invocation of an xt or the text interpretation of some source code). But then I don't know of any n>r
implementation that allocate
s.
Safe Forth is still a paper design and likely to stay one for at least several more years. So many ideas, so little time:-).