Digest #254 2024-01-31

Contributions

[334] 2024-01-30 22:30:13 lmr wrote:

referenceImplementation - INTERPRET

This is a simple way to fulfill the INTERPRET part of the reference implementation, plus ]] … [[ via a special STATE of -2. FWIW, it relies on as few predefined words as possible. It computes "actions" to undertake for each token/state, with a single EXECUTE in the main loop. I think it might be useful; there are other sections of the spec that are relevant but (1) I haven't seen a self-contained version of INTERPRET and (2) those sections are rather large, so I thought I'd leave this here. I've tested it in GForth (with a suitable prelude). Prerequisites:

compile(ish) words: : ; [ ] PARSE-NAME NAME>COMPILE NAME>INTERPRET COMPILE,
: FIND-NAME ( c-addr u -- nt ) ( find a name in search order ) ;
: LIT, ( n -- ) ( ... ) ; \ non-immediate, non-STATE-bound LITERAL
: PARSE-NUM?#-13 ( c-addr u -- n ) ( parse num or #-13 THROW ) ;

: LITERAL ( n -- ) LIT, ; IMMEDIATE RESTRICT

: FIND-NAME?#-13   ( c-addr u -- nt )      \ FIND-NAME or #-13 THROW
  FIND-NAME  DUP 0= #-13 AND THROW
;
: PARSE-NAME?#-16  ( "name" -- c-addr u )  \ PARSE-NAME or #-16 THROW
  PARSE-NAME DUP 0= #-16 AND THROW
;

: NAME'      ( c-addr u -- nt  ) PARSE-NAME?#-16 FIND-NAME?#-13 ;
: '          ( "name"   -- ixt ) NAME' NAME>INTERPRET ;
: NAME>COMPILED ( nt    -- cxt ) NAME>COMPILE DROP ;
: COMPILE'   ( "name"   -- cxt ) NAME' NAME>COMPILED ;
: [NAME']    ( c-addr u -- nt  ) NAME' LIT, ; IMMEDIATE RESTRICT
: [']        ( "name"   -- ixt ) ' LIT, ; IMMEDIATE RESTRICT
: [COMPILE'] ( "name"   -- xt  ) COMPILE' LIT, ; IMMEDIATE RESTRICT
: [COMPILE]  ( "name"   -- xt  ) COMPILE' COMPILE, ; IMMEDIATE RESTRICT

: SWAP&LIT&COMP  ( x xt -- )
  DUP ['] EXECUTE =
  IF DROP  \ optimization
  ELSE SWAP LIT,
  THEN COMPILE,
;
: POSTPONE-NAME ( nt -- )     NAME>COMPILE SWAP&LIT&COMP ;
: POSTPONE      ( "name" -- ) NAME' POSTPONE-NAME ; IMMEDIATE RESTRICT

: NAME>STATE?ACTION  ( nt -- x xt )
  STATE @ IF
    NAME>COMPILE
  ELSE
    NAME>INTERPRET DUP 0= #-14 AND THROW
    ['] EXECUTE
  THEN
;
: NUMBER>STATE?ACTION  ( n -- n xt )
  STATE @ IF ['] LIT,
  ELSE       ['] NOOP
  THEN
;

: ]] #-2  STATE ! ; IMMEDIATE RESTRICT
: [[ TRUE STATE ! ; IMMEDIATE RESTRICT
: _XT_[[ [COMPILE'] [[ ;  \ CONSTANT ...
: POSTPONE?ACTION  ( x xt -- x xt pxt )
  \ takes an \<x xt\> pair as produced by NAME>COMPILE
  \ adds SWAP&LIT&COMP if STATE = -2 and word <> [[
    OVER _XT_[[ <>
    STATE @ #-2 =
  AND IF ['] SWAP&LIT&COMP
  ELSE   ['] EXECUTE
  THEN
;

: STRING>STATE?ACTION  ( c-addr u -- x xt pxt )
  2DUP FIND-NAME
  ?DUP 0= IF
    PARSE-NUM?#-13 NUMBER>STATE?ACTION
  ELSE NIP NIP NAME>STATE?ACTION
  THEN
  POSTPONE?ACTION
;

: INTERPRET
  BEGIN
    PARSE-NAME DUP 0= IF 2DROP EXIT THEN
    STRING>STATE?ACTION EXECUTE
  AGAIN
;

Replies

[r1177] 2024-01-25 08:25:18 AntonErtl replies:

example - "make the user input device the input source"

Sounds right. The input source specification is explicitly mentioned in throw, which binds it to the exception stack and thus the return stack. It would be better to specify this in load, included etc., because that's where the implementations actually ensure this requirement (by catching, restoring, and rethrowing), and the current specification has led to confusion among some readers. Another possible fix to the document.

[r1178] 2024-01-25 18:43:17 lmr replies:

example - "make the user input device the input source"

by catching, restoring, and rethrowing

I've been thinking about this. It seems that for both the data and the return stack, there is a need for a "structured" version (i.e. one that can somehow accomodate blobs / arrays / higher-level data structures, whatever you want to call them) as opposed to just integers.

For the data stack, the structured version is the control stack. What is the structured version of the return stack? I think it is the exception stack. This would make catching and re-throwing unnecessary, and abiding by the spec would become automatic.

Specifically, pushing a new N>R item, pushing a new input source, and pushing an exception frame would all push to the "enhanced" exception stack (call it estk).

There are two ways estk can be popped. THROW has to search from the top of estk downward to the most recent item that is an exception frame, discard everything above it, and reset the data stack / control stack / return stack pointers as specified in the exception frame thus located. The action to take when discarding an item from estk varies (for a N>R item it means freeing up a buffer; for an input source it means closing the file etc).

Now let's see what happens in non-throw situations. NR> needs to pop off a N>R item from estk, INCLUDE/EVALUATE need to pop off an input source from estk when REFILL returns 0, CATCH needs to pop off an exception frame from estk when the XT returns. If the item popped does not match the expected tag / type, we throw an error. The questions is whether these actions could happen out of order in a standard program — and tt seems to me they couldn't.

There are some other details to take care of (e.g. when popping off an input source from estk, the next most-recent one has to be searched for and set as stdin; when popping off a N>R item, the "dumb" return stack also has to be popped).

This seems tempting to implement … provided I haven't missed things.

[r1179] 2024-01-26 08:57:41 AntonErtl replies:

example - "make the user input device the input source"

The usual implementation is to have control-flow stack items as a bunch of cells on the data stack and exception stack items as a bunch of cells on the return stack; the words that deal with them know the structure. If you instead use several stacks, you have to save all their depths on catch and restore these depths on throw.

The usual way to find the current exception frame is by having a (user) variable that points to the frame; when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one and stored in the variable. No need for a search.

Concerning the input sources, what you have to restore when include etc. are left depends on the implementation. E.g., usually a file has to be closed. This does not happen automatically when throwing, unless you put the restoring action into throw, which would slow down throw and catch even for cases where the input source does not change. That's why the common approach is catch, restore, rethrow in included etc.

Concerning the order of popping, yes, they are popped in the reverse order of pushing and there cannot be any reorderings. That's because catch puts the stuff on the return/exception stack before executeing the xt, and restores it afterwards. Likewise, included etc. put the input stream stuff on the return stack before interpreting the contents of the file and restore them afterwards; so this is all fully nested. The restrictions on n>r nr> also mean that no reordering can happen.

Your idea reminds me of my EuroForth 2022 paper, where I sketch having separate stacks to avoid subverting memory safety. The paper does not discuss these things in depth, though.

[r1180] 2024-01-26 12:55:29 lmr replies:

example - "make the user input device the input source"

[ … ] control-flow stack items as a bunch of cells [ … ] use several stacks, you have to save all their depths on catch and restore these depths on throw

Of course — and also clear the ones with return-stack semantics on QUIT. I've also thought about adding a word to create custom stacks, but then the cleanup problem multiplies (I'll have to look a little closer at what GForth does).

when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one

Ah, an implicit linked-list of exception frames; the same can be done for input-source frames.

This does not happen automatically when throwing, unless you put the restoring action into throw, which would slow down throw and catch even for cases where the input source does not change

This I'm not sure I understand. Items placed on this "enhanced" return stack estk absolutely need "destructors". If a N>R frame was placed on estk and gets discarded by THROW, the memory needs to be free()'d somewhere. If an input-source frame gets discarded by THROW, someone needs to close the file (except for EVALUATE) and restore stdin.

If so, what standard programs would get slowed down by moving the destructor invocations into THROW? For a program that doesn't leave behind any INCLUDE, EVALUATE or N>R frames between (1) CATCH (which pushes the exception frame onto estk) and (2) whatever causes the THROW, there would be no destructors to invoke (or actually at all) on estk — so no slowdown...?

Also, how does an "allocating" N>R ensure memory is free()d in a single "dumb" return stack implementation? By also catching and rethrowing...?

reminds me of my EuroForth 2022 paper

Thanks, that's quite relevant. Is it still just a "paper design"?

[r1181] 2024-01-26 17:27:40 lmr replies:

referenceImplementation - Possible Reference Implementation

I think the comma already aligns, so, while we're onto this particular horse, the following is probably even more efficient:

: VARIABLE ( "\<spaces\> name" -- )
  0 , HERE CELL- CONSTANT
;

... and the above also proves an ironic point about the requirement for stack diagrams: one can't actually input < and > right before or after a non-whitespace char within triple-backtick sections with the current web front-end (not with ampersand-gt / lt, not with plain less-than). It gets backslash-ified. So one can't actually provide "accurate" stack diagrams (accurate in the sense that I tried to copy the stack diagram from the reference above itself).

[r1182] 2024-01-27 11:33:51 AntonErtl replies:

referenceImplementation - Possible Reference Implementation

, is not guaranteed to align (and, e.g., in Gforth it does not). This is explicity stated in 6.1.0150:

An ambiguous condition exists if the data-space pointer is not aligned prior to execution of ,.

The reference implementations by Jim Peterson are standard programs, your reference implementation could be described as having an environmental dependency on an aligning ,.

[r1183] 2024-01-27 12:00:18 AntonErtl replies:

example - "make the user input device the input source"

Gforth's throw just checks whether the ball is non-zero, and if so, mainly restores a few virtual-machine registers. It does not need to check whether there is any destructor and invoke it; instead, destructors are invoked through the idiom

( xt ) catch destruct throw

which means that you have an exception frame for every destructor, but given that exception frames are cheap, that's fine.

For n>r you cannot use the idiom above, because n>r nr> are a pair of words rather than one word that wraps around some inner execution mechanism (whether it's the invocation of an xt or the text interpretation of some source code). But then I don't know of any n>r implementation that allocates.

Safe Forth is still a paper design and likely to stay one for at least several more years. So many ideas, so little time:-).