,---------------. | Contributions | `---------------´ ,------------------------------------------ | 2024-01-30 22:30:13 lmr wrote: | referenceImplementation - INTERPRET | see: https://forth-standard.org/standard/core/QUIT#contribution-334 `------------------------------------------ This is a simple way to fulfill the `INTERPRET` part of the reference implementation, plus `]]` … `[[` via a special `STATE` of -2. FWIW, it relies on as few predefined words as possible. It computes "actions" to undertake for each token/state, with a single `EXECUTE` in the main loop. I think it might be useful; there are other sections of the spec that are relevant but (1) I haven't seen a self-contained version of `INTERPRET` and (2) those sections are rather large, so I thought I'd leave this here. I've tested it in GForth (with a suitable prelude). Prerequisites: * compile(ish) words: `: ; [ ] PARSE-NAME NAME>COMPILE NAME>INTERPRET COMPILE,` * `: FIND-NAME ( c-addr u -- nt ) ( find a name in search order ) ;` * `: LIT, ( n -- ) ( ... ) ; \ non-immediate, non-STATE-bound LITERAL` * `: PARSE-NUM?#-13 ( c-addr u -- n ) ( parse num or #-13 THROW ) ;` ``` : LITERAL ( n -- ) LIT, ; IMMEDIATE RESTRICT : FIND-NAME?#-13 ( c-addr u -- nt ) \ FIND-NAME or #-13 THROW FIND-NAME DUP 0= #-13 AND THROW ; : PARSE-NAME?#-16 ( "name" -- c-addr u ) \ PARSE-NAME or #-16 THROW PARSE-NAME DUP 0= #-16 AND THROW ; : NAME' ( c-addr u -- nt ) PARSE-NAME?#-16 FIND-NAME?#-13 ; : ' ( "name" -- ixt ) NAME' NAME>INTERPRET ; : NAME>COMPILED ( nt -- cxt ) NAME>COMPILE DROP ; : COMPILE' ( "name" -- cxt ) NAME' NAME>COMPILED ; : [NAME'] ( c-addr u -- nt ) NAME' LIT, ; IMMEDIATE RESTRICT : ['] ( "name" -- ixt ) ' LIT, ; IMMEDIATE RESTRICT : [COMPILE'] ( "name" -- xt ) COMPILE' LIT, ; IMMEDIATE RESTRICT : [COMPILE] ( "name" -- xt ) COMPILE' COMPILE, ; IMMEDIATE RESTRICT : SWAP&LIT&COMP ( x xt -- ) DUP ['] EXECUTE = IF DROP \ optimization ELSE SWAP LIT, THEN COMPILE, ; : POSTPONE-NAME ( nt -- ) NAME>COMPILE SWAP&LIT&COMP ; : POSTPONE ( "name" -- ) NAME' POSTPONE-NAME ; IMMEDIATE RESTRICT : NAME>STATE?ACTION ( nt -- x xt ) STATE @ IF NAME>COMPILE ELSE NAME>INTERPRET DUP 0= #-14 AND THROW ['] EXECUTE THEN ; : NUMBER>STATE?ACTION ( n -- n xt ) STATE @ IF ['] LIT, ELSE ['] NOOP THEN ; : ]] #-2 STATE ! ; IMMEDIATE RESTRICT : [[ TRUE STATE ! ; IMMEDIATE RESTRICT : _XT_[[ [COMPILE'] [[ ; \ CONSTANT ... : POSTPONE?ACTION ( x xt -- x xt pxt ) \ takes an pair as produced by NAME>COMPILE \ adds SWAP&LIT&COMP if STATE = -2 and word <> [[ OVER _XT_[[ <> STATE @ #-2 = AND IF ['] SWAP&LIT&COMP ELSE ['] EXECUTE THEN ; : STRING>STATE?ACTION ( c-addr u -- x xt pxt ) 2DUP FIND-NAME ?DUP 0= IF PARSE-NUM?#-13 NUMBER>STATE?ACTION ELSE NIP NIP NAME>STATE?ACTION THEN POSTPONE?ACTION ; : INTERPRET BEGIN PARSE-NAME DUP 0= IF 2DROP EXIT THEN STRING>STATE?ACTION EXECUTE AGAIN ; ``` ,---------. | Replies | `---------´ ,------------------------------------------ | 2024-01-25 08:25:18 AntonErtl replies: | example - "make the user input device the input source" | see: https://forth-standard.org/standard/core/QUIT#reply-1177 `------------------------------------------ Sounds right. The input source specification is explicitly mentioned in `throw`, which binds it to the exception stack and thus the return stack. It would be better to specify this in `load`, `included` etc., because that's where the implementations actually ensure this requirement (by `catch`ing, restoring, and re`throw`ing), and the current specification has led to confusion among some readers. Another possible fix to the document. ,------------------------------------------ | 2024-01-25 18:43:17 lmr replies: | example - "make the user input device the input source" | see: https://forth-standard.org/standard/core/QUIT#reply-1178 `------------------------------------------ > by catching, restoring, and rethrowing I've been thinking about this. It seems that for both the data and the return stack, there is a need for a "structured" version (i.e. one that can somehow accomodate blobs / arrays / higher-level data structures, whatever you want to call them) as opposed to just integers. For the data stack, the structured version is the *control* stack. What is the structured version of the return stack? I think it is the *exception* stack. This would make catching and re-throwing unnecessary, and abiding by the spec would become automatic. Specifically, pushing a new `N>R` item, pushing a new input source, and pushing an exception frame would all push to the "enhanced" exception stack (call it `estk`). There are two ways `estk` can be popped. `THROW` has to search from the top of `estk` downward to the most recent item that is an exception frame, discard everything above it, and reset the data stack / control stack / return stack pointers as specified in the exception frame thus located. The action to take when discarding an item from `estk` varies (for a `N>R` item it means freeing up a buffer; for an input source it means closing the file etc). Now let's see what happens in non-throw situations. `NR>` needs to pop off a `N>R` item from `estk`, `INCLUDE/EVALUATE` need to pop off an input source from `estk` when `REFILL` returns 0, `CATCH` needs to pop off an exception frame from `estk` when the XT returns. If the item popped does not match the expected tag / type, we throw an error. The questions is whether these actions could happen out of order in a standard program — and tt seems to me they couldn't. There are some other details to take care of (e.g. when popping off an input source from `estk`, the next most-recent one has to be searched for and set as `stdin`; when popping off a `N>R` item, the "dumb" return stack also has to be popped). This seems tempting to implement … provided I haven't missed things. ,------------------------------------------ | 2024-01-26 08:57:41 AntonErtl replies: | example - "make the user input device the input source" | see: https://forth-standard.org/standard/core/QUIT#reply-1179 `------------------------------------------ The usual implementation is to have control-flow stack items as a bunch of cells on the data stack and exception stack items as a bunch of cells on the return stack; the words that deal with them know the structure. If you instead use several stacks, you have to save all their depths on `catch` and restore these depths on `throw`. The usual way to find the current exception frame is by having a (user) variable that points to the frame; when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one and stored in the variable. No need for a search. Concerning the input sources, what you have to restore when `include` etc. are left depends on the implementation. E.g., usually a file has to be closed. This does not happen automatically when `throw`ing, unless you put the restoring action into `throw`, which would slow down `throw` and `catch` even for cases where the input source does not change. That's why the common approach is `catch`, restore, re`throw` in `included` etc. Concerning the order of popping, yes, they are popped in the reverse order of pushing and there cannot be any reorderings. That's because `catch` puts the stuff on the return/exception stack before `execute`ing the xt, and restores it afterwards. Likewise, `included` etc. put the input stream stuff on the return stack before interpreting the contents of the file and restore them afterwards; so this is all fully nested. The restrictions on `n>r` `nr>` also mean that no reordering can happen. Your idea reminds me of my [EuroForth 2022 paper](http://www.euroforth.org/ef22/papers/ertl.pdf), where I sketch having separate stacks to avoid subverting memory safety. The paper does not discuss these things in depth, though. ,------------------------------------------ | 2024-01-26 12:55:29 lmr replies: | example - "make the user input device the input source" | see: https://forth-standard.org/standard/core/QUIT#reply-1180 `------------------------------------------ > [ … ] control-flow stack items as a bunch of cells [ … ] use several stacks, you have to save all their depths on catch and restore these depths on throw Of course — and also clear the ones with return-stack semantics on `QUIT`. I've also thought about adding a word to create custom stacks, but then the cleanup problem multiplies (I'll have to look a little closer at what GForth does). > when the exception frame is dropped, the pointer to the previous frame is taken from the dropped one Ah, an implicit linked-list of exception frames; the same can be done for input-source frames. > This does not happen automatically when throwing, unless you put the restoring action into throw, which would slow down throw and catch even for cases where the input source does not change This I'm not sure I understand. Items placed on this "enhanced" return stack `estk` absolutely need "destructors". If a `N>R` frame was placed on `estk` and gets discarded by `THROW`, the memory needs to be `free()`'d somewhere. If an input-source frame gets discarded by `THROW`, someone needs to close the file (except for `EVALUATE`) and restore `stdin`. If so, what standard programs would get slowed down by moving the destructor invocations into `THROW`? For a program that doesn't leave behind any `INCLUDE`, `EVALUATE` or `N>R` frames between (1) `CATCH` (which pushes the exception frame onto `estk`) and (2) whatever causes the `THROW`, there would be no destructors to invoke (or actually at all) on `estk` — so no slowdown...? Also, how does an "allocating" `N>R` ensure memory is `free()d` in a single "dumb" return stack implementation? By also catching and rethrowing...? > reminds me of my EuroForth 2022 paper Thanks, that's quite relevant. Is it still just a "paper design"? ,------------------------------------------ | 2024-01-26 17:27:40 lmr replies: | referenceImplementation - Possible Reference Implementation | see: https://forth-standard.org/standard/core/VARIABLE#reply-1181 `------------------------------------------ I think the comma already aligns, so, while we're onto this particular horse, the following is probably even *more* efficient: ``` : VARIABLE ( " name" -- ) 0 , HERE CELL- CONSTANT ; ``` ... and the above also proves an ironic point about the requirement for stack diagrams: one can't actually input `<` and `>` right before or after a non-whitespace char within triple-backtick sections *with the current web front-end* (not with ampersand-gt / lt, not with plain less-than). It gets backslash-ified. So one can't actually provide "accurate" stack diagrams (accurate in the sense that I tried to copy the stack diagram from the reference above itself). ,------------------------------------------ | 2024-01-27 11:33:51 AntonErtl replies: | referenceImplementation - Possible Reference Implementation | see: https://forth-standard.org/standard/core/VARIABLE#reply-1182 `------------------------------------------ `,` is not guaranteed to align (and, e.g., in Gforth it does not). This is explicity stated in [6.1.0150](https://forth-standard.org/standard/core/Comma): > An ambiguous condition exists if the data-space pointer is not aligned prior to execution of `,`. The reference implementations by Jim Peterson are standard programs, your reference implementation could be described as having an environmental dependency on an aligning `,`. ,------------------------------------------ | 2024-01-27 12:00:18 AntonErtl replies: | example - "make the user input device the input source" | see: https://forth-standard.org/standard/core/QUIT#reply-1183 `------------------------------------------ Gforth's throw just checks whether the ball is non-zero, and if so, mainly restores a few virtual-machine registers. It does not need to check whether there is any destructor and invoke it; instead, destructors are invoked through the idiom ``` ( xt ) catch destruct throw ``` which means that you have an exception frame for every destructor, but given that exception frames are cheap, that's fine. For `n>r` you cannot use the idiom above, because `n>r` `nr>` are a pair of words rather than one word that wraps around some inner execution mechanism (whether it's the invocation of an xt or the text interpretation of some source code). But then I don't know of any `n>r` implementation that `allocate`s. Safe Forth is still a paper design and likely to stay one for at least several more years. So many ideas, so little time:-).