Digest #301 2025-07-30

Contributions

[388] 2025-07-29 20:21:10 EricBlake wrote:

requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?

The test implies that the identity of the xt passed to DEFER! (or IS) should be preserved:

T{ DEFER defer4 -> }T
T{ ' + IS defer4 -> }T
T{ ' defer4 DEFER@ -> ' + }T

But elsewhere, in 2.1, we read

execution token:
    A value that identifies the execution semantics of a definition.

and 3.1 is clear that an xt is 1 cell. However, I'm working on an implementation of Forth where the implied requirement of identical tokens is overly restrictive. In my implementation, I can dispatch to words faster if every execution token is treated as the address to a pair of cells: one cell holding a pointer to the code handler to execute, and the other cell holding a parameter to be (optionally) used by that handler. For example, implementing a CONSTANT creates a pair "do-lit, value" where do-lit is a handler that knows how to push value to the stack (but where COMPILE, can bypass calling the do-lit handler and just compile code that directly pushes value to the stack); a VALUE creates a pair "do-val, addr" where addr is a one-cell location reserved at the time word was defined, and where the do-val handler performs (or COMPILE, inlines) "addr @", and so forth.

The interesting aspect of this is that for most other handlers, copying the two-cell contents to any other address still has the same semantics as the two cells in their original location (it doesn't matter whether I use a pointer to the two cells "do-lit, 5" that were compiled into 5 CONSTANT five, or a pointer to the two cells "do-lit, 5" that were compiled as part of the body of : doit 5 ; - any time my execution engine sees that two-cell sequence, it has the same semantics of pushing 5 to the stack). Note that in my scheme, VALUEs store an address in the parameter field of its 2-cell representation (where that address is basically ALIGN HERE 1 CELLS ALLOT at the time VALUE was run), and not the current value set by the most recent TO; that's because I have planned for the contents of an xt to be copied around, while still preserving the semantics regardless of the address where that copy of the 2 cell xt contents lives. Put another way, the compilation of 5 VALUE v : getv v ; must not hard-code a 5 as the value that v happened to have when getv was compiled, but rather must compile code that looks up the current value that v has at the time getv is executed; but it is more efficient for the compilation body of getv to have the two-cell sequence "do-val, addr" where do-val does "addr @" than it is to have the compilation body of getv have the two-cell sequence "do-call, ' v".

As fallout of that design, in my system, two distinct single-cell values can both be considered equivalent xts if the two cells they each point to have the same contents:

: xt= ( xt2 xt1 -- flag ) \ determine if xt1 and xt2 have the same execution semantics. False negatives are possible, but not false positives
  2@ ROT 2@ D= ;  \ implementation-dependent

With that background, my implementation of DEFER could be as simple as:

: DEFER ( "name" -- )
  [: abort" defer not assigned yet" ;] 2@ 2VALUE  \ share the same dictionary implementation as 2VALUE...
  do-defer latest !  \ ...except that I swap the handler from do-val2 to do-defer
  \ where do-val2 performs "addr 2@", do-defer performs the equivalent of "addr execute"
  \ in this implementation, "' name >BODY" gives addr
  ;
: DEFER! ( xt2 xt1 -- )
  >R 2@ R> >BODY 2! ;
: DEFER@ ( xt1 -- xt2 )
  >BODY ;

However, that implementation fails the testsuite as written: the two cells residing in the body of a DEFERred word have the same contents and thus the same semantics as the xt that was passed to DEFER!, but live at a different address (although the test passed ' + to IS defer4, DEFER@ gives back ' defer4 >BODY). Observe that my ' + ' defer4 DEFER@ xt= predicate correctly reports a true flag, but that xt= predicate is not portable to other implementations, and thus is not viable for the testsuite.

I can argue that section 2.1 merely requires that an xt be "A value that identifies the execution semantics of a definition.", and not "The unique value..."; thus, I see no compelling reason that consecutive calls to ' word must return the same immutable value for that word. In fact, I could envision a Forth system that provides an extension to optimize existing words in the dictionary, which recompiles them to better code and changes the xt that future ' word will produce even while preserving execution semantics. And it's also not hard to argue that with word-lists and the use of SYNONYM to copy a definition from one list to another while keeping the name, that ' word may produce different results based on the current wordlist order even when those various xts still resolve to the same execution semantics of the original word that all the other wordlists copied from.

In the case of my system, I believe that as long as I have the same two-cell contents passed to the execution engine, then the address of those two cells forms an xt of name even if that address differs from the one that ' name returns. If I'm right, my implementation complies with the standard but fails the testsuite, meaning the testsuite is too strict; in which case any use of -> ' in the testsuite is non-portable, and the most it can portably do is assert things like T{ 1 2 ' defer4 DEFER@ -> 3 }T after defer4 has been directed to ' +. But if I'm wrong, I could change my implementation to instead share the implementation of DEFER with VALUE (only 1 CELLS ALLOT instead of 2), at the expense of now every time my execution encounters the cell pair "do-defer addr", it must execute the slower sequence "addr @ execute" (an extra indirection from addr to xt, compared to my earlier implementation "addr execute" treating addr as the xt to dereference).

So, I'm asking clarification on whether the xt value passed to DEFER! must be preserved verbatim to that given by DEFER@, or whether the standard permits any other xt value so long as its execution semantics are the same.