Digest #302 2025-07-31

Contributions

[389] 2025-07-30 03:21:46 EricBlake wrote:

requestClarification - 3.2.1.1 vs. Double-cell integer behavior on unusual integer architectures

I'm aware that the proposal to mandate twos-complement in the next version of Forth has already been voted on, back in 2015: http://www.forth200x.org/twos-complement.html, which changes the permission that 3.2.1.1 gave for sign-magnitude representations. But in the context of Forth-2012, I'm trying to determine if the following setup can be declared as a compliant implementation.

Background: I'm attempting to implement at least Core Forth on top of the family of any implementation of Eric Wastl's IntCode virtual machine described at https://adventofcode.com/2019/day/5, using https://www.reddit.com/r/adventofcode/comments/enzgxw/forth_to_intcode_compiler_in_intcode/ as my starting point. This virtual machine specification has a rather loose specification: it merely guarantees that the set of problems designed to be solved by IntCode (12 in all for that advent of code) only needed to perform signed integer math, where magnitudes larger than 2^32 would be encountered, but where none of the math operations should ever overflow 2^53 bits. This looseness was intentional - it allows for things like implementing IntCode in Javascript (where the only native numeric type is an IEEE double, with 2^53 bits of integer precision before you run into rounding issues); since IEEE doubles are sign-magnitude rather than twos complement, this means that a Forth designed to run on any arbitrary IntCode implementation is not guaranteed to have typical 32- or even 64-bit twos-complement cell behaviors. That said, a single cell in IntCode already meets the requirement for representing all minimum-required signed and unsigned double integer values in the ranges {-2147483647 ... +2147483647} and {0 ... 2147483647}, although a portable Forth program would have to use double integers or else declare that it has an environmental dependency on requiring 32-bit cells, if it depends on avoiding 16-bit overflow. The standard is also clear in 3.2.2.2 that overflow and underflow are ignored with implementation-defined behavior.

I also have an implementation of the IntCode VM that uses just m4 define: https://repo.or.cz/aoc_eblake.git/blob/HEAD:/2019/intcode.barem4 (running a Forth engine by programming with just a single macro is a convincing proof on top of Doug McIlroy's demonstration that the single m4 macro is Turing complete https://www.cs.dartmouth.edu/~doug/barem4.m4). True, it can take minutes for that engine to compute the same things as an IntCode engine written in C can do in under a second - but Forth is notorious for being usable on arcane setups! Of note, my m4 engine supports arbitrary-width integers in sign-magnitude (although behavior is quadratically worse the longer the integer gets). Also, the IntCode VM supports addition and multiplication, but has no native operation for division, which means division has to be implemented on top (my Forth-on-IntCode requires 16 levels of recursion to compute "HEX FFFF / 2"; LSHIFT is trivial but RSHIFT is slow, regardless of whether IntCode is running on barem4 with unusual integers, or on C with twos-complement 64-bit integers).

I'm aware that the testsuite documents in F.3.1 that it assumes a twos-complement cell; and right away, I run into issues when running on an IntCode VM with arbitrary-width cells, because things like:

F.3.3
1S 1 RSHIFT INVERT CONSTANT MSB

make no logical sense - there is no way to determine the maximum number of bits or the MSB when a cell has no fixed size. But then there's A.3.2.1 "There is no requirement to implement circular unsigned arithmetic, nor to set the range of unsigned numbers to the full size of a cell.", even if it is non-normative rationale. So my idea was to pick some arbitrary cutoff (maybe 2^52) which I declare as my point of overflow, and let addition and multiplication be passed unchecked to IntCode VM; if the VM uses arbitrary-width integers, there is no actual overflow; if the VM uses 64-bit integers, you get twos-complement; if the VM uses IEEE double, you may get weird rounding, but either way I don't check it. Meanwhile, for division, I would declare that any dividend larger than my arbitrary cutoff represents overflow (and so my implementation-defined behavior is that I refuse to attempt it, because it might recurse deeper than my return stack). But since that arbitrary point fits within one cell, would it be compliant to state that the only valid representations I'm willing to accept for a double-cell integer "d | ud" are the stack values "u 0 | +d 0 | d -1", where a negative double can either have only the top cell negative or both cells negative? And if I make the restriction that I'm unwilling to handle any bit pattern other than 0 or -1 in the top cell of a double, I can then implement things like UM/MOD using only division of the lower cell, and not have to make /MOD more complex to deal with single vs double dividend, nor worry about having to implement double operations up to a full range of 2^106. Simplifying my implementation by only worrying about math on single cells after validating that the top cell of a double is one of two values would be nice, especially if I can still declare it to be a compliant implementation.

[390] 2025-07-30 19:25:54 agsb wrote:

comment - Change F~ to F~=

Rationale:

The word F~ may be confuse (with), as to perform bitwise complement (also known as one's complement) on its operand.

Just change to F~= for clarification, as to approximate value on this operand.

Replies

[r1456] 2025-07-30 01:54:44 EricBlake replies:

requestClarification - Same execution token

See also https://forth-standard.org/standard/core/DEFERFetch#contribution-388 for a detailed description of an implementation case where allowing two bitwise-distinct values to represent the same execution token may make sense, and where constraining the system to provide bitwise identical xt incurs a minor pessimization.

[r1457] 2025-07-30 03:28:21 EricBlake replies:

requestClarification - 3.2.1.1 vs. Double-cell integer behavior on unusual integer architectures

I also have an implementation of the IntCode VM that uses just m4 define: https://repo.or.cz/aoc_eblake.git/blob/HEAD:/2019/intcode.barem4 (running a Forth engine by programming with just a single macro is a convincing proof on top of Doug McIlroy's demonstration that the single m4 macro is Turing complete https://www.cs.dartmouth.edu/~doug/barem4.m4). True, it can take minutes for that engine to compute the same things as an IntCode engine written in C can do in under a second - but Forth is notorious for being usable on arcane setups! Of note, my m4 engine supports arbitrary-width integers in sign-magnitude (although behavior is quadratically worse the longer the integer gets). Also, the IntCode VM supports addition and multiplication, but has no native operation for division, which means division has to be implemented on top (my Forth-on-IntCode requires 16 levels of recursion to compute "HEX FFFF 2 /"; LSHIFT is trivial but RSHIFT is slow, regardless of whether IntCode is running on barem4 with unusual integers, or on C with twos-complement 64-bit integers).

F.3.3
1S 1 RSHIFT INVERT CONSTANT MSB

[r1458] 2025-07-30 03:29:04 EricBlake replies:

requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?

The test implies that the identity of the xt passed to DEFER! (or IS) should be preserved:

T{ DEFER defer4 -> }T
T{ ' + IS defer4 -> }T
T{ ' defer4 DEFER@ -> ' + }T

But elsewhere, in 2.1, we read

execution token:
    A value that identifies the execution semantics of a definition.

and 3.1 is clear that an xt is 1 cell. However, I'm working on an implementation of Forth where the implied requirement of identical tokens is overly restrictive. In my implementation, I can dispatch to words faster if every execution token is treated as the address to a pair of cells: one cell holding a pointer to the code handler to execute, and the other cell holding a parameter to be (optionally) used by that handler. For example, implementing a CONSTANT creates a pair "do-lit, value" where do-lit is a handler that knows how to push value to the stack (but where COMPILE, can bypass calling the do-lit handler and just compile code that directly pushes value to the stack); a VALUE creates a pair "do-val, addr" where addr is a one-cell location reserved at the time word was defined, and where the do-val handler performs (or COMPILE, inlines) "addr @", and so forth.

The interesting aspect of this is that for most other handlers, copying the two-cell contents to any other address still has the same semantics as the two cells in their original location (it doesn't matter whether I use a pointer to the two cells "do-lit, 5" that were compiled into 5 CONSTANT five, or a pointer to the two cells "do-lit, 5" that were compiled as part of the body of : doit 5 ; - any time my execution engine sees that two-cell sequence, it has the same semantics of pushing 5 to the stack). Note that in my scheme, VALUEs store an address in the parameter field of its 2-cell representation (where that address is basically ALIGN HERE 1 CELLS ALLOT at the time VALUE was run), and not the current value set by the most recent TO; that's because I have planned for the contents of an xt to be copied around, while still preserving the semantics regardless of the address where that copy of the 2 cell xt contents lives. Put another way, the compilation of 5 VALUE v : getv v ; must not hard-code a 5 as the value that v happened to have when getv was compiled, but rather must compile code that looks up the current value that v has at the time getv is executed; but it is more efficient for the compilation body of getv to have the two-cell sequence "do-val, addr" where do-val does "addr @" than it is to have the compilation body of getv have the two-cell sequence "do-call, ' v".

As fallout of that design, in my system, two distinct single-cell values can both be considered equivalent xts if the two cells they each point to have the same contents:

: xt= ( xt2 xt1 -- flag ) \ determine if xt1 and xt2 have the same execution semantics. False negatives are possible, but not false positives
  2@ ROT 2@ D= ;  \ implementation-dependent

With that background, my implementation of DEFER could be as simple as:

: DEFER ( "name" -- )
  [: abort" defer not assigned yet" ;] 2@ 2VALUE  \ share the same dictionary implementation as 2VALUE...
  do-defer latest !  \ ...except that I swap the handler from do-val2 to do-defer
  \ where do-val2 performs "addr 2@", do-defer performs the equivalent of "addr execute"
  \ in this implementation, "' name >BODY" gives addr
  ;
: DEFER! ( xt2 xt1 -- )
  >R 2@ R> >BODY 2! ;
: DEFER@ ( xt1 -- xt2 )
  >BODY ;

However, that implementation fails the testsuite as written: the two cells residing in the body of a DEFERred word have the same contents and thus the same semantics as the xt that was passed to DEFER!, but live at a different address (although the test passed ' + to IS defer4, DEFER@ gives back ' defer4 >BODY). Observe that my ' + ' defer4 DEFER@ xt= predicate correctly reports a true flag, but that xt= predicate is not portable to other implementations, and thus is not viable for the testsuite.

I can argue that section 2.1 merely requires that an xt be "A value that identifies the execution semantics of a definition.", and not "The unique value..."; thus, I see no compelling reason that consecutive calls to ' word must return the same immutable value for that word. In fact, I could envision a Forth system that provides an extension to optimize existing words in the dictionary, which recompiles them to better code and changes the xt that future ' word will produce even while preserving execution semantics. And it's also not hard to argue that with word-lists and the use of SYNONYM to copy a definition from one list to another while keeping the name, that ' word may produce different results based on the current wordlist order even when those various xts still resolve to the same execution semantics of the original word that all the other wordlists copied from.

In the case of my system, I believe that as long as I have the same two-cell contents passed to the execution engine, then the address of those two cells forms an xt of name even if that address differs from the one that ' name returns. If I'm right, my implementation complies with the standard but fails the testsuite, meaning the testsuite is too strict; in which case any use of -> ' in the testsuite is non-portable, and the most it can portably do is assert things like T{ 1 2 ' defer4 DEFER@ EXECUTE -> 3 }T after defer4 has been directed to ' +. But if I'm wrong, I could change my implementation to instead share the implementation of DEFER with VALUE (only 1 CELLS ALLOT instead of 2), at the expense of now every time my execution encounters the cell pair "do-defer addr", it must execute the slower sequence "addr @ execute" (an extra indirection from addr to xt, compared to my earlier implementation "addr execute" treating addr as the xt to dereference).

So, I'm asking clarification on whether the xt value passed to DEFER! must be preserved verbatim to that given by DEFER@, or whether the standard permits any other xt value so long as its execution semantics are the same.

[r1459] 2025-07-30 08:39:25 AntonErtl replies:

requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?

Before answering your question:

Your implementation of defer@ is incorrect even if there is only a requirement for EXECUTE equivalence, but no requirement for xt equality. Consider

T{ 1 2 ' defer4 DEFER@ ' - ' defer4 DEFER! EXECUTE -> 3 }T

Changing defer4 after the defer@ should not change the behaviour of the xt produced by defer@, which should represent +. But in your implementation, it does. Note that in your implementation, you would always get the same xt for a given deferred word, so it would represent the behaviour of the deferred word, not the behaviour that the deferred word is set to.

In general, creating a new xt from a pair of code and data (each requiring one cell) requires two cells of memory every time you do it, so even if there is no requirement for defer@ to produce the same xt that was given to defer!, I don't see a useful way that does not satisfy this requirement: Neither defer! nor defer@ allocates memory (unless the Forth system employs garbage collection for xts), so the obvious way for implementing defer@ is to produce the same xt.

I am trying to imagine an implementation where you work foremost with code/data pairs, and in most cases you just provide a pointer to that pair, but for defer@ you have a lookup table and it gets the xt from there. Now if you have two words with the same code/data pairs but a different xt (at the moment I have trouble thinking up an example; maybe synonyms, but why not have the same xt for them?), the lookup would only produce one of these xts, and defer@ could produce an xt that defer! did not store there. I am not aware of such an implementation, or anybody wanting to implement such an implementation; it also does not have any merits compared to existing implementation strategies that would make me want to hope that it complies with the standard.

An alternative implementation of DEFER etc. for your implementation strategy is to have three cells: the two cells for code and data pointer (used for, e.g., execute) and a cell for the original xt (used for defer@).

In general, we have had to deal with the problem of how to represent code and data in a single cell from the very earliest Forth implementations, and I have published a paper about this problem at EuroForth 2024.

Onwards to your question:

For defer! the specification says:

Set the word xt1 to execute xt2.

For defer@ the specification says:

xt2 is the execution token xt1 is set to execute.

This sounds to me that the xt2 produced by defer@ should be the same as the one that was last stored with defer! or is. That certainly was my thinking when I wrote the proposal, and the wording also points in that direction: "the execution token ...", not "an execution token equivalent to the one that ...".

Relevance: Are there any programs that rely on xt equality? While I don't remember writing code that checks the result of defer@ or action-of for equality, I can imagine that some people might do it for good reasons. There is no standard xt= that would allow such code to be easily written without xt equality, so one would have to replace it with code that is significantly more cumbersome to write.

[r1460] 2025-07-30 09:08:30 AntonErtl replies:

requestClarification - 3.2.1.1 vs. Double-cell integer behavior on unusual integer architectures

I find very few requirements about the internal representation of doubles in the text (but I have only looked in the most obvious places), so what you suggest looks to be standard-conforming to me.

However, if you want to run existing Forth programs, you may find out that they have (declared or undeclared) "environmental dependencies" (as the standard puts it) on the internal representation of integers, including double-cell integers. OTOH, several people have decided to forego implementing double-cell integers at all (including implementing single-cell (i.e., non-standard) variants of core words that use double-cell integers, such as #), so at least some people don't worry about running existing Forth programs that use double-cell integers.

I don't expect that the committee will come up with more enlightening comments, but there is one thing this question points to that we might want to discuss: Should we require that double-cell integers have twice the number of significant bits as single-cell integers? It seems to me that this is not required now, not even with the 2s-Complement Wrap-Around Integers proposal.

[r1461] 2025-07-30 12:56:14 EricBlake replies:

requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?

Your implementation of defer@ is incorrect even if there is only a requirement for EXECUTE equivalence, but no requirement for xt equality. Consider

T{ 1 2 ' defer4 DEFER@ ' - ' defer4 DEFER! EXECUTE -> 3 }T

Thanks - that's a useful test that should be added, and indeed, is reason enough for me to add in the extra indirection.

I am trying to imagine an implementation where you work foremost with code/data pairs, and in most cases you just provide a pointer to that pair, but for defer@ you have a lookup table and it gets the xt from there. Now if you have two words with the same code/data pairs but a different xt (at the moment I have trouble thinking up an example; maybe synonyms, but why not have the same xt for them?), the lookup would only produce one of these xts, and defer@ could produce an xt that defer! did not store there. I am not aware of such an implementation, or anybody wanting to implement such an implementation; it also does not have any merits compared to existing implementation strategies that would make me want to hope that it complies with the standard.

Implementing a (useful!) SEE requires the ability to look up a word's name from either an xt or a code/data pair - SEE does not have to be fast, but I certainly find it more useful if it can disassemble code back into guesses for which words were compiled in the first place, rather than just outputting raw instruction values (at least the standard was wise enough to say that SEE produces implementation-defined output, so it is a quality-of-implementation rather than a conformance issue if SEE maps a code/data pair in compiled code to the wrong word out of multiple words that happen to share the same code/data implementation). But that is a lookup from xt -> nt (and could be done with a TRAVERSE-WORDLIST that checks if a given xt or code/data matches the code/data of each nt in succession), not a lookup from xt -> xt, and is not relevant to the speed of EXECUTE.

An alternative implementation of DEFER etc. for your implementation strategy is to have three cells: the two cells for code and data pointer (used for, e.g., execute) and a cell for the original xt (used for defer@).

Indeed, tracking 3 or more cells (one for DEFER@, and two or more as a trampoline for less indirection during EXECUTE) appears to be a viable strategy for optimization, in the vein of trading increased memory usage for runtime speed. Off-hand, I'm guessing that most uses of DEFER have more instances of execution (whether compiled or through EXECUTE) than DEFER@ or DEFER!, so the extra time taken in DEFER! to update more than one cell in order to make execution faster is worthwhile in a smart COMPILE,. But my takeaway is also that whatever COMPILE, and EXECUTE do with a defer word, the implementation must ensure that DEFER@ produces a point-in-time snapshot (this is the xt at the time you queried, and the semantics of that xt no longer depend on the future of the defer) while execution remains dynamic (the effects of running the defer must correspond to the xt that was most-recently installed, even if the installation occurred after the point where the deferred word was compiled). At any rate, in the short term I've gone with just one cell (the xt) and the extra indirection, as that was the simplest approach that meets the intended semantics, at which point DEFER@ returns a bit-identical copy of ' + if that is the xt originally stored by DEFER!.

In general, we have had to deal with the problem of how to represent code and data in a single cell from the very earliest Forth implementations, and I have published a paper about this problem at EuroForth 2024.

Very useful reading.

Relevance: Are there any programs that rely on xt equality? While I don't remember writing code that checks the result of defer@ or action-of for equality, I can imagine that some people might do it for good reasons. There is no standard xt= that would allow such code to be easily written without xt equality, so one would have to replace it with code that is significantly more cumbersome to write.

https://forth-standard.org/proposals/input-values-other-than-true-and-false#reply-1083 has a proposed reference implementation for [IF] that uses xt comparisons during [if]-decode. Admittedly, that is performing xt comparisons on the result of WORD and FIND, rather than on the result of DEFER@, but it goes a long ways towards demonstrating that Forth programs depend on being able to reliably compare xts for equality. And it may be possible to implement FIND in such a way that it is using DEFER@ on objects in each nt of the dictionary. It may indeed help if the standard either documents additional guarantees (once a word is defined, bitwise equality testing on the result of FIND and DEFER@ are reliable) or else standardizes XT= (which can then abstract away the implementation magic on why two equivalent xts would ever have different bit values).

One thought I had while playing with all this was whether it might be possible, on a system where a-addr is always a positive value, to encode the difference between interpretation and compilation semantics, and/or identify IMMEDIATE words, based on whether an xt is positive or negative. That is, there may be optimizations possible if FIND is allowed to report two pieces of information in one query: the xt being positive or negative conveying something orthogonal to whether the top stack cell was 1 or -1. In those scenarios, xt equivalence might be : xt= ( xt1 xt2 -- flag) abs swap abs = ;.

[r1462] 2025-07-30 13:41:26 EricBlake replies:

requestClarification - Must the xt returned by DEFER@ have the same value as the one passed to DEFER!, or merely the same behavior?

One thought I had while playing with all this was whether it might be possible, on a system where a-addr is always a positive value, to encode the difference between interpretation and compilation semantics, and/or identify IMMEDIATE words, based on whether an xt is positive or negative. That is, there may be optimizations possible if FIND is allowed to report two pieces of information in one query: the xt being positive or negative conveying something orthogonal to whether the top stack cell was 1 or -1. In those scenarios, xt equivalence might be : xt= ( xt1 xt2 -- flag) abs swap abs = ;.

That said, if I need a way to look up multiple pieces of information for a given parsed word, it's probably better to have my own internal functions or accessors on an nt on top of traverse-wordlist or search-wordlist (among others, taking c-addr u rather than a counted string, possibly taking a flag on whether to do case-sensitive or case-insensitive lookup), and then implement FIND as a thin shim around my internal functions. At that point, even if the internal function encodes multiple pieces of information in an internal xt (such as whether it is native code, a Forth thread with no locals, a Forth thread that needs local cleanup on exit, whether the xt has an associated nt or was created by :noname, ...), the version of the xt exposed via FIND and SEARCH-WORDLIST is scrubbed to mask out any transient bits that help the internal interpret/compile loop but do not impact how EXECUTE would use the result. There would need to be a strong reason why standardizing an XT= rather than living with existing = for xt comparison would be needed, and I don't think I've come up with any such reason.

[r1463] 2025-07-30 15:21:17 EricBlake replies:

requestClarification - 3.2.1.1 vs. Double-cell integer behavior on unusual integer architectures

I don't expect that the committee will come up with more enlightening comments, but there is one thing this question points to that we might want to discuss: Should we require that double-cell integers have twice the number of significant bits as single-cell integers? It seems to me that this is not required now, not even with the 2s-Complement Wrap-Around Integers proposal.

I would much rather keep the existing status quo (a single-cell integer has at least 16 bits, a double-cell integer has at least 32 bits, but beyond that an implementation is not obligated to use the full width of a cell or cell pair for numeric values, and therefore a double need not add any precision on top of a single when the single already has sufficient precision). Do we need wording that guarantees that a double has as least as many bits of precision as the single? (I don't know why anyone would want to implement a single with 64-bit integer math but a double with 53-bits of precision with IEEE double - but the standard doesn't seem to forbid that, given its current permission for an implementation to use less than the full width of a cell). And even if we decide to mandate that if a machine provides 32-bit single cells, then double must provide additional precision, I'd still favor a wording that double must provide "the minimum between 48 bits or twice the bits of single precision", so as to keep double at 32-bits on a 16-bit platform while allowing the option of a double integer using IEEE double floating-point arithmetic as the workhorse on a 32-bit machine rather than having to implement 64-bit integers on that machine. And I'm certainly not in favor of mandating that a 64-bit single must imply a 128-bit double (or that a 53-bit single must imply a 106-bit double). An implementation can provide twice the precision if it wants (and I enjoy that gforth has 128-bit doubles on a 64-bit machine), but allowing is different than mandating.

Another platform I thought of: implementing Forth on OCaml would need to deal with the fact that OCaml defaults to a native int of 31 or 63 bits (the representation in memory is shifted to reserve a flag bit to make detection of integers vs pointers easier, https://stackoverflow.com/questions/3773985/why-is-an-int-in-ocaml-only-31-bits); it also has an Int32 and Int64 but those take up more space (an int on a 32-bit platform fits in 4 bytes, but Int32 requires 8 bytes on both 32- and 64-bit platforms because it is a pointer to boxed storage).

Another consideration: recent C23 has standardized <stdckdint.h> which gives options for performing math where overflow triggers specific behavior. There are times when I don't care about overflow (use '+' in C) and times when I do (use ckd_add); in that same vein, Forth should let me use + when I want the fastest operation possible (and ambiguous results on overflow is my problem), but where I could also write other operators that have guaranteed twos-complement wraparound, and/or a throw on an overflow, and then select which operator is best for my context. This is one case where Forth's ability to define words with non-alphanumerics makes adding operators easy.

Ultimately, it is a potential burden to mandate that overflows must massage results back into a twos-complement result regardless of the underlying precision used in performing the arithmetic; and it is an orthogonal burden to mandate that an implementation must provide double integers with twice the precision of single cells. On an underlying machine that already has twos-complement arithmetic, writing code to mask a 64-bit value down to 48 bits of usable precision is much simpler than writing code to implement 128-bit double integers on top of 64-bit integers. But it is a harder question to answer which of those two burdens is more onerous on niche architectures that are not your run-of-the-mill 32- or 64-bit twos-complement arithmetic of current mainstream computing devices. Part of the standardization process is choosing which tradeoffs to make to maximize portability of a program deemed standard, vs. alienating the platforms where those requirements are hard to meet, and with an eye towards not needlessly breaking backwards compatibility.