Digest #311 2025-08-19

Contributions

[403] 2025-08-18 12:10:38 EricBlake wrote:

referenceImplementation - Possible reference implementation

This implementation requires SPACES to gracefully ignore negative input; although it has been questioned whether that is intended by the standard: https://forth-standard.org/standard/core/SPACES#contribution-337. Note that this implementation does not use S>D; doing so would produce the wrong results on a twos-complement machine for the minimum integer value.

: .R ( n1 n2 -- ) \ "dot-r"
  SWAP DUP >R ABS 0 <# #S R> SIGN #> ( n2 c-addr u )
  ROT OVER - SPACES TYPE
;

[404] 2025-08-18 14:54:18 EricBlake wrote:

referenceImplementation - Possible reference implementation

The standard does not directly provide any word for dividing a double and getting a double quotient and single remainder simultaneously; thus, it is very useful to provide a reference implementation for how to do this. One possibility is to compute a smaller intermediate ud as the dividend to pass to the standard UM/MOD without triggering ambiguous behavior; plus an optimization to avoid the overhead of two divisions when ud is already small enough.

: ud/mod ( ud1 u1 -- u2 ud2 )
  \ Compute quotient ud2 and remainder u2 such that ud2*u1+u2=ud1
  OVER 0= IF UM/MOD 0 EXIT THEN \ fast version when ud1 fits in single
  DUP >R 0 SWAP ( u.lo ud.t1 u1 ) ( R: u1 ) \ create ud.t1 from upper half of ud1
  UM/MOD ( u.lo u.remhi u.quohi ) ( R: u1 ) \ perform first division
  R> SWAP >R ( ud.t2 u1 ) ( R: u.quohi ) \ create ud.t2 from lower half of ud1 and upper remainder
  UM/MOD R> ( u2 ud2 ) \ second division, ud2 constructed from two halves of quotient
;
: # ( ud1 -- ud2 ) \ "number-sign"
  BASE @ ud/mod ROT ( ud2 u.rem )
  DUP #10 < IF '0' ELSE [ 'A' #10 - ] LITERAL THEN + ( ud2 char )
  HOLD
;

Alternatively, using the double word set, and demonstrating a different bit-twiddling technique for branchless conversion of a digit to ASCII:

: # ( ud1 -- ud2 ) \ "number-sign"
  2DUP 1 BASE @ M*/ ( ud1 ud2 ) \ determine quotient
  2DUP BASE @ NEGATE 1 M*/ 2ROT ( ud2 ud ud1 ) \ ud = -ud2*base
  D+ DROP ( ud2 rem ) \ determine remainder
  DUP #9 > #7 AND + '0' + ( ud2 char ) \ exploit that 'A' - '9' = 8
  HOLD
;

Replies

[r1499] 2025-08-17 15:40:24 EricBlake replies:

referenceImplementation - Possible reference implementation

The following uses a couple of words from the double and string sets; the alternative using only words in core is more verbose.

: >digit ( char -- +n true | 0 ) \ "to-digit"
  \ convert char to a digit according to base followed by true, or false if out of range
  DUP [ '9' 1+ ] LITERAL <
  IF '0' - \ convert '0'-'9'
    DUP 0< IF DROP 0 EXIT THEN \ reject < '0'
  ELSE
    BL OR \ convert to lowercase, exploiting ASCII
    'a' -
    DUP 0< IF DROP 0 EXIT THEN \ reject non-letter < 'a'
    #10 + \ convert 'a'-'z'
  THEN
  DUP BASE @ < DUP 0= IF NIP THEN ( +n true | false ) \ reject beyond base
;
: >NUMBER ( ud1 c-addr1 u1 -- ud2 c-addr2 u2 ) \ "to-number"
  2SWAP 2>R
  BEGIN ( c-addr u ) ( R: ud.accum )
    DUP WHILE \ character left to inspect
      OVER C@ >digit
    WHILE \ digit parsed within base
      2R> BASE @ 1 M*/ ( c-addr u n.digit ud.accum ) \ scale accum by base
      ROT M+ 2>R \ add current digit to accum
      1 /STRING ( c-addr1+1 u1-1 )
  REPEAT THEN
  2R> 2SWAP ( ud2 c-addr2 u2 )
;

[r1500] 2025-08-17 17:51:39 mykesx replies:

requestClarification - Getting the block contents

It's not clear to me what happens if you LOAD a block that contains a (nested) LOAD.

To allow this would require BLK to be something like a stack.

Also to consider is when you INCLUDED a file from a LOADed block that does a LOAD...


[r1501] 2025-08-18 09:47:45 ruv replies:

requestClarification - Is it ambiguous to execute the use of does> after :NONAME or FORGET?

So, my request for clarification includes figuring out whether this code should become well-defined or remain ambiguous:

CREATE a
: does1 DOES> DROP ." in does body" ;
:NONAME ." in anon" ;  ( xt1 )
does1
' a ( xt1 xt2 )
." a: " CATCH (xt1 X ior ) 2DROP ( xt1 )
." anon xt: " EXECUTE

This example clearly contains an ambiguous condition, because when does1 (and the run-time semantics of does>) is executed the latest named definition is does1, which is not created with create.

If we change the example to:

: does1 DOES> DROP ." in does body" ;
CREATE a
:NONAME ." in anon" ;  ( xt1 )
does1

It becomes questionable.

From the clause "Replace the execution semantics of the most recent definition, referred to as name" I would conclude that it only concerns a named definition. But 6.1.1710 implies that the most recent definition can be an anonymous definition. Therefore, I would consider this example as ambiguous.

The cases when the current definition exists (the definition whose compilation has been started but not yet finished) seems ambiguous for does> run-time too. I think, does> and immediate should be clarified on this regard too. The term "most recent definition" should be also formally defined.

is the execution of a DOES> after a marker is executed supposed to reliably affect the most recent name (assuming that name was defined by CREATE) prior to the creation of the marker, or should the standard explicitly mention the ambiguity of not having a definitive most recent named word after a marker:

Formally, this case is not ambiguous at the moment. If some Forth systems does not behave correctly, we can ask authors to fix this or make a proposal to declare the corresponding ambiguous condition.


[r1502] 2025-08-18 10:11:03 ruv replies:

proposal - New words: latest-name and latest-name-in

@EricBlake wrote in the comment [r1493] (to the does> glossary entry):

Are there any implementations where the act of making b IMMEDIATE moves it out of one wordlist (the list of normal words) and into another (the list of immediate words), such that the proposed wording of LATEST-NAME changes its view of which name is encountered first in search order,

Good catch!

This is possible only in a Forth system that does not provide words from the Search-Order word set, so a program cannot detect such a move. I don't know such standard-compliant Forth systems.

I see two possible solutions:

  • add to the specification of LATEST-NAME the case when the compilation word list is not available;
  • introduce the formal term "most recent named definition" (that covers both cases), and rely LATEST-NAME on this term;

[r1503] 2025-08-18 12:13:44 EricBlake replies:

requestClarification - Behavior of `0 SPACES`

Given the Forth-94 made all these changes for the other words, but chose "n" for SPACES, makes me suspect that that choice was deliberate rather than an oversight, and that there was existing practice using SPACES with n<0.

One such possibility is the implementation of .R, which is slightly easier if SPACES accepts negative input: https://forth-standard.org/standard/core/DotR#contribution-403


[r1504] 2025-08-18 12:33:12 EricBlake replies:

referenceImplementation -

This shorter implementation appeals to me (but only because .R then takes care of the <# ... #>):

: .. ( n -- ) \ "dot"
  0 .R SPACE
;

[r1505] 2025-08-18 12:37:54 EricBlake replies:

referenceImplementation -

It would help if I didn't typo .. where . was intended.


[r1506] 2025-08-18 15:32:56 EricBlake replies:

referenceImplementation - Possible reference implementation

For implementations that do not want to support full double arithmetic, it appears that Forth-2012 permits an implementation where a double uses fewer bits than twice the number of bits in a single. That is, I argue that as long as a single has at least 32 bits (the minimum required range of a double), an implementation-defined encoding of doubles where the only valid representations of a double are ( u 0 ) and ( n.negative -1 ) is possible (even if the testsuite needs more work before passing all tests on such a system). With that definition, an implementation could be as simple as:

: # ( ud1 -- ud2 ) \ Implementation for a system where all significant bits of a double fit in a single
  dup 0<> -24 and throw \ reject input that is not an unsigned double in implementation-defined form
  base @ um/mod 0 rot ( ud1 -- ud2 rem ) \ single division adequate in this implementation
  dup #9 > #7 and + '0' + hold ;

[r1507] 2025-08-18 18:27:35 ruv replies:

requestClarification - Getting the block contents

what happens if you LOAD a block that contains a (nested) LOAD.

LOAD saves the current input source specification (typically, to the return stack), and restores it at the end. And INCLUDED does the same.

"Input source specification" is a formal term (see 2.1 Definitions of terms). The contents of BLK are part of the input source specification, so they also are saved and restored. Concerning exceptions and throw — see my other comment.