Digest #264 2024-06-18

Contributions

[344] 2024-06-17 20:47:11 ruv wrote:

proposal - Support for single line comments during `evaluate`

Author

Ruv

Change Log

2024-06-17 Initial version

Problem

Sometimes it is required to apply evaluate to a string that contains multiple text lines (text fragments separated with a line terminator sequence) and single-line comments (i.e., the word "\"). In this case the word "\" will skip all text lines till the end of the string. The desired behavior is that it skips the text till (and including) the nearest line terminator only, if the parse area contains a line terminator.

Previous work

Some known discussions/posts on this problem:

2021-11-22 New Line characters in a string passed to EVALUATE
2022-10-16 Portable line-oriented parsing

Solution

Change the glossary entry 6.2.2535 \ (in CORE EXT) to include the functionality from 7.6.2.2535 \, so this word will work as expected regardless the input source kind. Namely, include this functionality:

parse and discard the portion of the parse area corresponding to the remainder of the current line.

The behavior will not change when the input source is a file, since in this case the input buffer contains only a single line.

Proposal

Remove the glossary entry 7.6.2.2535 \ (in BLOCK EXT)

In the glossary entry 6.2.2535 \ (in CORE EXT), replace the text description for the Execution semantics with the following;

Parse and discard the portion of the parse area corresponding to the remainder of the current line. \ is an immediate word.

Reference implementation

A portable implementation (redefinition) for the word \ is following.

: evaluation ( -- flag )
  \ Return a flag: is the input source a string being evaluated.
  [defined] blk [if] blk @ 0<> if false exit then [then]
  source-id -1 =
;
: source-following ( -- sd )
  \ Return the parse area (a string).
  \ NB: the returned string may contain a line-terminator sequence in any position.
  source >in @ /string
;
: skip-source-line ( -- )
  \ Discard a part of the parse area that belongs to the current line.
  evaluation 0= if ['] \ execute exit then
  source-following  over >r  s\" \n"  dup >r  search  if drop r@ then  +  rdrop
  r> -  >in +!
;
: \ ( -- )
  skip-source-line
; immediate

Testing

t{ s\" 1 \\ \n drop 0 " evaluate -> 0 }t

Replies

[r1233] 2024-06-17 17:03:32 ruv replies:

proposal - Special memory access words

Data types

The standard word @ has the stack diagram ( a-addr -- x ), ditto for !, lshift, rshift — they operate on the most general single-cell data type x.

The proposed words w>s ( x -- n ), l>s ( x -- n ) also operate on a parameter of the data type x.

Thus, the following words probably should also have the data type x instead of the data type u in their stack diagrams:

w@ ( c-addr -- u )
l@ ( c-addr -- u )
x@ ( c-addr -- u )
wbe ( u1 -- u2 )
wle ( u1 -- u2 )
lbe ( u1 -- u2 )
lle ( u1 -- u2 )
xbe ( u1 -- u2 )
xle ( u1 -- u2 )

Alternatively, new data types x8, x16, x32, x64 can be introduced and used for the fetch operations.

Also, addr should be used instead of c-addr. Because either these data types are equal (in a Forth-2019 system), or these words should work on addr (in a Forth-2012 system).

Wording

w>s ( x -- n ) Sign-extend the 16-bit value in x to cell n.

Does it follow from this that $1234567890abcdef w>s shall produce $ffffffffffffcdef on a 64bit system?

Since the part "16-bit value in x" makes impression that the value in x, if interpreted as unsigned number, must belong to the range { 0 ... 65535 }.

16-bit and 32-bit systems

For l-family and x-family words, it should be mentioned in a note that the word can be only provided by the system if the cell size is not less than 32 and 64 bits correspondingly.

If the Forth-system has the address unit 32 bit (for example, a JavaScript-based Forth system), may this system provide w@ and w! words? Of course, these words can only read and write 16 least significant bits of an address unit in such a system.

Characters and bytes

c>s ( x -- n ) Sign-extend the 8-bit value in x to cell n.

The characters are not bound to 8 bit in the standard. For example, the character size can be the same as the cell size (on a cell-addressed Forth system). So, the text description for this word shall say: "Sign-extend the character value in x to cell n"

I suggest to add the words b! ( x addr -- ), b@ ( addr -- x ), b>s ( x -- n ) to operate on octets.

If the character size is 8 bit, they aliased to c-words, otherwise they either have own implementation, or are not provided by the system.

The word с@ should not be used to read octets without testing the environment.

[r1234] 2024-06-17 17:06:40 AntonErtl replies:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

At the 2023 meeting the committee decided to suggest that you (the proponent) should remove references to the wordset and that you move the specification of MAKE-IEEE-DFLOAT to the proposal section, and then I should promote the proposal to "formal".

However, the proposal seems clear enough, so I am promoting it to "formal" directly.

[r1235] 2024-06-17 17:35:59 AntonErtl replies:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

Looking at what other programming languages do, in the C world I see:

double ldexp(double x, int exp);

which returns the result of multiplying the floating-point number x by 2 raised to the power exp.

Interestingly, C does not provide a function where the mantissa is provided as an integer number.

One problem with the proposed word that I see is that it produces a native FP value of the system, but the name claims that it produces an IEEE-DFLOAT (which I would interpret as IEEE binary64 number). A number of Forth systems use the Intel extended format (which is also a variant of the IEEE format, but it's not binary64), and on those a word similar to ldexp() may be more appropriate.

[r1236] 2024-06-17 17:37:58 ruv replies:

proposal - Special memory access words

w! ( x c-addr -- ) Store the bottom 16 bits of x at c_addr.

The standard uses the term "least significant" instead of "bottom". Taking into account my previous comment, I would suggest the following specifications:

w! ( x addr -- ) "w-store"
Store the least significant 16 bits of x at addr. When the address unit size is greater than 16 bits, only 16 least significant bits can be modified at addr.

w@ ( addr -- x ) "w-fetch"
x is the zero-extended 16-bit value stored at 16 addr. When the address unit size is greater than 16 bits, only 16 least significant bits from addr are transferred.

If this wording is acceptable, it should be also used for l-family and x-family words.

[r1237] 2024-06-17 19:42:29 ruv replies:

proposal - Obsolescence for SAVE-INPUT and RESTORE-INPUT

what this proposal has to do with "interpreted loops"

As far as I know, "interpreted loops" are not used in standard programs (i.e., Forth programs that are independent on a particular Forth system). And the systems that provide save-input and restore-input may continue to provide these words.

An alternative way to implement an interpreted loop is to parse the input stream (the loop body) into a buffer and evaluate the buffer.

Concerning possible problems, see #217 New Line characters in a string passed to EVALUATE.