Digest #264 2024-06-18
Contributions
Author
Ruv
Change Log
- 2024-06-17 Initial version
Problem
Sometimes it is required to apply evaluate
to a string that contains multiple text lines (text fragments separated with a line terminator sequence) and single-line comments (i.e., the word "\
").
In this case the word "\
" will skip all text lines till the end of the string. The desired behavior is that it skips the text till (and including) the nearest line terminator only, if the parse area contains a line terminator.
Previous work
Some known discussions/posts on this problem:
- 2021-11-22 New Line characters in a string passed to EVALUATE
- 2022-10-16 Portable line-oriented parsing
Solution
Change the glossary entry 6.2.2535 \
(in CORE EXT) to include the functionality from 7.6.2.2535 \
, so this word will work as expected regardless the input source kind. Namely, include this functionality:
parse and discard the portion of the parse area corresponding to the remainder of the current line.
The behavior will not change when the input source is a file, since in this case the input buffer contains only a single line.
Proposal
Remove the glossary entry 7.6.2.2535 \
(in BLOCK EXT)
In the glossary entry 6.2.2535 \
(in CORE EXT), replace the text description for the Execution semantics with the following;
Parse and discard the portion of the parse area corresponding to the remainder of the current line.
\
is an immediate word.
Reference implementation
A portable implementation (redefinition) for the word \
is following.
: evaluation ( -- flag )
\ Return a flag: is the input source a string being evaluated.
[defined] blk [if] blk @ 0<> if false exit then [then]
source-id -1 =
;
: source-following ( -- sd )
\ Return the parse area (a string).
\ NB: the returned string may contain a line-terminator sequence in any position.
source >in @ /string
;
: skip-source-line ( -- )
\ Discard a part of the parse area that belongs to the current line.
evaluation 0= if ['] \ execute exit then
source-following over >r s\" \n" dup >r search if drop r@ then + rdrop
r> - >in +!
;
: \ ( -- )
skip-source-line
; immediate
Testing
t{ s\" 1 \\ \n drop 0 " evaluate -> 0 }t
Replies
Data types
The standard word @
has the stack diagram ( a-addr -- x )
, ditto for !
, lshift
, rshift
— they operate on the most general single-cell data type x.
The proposed words w>s ( x -- n )
, l>s ( x -- n )
also operate on a parameter of the data type x.
Thus, the following words probably should also have the data type x
instead of the data type u
in their stack diagrams:
w@ ( c-addr -- u )
l@ ( c-addr -- u )
x@ ( c-addr -- u )
wbe ( u1 -- u2 )
wle ( u1 -- u2 )
lbe ( u1 -- u2 )
lle ( u1 -- u2 )
xbe ( u1 -- u2 )
xle ( u1 -- u2 )
Alternatively, new data types x8, x16, x32, x64 can be introduced and used for the fetch operations.
Also, addr
should be used instead of c-addr
. Because either these data types are equal (in a Forth-2019 system), or these words should work on addr
(in a Forth-2012 system).
Wording
w>s ( x -- n )
Sign-extend the 16-bit value in x to cell n.
Does it follow from this that $1234567890abcdef w>s
shall produce $ffffffffffffcdef
on a 64bit system?
Since the part "16-bit value in x" makes impression that the value in x, if interpreted as unsigned number, must belong to the range { 0 ... 65535 }.
16-bit and 32-bit systems
For l-family and x-family words, it should be mentioned in a note that the word can be only provided by the system if the cell size is not less than 32 and 64 bits correspondingly.
If the Forth-system has the address unit 32 bit (for example, a JavaScript-based Forth system), may this system provide w@
and w!
words?
Of course, these words can only read and write 16 least significant bits of an address unit in such a system.
Characters and bytes
c>s ( x -- n )
Sign-extend the 8-bit value in x to cell n.
The characters are not bound to 8 bit in the standard. For example, the character size can be the same as the cell size (on a cell-addressed Forth system). So, the text description for this word shall say: "Sign-extend the character value in x to cell n"
I suggest to add the words b! ( x addr -- )
, b@ ( addr -- x )
, b>s ( x -- n )
to operate on octets.
If the character size is 8 bit, they aliased to c-words, otherwise they either have own implementation, or are not provided by the system.
The word с@
should not be used to read octets without testing the environment.
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
At the 2023 meeting the committee decided to suggest that you (the proponent) should remove references to the wordset and that you move the specification of MAKE-IEEE-DFLOAT
to the proposal section, and then I should promote the proposal to "formal".
However, the proposal seems clear enough, so I am promoting it to "formal" directly.
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
Looking at what other programming languages do, in the C world I see:
double ldexp(double x, int exp);
which returns the result of multiplying the floating-point number x by 2 raised to the power exp.
Interestingly, C does not provide a function where the mantissa is provided as an integer number.
One problem with the proposed word that I see is that it produces a native FP value of the system, but the name claims that it produces an IEEE-DFLOAT (which I would interpret as IEEE binary64 number). A number of Forth systems use the Intel extended format (which is also a variant of the IEEE format, but it's not binary64), and on those a word similar to ldexp() may be more appropriate.
w! ( x c-addr -- )
Store the bottom 16 bits of x at c_addr.
The standard uses the term "least significant" instead of "bottom". Taking into account my previous comment, I would suggest the following specifications:
w!
( x addr -- ) "w-store"
Store the least significant 16 bits of x at addr.
When the address unit size is greater than 16 bits, only 16 least significant bits can be modified at addr.
w@
( addr -- x ) "w-fetch"
x is the zero-extended 16-bit value stored at 16 addr.
When the address unit size is greater than 16 bits, only 16 least significant bits from addr are transferred.
If this wording is acceptable, it should be also used for l-family and x-family words.
what this proposal has to do with "interpreted loops"
As far as I know, "interpreted loops" are not used in standard programs (i.e., Forth programs that are independent on a particular Forth system). And the systems that provide save-input
and restore-input
may continue to provide these words.
An alternative way to implement an interpreted loop is to parse the input stream (the loop body) into a buffer and evaluate the buffer.
Concerning possible problems, see #217 New Line characters in a string passed to EVALUATE.