Digest #151 2021-05-15
Contributions
The test cases appear to assume a two's complement implementation of doubles even though the rationale for the existence of this word is that the implementation might not be two's complement. A similar situation exists for S>D
, whose entry says even less about the possibility of signed-magnitude doubles. I think the only viable test cases would have S>D
followed by D>S
. Without the presence of D>S
in the core word set, I can't think of any feasible test case for S>D
that does not rely on the double word set.
Alternatively, the standard could state that the implementation of doubles must be two's complement, and make this word obsolete. Are there any systems in use that don't use two's complement?
Replies
It's closer to the standard document (which I find helpful, because I sometimes need to see the section number and the wordset). However, all the links I tried are broken.
requestClarification - Return value of XC-WIDTH for control characters
An exhaustive definition of control characters for (say) UTF-8 will be a big job. See for example
<code> https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf
If you can't do a proper job, then I suggest that we need an undefined/unknown return value, for which the obvious value is -1. This will certainly work for embedded systems.
requestClarification - Return value of XC-WIDTH for control characters
An exhaustive definition of control characters for (say) UTF-8 will be a big job. See for example
<code> https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf
If you can't do a proper job, then I suggest that we need an undefined/unknown return value, for which the obvious value is -1. This will certainly work for embedded systems.
requestClarification - Return value of XC-WIDTH for control characters
I suggest that we need an undefined/unknown return value, for which the obvious value is -1.
It should be noted that the reference implementation for X-WIDTH
is admissible if XC-WIDTH
returns 1
or 0
for control codes, but it isn't admissible if XC-WIDTH
returns -1
for control codes.
When I print a string with control codes in the console, all but several of them takes one em.
-1
is easily transformed to 1
via ABS
.
But -1
can be an actual width for DEL (in that sense that it decreases the length of string by one). Then is it acceptable as a replacement for unknown values?
If a program handles some control characters, it knows these characters and it probably will not apply XC-WIDTH
for this characters at all. If the program gets special "unknown" value from XC-WIDTH
, what it can do with that?
So I'm not convinced that we really need a special code for "unknown". Probably it's enough to just some not too bad fallback value.
requestClarification - Return value of XC-WIDTH for control characters
I have done some more research of what other languages do. All use -1 except Julia that returns 0. Based on this I will implement -1 as the return value for the characters 01 to $1F and $80 to $9F. -1 will signify that a width can not be determined. As a consequence also X-WIDTH will return -1 if such a character is found in the string. The same will also apply for the range $D800-$DFFF , the surrogate range, these codes can never appear in a valid string
requestClarification - Return value of XC-WIDTH for control characters
X-WIDTH
will return -1 if such a character is found in the string.
Actually, the value -1
(or other negative) semantically violates the current specification, since a negative "number of monospace ASCII characters" is a nonsense.
At the moment, a program may use something code like ... X-WIDTH BLANK
or ... XC-WIDTH BLANK
. Such a program is standard compliant, but it will fail on your system for some characters.
requestClarification - Return value of XC-WIDTH for control characters
If XC-WIDTH
(or X-WIDTH
) may return -1
as a special value, then in the most cases this word should be followed by if
as:
... XC-WIDTH DUP -1 = IF DROP ( workaround ) ... ELSE ( use the width ) ... THEN
Perhaps a better way (and closer to Forth, where "functions" can return several values) was to introduce another word that also returns a flag.
Some possible variants:
XC-WIDTH?
( xchar -- u true | false ) — similar toENVIRONMENT?
( c-addr u -- i*x true | false )XC>WIDTH
( xchar -- u true | xchar false ) — similar toEKEY>CHAR
( x -- char true | x false ) andEKEY>FKEY
( x -- u true | x false )
requestClarification - Return value of XC-WIDTH for control characters
There are probably very few occasions where these words are really needed. One example is a command-line editor. I recently ported the editor from lxf to lxf64 and needed these redefinitions. lxf64 xc-width returns -1 for control chars
: xc-width0 xc-width 0 max ;`
: x-width0 ( addr u )
over + >r 0 swap
begin dup r@ < while xc@+ xc-width0 rot + swap repeat
r> 2drop ;
I implemented the XCHAR wordset when it was discussed on CLF. I do not remember any discussion of the description of XC-WIDTH more then it was a word that should be in the standard. Now when emojis are available, easy to input and take 2 chars space we need to revisit these words.
Just a side note. On the the primary web page we can find: Discuss the functions of website itself in the Meta Discussion.