Digest #151 2021-05-15

Contributions

[199] 2021-05-14 20:22:32 JimPeterson wrote:

testcase - Inaccurate Test Cases?

The test cases appear to assume a two's complement implementation of doubles even though the rationale for the existence of this word is that the implementation might not be two's complement. A similar situation exists for S>D, whose entry says even less about the possibility of signed-magnitude doubles. I think the only viable test cases would have S>D followed by D>S. Without the presence of D>S in the core word set, I can't think of any feasible test case for S>D that does not rely on the double word set.

Alternatively, the standard could state that the implementation of doubles must be two's complement, and make this word obsolete. Are there any systems in use that don't use two's complement?

Replies

[r683] 2021-05-07 07:49:42 AntonErtl replies:

requestClarification - Web site problem?

It's closer to the standard document (which I find helpful, because I sometimes need to see the section number and the wordset). However, all the links I tried are broken.


[r684] 2021-05-07 11:10:14 StephenPelc replies:

requestClarification - Return value of XC-WIDTH for control characters

An exhaustive definition of control characters for (say) UTF-8 will be a big job. See for example

<code> https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf

If you can't do a proper job, then I suggest that we need an undefined/unknown return value, for which the obvious value is -1. This will certainly work for embedded systems.


[r685] 2021-05-07 11:10:38 StephenPelc replies:

requestClarification - Return value of XC-WIDTH for control characters

An exhaustive definition of control characters for (say) UTF-8 will be a big job. See for example

<code> https://www.unicode.org/versions/Unicode12.0.0/ch23.pdf

If you can't do a proper job, then I suggest that we need an undefined/unknown return value, for which the obvious value is -1. This will certainly work for embedded systems.


[r686] 2021-05-07 14:54:39 ruv replies:

requestClarification - Return value of XC-WIDTH for control characters

I suggest that we need an undefined/unknown return value, for which the obvious value is -1.

It should be noted that the reference implementation for X-WIDTH is admissible if XC-WIDTH returns 1 or 0 for control codes, but it isn't admissible if XC-WIDTH returns -1 for control codes.

When I print a string with control codes in the console, all but several of them takes one em.

-1 is easily transformed to 1 via ABS.

But -1 can be an actual width for DEL (in that sense that it decreases the length of string by one). Then is it acceptable as a replacement for unknown values?

If a program handles some control characters, it knows these characters and it probably will not apply XC-WIDTH for this characters at all. If the program gets special "unknown" value from XC-WIDTH, what it can do with that?

So I'm not convinced that we really need a special code for "unknown". Probably it's enough to just some not too bad fallback value.


[r687] 2021-05-07 17:14:50 PeterFalth replies:

requestClarification - Return value of XC-WIDTH for control characters

I have done some more research of what other languages do. All use -1 except Julia that returns 0. Based on this I will implement -1 as the return value for the characters 01 to $1F and $80 to $9F. -1 will signify that a width can not be determined. As a consequence also X-WIDTH will return -1 if such a character is found in the string. The same will also apply for the range $D800-$DFFF , the surrogate range, these codes can never appear in a valid string


[r688] 2021-05-07 17:53:36 ruv replies:

requestClarification - Return value of XC-WIDTH for control characters

X-WIDTH will return -1 if such a character is found in the string.

Actually, the value -1 (or other negative) semantically violates the current specification, since a negative "number of monospace ASCII characters" is a nonsense.

At the moment, a program may use something code like ... X-WIDTH BLANK or ... XC-WIDTH BLANK. Such a program is standard compliant, but it will fail on your system for some characters.


[r689] 2021-05-07 20:17:28 ruv replies:

requestClarification - Return value of XC-WIDTH for control characters

If XC-WIDTH (or X-WIDTH) may return -1 as a special value, then in the most cases this word should be followed by if as:

   ... XC-WIDTH DUP -1 = IF DROP ( workaround ) ... ELSE ( use the width ) ... THEN

Perhaps a better way (and closer to Forth, where "functions" can return several values) was to introduce another word that also returns a flag.

Some possible variants:

  • XC-WIDTH? ( xchar -- u true | false ) — similar to ENVIRONMENT? ( c-addr u -- i*x true | false )
  • XC>WIDTH ( xchar -- u true | xchar false ) — similar to EKEY>CHAR ( x -- char true | x false ) and EKEY>FKEY ( x -- u true | x false )

[r690] 2021-05-07 22:10:06 PeterFalth replies:

requestClarification - Return value of XC-WIDTH for control characters

There are probably very few occasions where these words are really needed. One example is a command-line editor. I recently ported the editor from lxf to lxf64 and needed these redefinitions. lxf64 xc-width returns -1 for control chars

: xc-width0 xc-width 0 max ;`
   
: x-width0   ( addr u ) 
    over + >r 0 swap 
    begin dup r@ < while xc@+ xc-width0 rot + swap repeat 
    r> 2drop ;

I implemented the XCHAR wordset when it was discussed on CLF. I do not remember any discussion of the description of XC-WIDTH more then it was a word that should be in the standard. Now when emojis are available, easy to input and take 2 chars space we need to revisit these words.


[r691] 2021-05-09 12:27:46 ruv replies:

requestClarification - Web site problem?

Just a side note. On the the primary web page we can find: Discuss the functions of website itself in the Meta Discussion.