Digest #149 2021-05-06
This entry on the web site seems a bit messed up, being different from all other entries. It looks like something went wrong.
what the width should be for control characters ( 1 to $1F and $80 to $9F )
Whatever it should be, it should be specified — zero, one, or system dependent.
The space that a control character takes depends on the environment and context. E.g. Tab (0x9) can take from 1 to 8 em (depending on the position on the display, and the display properties). Bell (0x7) usually takes 0 em.
Obviously, a Forth system cannot return the actual width for control characters in all cases.
If a program knows the environment and wants to calculate the actual width of a string, it should implement its own version of
X-WIDTH. And then it doesn't matter what
XC-WIDTH returns for control characters. But what
XC-WIDTH returns should be good enough as a fallback.
Perhaps a good enough approach is to suppose that the control characters don't have any "control" function and they are mapped to some real characters and displayed as usual characters. And then
XC-WIDTH should return
1 for them.
A side note. In XML the most control characters are illegal due to unclear semantics in the context of XML (see useful comments at StackOverflow).
Note that where the standard mentions characters, it nearly always means "primitive characters" or pchars for short. With the AU=1 decision, this means bytes in all practical cases. UTF-16 systems need to use XCHAR operations. KEY and EMIT are still there to handle 8 bit operations such as Telnet. Note also that on most implementations, TCP/IP and USB operations can have packet breaks in the middle of UTF-16 characters.