Digest #17 2017-02-16


[28] 2017-02-14 19:53:17 alextangent wrote:

requestClarification - UTF-8 and Unicode codepoint maximum

U+10FFFF is the maximum Unicode codepoint, and anything above this is not valid UTF-8. The implementation of X-SIZE should therefore return a -77 Malformed xchar error for anything beyond UTF-8 F4 8F BF BF


[r75] 2017-02-15 01:27:46 BerndPaysan replies:

requestClarification - UTF-8 and Unicode codepoint maximum

Yes, though that's an artefact of UTF-16, and it might be possible that future Unicode standards lift that limitation (when they run out of code points... and deprecate UTF-16 or provide a means to expand it's code point range). With the current Unicode standard, a -77 throw for code points above $10FFFF is a correct and high quality implementation.

Note that the Posted principle suggests that you accept slightly wrong data, but you shall not produce it, so it's more important to throw on XC! + and XEMIT.