Digest #139 2021-04-04

Contributions

[184] 2021-04-03 15:34:40 AntonErtl wrote:

proposal - EMIT and non-ASCII values

Author:

Anton Ertl

Change Log:

2021-04-03 Original proposal

Problem:

The first ideas for the xchar wordset had EMIT behave like (current) XEMIT. Then Stephen Pelc pointed out that EMIT is used in a number of programs for dealing with raw bytes, so we introduced XEMIT for dealing with extended characters. But the wording and stack effect of EMIT suggests that EMIT should deal with (possibly extended) characters rather than raw bytes. This is at odds with a number of implementations, and there is hardly any reason to keep both EMIT and XEMIT.

Solution:

Define EMIT to deal with raw bytes.

I leave a likewise proposal for KEY to interested parties.

Typical use: (Optional)

$c3 emit $a4 emit \ outputs ä on an UTF-8 system

Proposal:

Change the definition of EMIT into:

EMIT ( char -- )

Send char as raw byte to the user output device.

Rationale:

EMIT supports low-level communication of arbitrary contents, not limited to specific encodings; it corresponds to TYPEing one char/byte. To print multi-byte extended characters, the straightforward way is to use TYPE or XEMIT, but you can also print the individual bytes with multiple EMITs.

Reference implementation:

create emit-buf 1 allot

: emit ( char -- )
  emit-buf c! emit-buf 1 type ;

Existing practice

Gforth, SwiftForth, and VFX implement EMIT as dealing with raw bytes (tested with the "typical use" above), but Peter Fälth's system implements EMIT as an alias of XEMIT, and iForth prints two funny characters. It is unclear if there are any existing programs affected by the proposed change.

Testing:

This cannot be tested from a standard program, because there is no way to inspect the output of EMIT.

Replies

[r625] 2021-04-03 08:37:52 AntonErtl replies:

referenceImplementation - Suggested reference implementation

Actually implementing TYPE using EMIT is fine even for strings containing xchars. Such an implementation of TYPE emits the individual chars (bytes) of a string, one byte at a time, which fits nicely with C@ and CHAR+ (1+). That's the magic the string representation of xchars; Strings still are arrays of chars (bytes). If you don't need individual xchars (and you rarely do), you can just treat them as such, no need for xchar-specific words. That's why we have no XTYPE.