Proposal: [305] Include a revised 79-STANDARD Specification for "><" To "Core Ext"

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

flaagelavatar of flaagel [305] Include a revised 79-STANDARD Specification for "><" To "Core Ext"Proposal2023-08-04 14:24:50

Author:

Francois Laagel

Change Log:

N/A.

Problem:

In networking code, it is often necessary to switch to a network canonical integer representation (big endian). The most commonly used platforms (Intel systems) are little endian so some byte re-ordering is necessary. 79-STANDARD covered this with a specification for "><" but it was restricted to 16 bit cell targets.

Solution:

Add the following specification for "><" to the "Core Ext" word set:

< u1 -- u2 "byte-swap" Return u2 as a representation of u1 in which all individual bytes appear in reversed order.

This allows one to convert a cell from little to big endian byte ordering and conversely.

Typical use: (Optional)

SHA-1 is specified in RFC 3174. The algorithm operates on 512 bit long blocks, the last one of which is to be padded with an extra '1' bit and as many '0' bits as needed so as to end up with a complete 512 bit block. However, the last 64 bits of the last block being 'digested' are to specify the total message length (expressed in bits and not including the padding '1' bit) as a big endian integer.

Proposal:

This should enumerate the changes to the document.

For the wording of word definitions, use existing word definitions as a template. Where possible, include the rationale for the definition.

Reference implementation:

On a 64 bit cell system (GNU Forth 0.7.3), the following code produces the expected results:

: >< ( u1 -- u2 )

R R@ $FF AND 56 LSHIFT R@ $FF00 AND 40 LSHIFT OR R@ $FF0000 AND 24 LSHIFT OR R@ $FF000000 AND 8 LSHIFT OR R@ $FF00000000 AND 8 RSHIFT OR R@ $FF0000000000 AND 24 RSHIFT OR R@ $FF000000000000 AND 40 RSHIFT OR R> $FF00000000000000 AND 56 RSHIFT OR ;

Testing: (Optional)

This should test the words or features introduced by the proposal, in particular, it should test boundary conditions. Test cases should work with the test harness in Appendix F.

MitchBradleyavatar of MitchBradley

Have you looked at other current implementations to see how they solve this problem?

rcfg7943avatar of rcfg7943

Do you want a word equivalent to htonl, htons, ntohl and ntohs in C as included in arpa/inet.h or netinet/in.h?

Or do you want a word to swap bytes of cells?

The former has to take the type of processor into account whereas the latter does not.

If the former then I don't think it belongs in CORE EXT.

If the latter then why? I'm a strong believer in less is more.

And I think Forth-83 had that word too.

MitchBradleyavatar of MitchBradley

As the author of Open Firmware, a Forth-based system that became a national standard (IEEE 1275-1994) . Open Firmware was used on millions of computers, of different byte orders and word sizes and alignment restrictions, produced by many different companies, Open Firmware supported portable device drives that would work unmodified across the full range of systems. Thus I can claim relevant experience on this topic

The typical use of byte-swizzling is to prepare or decode in-memory data structures that need to be passed to and from another device, whether another computer across a network or an I/O device that is controlled by memory-based descriptors. The data structure will have a defined byte order that might or might not match the host computer's byte order. Furthermore, the host might impose alignment restrictions different from the data structure. If you want to write portable code, then you cannot use a byte-swizzling primitive directly. You need a set of access wrappers akin to the "hton"/"ntoh" suite. Open Firmware drivers typically combine the host-dependent swizzling with unaligned-access, resulting in a set of works like "be-w@", "be-w!", "le-w@", "le-w!", and similar words for 32-bit and perhaps 64-bit access. It is a lot of words, but it actually solves the problem. Just having one swizzler primitive is "minimal", but doesn't actually get the job done. It can be useful as an implementation factor, but it shouldn't appear directly in portable code.

Before proposing something, it would be helpful to study modern Forth implementations like VFX Forth, Swift Forth, gforth, etc, to see how they solve this common problem. A proposal that conforms to a modern system's usage is much more likely to succeed than something modeled on a 40+ year old standard that never achieved much commercial traction.

ruvavatar of ruv

In networking code you need to switch endianness for an integer number of particular width (in bits). But a cell size can vary.

Probably, we need such a word for each width from 16, 32, 64 bits.

ruvavatar of ruv

< fooo

Reply New Version