Digest #158 2021-07-31

Contributions

[206] 2021-07-30 11:19:29 UlrichHoffmann wrote:

proposal - PLACE +PLACE

PLACE +PLACE request for discussion

Standardizing common use

Change History

2020-07-28 revising historical description, correcting typos
2021-07-26 adding history
2021-07-22 initial version

Problem

Traditional Forth represents strings in memory as so called counted strings, a leading count byte giving the length of the string (0 to 255) followed by the characters of string. No character value plays a special role (in contrast to zero-terminated strings where the character NUL terminates the string). The limitation of a single count byte and thus a maximal string length of 255 characters imposes no restriction on a broad range of applications especially in embedded systems. Of course there are applications that require processing longer strings. The programmer should then best use another string representation (such as c-addr len) without the length restriction of counted strings.

The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.

The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.

The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings

Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.

What is missing is an easy way to place a sequence of characters into memory using the counted string representation and also appending a sequence of characters to an already existing counted string in memory.

Solution

The Forth community already has a solution for placing strings as counted strings into memory and appending characters to counted strings by means of the words <code>PLACE and <code>+PLACE. <code>PLACE appeared around 1984 e.g. in F83. <code>+PLACE has been proposed by Wil Baden in his Tool Belt Extensions (latest version at [1], also with the name <code>APPEND)[^1]. Since then many systems have adopted <code>PLACE and <code>+PLACE with exactly the same behaviour.

So, we are proposing <code>PLACE and <code>+PLACE here for standardization.

Proposal

Please add the following specifications to the string extension word set.

PLACE STRING EXT

              ( c-addr1 u c-addr2 -- )

Store u as a length character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u is greater than zero, copy u consecutive characters from the data space starting at c-addr1 to that starting at c-addr2+1, proceeding character-by-character from lower addresses to higher addresses.

+PLACE "plus place" STRING EXT

              ( c-addr1 u1 c-addr2 -- )

Read the length character at c-addr2 as u2.

Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u1 is greater than zero, copy u1 consecutive characters from the data

space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.

end of proposal

Rationale

PLACE places the string identified by c-addr1 u as a counted string at c-addr2 managing the length character. Essentially PLACE converts from c-addr len representation to counted string representation.

PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.

+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.

The application must take care of any string overflow situation where the length of the resulting string would exceed the maximal length that can be stored in a length character. Also it has to assure that the memory is available to store the resulting string.

Note, that the proposal adds <code>PLACE and <code>+PLACE to the up to now empty string extensions word set. A compliant Forth system is free to provide or not to provide these words individually. If it chooses to provide them, they must have the above behaviour. Providing the words could be done by making the reference implementation available in source form or by means of an appropriate alias definition of an already provided word.

Typical Use

S" The quick brown"  PAD PLACE  
S"  fox jumps over" PAD +PLACE
S"  the lazy dog." PAD +PLACE

PAD COUNT TYPE  ( The quick brown fox jumps over the lazy dog.)

Reference Implementation

<code>PLACE and <code>+PLACE can be implemented in Forth-94 as follows:

    : PLACE ( c-addr1 u c-addr2 -- )
         2DUP C!  CHAR+ SWAP MOVE ;

    : +PLACE ( c-addr1 u c-addr2 -- )
         DUP COUNT + >R    2DUP C@ + SWAP C!  R> SWAP MOVE ;

Of course systems are expected to provide optimized implementations.

Testing

The following tests assure basic functionality of <code>PLACE and <code>+PLACE \

\<strong\>\<code\>CREATE buf 64 ALLOT</code></strong>

    T{ S" ABC" buf PLACE  buf COUNT S" ABC" COMPARE -> 0 }T

    T{ S" ABC" buf PLACE  buf C@ -> 3 }T

    T{ S" ABC" buf PLACE  
       S" DEF" buf +PLACE  buf COUNT S" ABCDEF" COMPARE -> 0 }T

Experience

<code>PLACE and <code>+PLACE have been implemented in numerous systems with the proposed functionality. <code>+PLACE is sometimes called <code>APPEND. A non-exhaustive list is:

<table> <tr> <td>System <td>PLACE <td>+PLACE <tr> <td>gforth <td>?? <td>?? <tr> <td>SwiftForth <td>?? <td>provided as APPEND <tr> <td>VFX Forth <td>?? <td>provided as APPEND in ToolBelt.fth <tr> <td>Win32Forth <td>?? <td>?? <tr> <td>4th <td>?? <td>?? <tr> <td>PFE <td>?? <td>?? <tr> <td>F-PC <td>?? <td>?? <tr> <td>F83 <td>?? <td> <tr> <td>volksForth-83 <td>?? <td> <tr> <td>DXForth <td>?? <td> <tr> <td> <td> <td>

Discussion

Why standardize <code>PLACE and <code>+PLACE?

Standard programs that deal with counted strings often also make use of <code>PLACE and <code>+PLACE. As these words are not known to be provided by the underlying system the standard program has to bring its own possibly less efficient implementation with it. This can be done with a prelude that uses the above reference implementation.

Note that the typical phrase

[UNDEFINED] PLACE [IF]

    «reference implementation for PLACE»

[THEN]

[UNDEFINED] +PLACE [IF]

    «reference implementation for +PLACE»

[THEN]

is not sufficient as the underlying system might provide <code>PLACE or <code>+PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

If <code>PLACE and <code>+PLACE are standardized, then a compliant system (if it decides to provide them) has to define them with the exact proposed behaviour. The above phrase would be sufficient and standard programs can leverage efficient system provided implementations.

Restriction on the Length of Counted Strings

Forth-94 and Forth-2021 both do not enforce characters to be of byte size so allow for systems with larger character sizes and by this also counted strings with a maximal length that exceeds 255 characters. For this reason both standards use the term “length character” and not “count byte”.

Since then, in 2016 the Forth-200x committee in favour of eliminating ambiguous conditions has decided to require “1 CHARS = 1” thus making systems that have other character sizes than on not compliant to future Forth-200x standards [2][3]. Requesting standard systems to have byte size characters limit counted strings to the known maximal length of 255 characters.

References

[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt

[2] http://www.forth200x.org/rfds.html

[3] http://www.forth200x.org/char-is-1.html

Author

Hans Bezemer <thebeez@xs4all.nl>

Ulrich Hoffmann <uho@xlerb.de>

Notes

[^1]: If the reader is aware of the detailed history of <code>+PLACE aka <code>APPEND, the authors would be happy to learn about that.

Replies

[r708] 2021-07-18 12:21:20 JeanJonethal replies:

requestClarification - Why "[" is specified using immediacy?

"[" is used to switch from compiling context to interpreter mode. This allows inlining constants or code into word definition : date-created [ 20210718 ] LITERAL ;
depending on threading model one could create some special words calling machine code instructions : my-special-op [ machine-code param + , ] ; so the code between "[" and "]" is executed during compile-time from user perspective. if "[" was not immediate - forth state could not be switched from compilation mode to interpreter mode. Compiler would compile a call to [ into the code.

[r709] 2021-07-19 08:23:12 AntonErtl replies:

requestClarification - Why "[" is specified using immediacy?

The question is: Why are the interpretation semantics undefined? Is there any standard system where [ has no interpretation semantics or where they are different from the compilation semantics?

[r710] 2021-07-19 17:20:10 ruv replies:

requestClarification - Why "[" is specified using immediacy?

It seems, I have realized why [ was specified as immediate: since this word had "I" (immediate) attribute in Forth-83, and this property was just inherited for this word in Forth-94. But it's unnecessary even in the current wording.

@JeanJonethal writes:

if "[" was not immediate - forth state could not be switched from compilation mode to interpreter mode.

It's wrong. For example, take a look at the IF word — immediacy isn't specified for this word, and it is implemented as non immediate (in the standard notion, it can be tested) in many Forth systems; although, it may be implemented as immediate. See also A.6.1.1550: "POSTPONE allowed de-specification of immediacy or non-immediacy for all but a few Forth words whose behavior must be STATE-independent". And indeed, immediacy should not be specified for a word having a STATE-dependent behavior (or that is allowed to have a STATE-dependent behavior, such as a word having undefined interpretation semantics).

Moreover, immediacy cannot be tested by a standard program for a word with undefined interpretation semantics, since the execution token cannot be obtained for such a word. So there is no any sense to mention immediacy for [ even in the current wording.

@AntonErtl writes:

The question is: Why are the interpretation semantics undefined?

It's obvious: in a cmForth-like Forth system this word can be unfound in interpretation state, and it was actually defined only in the COMPILER wordlist in cmForth. Therefore, to support this model, interpretation semantics should be undefined for this word by the standard.

Is there any standard system where [ has no interpretation semantics or where they are different from the compilation semantics?

Strictly speaking, in real Forth systems "no interpretation semantics" is a nonsense. Since even if a Forth system raises an error in interpretation state for some word, then it is a behavior that constitutes the interpretation semantics for this word. But it's another topic.

In many real Forth systems the actual interpretation semantics for [ are to do nothing. But the compilation semantics for this word are to enter interpretation state. Hence, they are different. See more reasonings in my post "About POSTPONE semantics in edge cases" on ForthHub.

[r711] 2021-07-30 13:47:38 UlrichHoffmann replies:

proposal - PLACE +PLACE

PLACE +PLACE request for discussion

Standardizing common use

Change History

2020-07-29 improving Markdown formatting
2020-07-28 revising historical description, correcting typos
2021-07-26 adding history
2021-07-22 initial version

Problem

The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.

The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.

The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings

Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.

Solution

So, we are proposing <code>PLACE and <code>+PLACE here for standardization.

Proposal

Please add the following specifications to the string extension word set.

PLACE STRING EXT

              ( c-addr1 u c-addr2 -- )

Store u as a length character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

+PLACE "plus place" STRING EXT

              ( c-addr1 u1 c-addr2 -- )

Read the length character at c-addr2 as u2.

Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u1 is greater than zero, copy u1 consecutive characters from the data

space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.

end of proposal

Rationale

PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.

+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.

Typical Use

S" The quick brown"  PAD PLACE  
S"  fox jumps over" PAD +PLACE
S"  the lazy dog." PAD +PLACE

PAD COUNT TYPE  ( The quick brown fox jumps over the lazy dog.)

Reference Implementation

<code>PLACE and <code>+PLACE can be implemented in Forth-94 as follows:

    : PLACE ( c-addr1 u c-addr2 -- )
         2DUP C!  CHAR+ SWAP MOVE ;

    : +PLACE ( c-addr1 u c-addr2 -- )
         DUP COUNT + >R    2DUP C@ + SWAP C!  R> SWAP MOVE ;

Of course systems are expected to provide optimized implementations.

Testing

The following tests assure basic functionality of <code>PLACE and <code>+PLACE

    CREATE buf 64 ALLOT


    T{ S" ABC" buf PLACE  buf COUNT S" ABC" COMPARE -> 0 }T

    T{ S" ABC" buf PLACE  buf C@ -> 3 }T

    T{ S" ABC" buf PLACE  
       S" DEF" buf +PLACE  buf COUNT S" ABCDEF" COMPARE -> 0 }T

Experience

Discussion

Why standardize <code>PLACE and <code>+PLACE?

Note that the typical phrase

[UNDEFINED] PLACE [IF]

    «reference implementation for PLACE»

[THEN]

[UNDEFINED] +PLACE [IF]

    «reference implementation for +PLACE»

[THEN]

is not sufficient as the underlying system might provide <code>PLACE or <code>+PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

Restriction on the Length of Counted Strings

References

[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt

[2] http://www.forth200x.org/rfds.html

[3] http://www.forth200x.org/char-is-1.html

Authors

Hans Bezemer <thebeez@xs4all.nl>

Ulrich Hoffmann <uho@xlerb.de>

Notes

[^1]: If the reader is aware of the detailed history of <code>+PLACE aka <code>APPEND, the authors would be happy to learn about that.

Digest #158 2021-07-31

Contributions

proposal - PLACE +PLACE

PLACE +PLACE request for discussion

Change History

Problem

Solution

Proposal

Rationale

Typical Use

Reference Implementation

Testing

Experience

Discussion

<strong>Why standardize <code>PLACE and <code>+PLACE?

Restriction on the Length of Counted Strings

References

Author

Notes

Replies

requestClarification - Why "[" is specified using immediacy?

requestClarification - Why "[" is specified using immediacy?

requestClarification - Why "[" is specified using immediacy?

proposal - PLACE +PLACE

PLACE +PLACE request for discussion

Change History

Problem

Solution

Proposal

Rationale

Typical Use

Reference Implementation

Testing

Experience

Discussion

<strong>Why standardize <code>PLACE and <code>+PLACE?

Restriction on the Length of Counted Strings

References

Authors

Notes