Proposal: PLACE +PLACE
This page is dedicated to discussing this specific proposal
ContributeContributions
UlrichHoffmann [206] PLACE +PLACEProposal2021-07-30 11:19:29
PLACE +PLACE request for discussion
Standardizing common use
Change History
- 2020-07-28 revising historical description, correcting typos
- 2021-07-26 adding history
- 2021-07-22 initial version
Problem
Traditional Forth represents strings in memory as so called counted strings, a leading count byte giving the length of the string (0 to 255) followed by the characters of string. No character value plays a special role (in contrast to zero-terminated strings where the character NUL terminates the string). The limitation of a single count byte and thus a maximal string length of 255 characters imposes no restriction on a broad range of applications especially in embedded systems. Of course there are applications that require processing longer strings. The programmer should then best use another string representation (such as c-addr len) without the length restriction of counted strings.
The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.
The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.
The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings
Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.
What is missing is an easy way to place a sequence of characters into memory using the counted string representation and also appending a sequence of characters to an already existing counted string in memory.
Solution
The Forth community already has a solution for placing strings as counted strings into memory and appending characters to counted strings by means of the words <strong><code>PLACE and <strong><code>+PLACE. <strong><code>PLACE appeared around 1984 e.g. in F83. <strong><code>+PLACE has been proposed by Wil Baden in his Tool Belt Extensions (latest version at [1], also with the name <strong><code>APPEND)[^1]. Since then many systems have adopted <strong><code>PLACE and <strong><code>+PLACE with exactly the same behaviour.
So, we are proposing <strong><code>PLACE and <strong><code>+PLACE here for standardization.
Proposal
Please add the following specifications to the string extension word set.
PLACE STRING EXT
( c-addr1 u c-addr2 -- )
Store u as a length character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.
If u is greater than zero, copy u consecutive characters from the data space starting at c-addr1 to that starting at c-addr2+1, proceeding character-by-character from lower addresses to higher addresses.
+PLACE "plus place" STRING EXT
( c-addr1 u1 c-addr2 -- )
Read the length character at c-addr2 as u2.
Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.
If u1 is greater than zero, copy u1 consecutive characters from the data
space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.
end of proposal
Rationale
PLACE places the string identified by c-addr1 u as a counted string at c-addr2 managing the length character. Essentially PLACE converts from c-addr len representation to counted string representation.
PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.
+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.
The application must take care of any string overflow situation where the length of the resulting string would exceed the maximal length that can be stored in a length character. Also it has to assure that the memory is available to store the resulting string.
Note, that the proposal adds <strong><code>PLACE and <strong><code>+PLACE to the up to now empty <em>string extensions word set. A compliant Forth system is free to provide or not to provide these words individually. If it chooses to provide them, they must have the above behaviour. Providing the words could be done by making the reference implementation available in source form or by means of an appropriate alias definition of an already provided word.
Typical Use
S" The quick brown" PAD PLACE
S" fox jumps over" PAD +PLACE
S" the lazy dog." PAD +PLACE
PAD COUNT TYPE ( The quick brown fox jumps over the lazy dog.)
Reference Implementation
<strong><code>PLACE and <strong><code>+PLACE can be implemented in Forth-94 as follows:
: PLACE ( c-addr1 u c-addr2 -- )
2DUP C! CHAR+ SWAP MOVE ;
: +PLACE ( c-addr1 u c-addr2 -- )
DUP COUNT + >R 2DUP C@ + SWAP C! R> SWAP MOVE ;
Of course systems are expected to provide optimized implementations.
Testing
The following tests assure basic functionality of <strong><code>PLACE and <strong><code>+PLACE \
\<strong\>\<code\>CREATE buf 64 ALLOT</code></strong>
T{ S" ABC" buf PLACE buf COUNT S" ABC" COMPARE -> 0 }T
T{ S" ABC" buf PLACE buf C@ -> 3 }T
T{ S" ABC" buf PLACE
S" DEF" buf +PLACE buf COUNT S" ABCDEF" COMPARE -> 0 }T
Experience
<strong><code>PLACE and <strong><code>+PLACE have been implemented in numerous systems with the proposed functionality. <strong><code>+PLACE is sometimes called <strong><code>APPEND. A non-exhaustive list is:
<table> <tr> <td>System <td>PLACE <td>+PLACE <tr> <td>gforth <td>?? <td>?? <tr> <td>SwiftForth <td>?? <td>provided as APPEND <tr> <td>VFX Forth <td>?? <td>provided as APPEND in ToolBelt.fth <tr> <td>Win32Forth <td>?? <td>?? <tr> <td>4th <td>?? <td>?? <tr> <td>PFE <td>?? <td>?? <tr> <td>F-PC <td>?? <td>?? <tr> <td>F83 <td>?? <td> <tr> <td>volksForth-83 <td>?? <td> <tr> <td>DXForth <td>?? <td> <tr> <td> <td> <td>
Discussion
<strong>Why standardize <code>PLACE and <code>+PLACE?
Standard programs that deal with counted strings often also make use of <strong><code>PLACE and <strong><code>+PLACE. As these words are not known to be provided by the underlying system the standard program has to bring its own possibly less efficient implementation with it. This can be done with a prelude that uses the above reference implementation.
Note that the typical phrase
[UNDEFINED] PLACE [IF]
«reference implementation for PLACE»
[THEN]
[UNDEFINED] +PLACE [IF]
«reference implementation for +PLACE»
[THEN]
is not sufficient as the underlying system might provide <strong><code>PLACE or <strong><code>+PLACE, <strong>but with a different behaviour, rendering the rest of the standard program invalid.
If <strong><code>PLACE and <strong><code>+PLACE are standardized, then a compliant system (if it decides to provide them) has to define them with the exact proposed behaviour. The above phrase would be sufficient and standard programs can leverage efficient system provided implementations.
Restriction on the Length of Counted Strings
Forth-94 and Forth-2021 both do not enforce characters to be of byte size so allow for systems with larger character sizes and by this also counted strings with a maximal length that exceeds 255 characters. For this reason both standards use the term “length character” and not “count byte”.
Since then, in 2016 the Forth-200x committee in favour of eliminating ambiguous conditions has decided to require “1 CHARS = 1” thus making systems that have other character sizes than on not compliant to future Forth-200x standards [2][3]. Requesting standard systems to have byte size characters limit counted strings to the known maximal length of 255 characters.
References
[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt
[2] http://www.forth200x.org/rfds.html
[3] http://www.forth200x.org/char-is-1.html
Author
Hans Bezemer <thebeez@xs4all.nl>
Ulrich Hoffmann <uho@xlerb.de>
Notes
[^1]: If the reader is aware of the detailed history of <strong><code>+PLACE aka <strong><code>APPEND, the authors would be happy to learn about that.