Proposal: PLACE +PLACE

Formal

This page is dedicated to discussing this specific proposal

ContributeContributions

UlrichHoffmann [206] PLACE +PLACEProposal2021-07-30 11:19:29

PLACE +PLACE request for discussion

Standardizing common use

Change History

2020-07-28 revising historical description, correcting typos
2021-07-26 adding history
2021-07-22 initial version

Problem

Traditional Forth represents strings in memory as so called counted strings, a leading count byte giving the length of the string (0 to 255) followed by the characters of string. No character value plays a special role (in contrast to zero-terminated strings where the character NUL terminates the string). The limitation of a single count byte and thus a maximal string length of 255 characters imposes no restriction on a broad range of applications especially in embedded systems. Of course there are applications that require processing longer strings. The programmer should then best use another string representation (such as c-addr len) without the length restriction of counted strings.

The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.

The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.

The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings

Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.

What is missing is an easy way to place a sequence of characters into memory using the counted string representation and also appending a sequence of characters to an already existing counted string in memory.

Solution

The Forth community already has a solution for placing strings as counted strings into memory and appending characters to counted strings by means of the words <code>PLACE and <code>+PLACE. <code>PLACE appeared around 1984 e.g. in F83. <code>+PLACE has been proposed by Wil Baden in his Tool Belt Extensions (latest version at [1], also with the name <code>APPEND)[^1]. Since then many systems have adopted <code>PLACE and <code>+PLACE with exactly the same behaviour.

So, we are proposing <code>PLACE and <code>+PLACE here for standardization.

Proposal

Please add the following specifications to the string extension word set.

PLACE STRING EXT

              ( c-addr1 u c-addr2 -- )

Store u as a length character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u is greater than zero, copy u consecutive characters from the data space starting at c-addr1 to that starting at c-addr2+1, proceeding character-by-character from lower addresses to higher addresses.

+PLACE "plus place" STRING EXT

              ( c-addr1 u1 c-addr2 -- )

Read the length character at c-addr2 as u2.

Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u1 is greater than zero, copy u1 consecutive characters from the data

space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.

end of proposal

Rationale

PLACE places the string identified by c-addr1 u as a counted string at c-addr2 managing the length character. Essentially PLACE converts from c-addr len representation to counted string representation.

PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.

+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.

The application must take care of any string overflow situation where the length of the resulting string would exceed the maximal length that can be stored in a length character. Also it has to assure that the memory is available to store the resulting string.

Note, that the proposal adds <code>PLACE and <code>+PLACE to the up to now empty string extensions word set. A compliant Forth system is free to provide or not to provide these words individually. If it chooses to provide them, they must have the above behaviour. Providing the words could be done by making the reference implementation available in source form or by means of an appropriate alias definition of an already provided word.

Typical Use

S" The quick brown"  PAD PLACE  
S"  fox jumps over" PAD +PLACE
S"  the lazy dog." PAD +PLACE

PAD COUNT TYPE  ( The quick brown fox jumps over the lazy dog.)

Reference Implementation

<code>PLACE and <code>+PLACE can be implemented in Forth-94 as follows:

    : PLACE ( c-addr1 u c-addr2 -- )
         2DUP C!  CHAR+ SWAP MOVE ;

    : +PLACE ( c-addr1 u c-addr2 -- )
         DUP COUNT + >R    2DUP C@ + SWAP C!  R> SWAP MOVE ;

Of course systems are expected to provide optimized implementations.

Testing

The following tests assure basic functionality of <code>PLACE and <code>+PLACE \

\<strong\>\<code\>CREATE buf 64 ALLOT</code></strong>

    T{ S" ABC" buf PLACE  buf COUNT S" ABC" COMPARE -> 0 }T

    T{ S" ABC" buf PLACE  buf C@ -> 3 }T

    T{ S" ABC" buf PLACE  
       S" DEF" buf +PLACE  buf COUNT S" ABCDEF" COMPARE -> 0 }T

Experience

<code>PLACE and <code>+PLACE have been implemented in numerous systems with the proposed functionality. <code>+PLACE is sometimes called <code>APPEND. A non-exhaustive list is:

<table> <tr> <td>System <td>PLACE <td>+PLACE <tr> <td>gforth <td>?? <td>?? <tr> <td>SwiftForth <td>?? <td>provided as APPEND <tr> <td>VFX Forth <td>?? <td>provided as APPEND in ToolBelt.fth <tr> <td>Win32Forth <td>?? <td>?? <tr> <td>4th <td>?? <td>?? <tr> <td>PFE <td>?? <td>?? <tr> <td>F-PC <td>?? <td>?? <tr> <td>F83 <td>?? <td> <tr> <td>volksForth-83 <td>?? <td> <tr> <td>DXForth <td>?? <td> <tr> <td> <td> <td>

Discussion

Why standardize <code>PLACE and <code>+PLACE?

Standard programs that deal with counted strings often also make use of <code>PLACE and <code>+PLACE. As these words are not known to be provided by the underlying system the standard program has to bring its own possibly less efficient implementation with it. This can be done with a prelude that uses the above reference implementation.

Note that the typical phrase

[UNDEFINED] PLACE [IF]

    «reference implementation for PLACE»

[THEN]

[UNDEFINED] +PLACE [IF]

    «reference implementation for +PLACE»

[THEN]

is not sufficient as the underlying system might provide <code>PLACE or <code>+PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

If <code>PLACE and <code>+PLACE are standardized, then a compliant system (if it decides to provide them) has to define them with the exact proposed behaviour. The above phrase would be sufficient and standard programs can leverage efficient system provided implementations.

Restriction on the Length of Counted Strings

Forth-94 and Forth-2021 both do not enforce characters to be of byte size so allow for systems with larger character sizes and by this also counted strings with a maximal length that exceeds 255 characters. For this reason both standards use the term “length character” and not “count byte”.

Since then, in 2016 the Forth-200x committee in favour of eliminating ambiguous conditions has decided to require “1 CHARS = 1” thus making systems that have other character sizes than on not compliant to future Forth-200x standards [2][3]. Requesting standard systems to have byte size characters limit counted strings to the known maximal length of 255 characters.

References

[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt

[2] http://www.forth200x.org/rfds.html

[3] http://www.forth200x.org/char-is-1.html

Author

Hans Bezemer <thebeez@xs4all.nl>

Ulrich Hoffmann <uho@xlerb.de>

Notes

[^1]: If the reader is aware of the detailed history of <code>+PLACE aka <code>APPEND, the authors would be happy to learn about that.

UlrichHoffmannNew Version: PLACE +PLACE [r711] 2021-07-30 13:47:38

Show differences

PLACE +PLACE request for discussion

Standardizing common use

Change History

2020-07-29 improving Markdown formatting
2020-07-28 revising historical description, correcting typos
2021-07-26 adding history
2021-07-22 initial version

Problem

The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.

The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.

The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings

Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.

Solution

So, we are proposing <code>PLACE and <code>+PLACE here for standardization.

Proposal

Please add the following specifications to the string extension word set.

PLACE STRING EXT

              ( c-addr1 u c-addr2 -- )

Store u as a length character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

+PLACE "plus place" STRING EXT

              ( c-addr1 u1 c-addr2 -- )

Read the length character at c-addr2 as u2.

Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u1 is greater than zero, copy u1 consecutive characters from the data

space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.

end of proposal

Rationale

PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.

+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.

Typical Use

S" The quick brown"  PAD PLACE  
S"  fox jumps over" PAD +PLACE
S"  the lazy dog." PAD +PLACE

PAD COUNT TYPE  ( The quick brown fox jumps over the lazy dog.)

Reference Implementation

<code>PLACE and <code>+PLACE can be implemented in Forth-94 as follows:

    : PLACE ( c-addr1 u c-addr2 -- )
         2DUP C!  CHAR+ SWAP MOVE ;

    : +PLACE ( c-addr1 u c-addr2 -- )
         DUP COUNT + >R    2DUP C@ + SWAP C!  R> SWAP MOVE ;

Of course systems are expected to provide optimized implementations.

Testing

The following tests assure basic functionality of <code>PLACE and <code>+PLACE

    CREATE buf 64 ALLOT


    T{ S" ABC" buf PLACE  buf COUNT S" ABC" COMPARE -> 0 }T

    T{ S" ABC" buf PLACE  buf C@ -> 3 }T

    T{ S" ABC" buf PLACE  
       S" DEF" buf +PLACE  buf COUNT S" ABCDEF" COMPARE -> 0 }T

Experience

Discussion

Why standardize <code>PLACE and <code>+PLACE?

Note that the typical phrase

[UNDEFINED] PLACE [IF]

    «reference implementation for PLACE»

[THEN]

[UNDEFINED] +PLACE [IF]

    «reference implementation for +PLACE»

[THEN]

is not sufficient as the underlying system might provide <code>PLACE or <code>+PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

Restriction on the Length of Counted Strings

References

[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt

[2] http://www.forth200x.org/rfds.html

[3] http://www.forth200x.org/char-is-1.html

Authors

Hans Bezemer <thebeez@xs4all.nl>

Ulrich Hoffmann <uho@xlerb.de>

Notes

[^1]: If the reader is aware of the detailed history of <code>+PLACE aka <code>APPEND, the authors would be happy to learn about that.

Formal

StephenPelc [r718] 2021-09-01 10:15:24

I support this proposal, but much prefer APPEND to +PLACE.

UlrichHoffmannNew Version: PLACE +PLACE [r745] 2021-09-08 21:15:27

Show differences

PLACE +PLACE request for discussion

Standardizing common use

Change History

2021-09-08 remove conditional phrase from PLACE definition following committees comments
2021-09-08 change reference imlementation for PLACE to avoid no overlapping issue, code by Wil Baden
2020-07-29 improving Markdown formatting
2020-07-28 revising historical description, correcting typos
2021-07-26 adding history
2021-07-22 initial version

Problem

The advantage of counted strings is that they are identified by a single cell (the address of the count byte) on the data stack and so help minimizing stack rearrangements.

The Forth-94 and Forth-2012 standards make use of counted strings and specify even core words which use counted string representation such as WORD FIND COUNT.

The Informational Annex of Forth-94 says in A.3.1.3.4 about counted strings

Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the “_c-addr u_” representation.

Solution

So, we are proposing <code>PLACE and <code>+PLACE here for standardization.

Proposal

Please add the following specifications to the string extension word set.

PLACE STRING EXT

              ( c-addr1 u c-addr2 -- )

Store u as a length character at c-addr2. Only the number of low-order bits corresponding to character size are transferred.

+PLACE "plus place" STRING EXT

              ( c-addr1 u1 c-addr2 -- )

Read the length character at c-addr2 as u2.

Store u1+u2 as a count character at c-addr2. When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

If u1 is greater than zero, copy u1 consecutive characters from the data

space starting at c-addr1 to that starting at c-addr2+u2+1, proceeding character-by-character from lower addresses to higher addresses.

end of proposal

Rationale

PLACE complements COUNT which essentially converts counted strings in memory to their c-addr u representations.

+PLACE appends the string identified by c-addr1 u1 to the counted string at c-addr2 (with length u2) managing the updated string length in the length character at c-addr2.

Typical Use

S" The quick brown"  PAD PLACE  
S"  fox jumps over" PAD +PLACE
S"  the lazy dog." PAD +PLACE

PAD COUNT TYPE  ( The quick brown fox jumps over the lazy dog.)

Reference Implementation

<code>PLACE and <code>+PLACE can be implemented in Forth-94 as follows:

    : PLACE ( c-addr1 u c-addr2 -- )
         2DUP 2>R  CHAR+  SWAP CHARS MOVE  2R> C! ;

    : +PLACE ( c-addr1 u c-addr2 -- )
         DUP COUNT + >R    2DUP C@ + SWAP C!  R> SWAP MOVE ;

Of course systems are expected to provide optimized implementations.

Testing

The following tests assure basic functionality of <code>PLACE and <code>+PLACE

    CREATE buf 64 ALLOT


    T{ S" ABC" buf PLACE  buf COUNT S" ABC" COMPARE -> 0 }T

    T{ S" ABC" buf PLACE  buf C@ -> 3 }T

    T{ S" ABC" buf PLACE  
       S" DEF" buf +PLACE  buf COUNT S" ABCDEF" COMPARE -> 0 }T

Experience

Discussion

Why standardize <code>PLACE and <code>+PLACE?

Note that the typical phrase

[UNDEFINED] PLACE [IF]

    «reference implementation for PLACE»

[THEN]

[UNDEFINED] +PLACE [IF]

    «reference implementation for +PLACE»

[THEN]

is not sufficient as the underlying system might provide <code>PLACE or <code>+PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

Restriction on the Length of Counted Strings

References

[1] “Toolbelt”, Wil Baden/Neil Bawd, http://www.wilbaden.com/neil_bawd/tool2002.txt

[2] http://www.forth200x.org/rfds.html

[3] http://www.forth200x.org/char-is-1.html

Authors

Hans Bezemer <thebeez@xs4all.nl>

Ulrich Hoffmann <uho@xlerb.de>

Notes

[^1]: If the reader is aware of the detailed history of <code>+PLACE aka <code>APPEND, the authors would be happy to learn about that.

CfV - Call for votes

AntonErtl [r746] 2021-09-09 08:42:32

The specification requires a specific (and probably not useful) behaviour when the start of the target address is in [c-addr1, caddr1+u). The reference implementation uses MOVE, which guarantees a different (and probably more useful) behaviour in that case. The two should be reconciled, and in this case I think it's better to change the specification to agree with the (more useful) reference implementation. For +PLACE there is the additional complication of whether the original or updated count should be copied if it is in [c-addr1,c-addr1+u), but at least in this case the specification and the implementation agree that the updated count is copied; Gforth also behaves that way, but SwiftForth's APPEND updates the count only afterwards.

AntonErtl [r747] 2021-09-09 11:06:55

Also, for PLACE, the specification specifies that the count byte is stored first, while the reference implementation, the Gforth implementation and the SwiftForth implementation store it afterwards. This makes a difference if c-addr2 is in [c-addr1,c-addr1+u).

ruv [r778] 2021-11-10 18:36:28

1. Problem of overflow

The "Rationale" section says:

The application must take care of any string overflow situation where the length of the resulting string would exceed the maximal length that can be stored in a length character. Also it has to assure that the memory is available to store the resulting string.

But the "Typical Use" section doesn't show how to check for overflow, especially when +PLACE is used. Could you please fill this gap?

The most robust behavior for an application in a case of overflow is to throw an exception. It seems, these words are useless when an application needs to throw an exception if a string doesn't fit a target buffer.

I think, the following words can be far more useful and less error prone (their names are placeholders):

place-counted ( c-addr1 u1 c-addr2 u2 -- )

+place-counted ( c-addr1 u1 c-addr2 u2 -- )

Where (c-addr2 u2) is the target buffer starting address and size. These words should check for overflow and throw an exception if any.

In their turn, the words place and +place just encourage a bad practice since almost nobody checks for overflow when uses them, I think. So it's better to not include these words into the standard.

2. Specify what instead of how

I support the Anton's idea that we should not specify how to copy the characters, i.e. "from lower addresses to higher addresses". In this regard, these words should be similar to MOVE rather than to CMOVE.

So it's better to specify the result, and don't specify how to achieve this result. The reference implementation should be corrected accordingly (i.e. store the length character after copy the characters).

3. Note concerning char size

It seems, in the subsection "Restriction on the Length of Counted Strings", “1 CHARS = 1 byte” is assumed.

But “1 CHARS = 1” means that the size of 1 character is 1 address unit (not 1 byte). For example, an address unit may be 2 bytes.

In this regard, it's unclear what does the phrase "Requesting standard systems to have byte size characters limit counted strings to the known maximal length of 255 characters" mean.

AntonErtl [r923] 2022-09-18 09:46:22

Here's an implementation of +PLACE that has the following nice properties:

it limits its writing to the 256-byte region starting at c-addr2
it copies the stuff that was originally at c-addr1 u1 even in the case of overlap

Maybe specifying +PLACE to have these properties is a good idea.

: +place {: c-addr1 u1 c-addr2 -- :} \ gforth-obsolete plus-place
    c-addr2 count {: c-addr u2 :}
    u2 u1 + $ff min {: u :}
    c-addr1 c-addr u u2 /string move
    u c-addr2 c! ;

Formal

AntonErtl [r924] 2022-09-18 10:08:58

On a more general note: While PLACE and maybe also +PLACE may be common practice, I think they are bad practice, for the following reasons:

They are designed to create counted strings. Counted strings may be seductive because you need to pass only one cell on the stack and store only one cell, but their length limitation means that they are not generally useful, so we need another set of words for dealing with longer strings, and we have it in the form of words that deal with c-addr u strings. But once we have a set of words for general strings, do we really want another set of words for another string representation? In the best case, these words will remain unused and just sow confusion. In the worst case, they are used, and users then suffer from their limitations. I suspect that PLACE was in more common use in 1994 than it is now, but the Forth-94 committee chose not to standardize it, probably for the reasons above. We should not standardize it, either.
These words have no way to check the length of the result buffer (admittedly, neither does MOVE), so they are a buffer overflow waiting to happen. That goes doubly for +PLACE, where it's even harder to avoid a buffer overflow. If you want to add such words, give them stack effects like ( c-addr u c-buf-addr u-buf -- ) and specify that they do not write outside [c-buf-addr,c-buf-addr+u-buf). But then the "common practice" argument no longer holds.

Concerning common practice: Gforth contains 3 uses of PLACE and 0 uses of +PLACE, compared to 45 uses of MOVE.

albert [r1131] 2023-11-07 09:46:37

I'm vehemently opposed. The first character is used as count. This is wrong. If the string is stored with a count, the count should be an int. So the practice since the 1970 is continued. If a word like PLACE is standardised, it guarantees backwardness. I applaud that standardisation is moving away from words that requires the "counted strings' and use an 'addr u' description. Lets continue.

ruv [r1134] 2023-11-09 18:47:36

the underlying system might provide PLACE or +PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

If PLACE and +PLACE are standardized, then a compliant system (if it decides to provide them) has to define them with the exact proposed behaviour.

Actually, this problem exists for many well-known words. If we add them all to the standard, it will bloat the standard too much (and I don't mention supporting of bad practice). Therefore, this problem should be solved in another way.

A possible solution is to support URI in include, require, etc. If the system provides an efficient implementation for a module, it maps module's URI on a local file (or an internal entity); otherwise it downloads and caches the implementation. Also, some fallback mechanism can be supported (e.g., to provide local implementation).

The specification for each module or library is maintained separately from the standard and specifications of other modules.

So a program can use:

require https://theforth.net/stdlib/cstring/place/?v=1.*

: foo ... place ... ;
: bar ... +place ... ;

Reply New Version

Proposal: PLACE +PLACE

ContributeContributions

UlrichHoffmann [206] PLACE +PLACEProposal2021-07-30 11:19:29

PLACE +PLACE request for discussion

Change History

Problem

Solution

Proposal

Rationale

Typical Use

Reference Implementation

Testing

Experience

Discussion

<strong>Why standardize <code>PLACE and <code>+PLACE?

Restriction on the Length of Counted Strings

References

Author

Notes

UlrichHoffmannNew Version: PLACE +PLACE [r711] 2021-07-30 13:47:38

PLACE +PLACE request for discussion

Change History

Problem

Solution

Proposal

Rationale

Typical Use

Reference Implementation

Testing

Experience

Discussion

<strong>Why standardize <code>PLACE and <code>+PLACE?

Restriction on the Length of Counted Strings

References

Authors

Notes

StephenPelc [r718] 2021-09-01 10:15:24

UlrichHoffmannNew Version: PLACE +PLACE [r745] 2021-09-08 21:15:27

PLACE +PLACE request for discussion

Change History

Problem

Solution

Proposal

Rationale

Typical Use

Reference Implementation

Testing

Experience

Discussion

<strong>Why standardize <code>PLACE and <code>+PLACE?

Restriction on the Length of Counted Strings

References

Authors

Notes

AntonErtl [r746] 2021-09-09 08:42:32

AntonErtl [r747] 2021-09-09 11:06:55

ruv [r778] 2021-11-10 18:36:28

1. Problem of overflow

2. Specify what instead of how

3. Note concerning char size

AntonErtl [r923] 2022-09-18 09:46:22

AntonErtl [r924] 2022-09-18 10:08:58

albert [r1131] 2023-11-07 09:46:37

ruv [r1134] 2023-11-09 18:47:36