S"

Extend the semantics of 6.1.2165 S" to be:

Interpretation:

( "ccc<quote>" -- c-addr u )

Parse ccc delimited by " (double quote). Store the resulting string in a transient buffer c-addr u.

Compilation:

( "ccc<quote>" -- )

Parse ccc delimited by " (double quote). Append the run-time semantics given below to the current definition.

Run-time:

( -- c-addr u )

Return c-addr and u that describe a string consisting of the characters ccc.

See:

Rationale:

Typical use: ... S" ccc" ...

The interpretation semantics for S" are intended to provide a simple mechanism for entering a string in the interpretation state. Since an implementation may choose to provide only one buffer for interpreted strings, an interpreted string is subject to being overwritten by the next execution of S" in interpretation state. It is intended that no standard words other than S" should in themselves cause the interpreted string to be overwritten. However, since words such as EVALUATE, LOAD, INCLUDE-FILE and INCLUDED can result in the interpretation of arbitrary text, possibly including instances of S", the interpreted string may be invalidated by some uses of these words.

When the possibility of overwriting a string can arise, it is prudent to copy the string to a "safe" buffer allocated by the application.

Testing:

T{ S" A String"2DROP -> }T    \ There is no space between the " and 2DROP

ContributeContributions

UlrichHoffmannavatar of UlrichHoffmann S( "Request for Discussion" (revised 2018-08-16)Proposal2018-08-17 16:27:53

S( "Request for Discussion"

Change History

2018-08-16 Improve with additional explanation, rewording
2018-07-09 First version

Problem

There have been extensive discussions about correct implementations (cf. [Ertl98], [Pelc17], [clf18]) of Forth words that have both

  • an interpretation semantics and
  • a compilation semantics that is different from adding the word's execution semantics to the current definition (sometimes called non-default compilation semantics, NDCS).

The Forth-94 word S" is an example for this as it has a defined compilation semantics (and an explicitly undefined interpretation semantics) in the CORE word set and a defined interpretation semantics in the FILE word set. (cf. [Forth94])

Words like this can have the so called copy&paste property: Program phrases that have been tested in interpretation mode can be copied unchanged into definitions and continue to work there in a seemingly identical way. This is attractive to quite some developers.

The desire to have words with diverging compilation semantics and interpretation semantics lead many system implementors to state-smart immediate words (which inspect the variable STATE to distinguish between compilation and interpretation) but these have the drawback to fail unexpectedly in corner cases [Ertl98].

To identify whether or not a word is state-smart typically requires studying its implementation or documentation. Problems arise when these state-smart words are used as buildung blocks for more sophisticated words, e.g. by means of POSTPONE-ing them. The distinction between interpret and compile state is then also postponed and happens at inapropriate times.

There are ways to structure the outer interpreter so that words with non-default compilation semantics can be implemented without failures in corner cases. Systems such as gforth development version and VFX version 5 deal with this issue (cf. [Pelc17], [clf18]).

Looking at the Forth-94 standard only few words actually have non default compilation semantics, namely the already mentioned S" and TO. Forth-2012 adds others.

In general Forth-94 and Forth-2012 define execution semantics for normal words. When the word is interpreted this execution semantics is performed, when it is compiled its execution semantics is appended to the current definition (the standard compilation semantics). For most compiling words the standards explicitly leave the interpretation semantics undefined and define only a compilation semantics. This compilation semantics might include appending an additionally defined runtime semantics to the current definition. Colloquially these words are called compile-only. The standards also explicitly define immediate words: Words that have identical interpretation and compilation semantics: both perform the word's execution semantics. Finally there are special words with non-default compilation semantics (NDCS) where interpretation semantics and compilation semantics diverge.

The following table summarizes this situation:


word kind interpretation semantics compilation semantics example comment
normal perform execution semantics compile execution semantics DUP normal :-definitions
immediate perform execution semantics perform execution semantics .( IMMEDIATE definitions
compile-only undefined perform execution semantics IF interpretation semantics undefined
NDCS perform execution semantics for interpretation perform execution semantics for compilation S" divergent interpretation, compilation

S" and TO seem to be very special: For other words such as ' (tick) or CHAR, or .( (dot-paren) the standards define similar compilation words namely ['](bracket-tick), [CHAR] (bracket-char), or ." (dot-quote) for use in compiling mode inside definitions.

Some argue that complicating the outer interpreter or dictionary design is not beneficial especially in memory restricted small system and either live with the failures of state-smart words in corner cases or avoid implementing an interpretive S" to stay standard compliant.

Solution

For small systems it might be reasonable and simpler to not implement an S" with non default compilation semantics as defined in the FILE word set but to define two diffently named words that captures the compilation semantics and the interpretation semantics of FILE S" respectively.

Possible naming choices could be

  • S" (s-quote) for interpretation and
    [S"] (bracket-s-quote-bracket) for compilation
    used in the form S" hello" and : xxx ... [S"] hello" ... ; or

  • S" (s-quote) for interpretation and
    [S" (bracket-s-quote) for compilation
    used in the form S" hello" and : xxx ... [S" hello"] ... ;

neither of which is really appealing.

Instead --- similar to ." for string output in definitions and. .( for string output outside definitions --- we propose to keep CORE S" with its behaviour and to standardize S( with interpretation semantics of FILE S" but using ) (right parenthesis) as delimiter.


Proposal

Add the word S( to the CORE Extension Word Set (CORE EXT):


S( "s-paren" CORE EXT

  • Interpretation: Perform the execution semantics given below.

  • Compilation: Perform the execution semantics given below.

  • Execution: ( "ccc<paren>" -- c-addr u )
    Parse ccc delimited by ) (right parenthesis). Store the resulting string in a transient buffer c-addr u.

    S( is an immediate word.

See: Parsing, S"


Typical Use

Typical use of S( would be interactively when temporary strings are required for example for use with INCLUDED:

S( s-paren.fs) INCLUDED

Remarks

As S( is proposed to be standardized in the CORE extension word set no standard system is required to provide S(. However if a system chooses to implement the S" compilation and interpretation semantics with two separately named words, it could choose the standard name S" and the (not yet standardized) name S( for this.

Defining the interpretation semantics explicitly in the glossary entry above is not strictly necessary as both Forth-94 and Forth-2012 state:

Unless otherwise specified in an “Interpretation:” section of the glossary entry, the interpretation semantics of a Forth definition are its execution semantics.

Reference implementation

With only a single string buffer, a minimal S(-implementation could look like this:

CREATE buf DECIMAL 80 CHARS ALLOT

: S( ( "ccc<paren>"-- c-addr len )
    [CHAR] ) PARSE  80 MIN >R  buf R@ MOVE  buf R> ;

A more elabortated implementation using mutliple string buffers in a circular fashion is:

DECIMAL 80 CHARS CONSTANT |buf|
               4 CONSTANT #bufs

CREATE bufs  |buf| #bufs * ALLOT
VARIABLE buf#  0 buf# !

: buf ( -- c-addr )
   bufs  buf# @ |buf| * + ;

: bump ( -- )
   buf# @  1+  #bufs MOD  buf# ! ;

: str ( char "ccc<char>" -- c-addr u )
   bump   buf SWAP PARSE  |buf| MIN >R  OVER R@ MOVE  R> ;

: S( ( "ccc<paren>"-- c-addr len )
   [CHAR] ) str ;

Testing

The following tests assure that S( pushes the desired c-addr u

CREATE s   3 c, CHAR a c, CHAR b c, CHAR c c, 

t{ 99 S( abc) SWAP DROP -> 99 3 }t
t{ 99 S( abc) s COUNT COMPARE -> 99 0 }t

Experience

S( is not yet defined in any of the contemporary systems such as

  • gForth
  • VFX
  • PFE
  • DXForth
  • FLT
  • SwiftForth
  • Win32Forth
  • noForth
  • amForth
  • camelForth
  • ciForth
  • mecrisp

so it has no common use. However the name S( seems to be available in all these systems.

Discussion

The proposal avoids issues with the NDCS word S" by providing S( as an alternative notation for an interpretive S".

It is intended for ressource restricted standard systems that want to support interpretive strings but which are not able to provide the FILE word set S".

S( is very simple to implement so this proposal is rather about standarizing the name S( with the intended functionality than a sophisticated feature.

One can argue to remove S" from the FILE word set, however this is not proposed here. Forth systems that provide the FILE word set are hopefully capable of providing a complete and correct S" implementation.

References

[Forth-94]: "American National Standard for Information Systems — Programming Languages — Forth", ANSI X3.215-1994

[clf18]: discussion about special words in comp.lang.forth, https://groups.google.com/forum/#!topic/comp.lang.forth/Gb9Hvj3Wm_Y%5B1-25%5D

[Ertl98]: "State-smartness - Why it is Evil and How to Exorcise it", Anton Ertl, euroForth 1998

[Pelc17]: "Special Words in Forth", Stephen Pelc, euroForth 2017

Author

Ulrich Hoffmann uho@xlerb.de

ruvavatar of ruv 2018-08-18 08:23:33

Perhaps there is a sense to mention that the closing bracket ')' should be in the same line? Also, why do not to mention the case when the current parse does not contain ')' at all?

And obviously it is need to mention that the transient buffer should not be less then the input buffer.

ruvavatar of ruv 2018-08-19 09:34:36

Correction for the above message:

the current parse

should be read as

the current parse area

StephenPelcavatar of StephenPelc 2018-08-20 10:52:29

It is a mistake to have the category "compile-only". Such words do not have defined interpretation actions, but may have a non-standard system-specific action. There are plenty of Forth systems with interpreted versions of IF ... THEN and DO ... LOOP. Long may they continue. Now that we know how to implement such words in a standard-compliant way, we should be encouraging emergent behaviour rather than denying it by claiming something that the standard(s) do not say.

Perhaps we should hold back on the separation of compilation and interpretation words (e.g. S" and [S"]) until the NDCS discussions have stabilised. I already regret proposing to obsolete [COMPILE] because [COMPILE] is useful, e.g. inside POSTPONE, and I made the proposal before I had completed the investigation that lead to the NDCS paper.

BerndPaysanavatar of BerndPaysan 2018-08-21 14:13:45

What might be useful is to have a common word to put a string into the system string buffer. I'd name that >STRING-BUF or such. It takes a string from a parse area or some other transcendent area, and moves it to the system string buffer, which holds at least two strings (as specified in File S" and S\").

Systems usually have such a word (it's a reasonable factor), but not under a common name.

BerndPaysanavatar of BerndPaysan 2018-08-21 18:51:42

Words helping to implement words like S(, S" (interactive version), or S\" so far found:

  • Gforth: SAVE-MEM
  • bigForth: >SSTRINGBUF (hidden, burried word)
  • VFX: >SYSPAD
  • SwiftForth: >QPAD COUNT
  • Win32Forth: uses $NEW to get a new buffer and PLACE to get the string there, only Forth that doesn't follow the common sense factoring here.

We are pretty good at burying our tools.

AntonErtlavatar of AntonErtl 2018-08-22 10:31:33

Stephen Pelc is an example of the position that values copy-pasteability higher than the implementation simplicity of only having normal and immediate words. The present proposal is an example of how the opposite position approaches the problem. Both positions have put their stamps on Forth-94 and Forth-2012: The copy-paste position gave use S" TO IS ACTION-OF S\", while the implementation simplicity position gave us ' ['] ." .( CHAR [CHAR].

There is a third approach: Avoid using parsing words for these purposes; instead, use recognizers. This approach has also found a way into the Forth standards and given us integer literals like 123, doubles such as 123., and fp literals such as 123e, and, in Forth-2012, character literals such as 'A'. For strings, several systems implement a recognizer for "abc def" (some using the recognizer proposal, others using system-specific ways).

I think we should first find consensus over which approach we prefer. If there is consensus that pairs of parsing words are the way to go, then we should go ahead with this proposal (and introduce similar ones for TO etc., and proposals to replace the existing recognizers with pairs of words); if there is no such consensus (and I don't think there is), I don't see a point in further pursuing this proposal.

Reply