6.1.2450 WORD CORE

( char "<chars>ccc<char>" -- c-addr )

Skip leading delimiters. Parse characters ccc delimited by char. An ambiguous condition exists if the length of the parsed string is greater than the implementation-defined length of a counted string.

c-addr is the address of a transient region containing the parsed word as a counted string. If the parse area was empty or contained no characters other than the delimiter, the resulting string has a zero length. A program may replace characters within the string.

See:

Rationale:

Typical use: char WORD ccc<char>

Testing:

: GS3 WORD COUNT SWAP C@ ;
T{ BL GS3 HELLO -> 5 CHAR H }T
T{ CHAR " GS3 GOODBYE" -> 7 CHAR G }T
T{ BL GS3 
   DROP -> 0 }T
\ Blank lines return zero-length strings

ContributeContributions

AntonErtlavatar of AntonErtl [315] WORD and the text interpreterRequest for clarification2023-11-27 18:02:20

In traditional implementations, the text interpreter uses WORD and thus clobbers the buffer used by word. This can be seen with the following test:

: ctype count type ; cr bl word uno ctype

If the text interpreter does not clobber the word buffer, this test outputs "uno"; if the text interpreter uses the WORD buffer, it outputs "ctype". Here are the results for different Forth systems:

output system
uno    Gforth 0.7.3, Copyright (C) 1995-2008 ...
ctype  iForth-5.1-mini
uno    lxf 1.6-982-823 Compiled on 2017-12-03
ctype  SwiftForth x64-Linux 4.0.0-RC52 20-Sep-2022
uno    VFX Forth 64 5.11 RC2 [build 0112] 2021-05-02 for Linux x64

So two systems clobber the WORD buffer in the text interpreter (as is traditional).

The reason for this request is that I fail to find any hint in the standard that the WORD buffer may be clobbered by parsing by the text interpreter. An obvious place would be 3.3.3.6, and it does mention certain circumstances when the contents of the WORD buffer may become invalid, but these circumstances do not include parsing by the text interpreter. A not so good place would be 3.4.1, but I don't find any such a provision there, either. If the standard contains such a provision, it is well hidden and that should be fixed.

If the standard does not contain such a provision, there are two options:

  1. Fix the systems to avoid clobbering the WORD buffer in the text interpreter
  2. Change the standard to allow clobbering the word buffer by parsing in the text interpreter.

Given that I have seen confused questions by users over the clobbering behaviour by some systems several times, I would prefer option 1. If you prefer option 2, make a proposal for such a change.

ruvavatar of ruv

I think, the initial intention was that the Forth text interpreter may use word (and find, as well as the pictured numeric output string buffer) by itself. These means were standardized not as a separate facilities for programs, but as the facilities that Forth systems already used by itself. So it's simply an oversight that the corresponding restriction is not normatively mentioned.

Another argument is that a user might implement a user-defined text interpreter using word and find (or the Recognizer API), and it must not make system non-standard.

Given that I have seen confused questions by users over the clobbering behaviour by some systems several times,

It means, users still use word. And then, they can use word in their own text interpreter, so they should be warned.

AntonErtlavatar of AntonErtl

I agree that this is an oversight in (and since) Forth-94. I think that we can therefore remove the guarantee that the text interpreter (and, I guess, some parsing words) to not clobber the WORD buffer without making this guarantee obsolescent for one version of the standard. The question is if we want that.

As for users writing their own text interpreter and the result still being a standard system:

  • When dealing with the "clarify FIND" proposal, it turned out that there is no consensus that the Forth standard should support writing a user-defined text interpreter, and that there is no common practice that would allow that.

  • There is currently no way to plug a user-defined text interpreter into the system, so your user-defined text interpreter will not change how the system parses.

Yes, users use WORD, but not many use it for writing a text interpreter, and if they do (e.g., in Bernd Paysan's OOF), the result does not change the system, and therefore the WORD-using user-defined text interpreter does not make the system non-standard.

ruvavatar of ruv

As for users writing their own text interpreter and the result still being a standard system:

  • there is no consensus that the Forth standard should support writing a user-defined text interpreter,

If the standard will specify Recognizer API, it will support writing a user-defined text interpreter de facto.

There is currently no way to plug a user-defined text interpreter into the system, so your user-defined text interpreter will not change how the system parses.

In many use cases a user-defined text interpreter is just called to translate a string or a part of the input source, and then it returns control to the caller. Forth code that is translated may use WORD in its turn.

The main idea is the following. If the user wants to use the word WORD, he should be warned that other components or libraries can also use this word (including the text interpreter that translates user's code), so the result is transient. Probably, a better choice is to use PARSE or PARSE-NAME.

AntonErtlavatar of AntonErtl

If recognizers are ever standardized, they provide a way for user-defining the recognizing part of the text interpreter. However, at least with the current proposals, the parsing is done outside the recognizers (i.e., by the system), and this is good design. WRT "clarify find", given the lack of consensus we have seen in that proposal, my guess is that even with recognizers there will be no consensus that users should be able to use find for the general dictionary-search recognizer.

If a user uses a user-defined text interpreter is used on some string, and uses word in that text interpreter, they should be aware that this text interpreter clobbers the word buffer; whoever writes this text interpreter should document this property, but that is not something that the standard needs to say anything about.

If there is ever a standard way to plug the parsing part of a user-defined text interpreter into the system, that plugging is again under the program's control. So the program's author should be aware of whether that text interpreter clobbers the word buffer, and write the rest of the program to cope with that. So even if we had such a standard feature, it would not be directly relevant to the question at hand. And given that we don't, it's certainly not relevant.

Reply New Version