6.1.2450 WORD CORE

( char "<chars>ccc<char>" -- c-addr )

Skip leading delimiters. Parse characters ccc delimited by char. An ambiguous condition exists if the length of the parsed string is greater than the implementation-defined length of a counted string.

c-addr is the address of a transient region containing the parsed word as a counted string. If the parse area was empty or contained no characters other than the delimiter, the resulting string has a zero length. A program may replace characters within the string.

See:

Rationale:

Typical use: char WORD ccc<char>

Testing:

: GS3 WORD COUNT SWAP C@ ;
T{ BL GS3 HELLO -> 5 CHAR H }T
T{ CHAR " GS3 GOODBYE" -> 7 CHAR G }T
T{ BL GS3 
   DROP -> 0 }T
\ Blank lines return zero-length strings

ContributeContributions

AntonErtlavatar of AntonErtl [315] WORD and the text interpreterRequest for clarification2023-11-27 18:02:20

In traditional implementations, the text interpreter uses WORD and thus clobbers the buffer used by word. This can be seen with the following test:

: ctype count type ; cr bl word uno ctype

If the text interpreter does not clobber the word buffer, this test outputs "uno"; if the text interpreter uses the WORD buffer, it outputs "ctype". Here are the results for different Forth systems:

output system
uno    Gforth 0.7.3, Copyright (C) 1995-2008 ...
ctype  iForth-5.1-mini
uno    lxf 1.6-982-823 Compiled on 2017-12-03
ctype  SwiftForth x64-Linux 4.0.0-RC52 20-Sep-2022
uno    VFX Forth 64 5.11 RC2 [build 0112] 2021-05-02 for Linux x64

So two systems clobber the WORD buffer in the text interpreter (as is traditional).

The reason for this request is that I fail to find any hint in the standard that the WORD buffer may be clobbered by parsing by the text interpreter. An obvious place would be 3.3.3.6, and it does mention certain circumstances when the contents of the WORD buffer may become invalid, but these circumstances do not include parsing by the text interpreter. A not so good place would be 3.4.1, but I don't find any such a provision there, either. If the standard contains such a provision, it is well hidden and that should be fixed.

If the standard does not contain such a provision, there are two options:

  1. Fix the systems to avoid clobbering the WORD buffer in the text interpreter
  2. Change the standard to allow clobbering the word buffer by parsing in the text interpreter.

Given that I have seen confused questions by users over the clobbering behaviour by some systems several times, I would prefer option 1. If you prefer option 2, make a proposal for such a change.

ruvavatar of ruv

I think, the initial intention was that the Forth text interpreter may use word (and find, as well as the pictured numeric output string buffer) by itself. These means were standardized not as a separate facilities for programs, but as the facilities that Forth systems already used by itself. So it's simply an oversight that the corresponding restriction is not normatively mentioned.

Another argument is that a user might implement a user-defined text interpreter using word and find (or the Recognizer API), and it must not make system non-standard.

Given that I have seen confused questions by users over the clobbering behaviour by some systems several times,

It means, users still use word. And then, they can use word in their own text interpreter, so they should be warned.

AntonErtlavatar of AntonErtl

I agree that this is an oversight in (and since) Forth-94. I think that we can therefore remove the guarantee that the text interpreter (and, I guess, some parsing words) to not clobber the WORD buffer without making this guarantee obsolescent for one version of the standard. The question is if we want that.

As for users writing their own text interpreter and the result still being a standard system:

  • When dealing with the "clarify FIND" proposal, it turned out that there is no consensus that the Forth standard should support writing a user-defined text interpreter, and that there is no common practice that would allow that.

  • There is currently no way to plug a user-defined text interpreter into the system, so your user-defined text interpreter will not change how the system parses.

Yes, users use WORD, but not many use it for writing a text interpreter, and if they do (e.g., in Bernd Paysan's OOF), the result does not change the system, and therefore the WORD-using user-defined text interpreter does not make the system non-standard.

ruvavatar of ruv

As for users writing their own text interpreter and the result still being a standard system:

  • there is no consensus that the Forth standard should support writing a user-defined text interpreter,

If the standard will specify Recognizer API, it will support writing a user-defined text interpreter de facto.

There is currently no way to plug a user-defined text interpreter into the system, so your user-defined text interpreter will not change how the system parses.

In many use cases a user-defined text interpreter is just called to translate a string or a part of the input source, and then it returns control to the caller. Forth code that is translated may use WORD in its turn.

The main idea is the following. If the user wants to use the word WORD, he should be warned that other components or libraries can also use this word (including the text interpreter that translates user's code), so the result is transient. Probably, a better choice is to use PARSE or PARSE-NAME.

AntonErtlavatar of AntonErtl

If recognizers are ever standardized, they provide a way for user-defining the recognizing part of the text interpreter. However, at least with the current proposals, the parsing is done outside the recognizers (i.e., by the system), and this is good design. WRT "clarify find", given the lack of consensus we have seen in that proposal, my guess is that even with recognizers there will be no consensus that users should be able to use find for the general dictionary-search recognizer.

If a user uses a user-defined text interpreter is used on some string, and uses word in that text interpreter, they should be aware that this text interpreter clobbers the word buffer; whoever writes this text interpreter should document this property, but that is not something that the standard needs to say anything about.

If there is ever a standard way to plug the parsing part of a user-defined text interpreter into the system, that plugging is again under the program's control. So the program's author should be aware of whether that text interpreter clobbers the word buffer, and write the rest of the program to cope with that. So even if we had such a standard feature, it would not be directly relevant to the question at hand. And given that we don't, it's certainly not relevant.

Reply New Version

soundwaveavatar of soundwave [362] Behavior on oversized WORDRequest for clarification2024-09-03 23:23:24

3.3.3.6 Other transient regions specifies that the region for WORD shall be at least 33 characters, but an ambiguous condition is only to occur if the word exceeds the maximum length of a counted string (usually quite longer). What is the behavior for words of intermediate length? Ambiguous condition (my understanding is this is the case traditionally)? Truncation?

If I'm understanding correctly this also means that an implementation that, e.g., returns counted strings where only the region they occupy is available to the program even if this should be shorter than 33 characters is not compliant?

ruvavatar of ruv

What is the behavior for words of intermediate length? Ambiguous condition (my understanding is this is the case traditionally)?

It's up to the system.

A system is allowed to support definition names longer than 31 characters. Then, the size of the region identified by WORD must be at least the maximum supported definition name length. An ambiguous condition exists if the system encounters a name that is longer than the maximum supported length, as [4.1.2 Ambiguous conditions] says:

  • a definition name exceeded the maximum length allowed (3.3.1.2 Definition names);

In this case a system may truncate the name, may throw an exception, or may take other cations (see 3.4.4 Possible actions on an ambiguous condition).

In the future, systems may be required to throw an exception with a specific throw code in this case (in the frame of eliminating some ambiguous conditions).

3.3.1.2 Definition names says:

  • Programs with definition names longer than 31 characters have an environmental dependency.

If I'm understanding correctly this also means that an implementation that, e.g., returns counted strings where only the region they occupy is available to the program even if this should be shorter than 33 characters is not compliant?

A don't see any statement that disallows a program to modify the region identified by WORD. If a program is allowed to do it, than yes, this region shall be at least 33 characters long, regardless how many characters are occupied.

ruvavatar of ruv

I missed it, there is a better option in the section 4.1.2 Ambiguous conditions:

  • string longer than a counted string returned by 6.1.2450 WORD;

Also, the section 4.1.1 Implementation-defined options says that a standard system shall document the size of buffer for WORD:

  • size of buffer at 6.1.2450 WORD (3.3.3.6 Other transient regions);

It is worth nothing that a program is not allowed to modify the contents of the region identified by WORD when its address become invalid as specified in 3.3.3.6 Other transient regions. In some systems this buffer's location is sliding, since it is offset from here, and compiled code may reside at past addresses of the buffer.

soundwaveavatar of soundwave

would seem to only apply to those words longer than a counted string, not necessarily larger than the transient region for WORD.

The relevant ambiguous condition seems to actually be

  • parsed string overflow;

which afaict can only apply to WORD, which makes sense. Just a little confusing as it does not seem to be mentioned anywhere but in the list of ambiguous conditions that parsing can overflow, not even in the section for parsing. Furthermore there's multiple passages that speak to WORD being limited by counted string length, which is fair as this will certainly be true for some systems, but won't be for minimal ones.

Closed

ruvavatar of ruv

Correction: "worth nothing" should be read as "worth noting"


The relevant ambiguous condition seems to actually be

  • parsed string overflow;

which afaict can only apply to WORD, which makes sense.

I cannot agree. Have a look at the following pairs:

  • About parsed string:
    • Implementation-defined option: maximum size of a parsed string (3.4.1 Parsing);
    • Ambiguous condition: parsed string overflow;
  • About WORD:
    • Implementation-defined option: size of buffer at 6.1.2450 WORD (3.3.3.6 Other transient regions);
    • Ambiguous condition: string longer than a counted string returned by 6.1.2450 WORD;

The association of "parsed string overflow" with "size of buffer at 6.1.2450 WORD" seems incorrect.

Also, 3.4.1 Parsing says: "the number of characters parsed may be from zero to the implementation-defined maximum length of a counted string". I think, this is about any parsing string (including strings parsed with PARSE, PARSE-NAME, S", C", etc). But parsing with WORD just imposes stronger restrictions.


Just a little confusing as

Agreed. This should be fixed.

Reply New Version