Proposal: Clarify FIND, more classic approach

Informal

This proposal has been moved into this section. Its former address was: /standard/core/FIND

This page is dedicated to discussing this specific proposal

ContributeContributions

ruvavatar of ruv Clarify FIND, more classic approachProposal2019-10-08 11:01:25

It is an alternative proposal to the one from Anton.

Problem

  1. The existing specification of FIND is unclear how the returned xt is connected with interpretation and compilation semantics for the corresponding word.

  2. In some popular Forth systems n=1 does not mean that the word is immediate.

Solution

Use the new wording in the specification for FIND.

Keep the original immediacy notion, but use another (more loose) wording for meaning of n in compilation state. The new wording allows to implement the words with undefined execution semantics as "dual-xt" words, and still allows (as it was before) to implement them as immediate STATE-dependent words. Also it allows to have the special definitions to compile the words with undefined interpretation semantics and defined execution semantics (like EXIT), and return proper values for them from FIND.

No need to mention that FIND may return c-addr 0 for the words with undefined interpretation semantics. It is a consequence of the clause that the returned values depend on STATE, and it is mentioned in the Rationale.

Don't change 4.1.2. Perhaps make FIND tighter later when it will be optional. The original specification guarantees that a user-defined text interpreter can interpret all ordinary and user-defined words at least.

Now many Forth systems don't use FIND by themselves but provide it for the old-fashion programs only. There is no much sense to restrict the implementation options of the modern Forth systems for the sake of the outdated approach. I think the modern Forth systems will tend to use Recognizer/Resolver approach for the special syntaxes and special words.

Proposal

Replace the text in the specification of FIND with the following.

( c-addr -- c-addr 0 | xt n )

Find the definition name whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero. Otherwise the definition is found, xt is the execution token for name, and n is 1 or -1. The all returned values may differ between interpretation and compilation state.

When the definition is found in interpretation state: if the definition is immediate then n is 1, otherwise n is -1; performing xt in intepretation state performs the interpretation semantics for name.

When the definition is found in compilation state: if n is -1, appending the semantics identified by xt to the current definition performs the compilation semantics for name, otherwise performing xt in compilation state performs the compilation semantics for name.

"Performing xt" means performing the execution semantics identified by the execution token xt.


Please check the gist at github for updating and corrections in this proposal.

ruvavatar of ruv

Have a look at the next version of FIND specification I have designed. It seems this variant is more accurate and quite better.

ruvavatar of ruvNew Version

Show differences

It is an alternative proposal to the one from Anton.

Problem

  1. The existing specification of FIND is unclear how the returned xt is connected with interpretation and compilation semantics for the corresponding word.

  2. In some popular Forth systems n=1 does not mean that the word is immediate.

Solution

Use the new wording in the specification for FIND.

Keep the original immediacy notion, but use another (more loose) wording for meaning of n in compilation state. The new wording allows to implement the words with undefined execution semantics as "dual-xt" words, and still allows (as it was before) to implement them as immediate STATE-dependent words. Also it allows to have the special definitions to compile the words with undefined interpretation semantics and defined execution semantics (like EXIT), and return proper values for them from FIND.

Some differences to the Anton's proposal

More accurate wordings that are closer to the language of standard.

Use "default interpretation semantics" criteria instead of referring to POSTPONE (item 3 in my comment).

Allow to implement words without interpretation semantics (e.g., IF) as immediate STATE-dependent words (as it was before).

Do specify what n means in all possible cases (news:qnko0l$jk2$1@dont-email.me).

Don't change 4.1.2. since FIND cannot and doesn't return xt for a definition with not default interpretation semantics. The new specification guarantees that a user-defined text interpreter can interpret any word that is found by FIND. Also, 4.1.2 should be updated independently by itself.

Nowadays many Forth systems don't use FIND by themselves but provide it for the old-fashion programs only. There is no much sense to restrict the implementation options of the modern Forth systems for the sake of the outdated approach. I think the modern Forth systems will tend to use Recognizer/Resolver approach for the special syntaxes and special words.

Proposal

Replace the text in the specification of FIND with the following.


FIND

( c-addr -- c-addr 0 | xt n )

Find the definition name whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero. Otherwise the definition is found, return xt and n.

If name has default interpretation semantics, xt is the execution token for name, and n is 1 if name is immediate word, -1 otherwise. The returned values are the same regardless whether the definition is found in interpretation state or in compilation state.

If name has other than default interpretation semantics, xt is the execution token for an unspecified implementation-dependent definition, and n is 1 or -1, and the following conditions are met.

  1. When the definition is found in compilation state: if n is 1, performing xt in compilation state performs the compilation semantics for name, otherwise n is -1, and appending the execution semantics identified by xt to the current definition performs the compilation semantics for name.

  2. When the definition is found in interpretation state: if n is 1, xt and n are the same when the definition is found in compilation state, otherwise n is -1, and both xt and n may be different when the definition is found in compilation state; performing xt in interpretation state performs the interpretation semantics for name.

  3. The definition may be not found in interpretation state but found in compilation state, and vise versa. Also a definition may be not found at all.

An ambiguous condition exists if xt is performed in the conditions that are not met the conditions specified above.


"Performing xt" means performing the execution semantics identified by the execution token xt.

A definition has default interpretation semantics if and only if the "Interpretation:" section is absent in the corresponding glossary entry, and the "Execution:" section is present (see also 3.4.3.1). Default interpretation semantics for a definition is to perform its executin semantics in interpretation state (see also 3.4.3.2).

If interpretation semantics are undefined for a definition, a Forth system is allowed to provide implementation-defined interpretation semantics for this definition (see A.3.4.3.2). In such case, when the definition is found in interpretation state, performing the returned xt in interpretation state performs the implementation-defined interpretation semantics for name.

A program is allowed to apply FIND to any string. A definition may be not found by FIND even if a Forth system provides interpretation or compilation semantics for the corresponding name (for example, in the case of locals).

ruvavatar of ruv

A problem with this version is that it doesn't allow to return the different xt between interpretation and compilation state for words with default interpretation semantics, while the original intention was that it's allowed. See also: news:2020Oct24.191314@mips.complang.tuwien.ac.at.

ruvavatar of ruvNew Version

Show differences

Author

Ruv

Change Log

2019-10-08: Initial version

2020-08-28: Avoid ambiguous clause "xt is the execution token for name" in the case of a word with non default interpretation semantics.

2021-04-18: Allow to return the different xt for any definition. More tight meaning of n in interpretation state. Avoid "implementation-dependent definition" and make the wording simpler.

Problem

The descriptions of the problem and solution are the same as in the previous version

Proposal

Replace the text in the specification of FIND with the following.


FIND

( c-addr -- c-addr 0 | xt n )

Find the definition name whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero.

Otherwise, return xt and n, where xt is an execution token and n is -1 or 1. The returned values may differ between interpretation and compilation state, and the following conditions shell be met:

  • if the definition is found in interpretation state, then
    • if and only if name is immediate, n is 1, otherwise n is -1;
    • if name has default interpretation semantics, xt indetifies the execution semantics for name;
    • performing xt in interpretation state performs the interpretation semantics for name;
  • if the definition is found in compilation state, then
    • if n is 1, performing xt in compilation state performs the compilation semantics for name;
    • if n is -1, appending the execution semantics identified by xt to the current definition performs the compilation semantics for name.

A definition may be found in compilation state but not found in interpretation state (or vise versa).


"Performing xt" means performing the execution semantics identified by the execution token xt.

A definition has default interpretation semantics if and only if the "Interpretation:" section is absent in the corresponding glossary entry (see 3.4.3.2).

If interpretation semantics are undefined for a definition, a Forth system is allowed to provide implementation-defined interpretation semantics for this definition (see A.3.4.3.2). In such case, when the definition is found in interpretation state, performing the returned xt in interpretation state performs the implementation-defined interpretation semantics for name.

If immediacy is not specified for a definition with non default interpretation semantics, a Forth system is still allowed to implement this definition as an immediate word by providing implementation-dependent execution semantics for this definition (see A.6.1.2033, A.6.1.1550).

ruvavatar of ruv

Some examples

The examples to illustrate this version of the specification

Implementation approach: single-xt dual-nt (and then dual-xt) dual-xt (but single-nt)
Comment: Some words are immediate The second nt is optional, and it's possible that it's associated with the same xt as the first The second xt can be optional, and it can be the same as the first
Result of FIND in the different state
For «DUP»
DUP interpretation ( xt-exe -1 ) ( xt-exe -1 ) ( xt-exe -1 )
DUP compilation ( xt-exe -1 ) ( xt-exe -1 | xt-comp 1 ) ( xt-exe -1 | xt-comp 1 )
For «[IF]»
[IF] interpretation ( xt-exe 1 ) ( xt-exe 1 ) ( xt-exe 1 )
[IF] compilation ( xt-exe 1 ) ( xt-exe 1 | xt-comp 1 ) ( xt-exe 1 )
For «S"»
S" interpretation ( xt-exe 1 ) ( xt-int -1 ) ( xt-int -1 )
S" compilation ( xt-exe 1 ) ( xt-comp 1 ) ( xt-comp 1 )
For «IF»
IF interpretation ( xt-exe 1 ) ( xt-comp 1 | xt-int -1 | c-addr 0 ) ( xt-comp 1 | xt-int -1 | c-addr 0 )
IF compilation ( xt-exe 1 ) ( xt-comp 1 ) ( xt-comp 1 )

(Concerning meaning of xt-int, xt-comp see also my another comment)


Can such results be easy implemented, or should the n value in interpretation state be more loose?

ruvavatar of ruv

Concerning meaning of n in interpretation state

First of all, execution token for a word means the execution token that identifies the execution semantics for this word (regardless whether the standard defines these semantics or not).


Without a doubt, a standard program may rely that for a user defined word, n is 1 if this word is immediate, and n is -1 otherwise

t{ : bar c" foo" find nip ;  : foo ; bar immediate bar -> -1 1 }t

The standard testcase also confirms this point in F.6.1.1550 FIND.


If we extend this rule to all words with default interpretation semantics, then for such a word: if it is immediate, then n shall be 1, otherwise n shall be -1.


So we only have some options for a standard word with non default interpretation semantics (NB: we only consider the case when a word is found in interpretation state). These options are the following:

  • A. If the word is implemented as an immediate word, then n is 1, otherwise n is -1.

  • B. In any case, n is 1.

  • C. n is unspecified among 1 and -1.

Depending on this choice in the specification, a system has different obligations, and a program can have different assumptions.


So if any word (of unknown kind) is found by a program in interpretation state, the program can have the following assumptions from the solely n value:

  • in option A: if n is 1, then xt is the execution token for this word and this word is immediate, otherwise n is -1 — no assumptions.

  • in option B: if n is -1, then xt is the execution token for this word and this word is not immediate, otherwise n is 1 — no assumptions.

  • in option C: regardless of n is 1 or -1 — no assumptions.


What is the most preferable option?


In this version I specified the option A since it seems to have simplest wording. But I think now the option B would be more useful.

ruvavatar of ruv

Choice n value in interpretation state

Clarification and correction

We should distinguish the declared semantics (in the glossary entry for a word) and the implemented semantics (in a particular Forth system, for this word). A declaration itself does not limit the implementation ways. And FIND returns n value not according to the declaration, but according to the implementation.

It means that some words with non default interpretation semantics can (and may) be implemented as an ordinary word, e.g. COMPILE, or EXIT, and then FIND should return n value -1 for them.

So, the option B from above message should be corrected.


The possible options for n value in interpretation state for the words with non default interpretation semantics in their glossary entries, after correction:

  • A. If the word is implemented as an immediate word, then n is 1, otherwise n is -1.

  • B. If the word is implemented as an ordinary word, then n is -1, otherwise n is 1.

  • B. n is unspecified among 1 and -1.


We consider these options for only standard words with non default interpretation semantics, since for other words we already don't have any choice, the value of n is determined: -1 for ordinary words, 1 for immediate words.

Implementation factors

Let's consider a simple cmForth-like system that doesn't explicitly maintain the immediacy flag, but rely on the different word lists only. For such a system the simplest approach is to return the same n regardless of the word kind, for example -1. This approach is simplest since otherwise the system have to make the second search in another word list to detect whether it's an ordinary word or not. But this approach is already unacceptable, since FIND should return n value 1 for immediate words. So this system have to perform the second search in any case. But then, in some simple implementation, this system cannot distinguish immediate word from a non-immediate word — i.e., it doesn't have information whether xt (that found in another word list) identifies the execution semantics for the name, or only performs the compilation semantics for name. It means that option A cannot be implemented in this very simple system. Then the option B is only possible fir this case.

In a more complex system, any option from the A and B is possible with equal cost. The option C means that a system is allowed to implement any from the A and B options, as well as something another, e.g. to return every time a random number among 1 and -1 (that is pretty unuseful).

If a program implements some advanced technique in a portable way (e.g., recognizer, see a comment), it needs to determine interpretation semantics and compilation semantics in the same time. With the option A this program have to perform FIND twice for all but immediate words. With the option B this program have to perform FIND twice for all but ordinary words. But the ordinary words comprise the majority part of words. Then the option B gains better performance.

So the option B is better for implementations in the general case.

Usefulness factors

Between the options A and B: it's far more important to distinguish ordinary words from non ordinary, than immediate words from non immediate. Also, immediacy is just a way to implement non ordinary words.

Concerning the option C: obviously, it's far less useful.

So, the option B is preferable.

Back compatibility factors

It seems, dual-xt systems already implement the option B (please, check).

Then the option B is better.

Conclusion

I suggest to stick with the option B.

ruvavatar of ruvNew Version

Show differences

Author

Ruv

Change Log

2019-10-08: Initial version

2020-08-28: Avoid ambiguous clause "xt is the execution token for name" in the case of a word with non default interpretation semantics.

2021-04-18: Allow to return the different xt for any definition. More tight meaning of n in interpretation state. Avoid "implementation-dependent definition" and make the wording simpler.

2021-05-06: Correct meaning of n in interpretation state: iff n is -1, then xt identifies the execution semantics for name. Eliminate the"default interpretation semantics" notion from the normative part.

Problem

The descriptions of the problem and solution are the same as in the previous version

Proposal

Replace the text in the specification of FIND with the following.


FIND

( c-addr -- c-addr 0 | xt n )

Find the definition name whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero.

Otherwise, return xt and n, where xt is an execution token and n is -1 or 1. The returned values may differ between interpretation and compilation state, and the following conditions shell be met:

  • if the definition is found in interpretation state, then
    • if and only if xt identifies the execution semantics for name and name is not immediate, n is -1, otherwise n is 1;
    • performing xt in interpretation state performs the interpretation semantics for name;
  • if the definition is found in compilation state, then
    • if n is 1, performing xt in compilation state performs the compilation semantics for name;
    • if n is -1, appending the execution semantics identified by xt to the current definition performs the compilation semantics for name.

A definition may be found in compilation state but not found in interpretation state (or vise versa). A program is allowed to apply FIND to any string.


"Performing xt" means performing the execution semantics identified by the execution token xt.

If interpretation semantics are undefined for a definition, a Forth system is allowed to provide implementation-defined interpretation semantics for this definition (see A.3.4.3.2). In such case, when the definition is found in interpretation state, performing the returned xt in interpretation state performs the implementation-defined interpretation semantics for name.

Neither immediacy nor non-immediacy is specified for the most definitions by this standard, so a Forth system is allowed to implement any definition as an immediate word if this implementation meets the specification for this word (see A.6.1.2033, A.6.1.1550).

Reply New Version