Proposal: Clarify FIND
This proposal has been moved into this section. Its former address was: /standard/core/FIND
This page is dedicated to discussing this specific proposal
ContributeContributions
AntonErtl [55] Clarify FINDProposal2018-05-23 17:04:29
Problem
The existing specification of FIND is unclear wrt what xts are returned under what conditions. It also uses a different notion of immediacy from the one in the Definition of Terms. From the rationale of FIND, it is obvious that cmForth inspired the idea that two different xts can be returned. The rationale of COMPILE, shows that the intention is that FIND can be usable for the user-defined text interpreter. But FIND as specified does not guarantee that. This proposal would fix this problem; it is also phrased in a way that includes systems like cmForth and Mark Humphries' system.
Proposal
Replace the text in the specification of FIND with:
Find the definition named in the counted string at c-addr. If the definition is not found, return c-addr and zero. If the definition is found, return xt 1 or xt -1. The returned values may differ between interpretation and compilation state. In interpretation state, EXECUTEing the returned xt performs the interpretation semantics of the word. In compilation state, the returned values represent the compilation semantics: if xt 1 is returned, then EXECUTEing xt performs the compilation semantics; if xt -1 is returned, then COMPILE,ing xt performs the compilation semantics.
JennyBrien
Looks good to me, but the cmForth approach (of separate compilation and interpretation wordlists and searching one first) cannot be made Standard. It will not always find the most recent use of the name. Mark's doesn't either, but it fails by ignoring a more recent compile-only version while interpreting. Other Forths would attempt to execute the compile-only version, which would also be an error.
Thinking in terms of name tokens, it seems that NAME>COMPILE can take four possible forms:
xt compile, \ the default
xt optimiser, \ set by set-compile
xt execute \ a normal immediate word
ext execute. \ set by set-ndcs
FIND should return a flag of -1 for the first two cases (no optimisation is done unless with an intelligent COMPILE, that can derive the optimising xt from the original xt alone) and 1 in the other two cases, but the fourth only while compiling.
AntonErtl
A cmForth-like system could be standard by first looking up user-defined words, then (in compile state) the compiler words, then (in both states) the system-defined words. This does not allow to add new compiler words, but the standard does not allow that, either.
Concerning your description of NAME>COMPILE, optimization (set by SET-OPTIMIZER in Gforth or SET-COMPILER in VFX) affects COMPILE,, not NAME>COMPILE, so your second case does not exist. Otherwise yes, if we accept the present proposal, the xt2 produced by NAME>COMPILE ( nt -- xt1 xt2 ) has to be EXECUTE or COMPILE, to match this FIND; otherwise NAME>COMPILE would produce compilation tokens that cannot be represented by the results of FIND.
StephenPelc
It's too early to rewrite FIND. It may be better to find an alternative and to mark FIND as obsolescent. Modern Forths have a range of flags other than IMMEDIATE, for example the NDCS flag. We have an opportunity to expose other flags and to replace xt value with xt mask where mask is a bitmask containing IMMEDIATE and NDCS bits. This gives more opportunities to systems. We also need to revisit COMPILE, and to provide a compiling word for words (e.g. IF and friends) that parse or produce/consume data at compile time. The standard's wording for compile time is mystically unclear.
StephenPelc
It's too early to rewrite FIND. It may be better to find an alternative and to mark FIND as obsolescent. Modern Forths have a range of flags other than IMMEDIATE, for example the NDCS flag. We have an opportunity to expose other flags and to replace xt value with xt mask where mask is a bitmask containing IMMEDIATE and NDCS bits. This gives more opportunities to systems. We also need to revisit COMPILE, and to provide a compiling word for words (e.g. IF and friends) that parse or produce/consume data at compile time. The standard's wording for compile time is mystically unclear.
AntonErtl
One benefit of this proposal is that it does not use "immediate" in a way that is at odds with the use in the rest of the document. AFAICT it also reflects the intent of the Forth-94 committee better than the current text. And 24 years after Forth-94, it is certainly not too early to make that clarification. It may be unnecessary, but there are people who make wide-reaching claims, because "immediate" is used here with a different meaning than in the rest of the document, so maybe such a clarification is necessary after all.
As for the alternative, I'll propose that Real Soon Now.
StephenPelc
I'm really not trying to stop people from improving FIND. I just need some breathing space to get to the next stage of VFX NDCS. This involves rebuilding the cross compiler on it, and it ain't pretty. Once that's done, I'll have a pretty good idea where the problems lie.
One of the problems in changing FIND now is the law of unintended consequences; basically what else are you going to damage? The safest course may well be to find a new word that we can agree on, and then to mark FIND as obsolescent.
AntonErtl
Do you want the traditional user-defined text interpreter (e.g., as outlined in the rationale of COMPILE,) to work on VFX? The new description specifies exactly as much as is necessary to guarantee that, nothing more. If you need more breathing space, you would break these user-defined text interpreters.
As for unintended consequences, if someone uses a blemish in the standard like the current description of FIND as an excuse to damn the whole standard, a consequence is to fix the blemish.
StephenPelc
Since we cannot use COMPILE, for words that parse or affect the stack, surely NAME>COMPILE has to be able to return the xt of words such as the suggested NDCS, which can handle parsing or stack effects.
AntonErtl
Let's make it concrete:
: s"-int '"' parse save-mem ;
: s"-comp '"' parse postpone sliteral ;
' s"-int ' s"-comp interpret/compile: s"
s\" s\"" find-name name>compile ( xt1 xt2 )
A correct result at the end of this piece of code is: xt1 is the xt of S"-comp
and xt2 is the xt of execute
. In compilation state, FIND returns the xt of S"-comp
and 1.
Now your idea seems to be a different result for the piece of code above: xt1 should be the xt of S"
, and xt2 should be the xt of NDCS,
. That may be correct for NAME>COMPILE
, but what does your FIND do, if you want to support the traditional user-defined text interpreter (which uses only EXECUTE
and COMPILE,
)?
JennyBrien
I can see the point of centralising the optimising of default words in COMPILE,
because that allows the action to take place within a definition, apart from the compiling loop. If no optimisation is set, it compiles a call, which results in code that produces the same output, if more slowly.
But what should xt NDCS,
do?
In particular, what should it do if presented with an xt that has no explicit compiling action?
It would seem most sensible to compile it - which is after all 'performing its compilation semantics'. But in that case, the whole of the compiler, form the point of obtaining a valid token, is bundled in the one word.
NDCS,
is equivalent to NAME>COMPILE EXECUTE
The more import question is: does this token have an explicit compiling action set, and if so, what is it? With name tokens, that is simple:
: EXPLICIT? ( nt -- int-xt 0 | comp-xt -1 )
NAME>COMPILE ['] EXECUTE = ;
AntonErtl
NAME>COMPILE is a standard way to get at the compilation semantics, and it works for all words. There is no need for NDCS, as a standard interface. Some systems may have it as internal factor, however.
But that's for a discussion on replacing FIND, and is not relevant to the present proposal on clarifying FIND.
AntonErtlNew Version: Clarify FIND
ChangeLog
2018-05-29: Specify FIND for words without interpretation semantics, and loosen it for TO IS ACTION-OF. Added Remarks section for a rationale of these additions.
2018-05-23: Initial version
Problem
The existing specification of FIND is unclear wrt what xts are returned under what conditions. It also uses a different notion of immediacy from the one in the Definition of Terms. From the rationale of FIND, it is obvious that cmForth inspired the idea that two different xts can be returned. The rationale of COMPILE, shows that the intention is that FIND can be usable for the user-defined text interpreter. But FIND as specified does not guarantee that. This proposal would fix this problem; it is also phrased in a way that includes systems like cmForth and Mark Humphries' system.
Proposal
Replace the text in the specification of FIND with:
Find the definition named in the counted string at c-addr. If the definition is not found, return c-addr and zero. If the definition is found, return xt 1 or xt -1. The returned values may differ between interpretation and compilation state. In interpretation state, EXECUTEing the returned xt performs the interpretation semantics of the word. In compilation state, the returned values represent the compilation semantics: if xt 1 is returned, then EXECUTEing xt performs the compilation semantics; if xt -1 is returned, then COMPILE,ing xt performs the compilation semantics.
Find the definition named in the counted string at c-addr. If the definition is not found, return c-addr and zero. If the definition is found, return xt 1 or xt -1. The returned values may differ between interpretation and compilation state. In interpretation state, EXECUTEing the returned xt performs the interpretation semantics of the word. In compilation state, the returned values represent the compilation semantics: if xt 1 is returned, then EXECUTEing xt performs the compilation semantics; if xt -1 is returned, then COMPILE,ing xt performs the compilation semantics.
In interpretation STATE, FIND may produce c-addr 0 if the definition has no interpretation semantics; if it produces xt 1 or xt -1, the returned xt represents a system-dependent action.
If the definition if for a word for which POSTPONE is ambiguous, it is ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND.
In 4.1.2 Ambiguous conditions, add the ambiguous condition above, and remove "6.1.1550 FIND" from
attempting to obtain the execution token, (e.g., with 6.1.0070 ', 6.1.1550 FIND, etc. of a definition with undefined interpretation semantics;
Remarks
The removal of FIND from the clause in 4.1.2 ensures that we can text-interpret (in compile STATE) words without interpretation semantics, such as IF. The description of the behaviour of FIND for these words in interpretation STATE allows implementations that do not find such words, implementations that return the xt for an error, implementations that return the xt for the compilation semantics, and implementations that return the xt for some system-specific interpretation semantics.
The ambiguous condition allows STATE-smart implementations of TO, IS and ACTION-OF (as Forth-94 and Forth-2012 do).
Note that this does not allow STATE-smart implementations of words without interpretation semantics (e.g., IF), but then, that's already forbidden by POSTPONE and [COMPILE].
JennyBrien
Editing suggestion:
In interpretation state, EXECUTEing the returned xt performs the interpretation semantics of the word. If the definition has no interpretation semantics FIND may produce c-addr 0; if it produces xt 1 or xt -1, the returned xt represents a system-dependent action.
In compilation state, the returned values represent the compilation semantics: if xt 1 is returned, then EXECUTEing xt performs the compilation semantics; if xt -1 is returned, then COMPILE,ing xt performs the compilation semantics.
If the definition is for a word for which POSTPONE is ambiguous, it is ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND.
AntonErtlNew Version: Clarify FIND
ChangeLog
2018-05-17: Wording changes following the suggestion by JennyBrien
2018-05-29: Specify FIND for words without interpretation semantics, and loosen it for TO IS ACTION-OF. Added Remarks section for a rationale of these additions.
2018-05-23: Initial version
Problem
The existing specification of FIND is unclear wrt what xts are returned under what conditions. It also uses a different notion of immediacy from the one in the Definition of Terms. From the rationale of FIND, it is obvious that cmForth inspired the idea that two different xts can be returned. The rationale of COMPILE, shows that the intention is that FIND can be usable for the user-defined text interpreter. But FIND as specified does not guarantee that. This proposal would fix this problem; it is also phrased in a way that includes systems like cmForth and Mark Humphries' system.
Proposal
Replace the text in the specification of FIND with:
Find the definition named in the counted string at c-addr. If the definition is not found, return c-addr and zero. If the definition is found, return xt 1 or xt -1. The returned values may differ between
interpretation and compilation state. In interpretation state, EXECUTEing the returned xt performs the interpretation semantics of the word. In compilation state, the returned values represent the
interpretation and compilation state:
In interpretation state: If the definition has interpretation semantics, FIND returns xt 1 or xt -1, and EXECUTEing xt performs the interpretation semantics of the word. If the definition has no interpretation semantics, FIND may produce c-addr 0; if it produces xt 1 or xt -1, EXECUTEing xt performs a system-dependent action.
In compilation state, the returned values represent the
compilation semantics: if xt 1 is returned, then EXECUTEing xt performs the compilation semantics; if xt -1 is returned, then COMPILE,ing xt performs the compilation semantics.
In interpretation STATE, FIND may produce c-addr 0 if the definition has no interpretation semantics; if it produces xt 1 or xt -1, the returned xt represents a system-dependent action.
If the definition if for a word for which POSTPONE is ambiguous, it is
If the definition is for a word for which POSTPONE is ambiguous, it is
ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND.
In 4.1.2 Ambiguous conditions, add the ambiguous condition above, and remove "6.1.1550 FIND" from
attempting to obtain the execution token, (e.g., with 6.1.0070 ', 6.1.1550 FIND, etc. of a definition with undefined interpretation semantics;
Remarks
The removal of FIND from the clause in 4.1.2 ensures that we can text-interpret (in compile STATE) words without interpretation semantics, such as IF. The description of the behaviour of FIND for these words in interpretation STATE allows implementations that do not find such words, implementations that return the xt for an error, implementations that return the xt for the compilation semantics, and implementations that return the xt for some system-specific interpretation semantics.
The ambiguous condition allows STATE-smart implementations of TO, IS and ACTION-OF (as Forth-94 and Forth-2012 do).
Note that this does not allow STATE-smart implementations of words without interpretation semantics (e.g., IF), but then, that's already forbidden by POSTPONE and [COMPILE].
ruv
1. Immediacy notion.
It also uses a different notion of immediacy from the one in the Definition of Terms.
I believe this is an inconsistency. But in the message news:2019Aug2.082728@mips.complang.tuwien.ac.at you wrote:
I don't think there is an inconsistency in the normative text of the standard here. It's just that the normative text does not reflect the intention of the Forth-94/2012 committees.
Using immediacy in different notion is an inconsistency regardless of the intention. But if you changed your mind the wording in the proposal perhaps should be updated accordingly.
Another possible interpretation is following. If we suppose that when FIND
returns different xt depending on STATE it returns xt for the different Forth definitions, then each of them can be independently immediate (or not immediate) in the notion from the Definition of Terms. This idea also conforms to the conception that a Forth definition can have not more than one execution semantics. Also, in cmForth and Mark Humphries' system they are actually the different Forth definitions (the words from the different word lists having the same name, or the words having the same name but the different flags).
2. An ambiguous condition on obtain the execution token
It seems that removing "6.1.1550 FIND" is not enough, since the statement mentions just some examples due to using "e.g." and "etc.". It is need to enumerate exactly all the possible variants.
OTOH, in the proposed specification FIND
does not return execution tokens for the corresponding Forth definition (in the general case). It returns (on success) the pair of values that represents the compilation semantics or the interpretation semantics (the latter is expressed less clear although), or something else. But from the returned values we can say nothing about execution semantics (according to the proposed specification).
3. Performing the returned xt in the different STATE
The cases of ambiguous POSTPONE
can be eliminated in the future, but it will not affect the limitation on performing the corresponding tokens in the same state only. Also this limitation should be applied for some other words too (e.g. for words with undefined interpretation semantics, since these words may be implemented as "state-smart").
So, it is better to refer them via undefined (or unspecified) execution semantics.
E.g.: "If the definition is for a word for which execution semantics is not specified by the Standard, it is ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND"
Another issue is that in this case the returned values do not represent the compilation semantics (since we don't know from these values should we set compilation state to perform these semantics or not), but the text says that "In compilation state, the returned values represent the compilation semantics"
4. Implementing some words via recognizer
It seems that the current specifications wording allows to implement the words with undefined interpretation (or even execution) semantics via the recognizer mechanism, and they may be unfindeable by FIND in such case. But the proposed specification excludes such implementation approach. Could it be changed to don't exclude this approach?
The idea: If the definition has no execution semantics, FIND may produce c-addr 0;
5. Terminology
Why is the "represent" term used instead of the "identify" term?
E.g. "the returned pair of values identifies the compilation semantics".
Have a look to the following normative text from the current Standard: "execution token: A value that identifies the execution semantics of a definition"
ruv
Note that this does not allow STATE-smart implementations of words without interpretation semantics (e.g., IF), but then, that's already forbidden by POSTPONE and [COMPILE].
It is wrong that it's already forbidden. POSTPONE (and [COMPILE]) does not forbid STATE-smart implementations of the words with undefined interpretation semantics (like IF). I suggested a simple implementation of POSTPONE that supports STATE-smartness for these words (like IF
or ACTION-OF
, etc.).
Therefore, the updated specification should allow the STATE-smart implementations for the words with undefined execution semantics.
AntonErtl
Immediacy notion.
You can implement FIND in a way that produces 1 only for words where the compilation semantics are to perform the execution semantics. In that sense the standard is not inconsistent. It's just that such a FIND is totally useless for the classic user-defined text interpreter, and also, in all existing systems, including systems that implement e.g., TO as STATE-smart word, as well as systems like cmForth, FIND does not behave in that way.
ambiguous condition
Good catch about the "e.g.".
It is the intention of the proposal that FIND returns values that represent the interpretation semantics or the compilation semantics, because that's what is needed for the user-defined text interpreter. Execution semantics (if they exist) are just helper semantics for defining interpretation and compilation semantics through the default mechanism; you can see that by the absence of execution semantics for nearly all words where both interpretation and compilation semantics are specified directly.
Performing the returned xt in the different STATE
Words with undefined interpretation semantics cannot be implemented as STATE-smart words as long as POSTPONE is allowed for them. POSTPONE allows to perform the compilation semantics in interpretation state, and such implementations would not work correctly then.
The values returned in compilation state represent the compilation semantics. I could not follow what you mean in the paragraph about that.
Recognizer
The recognizer stuff is used in the system-defined text interpreter (or a recognizer-aware user-defined text interpreter), not inside FIND. I explored the idea of doing the recognizer inside FIND in a EuroForth 2016 paper, but that's incompatible with some user-defined text interpreters using FIND, and in any case, the current recognizer proposal proposes putting it in the text interpreter.
It's unclear to me what you think is excluded, and why, and what it has to do with recognizers.
Terminology
No particular reason. If the committee prefers, they can replace "represent" with "identify".
POSTPONE
Your suggested implementation of POSTPONE does not work correctly (I will address that issue there). So no, a STATE-smart IF is non-standard. A STATE-smart ACTION-OF is standard, because there is an ambiguous condition on POSTPONE ACTION-OF.
ruv
6. POSTPONE and STATE-smartness
@Anton, in your comment you hadn't shown that "suggested implementation of POSTPONE does not work correctly" but vice versa: "this can work". I'm even going to provide a working Forth system for your test later.
So, the original specification for FIND
and POSTPONE
does not disallow the STATE-smart implementations for the words with unspecified execution semantics.
Hence, the updated specification should not disallow it too.
3. Performing the returned xt in the different STATE
I meant the following state from your proposal:
If the definition is for a word for which POSTPONE is ambiguous, it is ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND.
It has the following issues:
a) The rule "POSTPONE is ambiguous" isn't enough robust (since this ambiguity can be eliminated), and it isn't enough at all (due to possible STATE-smart implementations of the words with undefined interpretation semantics).
b) The state "it is ambiguous to perform the xt returned by FIND in a STATE different from the STATE during FIND" means that the returned pair (xt 1)
, that can be produced in compilation state, does not represent the compilation semantics: since from this pair you don't know do you need to set compilation state or not before executing xt.
4. Possible implementation of some words (like TO
) via Recognizer API
I should have mentioned the examples.
a) See dual-semantics-via-recognizer.example
b) See c.l.f. messages: news://qi9c8i$df1$1@gioia.aioe.org, news://qi9am4$703$1@gioia.aioe.org.
It seems that such implementation of TO
is not disallowed by Forth-2012 standard, but will be disallowed by the proposed update specification for FIND
.
AntonErtl
Over the course of the last year it has become clear that a significant number of systems don't support user-defined text interpreters based on FIND that are as capable as the system-defined text interpreter; e.g., in some systems FIND does not find locals (and there is no other standardized way to do that), and IIRC VFX5 handles S" other than intended in such a user-defined text interpreter. So basically, there is no common practice for a FIND-based user-defined text interpreter that can handle all of Forth. There is also no consensus that this is something that should be supported with FIND.
So this proposal is based on wrong assumptions about the intentions of the committee and the community at large. Therefore I retract it.