Digest #80 2019-09-07
Contributions
CS-DROP "Request for Discussion"
Change History
2019-08-22 CS-DROP again should drop both, control-flow dest and orig item. Propsal extended to included CS-PICK modification
2018-08-20 CS-DROP now operates only on control-flow dest items, typical use example has simpler control-flow structure
2017-07-27 First version
Problem
Forth-94 and Forth-2012 provide explicit access to the control-flow stack by means of the words CS-PICK and CS-ROLL in the Programming-Tools extension wordset (TOOLS EXT). These words allow to copy and rearrange control flow (orig and dest) items.
Control structures (BEGIN IF AHEAD WHILE ...) put control-flow dest resp. orig items onto the control-flow stack. orig items always go along with a yet unresolved forward branch. dest items mark backward branch targets.
There is however no way to remove an item from the control-flow stack without actually resolving it. This limits the abilty to define more complex control structures within the standard's scope.
In its Forth-94 and Forth-2012 version CS-PICK has only defined behaviour when copying dest items to the top of the control flow stack:
15.6.2.1015 CS-PICK “c-s-pick” TOOLS EXT
Interpretation: Interpretation semantics for this word are undefined.
Execution: ( C: dest_u ... orig_0|dest_0 –– dest_u ... orig_0|dest_0 dest_u) ( S: u – – )Remove u. Copy dest_u to the top of the control-flow stack. An ambiguous condition exists if there are less than u+1 items, each of which shall be an orig or dest, on the control-flow stack before CS-PICK is executed.
If the control-flow stack is implemented using the data stack, u shall be the topmost item on the data stack.
See: A.15.6.2.1015 CS-PICK.
u has to index a dest item (dest_u in the above standard text). Trying to copying an orig item with CS-PICK results in an ambigous condition as this would violate the type compatibility with the dest_u input parameter:
4.1.2 Ambiguous conditions
A system shall document the system action taken upon each of the general or specific ambiguous conditions identified in this standard. See 3.4.4 Possible actions on an ambiguous condition.
The following general ambiguous conditions could occur because of a combination of factors:
[...]
– argument type incompatible with specified input parameter, e.g., passing a flag to a word expecting an n (3.1 Data types);
CS-DROPping orig items
So, with this Forth-94 and Forth-2012 behaviour of CS-PICK (trying to copy an orig item is an ambigous condition) it seems to
be reasonable to deal with orig items only by resolving them e.g. by THEN
or ELSE
.
Simply dropping an orig item leaves an unresolved forward branch which actually is malformed code and eventually will crash when executed.
If however CS-PICK could also copy orig items in a defined way, multiple identical orig items could exist in the control flow stack and dropping them later would be perfectly reasonable. The programmer would take care that exactly one of these orig items will be resolved and all other items dropped at the best convenience of her program.
CS-DROPping dest items
An addition, there are situations where control-flow dest items have been generated or duplicated by CS-PICK and then need no further resolution and thus should be simply removed. As dest items designate branch targets, dropping them when not needed is of no further significance.
Solution
A control-flow stack operator - CS-DROP - to discard the top most control flow item can be defined to supply the missing functionality.
Combined with the duplication capability of CS-PICK and the re-arrangement capability of CS-ROLL this would allow to do arbitrary control flow stack changes of its top items (down to the first colon-sys, do-sys, case-sys, or of-sys).
Being able to CS-DROP both orig and dest items from the control flow stack calls for tightening the definition of CS-PICK to allow copying dest and also orig items. This would eliminate the ambigous condition of CS-PICK operating on orig items.
Proposal
Revise the word CS-PICK in the Tools Extension wordset (TOOLS EXT) so that it can copy both orig and dest items (replace 15.6.2.1015 with the following paragraph):
15.6.2.1015 CS-PICK “c-s-pick” TOOLS EXT
Interpretation: Interpretation semantics for this word are undefined.
Execution:
( C: orig_u|dest_u ... orig_0|dest_0 –– orig_u|dest_u ... orig_0|dest_0 orig_u|dest_u) ( S: u –– )
Remove u. Copy orig_u|dest_u to the top of the control-flow stack. An ambiguous condition
exists if there are less than u+1 items, each of which shall be an orig or dest, on the
control-flow stack before CS-PICK is executed.
If the control-flow stack is implemented using the data stack, u shall be the topmost item
on the data stack.
See: A.15.6.2.1015 CS-PICK.
Add the word CS-DROP to the Tools Extension wordset (TOOLS EXT).
CS-DROP "c-s-drop" TOOLS EXT
Interpretation: Interpretation semantics for this word are undefined.
Execution: ( C: orig|dest -- )
Remove the top item dest from the control-flow stack.
An ambiguous condition exists if the top control-flow stack item
is not a dest, an orig, or if the control-flow stack is empty.
Typical Use
Typical use of CS-DROP would be in defining elaborated control structures.
As an example for the use of CS-DROP we create a simple control structure that allows to branch multiple times to an enclosing BEGIN. A corresponding END drops the BEGIN-generated control-flow dest item:
: END ( C: dest -- ) \ Compilation
( -- ) \ Run-time
CS-DROP ; IMMEDIATE
: ?{ ( C: dest –– dest orig dest) \ Compilation
( f -- ) \ Run-time
POSTPONE IF 1 CS-PICK ; IMMEDIATE
: }* ( C: orig dest -- )
POSTPONE AGAIN POSTPONE THEN ; IMMEDIATE
This can for example be used to define the Collatz function:
: even? ( u -- f ) 1 AND 0= ;
: collatz ( u -- )
BEGIN
DUP .
DUP even? ?{ 2 / }*
DUP 1 <> ?{ 3 * 1+ }*
END
DROP ;
19 collatz ( 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1 ok )
Remarks
Several Forth-94 and Forth-2012 systems already define CS-DROP or the same functionality under a different name. It is already common practice so it is only consequent to standardize its use.
Neither Forth-94 nor Forth-2012 specify the size of orig or dest items. They are even not required to have identical sizes. Different sizes would complicate the implementation of CS-ROLL (and the proposed revised CS-PICK). Most systems therefore implement orig and dest items the same size. The implementation of the proposed revised CS-PICK should thus be straight forward in most systems (and in no case more complicated than CS-ROLL).
Reference implementation
As standard systems are
free to choose an appropriate representation for control-flow dest and orig stack items and also
free to choose the data stack as control-flow stack or a separate stack for this purpose
a standard definition for CS-DROP cannot be provided.
An estimation for dest items only would be the following definitions that however compile code in the dictionary.
: CS-DROP ( C: dest -- ) POSTPONE TRUE POSTPONE UNTIL ;
: CS-DROP ( C: dest -- ) POSTPONE AHEAD 1 CS-ROLL POSTPONE AGAIN POSTPONE THEN ;
CS-DROP can easily implemented in a system specific way if system knowledge about the control-flow stack implementation is available.
As an example SwiftForth uses a single cell on the data stack as control-flow items. A SwiftForth definition for CS-DROP, which also takes compiler security into account would be:
: CS-DROP ( C: orig|dest -- ) DROP -BAL ;
Win32Forth uses two cells on the data stack as control-flow items including one cell for compiler security, so a defintion for CS-DROP in Win32Forth would be:
: CS-DROP ( C: orig|dest -- ) 1 ?PAIRS DROP ;
Testing
The following test assures that CS-DROP actually removes the top most dest item from the control-flow stack:
t{ 99 :NONAME BEGIN [ CS-DROP ] ; DROP -> 99 }t
The following test assures that CS-PICK can copy orig items and CS-DROP can discard them:
t{ 99 :NONAME IF [ 1 CS-PICK CS-DROP ] THEN ; DROP -> 99 }t
Experience
CS-DROP is already available in the following systems:
- gForth version 0.7.9 (not in 0.7.3)
- VFX version 4.7.2
- PFE version 0.33.71
- DXForth version 4.30
Similar functionality with a different name is supported by:
- FLT version 1.3.2 as (delete-cs-item)
CS-DROP is not (yet) supported in:
- SwiftForth version 3.6.3, sample definition given above
- Win32Forth version 6.15.04, sample definition given above
The Gforth 0.7.9 implementation of CS-PICK includes a check that actually only dest items are copied (by means of the check ?NON-ORIG). Its CS-DROP implementation does not check for dest items.
There are numerous discussions on comp.lang.forth (e.g. [3][4]) about control structure implementation using control-flow stack manipulations. Among the non standard system specific words mentioned in this context CS-DROP is widely accepted.
There seems to be a prior similar proposal probably by Guido U. Draheim as the PFE Forth documentation [2] suggests.
Discussion
Fall 2018
In its 2018 (12–14 September) meeting in Edinburgh the Forth standards committee discussed the (2018-08-20 revised) CS-DROP proposal with the following outcome:
CS-DROP
Allowing CS-DROP to drop an orig was discussed, this would require allowing CS-PICK to pick an orig. Referred to author for further consideration.
Fall 2017
The initial version of the proposal was presented at the fall 2017 standards meeting. It proposed CS-DROP to drop both dest and orig items. Concerns were raised that using CS-DROP with control-flow orig items would lead to unresolved branch origins that eventually will result in run time errors when executed. Every orig created (e.g. by IF AHEAD ELSE WHILE) should be resolved exactly once (e.g. by THEN ELSE REPEAT).
The standardized behaviour of CS-PICK (15.6.2.1015) in both Forth-92 and Forth-2012 does not allow to copy control-flow orig stack items but requires a control-flow destination item dest-u to be copied. Although not explicitly stated we assume that copying orig items is an ambigous condition. Thus control-flow orig items cannot be copied within the Forth94 and Forth2012 standard scope, only control-flow dest items can. For this it is reasonable to restrict the to be standardized CS-DROP to also only drop control-flow dest items.
References
[1] http://dxforth.netbay.com.au/cfsext.html
[2] http://forth.sourceforge.net/word/cs-drop/
[3] https://groups.google.com/forum/#!topic/comp.lang.forth/64GKthsYVFs
[4] https://groups.google.com/forum/#!msg/comp.lang.forth/QCrKjzxodj0/RpPpq8Jp0AoJ
Author
Ulrich Hoffmann uho@xlerb.de
Problem
This is an alternative proposal for the same problem as in proposal Case sensitivity
Forth-2012 states:
Programs that use lower case for standard definition names or depend on the case-sensitivity properties of a system have an environmental dependency.
This differs from common practice:
It is common practice for programs to use lower case for standard definition names, and also not uncommon practice to use capitalized (i.e., mixed-case) names for some standard definition names.
It is common practice for systems to support case insensitivity for ASCII characters, either by default (Gforth, iForth, SwiftForth, VFX), or after invoking a special command (SP-Forth).
Solution
Standardize the common practice of systems.
Typical use
Create a 5 cells allot
Remarks
What about non-ASCII characters? They are treated case-sensitively.
The advantages of this approach are: This approach is common practice. The implementation is relatively simple (especially if you consider the complexity of locale-dependent case insensitivity in UTF-8). Forth source files work independent of the encoding and locale, i.e., the system does not need to know the encoding to know whether a word matches a dictionary entry (of course, the application itself may be locale-dependent). The main purpose
The disadvantage of this approach is that users might be confused by the difference in case sensitivity between ASCII and non-ASCII characters. E.g., "WIEN" would match "Wien", but "KÖLN" would not match "Köln".
Comparison with the Case sensitivity proposal
The present proposal covers the practice of using mixed-case names. It makes this part of the standard air-tight rather than being unnecessarily loose: while having special case-sensitivity rules for standard words and other rules for other words has been discussed, the common and simpler practice is to just implement case insensitivity for ASCII characters.
Proposal
In 3.3.1.2, delete
Programs that use lower case for standard definition names or depend on the case-sensitivity properties of a system have an environmental dependency.
In 3.4.2, replace
The case sensitivity (whether or not the upper-case letters match the lower-case letters) is implementation defined. A system may be either case sensitive, treating upper- and lower-case letters as different and not matching, or case insensitive, ignoring differences in case while searching.
The matching of upper- and lower-case letters with alphabetic characters in character set extensions such as accented international characters is implementation defined.
A system shall be capable of finding the definition names defined by this standard when they are spelled with upper-case letters.
with
ASCII characters are matched case-insensitively. All other characters are matched exactly (case sensitively).
Reference implementation
System-dependent
Testing
T{ 1 constant case-insensitive -> }T
T{ 2 Constant Case-INSENSITIVE -> }T
T{ case-insensitive -> 2 }T
Experience
Gforth has implemented this approach since its inception. Several other systems (SwiftForth, VFX, iForth) have also done so for as long as I have used them.
Many published programs use lower-case or mixed-case system words.
Replies
DOES>
and synonym
But, in the general case if not impossible, that would cause severe implementation difficulties.
If >BODY
and IS
are work correctly for synonyms, what is a possible problem with DOES>
?
If you know the BODY address of oldname, isn't it enough to know whatever else and change the behavior?
If a synonym (newname) works correctly when you change oldname via IS
, why cant newname work in the case of DOES>
(i.e. when DOES>
changes behavior of oldname)?
requestClarification - description of "nt" in the standard
Changing the description of "nt" to the following is acceptable.
"A name token is a single-cell value that identifies a named definition."
Does it not follow then that "nt" is not a token that identifies a :NONAME definition?
requestClarification - description of "nt" in the standard
@Ruvim: Yes, "a named definition" is probably a more frequently used meaning than "the name of a Forth definition". I tend to use "word" also for unnamed definitions (e.g., "of :NONAME words" below), and my impression is that others do that, too. In other words, we use "word" as synonym for "definition". Whether we want to follow this usage in the standard is up to discussion. IIRC this difference has not led to any confusion yet.
@KrishnaMyneni: There is no standard way to get an nt of a :NONAME definition, and I don't expect that there will ever be (because there are systems where there is nothing that could serve as nt of a :NONAME word).
If your question is whether the definition forbids that systems provide nts of :NONAME wordsdefinitions, the answer is no. Beyond the requirements of the standard, systems are free to provide any functionality their maintainers find appropriate. So if a system has a non-standard word for getting the nt of a :NONAME worddefinition, it's no skin off the standard's nose.
Please refer to the 2019-08-22 version below.
It's not specified as ambiguous condition, but the effect as far as standard programs are concerned is the same. As far as standard systems are concerned, the difference is that a system needs to document the result of an ambiguous condition (a pretty pointless exercise IMO).
Yes, I think that the file-id returned from SOURCE-ID should not be used for pretty much any file operation. I don't know of anybody who used SOURCE-ID for file operations, but absence of evidence is not evidence of absence.
It's unclear to me what the purpose of providing the file-id through SOURCE-ID was. It looks to me like a case where the committee had a specific implementation (line-at-a-time) in mind, and provided access to more implementation detail than actually needed. But if that was the case, why not also allow REPOSITION-FILE?
In any case, we may want to make the allowance to use other file words on the file-id returned by SOURCE-ID obsolescent.
It tuns out that different systems behave differently for your example: Gforth, SwiftForth, and VFX make BAR immediate (i.e., it's the most recently started definition), while iForth makes FOO immediate (i.e., it's the most recently completed definition).
Does it matter? I do not see a good reason for starting another definition, then making the previous definition immediate. I can see a case for making the most recently started definition immediate in metaprogramming, e.g.,:
: macro: : immediate ;
However, I am not aware that anybody has ever written such code (but as usual, absence of evidence is not evidence of absence).
If you allow CS-PICK and CS-DROP to deal with origs, you have to cover the case where the same orig is not resolved at all, or resolved more than once. Either make it explicitly ambiguous, or define what happens.
I have not yet thought through (Automatic scoping of locals)[http://www.complang.tuwien.ac.at/papers/ertl94l.ps.Z] in the presence of these enhanced CS-PICK and new CS-DROP capabilities.
Current Gforth also tests for ?NON-ORIG in CS-DROP, but that can be changed.
requestClarification - description of "nt" in the standard
In any case, where a system does provide a 'dummy' nt for NONAME: definitions, that nt conveys no more information than that it is a member of the set of NONAME: definitions.
It makes no sense for an anonymous definition to be IMMEDIATE, or to have special compiling semantics, or a data field, or a DOES> action. All these can only be set for named definitions, Therefore, the 'most recent definition' is always the most recent named definition.
requestClarification - description of "nt" in the standard
@AntonErtl, @JennyBrien: I request that you post your replies to c.l.f. also in response to the recent thread, "Clarification on CREATE and colon :". I think your replies help to lessen the confusion about the use of "nt"s in the context of :NONAME definitions, and what can be done with those "nt"s.
@AntonErtl: I agree that the standard does not prohibit creation of "nt"s for :NONAME definitions. However, I am confused about their purpose for such definitions. Is it simply to discover their existence? I think Jenny Brien's use of the term "dummy nt" to refer to them helps indicate their restricted use. In connection with a future proposal to standardize "LATEST" (or another suitable name), there appears to still be confusion on whether the returned nt should be zero in the case of the most recent definition being a :NONAME definition, or whether it should be the (non-zero) nt of the last named definition.
NAME>INTERPRET ( nt -- xt ) and NAME>COMPILE ( nt -- xt1 xt2 ) are fine with a STATE-smart implementation of TO: Xt and xt1 are the same, and represent the STATE-smart definition. Xt2 is the xt of EXECUTE. If the standard allowed a STATE-smart implementation of FILE S", it would be the same.
It is not the proponent's job to decide which wordset a word should be added to, but the committee's. If the committee sees this as an optional feature, it will probably add these words to TOOLS EXT. If it sees this in the long term as a replacement for FIND and SEARCH-WORDLIST, it will probably add FIND-NAME to CORE and FIND-NAME-IN to SEARCH.
SEARCH-WORDLIST problems:
The interface of SEARCH-WORDLIST is inferior to that of FIND when you do not immediately have a DUP IF or somesuch following the SEARCH-WORDLIST, but want to AND the flag with something you have on the stack. With FIND, you could do ROT AND, but with SEARCH-WORDLIST you cannot. And indeed, an stack effect ( c-addr u wid -- c-addr u false | xt false true | xt true true ) would be better.
What you write does not have anything to do with the specification of SEARCH-WORDLIST. SEARCH-WORDLIST does not return an xt representing the compilation semantics; and it does not have the text that FIND has that suggests that it may be STATE-dependent, and may return an xt representing the compilation semantics. As stated in the proposal, in Gforth the result of SEARCH-WORDLIST represents the interpretation semantics of the found word, plus an immediate flag.
The practical usage case for SEARCH-WORDLIST (as specified) that I have found is for a lookup table. Put a number of words in a wordlist, then use search-wordlist to look them up. I have not used IMMEDIATE in that usage, but it could be used to put an additional bit of information on the word that you do not want to store in the data field. So yes, the 1/-1 result could be used for something.
2. Well, it is better to discuss this aspect of Gforth standard compliance in another place.
3. Yes, in practical usage the -1|1 difference is detected and used very rare. By my roughly evaluation, not more than in 1% cases (excluding implementing the standard features).
The original term definition can look like:
name token: A single-cell value that identifies a named word.
For comparison, a similar normative definition:
execution token: A value that identifies the execution semantics of a definition.
Possible variants
name token: A value that identifies the name of a definition.
name token: A value that identifies the name of a word.
Rationale
- It is unclear from the glance why do we need two different identifiers for almost the same things. The proposed term definitions shows that it is the different things.
- The size 1 cell is mentioned in the data type table, no need to mention it in the term definition.
Immediacy notion.
You can implement FIND in a way that produces 1 only for words where the compilation semantics are to perform the execution semantics. In that sense the standard is not inconsistent. It's just that such a FIND is totally useless for the classic user-defined text interpreter, and also, in all existing systems, including systems that implement e.g., TO as STATE-smart word, as well as systems like cmForth, FIND does not behave in that way.
ambiguous condition
Good catch about the "e.g.".
It is the intention of the proposal that FIND returns values that represent the interpretation semantics or the compilation semantics, because that's what is needed for the user-defined text interpreter. Execution semantics (if they exist) are just helper semantics for defining interpretation and compilation semantics through the default mechanism; you can see that by the absence of execution semantics for nearly all words where both interpretation and compilation semantics are specified directly.
Performing the returned xt in the different STATE
Words with undefined interpretation semantics cannot be implemented as STATE-smart words as long as POSTPONE is allowed for them. POSTPONE allows to perform the compilation semantics in interpretation state, and such implementations would not work correctly then.
The values returned in compilation state represent the compilation semantics. I could not follow what you mean in the paragraph about that.
Recognizer
The recognizer stuff is used in the system-defined text interpreter (or a recognizer-aware user-defined text interpreter), not inside FIND. I explored the idea of doing the recognizer inside FIND in a EuroForth 2016 paper, but that's incompatible with some user-defined text interpreters using FIND, and in any case, the current recognizer proposal proposes putting it in the text interpreter.
It's unclear to me what you think is excluded, and why, and what it has to do with recognizers.
Terminology
No particular reason. If the committee prefers, they can replace "represent" with "identify".
POSTPONE
Your suggested implementation of POSTPONE does not work correctly (I will address that issue there). So no, a STATE-smart IF is non-standard. A STATE-smart ACTION-OF is standard, because there is an ambiguous condition on POSTPONE ACTION-OF.
requestClarification - description of "nt" in the standard
@JennyBrien: A system that gives you some way to produce an nt for a :NONAME word can do whatever it likes, so the following is just my preference: It should return an nt for which NAME>INTERPRET produces the xt of the :NONAME word. Gforth behaves that way.
@KrishnaMyneni: There is a desire for a uniform token that represents a worddefinition (probably because in the old times, words were simple enough to be represented with a single token). Various newfangled features such as nameless definitions, dual-semantics words, or synonyms make it necessary to differentiate between xt and nt, but the desire is still there. In the new Gforth header, the idea was to make nt=xt where possible, with reasonable behaviour for words that take xts when you give them an nt that is not an xt, and reasonable behaviour for words that take nts when you give them an xt. Over the last few years we have filled various holes that still used to be there (e.g., if you use a snapshot from 2018, you cannot EXECUTE the nt of a synonym, with a recent snapshot you can). We had no particular application usage in mind, just a well-rounded system. We will see how that translates into real-world benefits.
:noname ; latest
returns 0 in Gforth, LATESTNT returns the nt of the :NONAME definition, and LATESTXT it's xt.
@Ruvim:
You want to move the first sentence of 15.3.1 into a new section "15.2.1 Definition of terms"?
The variants are nonsense. The nt does not identify the name, but the named word (or named definition, if you prefer), which (unlike the name) has an interpretation semantics and a compilation semantics.
nt and xt are different things, even in recent Gforth, which tries to make them as much the same as possible. Maybe we should add a Rationale to 15.3.1 that explains that.
Does it matter? I do not see a good reason for starting another definition, then making the previous definition immediate. I can see a case for making the most recently started >definition immediate in metaprogramming, e.g.,:
: macro: : immediate ;
However, I am not aware that anybody has ever written such code (but as usual, absence of evidence is not evidence of absence).
I used to do that very thing many years ago, on the basis of 'say what you are doing as soon as you can'. It's a nice thing to have, and I'm not sure why iForth doesn't allow it. (a combination of expecting LATEST to be the top of the CURRENT wordlist, and only linking it there with ; ?)