Digest #111 2020-08-23
Nestable Recognizer Sequences
M. Anton Ertl
There are similarities between a word list, a recognizer, a search order, and a recognizer sequence: All of them take a string as input, and either recognize it, or not. if they recognize it, word lists and the search order produce a name token (or an xt and an immediate flag), while recognizers and recognizer sequences produce some data and a rectype.
The similarity between wordlists and a search order has inspired the idea of nestable search orders: Several wordlists could be combined into a sequence that itself would work like a wordlist in other search orders. However, the search order words had already been standardized, so this idea never made it out of the concept stage.
The similarity between the search order and recognizer sequences has led to the present recognizer proposal containing the words GET-RECOGNIZER and SET-RECOGNIZER, which are mostly modeled on GET-ORDER and SET-ORDER.
As an alternative, this proposal proposes the idea of nestable (but not necessarily changeable) recognizer sequences.
Add the following words:
rec-sequence ( xt1 .. xtn n "name" -- )
Defines a recognizer "name".
"name" execution: ( c-addr u -- ... rectype )
Tries to recognize c-addr u using the recognizers xtn...xt1 (in this order). The first successful recognizer in the sequence returns from "name" with its result. If no recognizer succeeds, return RECTYPE-NULL.
[On the order of xts: This order is modeled on the order in the search
order, but one could use the reverse order without suffering
disadvantages; I am leaving this open to
get-rec-sequence ( xt -- xt1 .. xtn n )
If xt refers to a recognizer sequence, return the contained recognizers. If xt refers to a deferred word, perform DEFER@ followed by GET-REC-SEQUENCE (i.e., GET-REC-SEQUENCE works through deferred words). IF xt refers to neither, return 0.
FORTH-RECOGNIZER now contains the xt of a recognizer or a rec-sequence. RECOGNIZE is unnecessary, because it's functionality is performed by running a rec-sequence. GET-RECOGNIZER, SET-RECOGNIZER, NEW-RECOGNIZER-SEQUENCE are replaced by the words above.
Define a recognizer sequence for the classical text interpreter:
' rec-num ' rec-nt 2 rec-sequence rec-forth-cm ( c-addr u -- ... rectype )
Extend it with FP numbers:
' rec-float ' rec-forth-cm 2 rec-sequence rec-forth ( c-addr u -- ... rectype )
Make this the text interpreter
' rec-forth to forth-recognizer
Have a dot-parser to be searched first:
' rec-forth ' rec-dot 2 rec-sequence rec-.forth ( c-addr u -- ... rectype )
Put a user-defined recognizer REC-USER behind the currently active recognizers, temporarily:
' rec-user forth-recognizer 2 rec-sequence rec-forthuser forth-recognizer ( old ) ' rec-forthuser to forth-recognizer \ some code that uses REC-USER: ... \ now restore the old recognizer sequence ( old ) to forth-recognizer
You can insert a recognizer in the middle of a sequence by picking the existing sequence apart and using it for constructing a new recognizer:
' rec-forth-cm get-rec-sequence swap ' rec-foo rot 1+ rec-sequence rec-FOOrth-cm
This inserts REC-FOO to be searched as second recognizer (after REC-NT). This approach has the disadvantage that you need to know pretty well what the recognizer currently contains (it shares this disadvantage with the GET-RECOGNIZER interface). It also has the disadvantage that you have no easy way to update all the recognizer sequences that contain REC-FORTH-CM. To avoid these disadvantages, you can put deferred words into recognizer sequences from the start:
: rec-nothing ( c-addr u -- rectype-null ) 2drop rectype-null ; defer rec-foo-deferred ' rec-nothing is rec-foo-deferred ' rec-num ' rec-foo-deferred ' rec-nt 3 rec-sequence rec-forth-cm
Then you can plug in REC-FOO:
' rec-foo is rec-foo-deferred
And of course you can deactivate it later. Of course, this approach works only if you have the foresight to insert REC-FOO-DEFERRED from the start, or if you can change the source code of REC-FORTH-CM later.
An alternative would be to be able to change the rec-sequences in words defined with REC-SEQUENCE; for that we would need something like SET-REC-SEQUENCE. It's not clear to me that this is really needed, though.
TBD (if this informal proposal is actually is popular enough to merit further development).
A word REC-SEQUENCE: (but without GET-REC-SEQUENCE) has been in Gforth since 2016. It has not been used; instead, the mainstream GET-RECOGNIZER SET-RECOGNIZER interface was used.
Ruvim has recently suggested something in this vein, rekindling my interest in this kind of interface.
If you don't implement SYNONYM, and don't implement FILE S" (i.e., the interpretation semantics of S") or other dial-semantics words, you can implement xt=nt. For an extensive discussion, read Section 3 of The new Gforth Header.
I don't see any of the references. Is this text incomplete?
@MarcelHendrix, sorry I have formatting problems as well as the lack of references. I will update the proposal to fix these issues.
2020-08-18 v 0.0.1 first draft posted on comp.lang.forth
2020-08-21 v 0.0.2 revised to introduce essential word for defining double-precision IEEE floating point value.
2020-08-22 v 0.0.3 fix code formatting and add references
The IEEE 754-2008 standard for floating point arithmetic  provides numerous advantages for those who write numerical floating point programs , including standardized floating point number formats which have been widely adopted for several decades, as well as a significantly simpler approach to dealing with exceptions in floating point arithmetic. Although significant parts of an optional IEEE floating point word set for Forth have been developed as RfD's since 2009, the proposal(s)  have languished now for 11 years without any progress towards including their features within a standard. This RfD takes the view that the lack of progress is primarily a result of two factors:
the complexity of the problem in specifying even a partially complete solution for support of the features IEEE 754-2008 standard, particularly with the setup and enabling of traps for floating point arithmetic exceptions, and
the relatively low use of floating point arithmetic, and specifically of programs which require more than simple floating point numerical calculations, within the Forth user community.
<p>Even in language standards which have adopted many of the IEEE 754 arithmetic features, support is often incomplete. One such example is the C99 standard, which specifies extensions for features such as setting the rounding modes and masking floating point exceptions, but does not specify a way to enable and disable floating point exception traps.
<p>Several Forth systems  have already extended their floating point capabilities to include IEEE 754 features such as special binary values representing signed infinity (+/-INF) and "not a number" (NAN) values, with possibly different names.
<p>Instead of a more or less comprehensive proposal, specifying words to provide most of the functionality within the IEEE 754 standard, we propose the formal inclusion within the standard of the "optional IEEE 754 binary floating-point word set", initially containing a minimal set of words to allow creating IEEE binary floating point values with bit-level precision. Further functionality provided by the IEEE 754-2008 standard may be added by subsequent proposals.
<p>In this proposal, in addition to the inclusion of the "optional IEEE 754 binary floating-point word set", also adding the word MAKE-IEEE-DFLOAT which permits the creation of any recognized double precision floating point value. It will allow definition of special IEEE 754 floating point values which are returned by default upon certain arithmetic exceptions (+/-INF, NAN), and are useful for detecting an arithmetic exception. The IEEE 754 standard also provides other mechanisms for detecting and dealing with floating point arithmetic exceptions.
<p>Subsequent proposals can incrementally add IEEE 754-2008 or IEEE 754-2019 functionality to the standard optional word set. For example, another proposal can add standardized named constants for special binary values returned upon arithmetic exceptions. Another proposal may formally update the specifications for existing floating point arithmetic words for consistency with the IEEE 754 standard, and yet another proposal may add words for exception detection by providing access to the exception flags of the floating point unit. Such changes may be introduced individually so that the problem of providing consistent floating point arithmetic consistent with the IEEE 754 standard can be tackled in pieces rather than all at once. Given the substantial amount of work already done towards such an optional word set , the problem can be reduced to identifying groups of words which may be added separately to provide enhanced capabilities.
<p>The adoption of an "optional IEEE 754 binary floating-point word set" into the Forth 20xx standard, initially with minimal provisions, will be immediately useful for practitioners of numerical floating-point computation in Forth. The proposed addition to the new word set is
MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )
which will return an IEEE 754 double precision floating point value from the specified bit fields for the sign, binary fraction, and exponent. It will also validate the binary fraction and exponent fields for consistency with the IEEE binary format and return a error value on the data stack, 0 for no error and non-zero values to indicate the type of failure. The least significant bit of the signbit value represents the sign of the floating point value (0 is positive, 1 is negative), the lower 32-bits of each cell value of udfraction are concatenated to provide the binary fraction bits of the mantissa, and uexp provides the binary representation of the exponent.
HEX 0 54442D18 921FB 1 MAKE-IEEE-DFLOAT fconstant pi
Adopt the Optional IEEE 754 binary floating point word set into the Forth 20xx standard.
The new word set will provide the word MAKE-IEEE-DFLOAT with the specifications given above.
The reference implementation is specific to a 32-bit, little-endian Forth system.
HEX \ Make an IEEE 754 double precision floating point value from \ the specified bits for the sign, binary fraction, and exponent. \ Return the fp value and error code with the following meaning: \ 0 no error \ 1 exponent out of range \ 2 fraction out of range fvariable temp : MAKE-IEEE-DFLOAT ( signbit udfraction uexp -- r nerror ) dup 800 u< invert IF 2drop 2drop F=ZERO 1 EXIT THEN 14 lshift 3 pick 1F lshift or >r dup 100000 u< invert IF r> 2drop 2drop F=ZERO 2 EXIT THEN r> or [ temp cell+ ] literal ! temp ! drop temp df@ 0 ;
- IEEE, 754-2008 - IEEE Standard for Floating-Point Arithmetic - Redline, https://ieeexplore.ieee.org/document/5976968 (2008).
- W. Kahan, Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF (1997).
- D. N. Williams, Proposal for an Optional IEEE 754 Binary Floating-Point Word Set, v 0.5.4, http://www.forth200x.org/ieee-fp.txt (2009); Also, see older and more recent versions of this draft proposal and another proposal for supporting IEEE 754 exceptions and exception handling at http://www-personal.umich.edu/~williams/archive/forth/ieeefp-drafts/.
- Based on discussions in comp.lang.forth during August 2020, the following systems appear to be able to output IEEE 754 values for signed INF: iForth gforth, lxf, kForth-32. Only iForth appears to support an intrinsic definition of fp values for +/-INF.
Confirmed: The Section numbers are incorrect (counting up faster than they should). The word numbers are correct (AFAICT), which makes the section number problems more obvious (you get lower-numbered words in higher-numbered sections).
The "RECOGNIZER-SEQUENCE:" proposal is now online: Nestable Recognizer Sequences.
This proposal is intended to modify the Recognizer proposal.
That doesn't look like the preview! This view is not satisfactory. I will crosspost this to comp.lang.forth.