Digest #112 2020-08-29
Contributions
How can we conclude from nt, do identifiable execution semantics are defined for the corresponding definition? And how to get the corresponding xt, if any?
By "identifiable" I mean that these semantics can be identified by an execution token xt — to exclude the cases of the words like EXIT
, >R
, etc, that have only nominal execution semantics (see also news:r2u5p3$n4p$1@dont-email.me).
Replies
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
Author:
Krishna Myneni
Change Log:
2020-08-18 v 0.0.1 first draft posted on comp.lang.forth
2020-08-21 v 0.0.2 revised to introduce essential word for defining double-precision IEEE floating point value.
2020-08-22 v 0.0.3 fix code formatting and add references
2020-08-23 v 0.0.4 fix reference
Problem:
The IEEE 754-2008 standard for floating point arithmetic [1] provides numerous advantages for those who write numerical floating point programs [2], including standardized floating point number formats which have been widely adopted for several decades, as well as a significantly simpler approach to dealing with exceptions in floating point arithmetic. Although significant parts of an optional IEEE floating point word set for Forth have been developed as RfD's since 2009, the proposal(s) [3] have languished now for 11 years without any progress towards including their features within a standard. This RfD takes the view that the lack of progress is primarily a result of two factors:
the complexity of the problem in specifying even a partially complete solution for support of the features IEEE 754-2008 standard, particularly with the setup and enabling of traps for floating point arithmetic exceptions, and
the relatively low use of floating point arithmetic, and specifically of programs which require more than simple floating point numerical calculations, within the Forth user community.
<p>Even in language standards which have adopted many of the IEEE 754 arithmetic features, support is often incomplete. One such example is the C99 standard, which specifies extensions for features such as setting the rounding modes and masking floating point exceptions, but does not specify a way to enable and disable floating point exception traps.
<p>Several Forth systems [4] have already extended their floating point capabilities to include IEEE 754 features such as special binary values representing signed infinity (+/-INF) and "not a number" (NAN) values, with possibly different names.
Solution:
<p>Instead of a more or less comprehensive proposal, specifying words to provide most of the functionality within the IEEE 754 standard, we propose the formal inclusion within the standard of the "optional IEEE 754 binary floating-point word set", initially containing a minimal set of words to allow creating IEEE binary floating point values with bit-level precision. Further functionality provided by the IEEE 754-2008 standard may be added by subsequent proposals.
<p>In this proposal, in addition to the inclusion of the "optional IEEE 754 binary floating-point word set", also adding the word MAKE-IEEE-DFLOAT which permits the creation of any recognized double precision floating point value. It will allow definition of special IEEE 754 floating point values which are returned by default upon certain arithmetic exceptions (+/-INF, NAN), and are useful for detecting an arithmetic exception. The IEEE 754 standard also provides other mechanisms for detecting and dealing with floating point arithmetic exceptions.
<p>Subsequent proposals can incrementally add IEEE 754-2008 or IEEE 754-2019 functionality to the standard optional word set. For example, another proposal can add standardized named constants for special binary values returned upon arithmetic exceptions. Another proposal may formally update the specifications for existing floating point arithmetic words for consistency with the IEEE 754 standard, and yet another proposal may add words for exception detection by providing access to the exception flags of the floating point unit. Such changes may be introduced individually so that the problem of providing consistent floating point arithmetic consistent with the IEEE 754 standard can be tackled in pieces rather than all at once. Given the substantial amount of work already done towards such an optional word set [3], the problem can be reduced to identifying groups of words which may be added separately to provide enhanced capabilities.
<p>The adoption of an "optional IEEE 754 binary floating-point word set" into the Forth 20xx standard, initially with minimal provisions, will be immediately useful for practitioners of numerical floating-point computation in Forth. The proposed addition to the new word set is
MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )
which will return an IEEE 754 double precision floating point value from the specified bit fields for the sign, binary fraction, and exponent. It will also validate the binary fraction and exponent fields for consistency with the IEEE binary format and return a error value on the data stack, 0 for no error and non-zero values to indicate the type of failure. The least significant bit of the signbit value represents the sign of the floating point value (0 is positive, 1 is negative), the lower 32-bits of each cell value of udfraction are concatenated to provide the binary fraction bits of the mantissa, and uexp provides the binary representation of the exponent.
Typical use:
HEX
0 54442D18 921FB 1 MAKE-IEEE-DFLOAT fconstant pi
Proposal:
<p>
Adopt the Optional IEEE 754 binary floating point word set into the Forth 20xx standard.
The new word set will provide the word MAKE-IEEE-DFLOAT with the specifications given above.
Reference implementation:
The reference implementation is specific to a 32-bit, little-endian Forth system.
HEX
\ Make an IEEE 754 double precision floating point value from
\ the specified bits for the sign, binary fraction, and exponent.
\ Return the fp value and error code with the following meaning:
\ 0 no error
\ 1 exponent out of range
\ 2 fraction out of range
fvariable temp
: MAKE-IEEE-DFLOAT ( signbit udfraction uexp -- r nerror )
dup 800 u< invert IF 2drop 2drop F=ZERO 1 EXIT THEN
14 lshift 3 pick 1F lshift or >r
dup 100000 u< invert IF
r> 2drop 2drop F=ZERO 2 EXIT
THEN
r> or [ temp cell+ ] literal ! temp !
drop temp df@ 0 ;
Testing: (Optional)
References
- IEEE, 754-2008 - IEEE Standard for Floating-Point Arithmetic - Redline, https://ieeexplore.ieee.org/document/5976968 (2008).
- W. Kahan, Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF (1997).
- D. N. Williams, Proposal for an Optional IEEE 754 Binary Floating-Point Word Set, v 0.5.4, http://www.forth200x.org/ieee-fp.txt (2009); Also, see older and more recent versions of this draft proposal and another proposal for supporting IEEE 754 exceptions and exception handling at http://www-personal.umich.edu/~williams/archive/forth/ieeefp-drafts/.
- Based on discussions in comp.lang.forth during August 2020, the following systems appear to be able to output IEEE 754 values for signed INF: iForth gforth, lxf, kForth-32. Only iForth appears to support an intrinsic definition of fp values for +/-INF.
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
Re: MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )
I fully support the need for such a word (and its reciprocal) but having covered several IEEE handling CPUs, the natural way to look at an IEEE value is
sign exp fraction
and this also applies for non-IEEE values as used for some embedded systems without FPUs. I also worry about the error return. What do Nan and Inf result in. If there is no error, then +/-Nan is surely the right thing to return, and the error code can disappear, leaving no problems with ambiguous conditions or error codes. The TC voted some years back that error codes should be unique so that they can be used as THROW codes.
The reciprocal operation is also useful:
<code> FLOAT>PARTS ( F: -- ; -- sign exp fraction )
2020 Forth Standards Meeting
1-3 Sept 2020, Online
We expect to be using BigBlueButton or Zoom or Webex or some such. If you want to rant or rave about online meeting tools, it's outside the scope of this document. For reasonable discussion, contact Stephen Pelc, stephen@mpeforth.com
Schedule
The standards meeting will be Tuesday-Thursday from 2:30 pm to 6:30 pm CEST with a short bio-break at 4:30. This solution fits with at least one committee member who doesn't do Mondays!
Participants
Review of Procedures
- Covid consequences
- Brexit consequences
- Payment for services/licences
Reports
- Chair
- Editor
- Technical
- Treasurer
Review of Proposals and Activities
- Recognisers
Stay as experimental proposal?
Separate POSTPONE action?
Impact of dot parser on POSTPONE?
Multi-tasking from APH
Ambiguous condition and IMMEDIATE
TC answer by Bernd, 2019-09-12 15:19:24
Move from RUV, any further action?
- CS-DROP from UH
say orig and dest must be same size
Go to vote?
- Case insensitivity
ASCII case insensitivity only.
Go to vote?
- Remove the “rules of FIND” (BP)
Locals word set?
Go to vote?
- Reference implementation of SYNONYM (AE, RUV)
Broken reference implementation.
New reference implementation.
VOCABULARY (UH)
Unfindable definitions (RUV)
Case sensitivity in [IF] and friends.
FIND
FIND-NAME
License (JK, RUV)
String, EPLACES (RUV)
Error if macro does not exist during compilation?
Why RECURSE is needed (BI) Pick a TC answer.
Input values other than true and false [IF]
Pick flag as z/nz, vote, TC response
- sample implementation that can also be interpreted (MAX)
Adopt RUV's response as TC answer.
Better wording for Colon (RUV)
NAME>INTERPRET wording (RUV)
The parts of execution semantics and the calling definition (RUV)
Recognizer RfD rephrase 2020 (UH)
Move to recogniser workshop
- "(" typo in a testcase (RUV)
Assign to editor
- Wording: declare undefined interpretation semantics for locals (RUV)
Remove ambiguous conditions
Word set of S>D word (RUV) Leave as is?
Same name token for different words (RUV)
Recognizer for locals (RUV)
There is error in testing SM/REM (MB)
Pass to editor
Defer Implementation (Tolich)
Recogniser (BP)
Move to recogniser workshop.
Does wording imply that if you SYNONYM a word with the same name (JN)
What happens when parse reaches the end of the parse area? (JN)
TEST instead of TEAT in F.1 para 2 (JN)
Pass to editor
Workshop Topics
Workshops are topics for discussion outside the formal meeting.
Future Document Format
Stack comments
stack comments should be parseable
Stack naming S: D: F: N: R:
stack effect notation
stack effect conventions
- Test suites
Philosophy
J Hayes sequencing
G Jackson suite
- Workshop reports
Consideration of proposals + CfV votes
Matters arising
Any other business
Date of next meeting
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
Changing the order of the inputs on the data stack for MAKE-IEEE-DFLOAT to sign uexp udfraction is fine if it proves to be more convenient. I have no objection to such a change.
<p>With respect to FLOAT>PARTS , I anticipate the following words will be needed to fetch the individual binary fields of a floating point value:
FSIGNBIT (contained in DNW's IEEEE 754 proposal v0.5.5, section 8.7) FEXPONENT FFRACTION
While MAKE-IEEE-DFLOAT assembles a specifc type of IEEE binary float format (double precision), a generic word such as FLOAT>PARTS should work with the default binary format used by the system, i.e. the same format of the values stored on the floating point stack. The above words, FSIGNBIT FEXPONENT FFRACTION should also apply to the default floating point format. I have not been included in this proposal, but they probably should be included.
<p>The considerations above raises the question of whether or not MAKE-IEEE-SFLOAT should also be included. Are there Forth systems with a floating point stack for single precision floats?
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
Sorry, I accidentally changed the status to Retired. This proposal is not retired.
proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET
To answer the question of whether NaNs should return an error code from MAKE-IEEE-DFLOAT, my thinking is that all IEEE 754 special values should be considered valid. This allows the use of MAKE-IEEE-DFLOAT for defining named constants for the special values. If the TC wants the error codes to be reserved throw codes, that's fine.
The existence of the sentence "The system need not maintain ..." only makes sense if "they" refers to the locals. The other dictionary entries can be found with the normal dictionary searching processes anyway, so the sentence would be redundant if "they" referred to the other dictionary entries.
Two arguments for requiring that FIND-NAME/FIND find locals:
FIND-NAME/FIND are "normal dictionary searching processes", so these words should find the locals according to that sentence.
The intent of the Forth-94 committee (and also my intent when proposing FIND-NAME) was that it is possible to write a user-defined text interpreter, as outlined in the rationale for COMPILE,. There is no other standard way to find locals, so FIND and FIND-NAME should include this functionality.
An argument against:
- We did not mention locals in the specifications of FIND and FIND-NAME. But given the intent, that was an oversight, certainly on my part.
Looking at what existing systems do, I wrote the small test
: x c" fluffystunk" ; : foo locals| fluffystunk | [ x find cr .s 2drop ] ;
In VFX and Gforth, FIND finds fluffystunk; in SwiftForth, iForth, and lxf, FIND does not.
Apparently this has not been an issue for 26 years, so apparently nobody has used a user-defined text interpreter to process a program using locals on SwiftForth, iForth, or lxf; nor has anybody written a program that relies on FIND not finding locals, and let it run on VFX or Gforth. Still, I think that the standard should nail this issue down.
would an implementation that treated name tokens and execution tokens as being the same thing cause any problems?
A Forth system can be implemented in such a way that nt and xt are the same thing without any problems with compliance to the standard. Some operation just would be a bit slower.
NAME>INTERPRET
andNAME>COMPILE
would still be in the wordlist but do nothing to the token.
In such case, NAME>COMPILE
for the FILE S"
word cannot return xt of EXECUTE
at the top. It should return xt of some special EXECUTE-COMPILING
word.
: token-compile-for-s" ( -- x xt ) nt-of-s" ['] execute-compiling ;
Another variant is that NAME>COMPILE
for S"
returns another nt and xt of EXECUTE
.
: token-compile-for-s" ( -- x xt ) nt-of-s"-compiling ['] execute ;
If you don't implement SYNONYM
Also, I don't see any problem to implement SYNONYM
.
requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?
When loading from a file, my Forth's SOURCE returns the start address of the buffer the file was loaded into and the length of the entire file.
If you want to be standard compliant, just use another name for your flavor, and provide SOURCE
with standard behavior. Ditto for other standard words. The Forth text interpreter is not obligated to use all these standard words, they can be provided just for programs.
You can even implement SOURCE
in lazy evaluated manner (with memorization): that is, it calculates the length on the first call after each pass of the line terminator.
Your REFILL
can just adjust your pointer in the entire file.
>IN
is harder for virtualization (it would be better to have a setter and getter). When >IN
was used, PARSE
, PARSE-NAME
and WORD
should check changes of >IN
value.
OTOH, the corresponding overhead will take place in some programs only, and not in the system itself.
Also, in my Forth, if PARSE ends on a line terminator, >IN will have the offset of the line terminator. In other standard Forths, that would technically be the length of the current input buffer and not be pointing to a valid character.
A standard program cannot read a character beyond the input buffer. So it doesn't matter is it a valid character or not.
requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?
I did some more reading and found that the interpreter section refers you to QUIT, and if you look up all the terms, a line is defined as a sequence of characters followed by a line terminator or implied line terminator. My forth doesn't implement the file extension word set so I didn't read that until today. I do something different.
I don't see why this is important to do... or if I do the work arounds above, why they are needed. The only thing I can think of is older Forths use preallocated memory regions to hold input. My Forth has dynamic memory allocation so I am not restricted by that. When I did my Forth I was thinking someone someday may be working on a terminal device and have something more complicated where they can put line terminators and other characters into the input stream. If you are using an operating system function to do ACCEPT like I am, I thought it would be a good idea to not assume that operating system would always follow the rules. I suppose you could pre-parse the 'line' returned from the operating system as if it were a file to pull out each line, but why?
Why is it so important that SOURCE refer to a line and >IN be the offset from the beginning of a line?
Who needs this behavior? Isn't enough that SOURCE and >IN refer to the current parsing position?
In reality this is a restriction that line terminators can't be in the input buffer.
In any case, I think don't my Forth will be following this part of the standard. It's just kind of a bummer. I thought I did a good job of reading the standard and following everything, but it turns out I missed something.
Two arguments for requiring that FIND-NAME/FIND find locals
As I can see, we have some arguments not for requiring, but for allowing.
I agree that "they" refers to identifiers of the locals. But it seems to me, "as long as they can be found" means that the corresponding statement is only applicable when locals "can be found by normal dictionary searching processes". It does not require that they should be found by "normal dictionary searching processes". OTOH, what is "normal dictionary searching processes"? And do we have non normal dictionary searching processes? This wording too fuzzy to produce strong arguments.
Actually, there is no any standard way to find locals, despite the intention. So, I suggest to introduce such a way via recognizer.
Concerning FIND. I think, when the Search-Order word set is provided, we can allow FIND-NAME to find locals only via the search order (i.e., when they are implemented as some special word list that is appended to the top of search order). Otherwise, FIND-NAME cannot be implemented via TRAVARSE-WORDLIST. Also, some existent implementation become nonstandard.
So, we should allow, but should not require this approach.
I think, we should provide a recognizer for locals, that will works in any case.
Correction: Otherwise (i.e., if FIND-NAME
is required to search locals and allowed to do it beyond the search order), the function of FIND-NAME
cannot be implemented via GET-ORDER
and TRAVARSE-WORDLIST
.
Effect on Performance
I have measured this in <2002Nov22.175007@a0.complang.tuwien.ac.at>:
Gforth stores words in the original case, and uses a case-insensitive compare. I did some timings in Gforth on an Athlon:
searching for "execute" in a (case-sensitive) wordlist that contains only "execute": 2009 cycles
searching for "execute" in a (case-insensitive) wordlist that contains only "execute": 2042 cycles
searching for "execute" in forth-wordlist (case-insensitive): 2117 cycles.
It is an alternative proposal to the one from Anton.
Problem
The existing specification of FIND is unclear how the returned xt is connected with interpretation and compilation semantics for the corresponding word.
In some popular Forth systems n=1 does not mean that the word is immediate.
Solution
Use the new wording in the specification for FIND.
Keep the original immediacy notion, but use another (more loose) wording for meaning of n in compilation state. The new wording allows to implement the words with undefined execution semantics as "dual-xt" words, and still allows (as it was before) to implement them as immediate STATE-dependent words. Also it allows to have the special definitions to compile the words with undefined interpretation semantics and defined execution semantics (like EXIT
), and return proper values for them from FIND
.
Some differences to the Anton's proposal
More accurate wordings that are closer to the language of standard.
Use "default interpretation semantics" criteria instead of referring to POSTPONE (item 3 in my comment).
Allow to implement words without interpretation semantics (e.g., IF) as immediate STATE-dependent words (as it was before).
Do specify what n means in all possible cases (news:qnko0l$jk2$1@dont-email.me).
Don't change 4.1.2. since FIND cannot and doesn't return xt for a definition with not default interpretation semantics. The new specification guarantees that a user-defined text interpreter can interpret any word that is found by FIND. Also, 4.1.2 should be updated independently by itself.
Nowadays many Forth systems don't use FIND by themselves but provide it for the old-fashion programs only. There is no much sense to restrict the implementation options of the modern Forth systems for the sake of the outdated approach. I think the modern Forth systems will tend to use Recognizer/Resolver approach for the special syntaxes and special words.
Proposal
Replace the text in the specification of FIND with the following.
FIND
( c-addr -- c-addr 0 | xt n )
Find the definition name whose name matches the counted string at c-addr. If the definition is not found, return c-addr and zero. Otherwise the definition is found, return xt and n.
If name has default interpretation semantics, xt is the execution token for name, and n is 1 if name is immediate word, -1 otherwise. The returned values are the same regardless whether the definition is found in interpretation state or in compilation state.
If name has other than default interpretation semantics, xt is the execution token for an unspecified implementation-dependent definition, and n is 1 or -1, and the following conditions are met.
When the definition is found in compilation state: if n is 1, performing xt in compilation state performs the compilation semantics for name, otherwise n is -1, and appending the execution semantics identified by xt to the current definition performs the compilation semantics for name.
When the definition is found in interpretation state: if n is 1, xt and n are the same when the definition is found in compilation state, otherwise n is -1, and both xt and n may be different when the definition is found in compilation state; performing xt in interpretation state performs the interpretation semantics for name.
The definition may be not found in interpretation state but found in compilation state, and vise versa. Also a definition may be not found at all.
An ambiguous condition exists if xt is performed in the conditions that are not met the conditions specified above.
"Performing xt" means performing the execution semantics identified by the execution token xt.
A definition has default interpretation semantics if and only if the "Interpretation:" section is absent in the corresponding glossary entry, and the "Execution:" section is present (see also 3.4.3.1). Default interpretation semantics for a definition is to perform its executin semantics in interpretation state (see also 3.4.3.2).
If interpretation semantics are undefined for a definition, a Forth system is allowed to provide implementation-defined interpretation semantics for this definition (see A.3.4.3.2). In such case, when the definition is found in interpretation state, performing the returned xt in interpretation state performs the implementation-defined interpretation semantics for name.
A program is allowed to apply FIND
to any string. A definition may be not found by FIND
even if a Forth system provides interpretation or compilation semantics for the corresponding name (for example, in the case of locals).
There are a number of discussions going on elsewhere, and I want to make sure the discussion is going to be here.
There are too many questions to discuss all of them on a single page.
I opened several issues on the subject of review API v4. Please check.
Some notable of them:
- side effects are not acceptable (Issue #7)
- arguments against VALUE in API (Issue #5)
- accessors are not needed (Issue #6)
- new arguments concerning "unrecognized vs zero" (Issue #4)
- choosing better names (Issue #3)
I suggest the following road map:
- Terms definitions and data types
- A minimal essential part
- Choose among the different approaches (e.g., I suggest to discard both postponing and reproducing actions in the public API).
- Choose solutions for some problems
- Choose names
I strongly support this approach: to switch the recognizer that the Forth text interpreter uses, we pass the xt of another recognizer to the system.
But, as I said before, it's better to have the separate getter and setter instead of the single value that is changed via TO
.
GET-REC-SEQUENCE
and Co. can comprise a totally separate proposal. And it is worth to extract them into a separate proposal to make the basic proposal less in size and number of conflicts.
Two definitions can have the different execution semantics when the interpretation semantics for them are the same and the compilation semantics are the same [1].
So, we should mention the same execution semantics too:
The different words may have the same name token if their names are identical, the actual interpretation semantics for them are equivalent, the actual compilation semantics for them are equivalent, and if the execution semantics defined for one of them, the execution semantics for another one are equivalent to the first one.
Or, in a shorter variant:
The different words may have the same name token if their are synonyms having identical names.
The idea is that if two words have the same name (what about case-sensitivity?), and they behavior indistinguishable, they may have the same nt.
OTOH, I don't insist that we should namely allow the same nt for the different words. But I think, we should either explicitly allow, or explicitly prohibit the same nt.
[1] Example:
: foo 0= ;
: bar ['] 0= state @ if compile, else execute then ; immediate
\ testcase that shows the different execution semantics
: [e] execute ; immediate
0 ' foo ] [e] [ . \ prints -1
0 ' bar ] [e] [ . \ prints 0 (or even throws an exception)
I remember that by Anton's view (in the last year at least), the interpretation semantics for them are different, and the compilation semantics for them are different too. But it seems, this view even isn't supported by the authors of the original reference implementation for SYNONYM.