Digest #110 2020-08-22

Contributions

[147] 2020-08-21 18:41:19 GerryJackson wrote:

requestClarification - Annex A section and paragraph numbering has gone wrong

Using Firefox the section & paragraph numbering has gone berserk e.g. After Section A.6 it goes to A.9 then A.11 which, after A.11.6 goes to A.8.6.1.0360 then A.13 A.9.6.1.2275 then A.15 etc(too many errors to list completely).

[148] 2020-08-21 21:03:51 KrishnaMyneni wrote:

proposal - OPTIONAL IEEE 754 BINARY FLOATING-POINT WORD SET

Author:

Krishna Myneni

Change Log:

2020-08-18 v 0.0.1 first draft posted on comp.lang.forth 2020-08-21 v 0.0.2 revised to introduce essential word for defining double-precision IEEE floating point value.

Problem:

The IEEE 754-2008 standard for floating point arithmetic [1] provides numerous advantages for those who write numerical floating point programs, including standardized floating point number formats which have been widely adopted for several decades, as well as a significantly simpler approach to dealing with exceptions in floating point arithmetic. Although significant parts of an optional IEEE floating point word set for Forth have been developed as RfD's since 2009, the proposal(s) [2] have languished now for 11 years without any progress towards including their features within a standard. This RfD takes the view that the lack of progress is primarily a result of two factors:

the complexity of the problem in specifying even a partially complete solution for support of the features IEEE 754-2008 standard, particularly with the setup and enabling of traps for floating point arithmetic exceptions, and
the relatively low use of floating point arithmetic, and specifically of programs which require more than simple floating point numerical calculations, within the Forth user community.

It should be noted that even in language standards which have adopted many of the IEEE 754 arithmetic features, support is often incomplete. One such example is the C99 standard, which specifies extensions for features such as setting the rounding modes and masking floating point exceptions, but does not specify a way to enable and disable floating point exception traps.

Several Forth systems [3] have already extended their floating point capabilities to include IEEE 754 features such as special binary values representing signed infinity (+/-INF) and "not a number" (NAN) values, with possibly different names.

Solution:

Instead of a more or less comprehensive proposal, specifying words to provide most of the functionality within the IEEE 754 standard, we propose the formal inclusion within the standard of the "optional IEEE 754 binary floating-point word set", initially containing a minimal set of words to allow creating IEEE binary floating point values with bit-level precision. Further functionality provided by the IEEE 754-2008 standard may be added by subsequent proposals.

In this proposal, in addition to the inclusion of the "optional IEEE 754 binary floating-point word set", also adding the word MAKE-IEEE-DFLOAT which permits the creation of any recognized double precision floating point value. It will allow definition of special IEEE 754 floating point values which are returned by default upon certain arithmetic exceptions (+/-INF, +/-NAN), and are useful for detecting an arithmetic exception. The IEEE 754 standard also provides other mechanisms for detecting and dealing with floating point arithmetic exceptions.

Subsequent proposals can incrementally add IEEE 754-2008 or IEEE 754-2019 functionality to the standard optional word set. For example, another proposal can add standardized named constants for special binary values returned upon arithmetic exceptions. Another proposal may formally update the specifications for existing floating point arithmetic words for consistency with the IEEE 754 standard, and yet another proposal may add words for exception detection by providing access to the exception flags of the floating point unit. Such changes may be introduced individually so that the problem of providing consistent floating point arithmetic consistent with the IEEE 754 standard can be tackled in pieces rather than all at once. Given the substantial amount of work already done towards such an optional word set [2], the problem can be reduced to identifying groups of words which may be added separately to provide enhanced capabilities.

The adoption of an "optional IEEE 754 binary floating-point word set" into the Forth 20xx standard, initially with minimal provisions, will be immediately useful for practitioners of numerical floating-point computation in Forth. The proposed addition to the new word set is

MAKE-IEEE-DFLOAT ( F: -- r ) ( signbit udfraction uexp -- error )

which will return an IEEE 754 double precision floating point value from the specified bit fields for the sign, binary fraction, and exponent. It will also validate the binary fraction and exponent fields for consistency with the IEEE binary format and return a error value on the data stack, 0 for no error and non-zero values to indicate the type of failure. The least significant bit of the "signbit" value represents the sign of the floating point value (0 is positive, 1 is negative), the lower 32-bits of each cell value of udfraction are concatenated to provide the binary fraction bits of the mantissa, and uexp provides the binary representation of the exponent.

Typical use:

HEX 0 54442D18 921FB 1 MAKE-IEEE-DFLOAT fconstant pi

Proposal:

Adopt the Optional IEEE 754 binary floating point word set into the Forth 20xx standard.
The new word set will provide the word MAKE-IEEE-DFLOAT with the specifications given above.

Reference implementation:

The reference implementation is specific to a 32-bit, little-endian Forth system.

HEX \ Make an IEEE 754 double precision floating point value from \ the specified bits for the sign, binary fraction, and exponent. \ Return the fp value and error code with the following meaning: \ 0 no error \ 1 exponent out of range \ 2 fraction out of range fvariable temp

: MAKE-IEEE-DFLOAT ( signbit udfraction uexp -- r nerror ) dup 800 u< invert IF 2drop 2drop F=ZERO 1 EXIT THEN 14 lshift 3 pick 1F lshift or >r dup 100000 u< invert IF r> 2drop 2drop F=ZERO 2 EXIT THEN r> or [ temp cell+ ] literal ! temp ! drop temp df@ 0 ;

Testing: (Optional)

Replies

[r399] 2020-08-12 08:33:49 AntonErtl replies:

proposal - 2020 Forth Standards meeting agenda

September 1-3 is Tuesday-Thursday; elsewhere it says Monday to Thursday. Please make the dates and the days agree.

[r400] 2020-08-12 09:27:46 AntonErtl replies:

requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?

Yes, SOURCE-ID is optional. Another issue is that REFILL during EVALUATE returns false, while REFILL during INCLUDED returns true and changes the input buffer to the next line if there is one.
Words that parse must not parse beyond the end of line in a standard system (unless they also refill). And REFILL has to work as described in the standard. As for avoiding copying, I think it is possible to read (or mmap) the whole file into a buffer, and then treat each line in that file as input buffer. REFILL then changes the address that SOURCE returns. No copying of buffer contents happening.
Looking at some individual test files, they mention distribution terms at the start of the file (typically public domain). If you want a LICENSE file, maybe you could contribute one.
If no existing programs are broken because your implementations of standard words satisfy the requirements of the standard, then there is no need to change the standard. If no existing programs are broken, because no existing programs exercise the areas where your implementations of the standard words deviate from the requirements, then you could make a proposal for changing the standard, and (to get it accepted) would have to convince people that there is really no program around that exercises these areas.
I meant

Otherwise, the string continues up to and including the last character in the parse area, and the number in >IN is changed to the length of the input buffer, thus emptying the parse area.

Concerning the parse area being a line when including a file, look at INCLUDED.

The sentence about counted strings probably refers to restrictions that systems have that primarily use WORD to parse (in most of the words that parse). PARSE and PARSE-NAME probably should "note otherwise". Of course, it is good practice for systems to use PARSE and PARSE-NAME instead of WORD, but systems that implement, say .", by calling WORD can still be standard.

[r401] 2020-08-12 12:27:26 JamesNorris replies:

requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?

"6) Looking at some individual test files, they mention distribution terms at the start of the file (typically public domain). If you want a LICENSE file, maybe you could contribute one."

This is from GitHub:

"Public repositories on GitHub are often used to share open source software. For your repository to truly be open source, you'll need to license it so that others are free to use, change, and distribute the software."

In other words, it is illegal to copy and use software unless the author specifically gives permission. The standard way to do that is with a license file, and it has to come from the author. If I wrote one and posted it, that would be identity theft and forgery, which is a felony. I know some people don't care about this, but some day down the line it could become a problem for you if you don't take the time to make sure you really have the author's permission to copy and use their work.

"4) Words that parse must not parse beyond the end of line in a standard system (unless they also refill). And REFILL has to work as described in the standard. As for avoiding copying, I think it is possible to read (or mmap) the whole file into a buffer, and then treat each line in that file as input buffer. REFILL then changes the address that SOURCE returns. No copying of buffer contents happening."

I'm still not understanding what difference it makes if you pass the entire file to PARSE and treat line delimiters as white space or if you pass lines and do REFILLS when the strings returned are exactly the same either way. I honestly can't think of a test case that would be able to test the difference that matters in practical use. Again, the standard does not say you have to do it this way, and why are you dictating implementation instead of end behavior? The stuff in section 3.4.1 is a limitation on what parsing returns, not on how much stuff you pass to PARSE.

[r402] 2020-08-12 12:30:50 JamesNorris replies:

requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?

I mean treat line terminators as additional end delimiters, not white space. (I make mistakes :-)

[r403] 2020-08-13 13:22:05 AntonErtl replies:

requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?

SOURCE will produce different results. >IN @ and >IN ! will produce different results. REFILL will produce different results.

Of course, if your implementation makes sure that they do not (maybe I misunderstood your description of it), it may be standard-compliant.

Concerning "prescribing the implementation", that's not what is happening: An implementation that reads individual lines into memory on REFILL can be standard; an implementation that mmaps everything and then copies each line can be standard; an implementation that reads everything into a buffer and then lets SOURCE point into that buffer, in a different place after every REFILL can be standard. The implementation is not prescribed.

I think you are mistaken that you would perform "identity theft and forgery" by adding a LICENSE file, even if it was not an accurate summary of the licenses of the individual source files.

But in any case, why demand of Gerry Jackson what you think you are not allowed to do yourself? He is not the author of all files in his collection, so by your reasoning he is not allowed to write a LICENSE file, either.

[r404] 2020-08-13 18:38:31 JamesNorris replies:

requestClarification - What happens when parse reaches the end of the parse area and the parse delimiter was not found?

"4) SOURCE will produce different results. >IN @ and >IN ! will produce different results. REFILL will produce different results."

My source does not support REFILL since it is based on the Forth 94 draft standard and there it's an optional word. If I did add REFILL it would always return false since someone is only supposed to call REFILL when the >IN value is equal to the length returned from SOURCE and in my Forth that means >IN is at the end of the file.

When loading from a file, my Forth's SOURCE returns the start address of the buffer the file was loaded into and the length of the entire file. If you use PARSE or PARSE-NAME in my FORTH using the values returned from SOURCE you will get an address pointing to the start of the named string and same length as any FORTH following the standard. >IN will have the offset of the correct next character following any PARSE or PARSE-NAMES call (once I upload the next version with the fixes to make PARSE single line only that is). The only difference is my Forth handles the case where there are line terminators in the strings passed to EVALUATE, PARSE, and PARSE-NAME. In the standard I guessing this is an ambiguous condition?

Also, in my Forth, if PARSE ends on a line terminator, >IN will have the offset of the line terminator. In other standard Forths, that would technically be the length of the current input buffer and not be pointing to a valid character.

One of the ways above mentioned loading the entire file into a buffer, like mine, and having SOURCE return the starting addresses of each line and >IN being the offset in the current line. REFILL in this situation would then technically consume the line terminator? That's looking at the buffer twice, mine only looks once but looks harder.

Hmm yes technically that would be a difference, >IN is the offset in the current line in the standard, and SOURCE the start address of each line. Is there anything in the standard that says this? And is there anything that depends on this behavior? PARSE, PARSE-NAME, and EVALUATE do not depend on this behavior in my Forth. Line comment in my forth also works correctly. If someone were to write their own line comment and wanted to compare the >IN offset with the length returned from SOURCE then yes it would be a problem. If they did 0 PARSE to skip to the end of the line, it would work fine.

My suggestion for the standard is that it not specify that SOURCE be the start of each line and that >IN be the offset in the line. The only change I'm suggesting is that the definition of PARSE above say it goes to the end of the line if a delimiter is not found before then. That wasn't clear to me when I read the standard. If someone has a reason for why having SOURCE and >IN work this way is important then I'll probably change my mind, but really, I kind of like how my implementation only needs to look at stuff once, and how REFILL is not necessary.

On the copyright issue, I did not write the files. I can't tell the public that the author has given permission to copy and use the files without actually asking the author. Legally I would need something from each author of each file in writing to do something like that (an email from the author would word too). That's how US copyright law works. I know most people don't care about this and I've had problems on some of my jobs because they wanted me to copy or use copyrighted stuff and I said I'd need to contact the authors, or said no when it was a commercial application they wanted me to pirate. I've taken the time to contact authors in some of these situations and they usually say yes and are happy you want to use their stuff, especially if it's for a non profit use. I'm not interested in going through all that to use these test suites though.

[r407] 2020-08-15 09:54:18 GeraldWodni replies:

proposal - find-name

Accepted 9/0/1 (Yes/No/Abstain) in 2018

[r408] 2020-08-15 17:10:48 GeraldWodni replies:

example - An example

Thank you for your example, it is deemed valid and therefor closed.

[r409] 2020-08-16 16:20:59 JamesNorris replies:

proposal - find-name

Diaperglu has words similar to FIND-NAME and FIND-NAME-IN except they return an execution token. This execution token is the index of a name value pair, where the name is the word's name, and the value is the header containing the PFA and CFA for the word. I tried to digest everything that was said above, my question is, would an implementation that treated name tokens and execution tokens as being the same thing cause any problems? Basically NAME>INTERPRET and NAME>COMPILE would still be in the wordlist but do nothing to the token.

Also I have to agree with the comment that NAME>INTERPRET and NAME>COMPILE are misnamed if NAME is going to be shorthand for a name token for these definitions, and mean something else in others. If the term NAME is going to be shorthand for name token across the board then I think it's a good idea. Up to now I was thinking the term 'name' was the replacement for 'word' which meant a series of non white-space delimiter characters. If name is going to mean name token, what is the replacement term for a series of non white-space delimiter characters? (Please do not pick NW-SDC :-)

[r410] 2020-08-19 07:10:46 PeterKnaggs replies:

referenceImplementation - ASCII version of BL

BL has been in Forth since 1976, the rational in the '79 document is:

Leave the ASCII character value for blank (decimal 32).

It was originally included as you can not use CHAR or [CHAR} to obtain the value. We have kept it in the standard for backward compatibility.

If you must have a reference implementation, I would use:

DECIMAL 32 CONSTANT BL

Particularly given the prefix notations are optional.

[r411] 2020-08-21 17:08:53 AntonErtl replies:

referenceImplementation - ASCII version of BL

AFAICS, number prefixes are obligatory in Forth-2012, so either reference implementation is ok.