12 The optional Floating-Point word set

12.1 Introduction

12.2.1 Definition of terms

The address of a memory location at which a floating-point number can be accessed.

The address of a memory location at which a 64-bit IEEE double-precision floating-point number can be accessed.

The address of a memory location at which a 32-bit IEEE single-precision floating-point number can be accessed.

IEEE floating-point number:
A single- or double-precision floating-point number as defined in ANSI/IEEE 754-1985.

12.2.2 Notation

12.2.2.2 Stack notation

Floating-point stack notation is:
( F: before -- after )

A unified stack notation is provided for systems with the environmental restriction that the floating-point numbers are kept on the data stack.

12.3.1 Data types

Append table 12.1 to table 3.1.

Table 12.1: Data Types

The set of float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a floating-point number to a float-aligned address shall produce a float-aligned address.

The set of double-float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a 64-bit IEEE double-precision floating-point number to a double-float-aligned address shall produce a double-float-aligned address.

The set of single-float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a 32-bit IEEE single-precision floating-point number to a single-float-aligned address shall produce a single-float-aligned address.

12.3.1.2 Floating-point numbers

The internal representation of a floating-point number, including the format and precision of the significand and the format and range of the exponent, is implementation defined.

Any rounding or truncation of floating-point numbers is implementation defined.

12.3.2 Floating-point operations

"Round to nearest" means round the result of a floating-point operation to the representable value nearest the result. If the two nearest representable values are equally near the result, the one having zero as its least significant bit shall be delivered.

"Round toward negative infinity" means round the result of a floating-point operation to the representable value nearest to and no greater than the result.

"Round toward zero" means round the result of a floating-point operation to the representable value nearest to zero, frequently referred to as "truncation".

12.3.3 Floating-point stack

A last in, first out list that shall be used by all floating-point operators.

The width of the floating-point stack is implementation-defined. The floating-point stack shall be separate from the data and return stacks.

The size of a floating-point stack shall be at least 6 items.

A program that depends on the floating-point stack being larger than six items has an environmental dependency.

12.3.4 Environmental queries

Append table 12.2 to table 3.4.

Table 12.2: Environmental Query Strings
 String Value data type Constant? Meaning `FLOATING-STACK` n yes the maximum depth of the separate floating-point stack. On systems with the environmental restriction of keeping floating-point items on the data stack, n = 0. `MAX-FLOAT` r yes largest usable floating-point number

Since the address returned by a CREATEd word is not necessarily aligned for any particular class of floating-point data, a program shall align the address (to be float aligned, single-float aligned, or double-float aligned) before accessing floating-point data at the address.

12.3.6 Variables

A program may address memory in data space regions made available by FVARIABLE. These regions may be non-contiguous with regions subsequently allocated with , (comma) or ALLOT. See: 3.3.3.3 Variables.

12.3.7 Text interpreter input number conversion

If the Floating-Point word set is present in the dictionary and the current base is DECIMAL, the input number-conversion algorithm shall be extended to recognize floating-point numbers in this form:

 Convertible string := := [][.] := E[] := { + | - } := := * := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

These are examples of valid representations of floating-point numbers in program source:

`1E      1.E      1.E0      +1.23E-1      -1.23E+1`

12.4.1 System documentation

12.4.1.4 Environmental restrictions

• Keeping floating-point numbers on the data stack.

12.4.2 Program documentation

12.4.2.1 Environmental dependencies

• requiring the floating-point stack to be larger than six items (12.3.3 Floating-point stack);
• requiring floating-point numbers to be kept on the data stack, with n cells per floating point number.

12.5 Compliance and labeling

12.5.1 Forth-2012 systems

The phrase "Providing the Floating-Point word set" shall be appended to the label of any Standard System that provides all of the Floating-Point word set.

The phrase "Providing name(s) from the Floating-Point Extensions word set" shall be appended to the label of any Standard System that provides portions of the Floating-Point Extensions word set.

The phrase "Providing the Floating-Point Extensions word set" shall be appended to the label of any Standard System that provides all of the Floating-Point and Floating-Point Extensions word sets.

12.5.2 Forth-2012 programs

The phrase "Requiring the Floating-Point word set" shall be appended to the label of Standard Programs that require the system to provide the Floating-Point word set.

The phrase "Requiring name(s) from the Floating-Point Extensions word set" shall be appended to the label of Standard Programs that require the system to provide portions of the Floating-Point Extensions word set.

The phrase "Requiring the Floating-Point Extensions word set" shall be appended to the label of Standard Programs that require the system to provide all of the Floating-Point and Floating-Point Extensions word sets.

JennyBrienMistake in the specification of significand?Example2016-07-03 16:20:50

Convertible string := <significand><exponent>

<significand> := [<sign>]<digits>[.<digits0>]

<exponent> := E[<sign>]<digits0>

<sign> := { + | - }

<digits> := <digit><digits0>

<digits0> := <digit>*

<digit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

Surely it should be [<sign>]<digit>[.<digits0>] ? Only one digit allowed before the decimal point?

AntonErtl 2016-12-09 11:19:56

We also want to guarantee to convert 123e, 12.3e, 123e3 etc., so the specification is not a mistake. What I wonder about is if we should not also allow ".123e".

zhtoorImplementations requiring BOTH 32 bit single floats and 64 bit double floats.Proposal2016-12-21 14:39:40

Here the problem arises that what kind of floats are referred to by F F/ etc., single (32bit) or double (64bit). so would it not be better to use two naming standards for both the precisions such as G G/ for double operations, C C/ for single complex numbers and Z Z/ for double complex numbers. please advise.

BerndPaysan 2016-12-22 01:09:07

The floats on the floating point stack are all one size, usually double. Only memory operations like `SF@` and `SF!` convert to and from single.

zhtoor 2016-12-22 09:44:54

My question was in the case of a forth implementation which implements BOTH 32 bit single floats AND 64 bit double floats. How do you propose to name their respective words? for example, sf+ for single float+,df+ for double float+ and f+ may point to either by using SYNONYMS etc.

AntonErtl 2016-12-22 18:42:35

Yes, the convention you propose is what comes first to my mind. The only question is how you would name the corresponding @ and ! words.

I would recommend against adding having different FP sizes in a Forth system. It complicates matters and buys little to nothing on current hardware; the main benefit of smaller FP numbers is in memory and memory bandwidth requirements, and we have SF@ SF! for that already.

zhtoor 2016-12-23 00:47:30

you already have df@, df!, sf@, sf!, f@ and f!. no problems there. let f@ f! f+ ... family be the "default" float behavior, sf@ sf! sf+ ... family be the single float behavior and df@ df! df+ ... be the double float behavior. Moreover the sfstack?, dfstack? and fstack? indicate which of these have a separate stack (keeping future 64bit implementations in mind in case doubles could be stacked in the main stack). as far as the rational for having both implementations is analogous to single cell numbers vs. double cell numbers and some practical requirements of implementing on modern cores like ARM cortex A9 etc. with VFPv3 which have both options available in some cases, also some vector / dsp operations are also implemented as single floats (like graphic processors etc.). since i am actually facing this problem in my implementation, i thought the standards community might help. regards and thanks for your responses.

AntonErtl 2016-12-31 11:41:05

We already have SF@ SF! DF@ DF!, and they put standard floats on the FP stack, not single or double floats. So you would need other words to put single and double floats on the FP stack.

Also, we have standardized a separate FP stack, because writing portable programs for a shared FP stack is not practical. For single and double floats it's probably best to use the same FP stack, with one FP stack item per single or double, like FP registers are organized on most architectures; of course, separate stacks for these types are also possible, but would require additional stack pointers and stack manipulation words.

None of these variants looks particularly attractive, which is probably why there is very little, if any, practice in this direction.

AntonErtl 2016-12-31 11:57:29

Reasons for hardware to have 32-bit FP numbers (and smaller):

1) Existing software uses it; and existing programming languages have types for it; I.e., continue a tradition that had technical reasons once upon a time; and 32-bit FP is cheap to implement once you have implemented 64-bit FP. That's no reason to have it in Forth.

2) For vector operations you can deal with more values in the same amount of hardware. I don't see this as a reason to have it in Forth for scalar FP operations, because I don't think we should go for auto-vectorization in Forth. If we add vector operations to Forth, we can think about vectors of 32-bit FP numbers, but that's still no reason to have several scalar on-stack FP types.

gnuarm 2017-04-15 19:03:39

There are a number of smaller CPUs that implement 32 bit IEEE floating point in hardware and not 64 bit. They work exclusively in 32 bit without the extended precision intermediate values used in x86 and similar CPUs. Seems 32 bit floating point is adequately addressed by the present standard to support these smaller devices. Or am I missing something?

AntonErtl 2017-04-16 07:54:38

Sure, if you have a 32-bit-FP-only FPU, you (as systems implementor) will probably go for 32-bit floats. The question at hand was if we should have multiple FP types for FPUs that support 64-bit and 32-bit floats.

gnuarm 2017-04-18 02:35:54

Which hardware systems do you know of that support higher precision floats, but can do the math in 32 bit floats rather than using higher precision for calculations while saving the results in 32 bits. The latter I believe is already supported by the existing Forth standard.

I certainly am not familiar with all systems, but the ones I am familiar with that support higher precision FP calculations will perform those higher precision FP calculations just as fast as 32 bit calculations.

I'm not clear what systems we are trying to not exclude? Graphics chips perhaps?

StephenPelc 2017-04-19 16:26:29

On current hardware FP that has both 64 and 32 bit operations, I can see no reason to implement the FP stack with 32 bit operations. The only area in which the 64/32 bit distinctions become significant is in memory access, for which we already have the SFx and DFx operations. Provision of 32 bit only FP for embedded hardware is likely to be a transient of a few years until silicon improves. We are already seeing this in the transition from Cortex-M4F (universally 32 bit until now) to Cortex-M7 (mostly 64 bit and 32 bit).

I'm tired of writing ARM and Cortex FP packages.

kc5tjaF>R and FR> to support dynamically-scoped floating point variablesProposal2019-03-03 06:20:52

In writing an implementation of map and reduce operations for some floating point vectors, I've had a need to save and restore dynamic variables on the R-stack. Some of these variables are floating point variables.

The lack of a floating-point stack equivalent to R> and >R made this more difficult than it should have been, I think. Therefore, I'd like to propose the following two words for consideration in the FLOATING-EXT wordset:

Word Run-time Semantics
`F>R` Pushes the top of the floating point stack onto the return stack.
`FR>` Pops the return stack, pushing the value removed onto the floating point stack.

On systems where the return stack cell size differs from the floating point stack cell size, multiple cells may need to be pushed onto the R-stack, padding as appropriate. Because of alignment issues, `FR>` and `F>R` are not guaranteed to be fast, as the implementation may have to store the floating point value in smaller parcels (e.g., storing a 64-bit or 80-bit FP value as a series of 16-bit cells on a 16-bit Forth).

Here's my current implementation written in 64-bit GForth 0.7.0 on x86-64 platform:

``````FVARIABLE realvar
: F>R ( r -- ) ( R: -- r )  R> realvar F!  realvar @ >R >R ;
: FR> ( R: r -- ) ( -- r )  R> R> realvar !  realvar F@ >R ;
``````

I'd love to hear your thoughts. Thanks for entertaining my idea.

MarcelHendrix 2019-04-13 07:49:18

Why didn't you use an FLOCAL ?

MarcelHendrix 2019-04-14 06:12:35

Why not use FLOCALs ?

BerndPaysan 2019-04-14 06:33:14

Probably, because there is no FLOCAL in the standard, either.