12 The optional Floating-Point word set

12.1 Introduction

12.2 Additional terms and notation

12.2.1 Definition of terms

float-aligned address:
The address of a memory location at which a floating-point number can be accessed.

double-float-aligned address:
The address of a memory location at which a 64-bit IEEE double-precision floating-point number can be accessed.

single-float-aligned address:
The address of a memory location at which a 32-bit IEEE single-precision floating-point number can be accessed.

IEEE floating-point number:
A single- or double-precision floating-point number as defined in ANSI/IEEE 754-1985.

12.2.2 Notation

12.2.2.2 Stack notation

Floating-point stack notation is:
( F: before -- after )

A unified stack notation is provided for systems with the environmental restriction that the floating-point numbers are kept on the data stack.

12.3 Additional usage requirements

12.3.1 Data types

Append table 12.1 to table 3.1.

Table 12.1: Data Types

Symbol Data type Size on stack

df-addr double-float-aligned address 1 cell
f-addr float-aligned address 1 cell
r floating-point number implementation-defined
sf-addr single-float-aligned address 1 cell

12.3.1.1 Addresses

The set of float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a floating-point number to a float-aligned address shall produce a float-aligned address.

The set of double-float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a 64-bit IEEE double-precision floating-point number to a double-float-aligned address shall produce a double-float-aligned address.

The set of single-float-aligned addresses is an implementation-defined subset of the set of aligned addresses. Adding the size of a 32-bit IEEE single-precision floating-point number to a single-float-aligned address shall produce a single-float-aligned address.

12.3.1.2 Floating-point numbers

The internal representation of a floating-point number, including the format and precision of the significand and the format and range of the exponent, is implementation defined.

Any rounding or truncation of floating-point numbers is implementation defined.

12.3.2 Floating-point operations

"Round to nearest" means round the result of a floating-point operation to the representable value nearest the result. If the two nearest representable values are equally near the result, the one having zero as its least significant bit shall be delivered.

"Round toward negative infinity" means round the result of a floating-point operation to the representable value nearest to and no greater than the result.

"Round toward zero" means round the result of a floating-point operation to the representable value nearest to zero, frequently referred to as "truncation".

12.3.3 Floating-point stack

A last in, first out list that shall be used by all floating-point operators.

The width of the floating-point stack is implementation-defined. The floating-point stack shall be separate from the data and return stacks.

The size of a floating-point stack shall be at least 6 items.

A program that depends on the floating-point stack being larger than six items has an environmental dependency.

12.3.4 Environmental queries

Append table 12.2 to table 3.4.

See: 3.2.6 Environmental queries.

Table 12.2: Environmental Query Strings

String Value data type Constant? Meaning

FLOATING-STACK n yes the maximum depth of the separate floating-point stack. On systems with the environmental restriction of keeping floating-point items on the data stack, n = 0.
MAX-FLOAT r yes largest usable floating-point number

12.3.5 Address alignment

Since the address returned by a CREATEd word is not necessarily aligned for any particular class of floating-point data, a program shall align the address (to be float aligned, single-float aligned, or double-float aligned) before accessing floating-point data at the address.

See: 3.3.3.1 Address alignment, 12.3.1.1 Addresses.

12.3.6 Variables

A program may address memory in data space regions made available by FVARIABLE. These regions may be non-contiguous with regions subsequently allocated with , (comma) or ALLOT. See: 3.3.3.3 Variables.

12.3.7 Text interpreter input number conversion

If the Floating-Point word set is present in the dictionary and the current base is DECIMAL, the input number-conversion algorithm shall be extended to recognize floating-point numbers in this form:

Convertible string := <significand><exponent>
<significand> := [<sign>]<digits>[.<digits0>]
<exponent> := E[<sign>]<digits0>
<sign> := { + | - }
<digits> := <digit><digits0>
<digits0> := <digit>*
<digit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

These are examples of valid representations of floating-point numbers in program source:

1E      1.E      1.E0      +1.23E-1      -1.23E+1

See: 3.4.1.3 Text interpreter input number conversion, 12.6.1.0558 >FLOAT.

12.4 Additional documentation requirements

12.4.1 System documentation

12.4.1.1 Implementation-defined options

12.4.1.2 Ambiguous conditions

12.4.1.3 Other system documentation

  • no additional requirements.

12.4.1.4 Environmental restrictions

  • Keeping floating-point numbers on the data stack.

12.4.2 Program documentation

12.4.2.1 Environmental dependencies

  • requiring the floating-point stack to be larger than six items (12.3.3 Floating-point stack);
  • requiring floating-point numbers to be kept on the data stack, with n cells per floating point number.

12.4.2.2 Other program documentation

  • no additional requirements.

12.5 Compliance and labeling

12.5.1 Forth-2012 systems

The phrase "Providing the Floating-Point word set" shall be appended to the label of any Standard System that provides all of the Floating-Point word set.

The phrase "Providing name(s) from the Floating-Point Extensions word set" shall be appended to the label of any Standard System that provides portions of the Floating-Point Extensions word set.

The phrase "Providing the Floating-Point Extensions word set" shall be appended to the label of any Standard System that provides all of the Floating-Point and Floating-Point Extensions word sets.

12.5.2 Forth-2012 programs

The phrase "Requiring the Floating-Point word set" shall be appended to the label of Standard Programs that require the system to provide the Floating-Point word set.

The phrase "Requiring name(s) from the Floating-Point Extensions word set" shall be appended to the label of Standard Programs that require the system to provide portions of the Floating-Point Extensions word set.

The phrase "Requiring the Floating-Point Extensions word set" shall be appended to the label of Standard Programs that require the system to provide all of the Floating-Point and Floating-Point Extensions word sets.

12.6 Glossary

12.6.1 Floating-Point words

12.6.2 Floating-Point extension words

ContributeContributions

JennyBrienavatar of JennyBrien Mistake in the specification of significand?Example2016-07-03 16:20:50

Convertible string := <significand><exponent>

<significand> := [<sign>]<digits>[.<digits0>]

<exponent> := E[<sign>]<digits0>

<sign> := { + | - }

<digits> := <digit><digits0>

<digits0> := <digit>*

<digit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

Surely it should be [<sign>]<digit>[.<digits0>] ? Only one digit allowed before the decimal point?

AntonErtlavatar of AntonErtl 2016-12-09 11:19:56

We also want to guarantee to convert 123e, 12.3e, 123e3 etc., so the specification is not a mistake. What I wonder about is if we should not also allow ".123e".

Reply

zhtooravatar of zhtoor Implementations requiring BOTH 32 bit single floats and 64 bit double floats.Proposal2016-12-21 14:39:40

Here the problem arises that what kind of floats are referred to by F F/ etc., single (32bit) or double (64bit). so would it not be better to use two naming standards for both the precisions such as G G/ for double operations, C C/ for single complex numbers and Z Z/ for double complex numbers. please advise.

BerndPaysanavatar of BerndPaysan 2016-12-22 01:09:07

The floats on the floating point stack are all one size, usually double. Only memory operations like SF@ and SF! convert to and from single.

zhtooravatar of zhtoor 2016-12-22 09:44:54

My question was in the case of a forth implementation which implements BOTH 32 bit single floats AND 64 bit double floats. How do you propose to name their respective words? for example, sf+ for single float+,df+ for double float+ and f+ may point to either by using SYNONYMS etc.

AntonErtlavatar of AntonErtl 2016-12-22 18:42:35

Yes, the convention you propose is what comes first to my mind. The only question is how you would name the corresponding @ and ! words.

I would recommend against adding having different FP sizes in a Forth system. It complicates matters and buys little to nothing on current hardware; the main benefit of smaller FP numbers is in memory and memory bandwidth requirements, and we have SF@ SF! for that already.

zhtooravatar of zhtoor 2016-12-23 00:47:30

you already have df@, df!, sf@, sf!, f@ and f!. no problems there. let f@ f! f+ ... family be the "default" float behavior, sf@ sf! sf+ ... family be the single float behavior and df@ df! df+ ... be the double float behavior. Moreover the sfstack?, dfstack? and fstack? indicate which of these have a separate stack (keeping future 64bit implementations in mind in case doubles could be stacked in the main stack). as far as the rational for having both implementations is analogous to single cell numbers vs. double cell numbers and some practical requirements of implementing on modern cores like ARM cortex A9 etc. with VFPv3 which have both options available in some cases, also some vector / dsp operations are also implemented as single floats (like graphic processors etc.). since i am actually facing this problem in my implementation, i thought the standards community might help. regards and thanks for your responses.

AntonErtlavatar of AntonErtl 2016-12-31 11:41:05

We already have SF@ SF! DF@ DF!, and they put standard floats on the FP stack, not single or double floats. So you would need other words to put single and double floats on the FP stack.

Also, we have standardized a separate FP stack, because writing portable programs for a shared FP stack is not practical. For single and double floats it's probably best to use the same FP stack, with one FP stack item per single or double, like FP registers are organized on most architectures; of course, separate stacks for these types are also possible, but would require additional stack pointers and stack manipulation words.

None of these variants looks particularly attractive, which is probably why there is very little, if any, practice in this direction.

AntonErtlavatar of AntonErtl 2016-12-31 11:57:29

Reasons for hardware to have 32-bit FP numbers (and smaller):

1) Existing software uses it; and existing programming languages have types for it; I.e., continue a tradition that had technical reasons once upon a time; and 32-bit FP is cheap to implement once you have implemented 64-bit FP. That's no reason to have it in Forth.

2) For vector operations you can deal with more values in the same amount of hardware. I don't see this as a reason to have it in Forth for scalar FP operations, because I don't think we should go for auto-vectorization in Forth. If we add vector operations to Forth, we can think about vectors of 32-bit FP numbers, but that's still no reason to have several scalar on-stack FP types.

Reply