Proposal: [26] Implementations requiring BOTH 32 bit single floats and 64 bit double floats.


This proposal has been moved into this section. Its former address was: /standard/float

This page is dedicated to discussing this specific proposal


zhtooravatar of zhtoor [26] Implementations requiring BOTH 32 bit single floats and 64 bit double floats.Proposal2016-12-21 14:39:40

Here the problem arises that what kind of floats are referred to by F* F/ etc., single (32bit) or double (64bit). so would it not be better to use two naming standards for both the precisions such as G* G/ for double operations, C* C/ for single complex numbers and Z* Z/ for double complex numbers. please advise.

BerndPaysanavatar of BerndPaysan

The floats on the floating point stack are all one size, usually double. Only memory operations like SF@ and SF! convert to and from single.

zhtooravatar of zhtoor

My question was in the case of a forth implementation which implements BOTH 32 bit single floats AND 64 bit double floats. How do you propose to name their respective words? for example, sf+ for single float+,df+ for double float+ and f+ may point to either by using SYNONYMS etc.

AntonErtlavatar of AntonErtl

Yes, the convention you propose is what comes first to my mind. The only question is how you would name the corresponding @ and ! words.

I would recommend against adding having different FP sizes in a Forth system. It complicates matters and buys little to nothing on current hardware; the main benefit of smaller FP numbers is in memory and memory bandwidth requirements, and we have SF@ SF! for that already.

zhtooravatar of zhtoor

you already have df@, df!, sf@, sf!, f@ and f!. no problems there. let f@ f! f+ ... family be the "default" float behavior, sf@ sf! sf+ ... family be the single float behavior and df@ df! df+ ... be the double float behavior. Moreover the sfstack?, dfstack? and fstack? indicate which of these have a separate stack (keeping future 64bit implementations in mind in case doubles could be stacked in the main stack). as far as the rational for having both implementations is analogous to single cell numbers vs. double cell numbers and some practical requirements of implementing on modern cores like ARM cortex A9 etc. with VFPv3 which have both options available in some cases, also some vector / dsp operations are also implemented as single floats (like graphic processors etc.). since i am actually facing this problem in my implementation, i thought the standards community might help. regards and thanks for your responses.

AntonErtlavatar of AntonErtl

We already have SF@ SF! DF@ DF!, and they put standard floats on the FP stack, not single or double floats. So you would need other words to put single and double floats on the FP stack.

Also, we have standardized a separate FP stack, because writing portable programs for a shared FP stack is not practical. For single and double floats it's probably best to use the same FP stack, with one FP stack item per single or double, like FP registers are organized on most architectures; of course, separate stacks for these types are also possible, but would require additional stack pointers and stack manipulation words.

None of these variants looks particularly attractive, which is probably why there is very little, if any, practice in this direction.

AntonErtlavatar of AntonErtl

Reasons for hardware to have 32-bit FP numbers (and smaller):

  1. Existing software uses it; and existing programming languages have types for it; I.e., continue a tradition that had technical reasons once upon a time; and 32-bit FP is cheap to implement once you have implemented 64-bit FP. That's no reason to have it in Forth.

  2. For vector operations you can deal with more values in the same amount of hardware. I don't see this as a reason to have it in Forth for scalar FP operations, because I don't think we should go for auto-vectorization in Forth. If we add vector operations to Forth, we can think about vectors of 32-bit FP numbers, but that's still no reason to have several scalar on-stack FP types.

gnuarmavatar of gnuarm

There are a number of smaller CPUs that implement 32 bit IEEE floating point in hardware and not 64 bit. They work exclusively in 32 bit without the extended precision intermediate values used in x86 and similar CPUs. Seems 32 bit floating point is adequately addressed by the present standard to support these smaller devices. Or am I missing something?

AntonErtlavatar of AntonErtl

Sure, if you have a 32-bit-FP-only FPU, you (as systems implementor) will probably go for 32-bit floats. The question at hand was if we should have multiple FP types for FPUs that support 64-bit and 32-bit floats.

gnuarmavatar of gnuarm

Which hardware systems do you know of that support higher precision floats, but can do the math in 32 bit floats rather than using higher precision for calculations while saving the results in 32 bits. The latter I believe is already supported by the existing Forth standard.

I certainly am not familiar with all systems, but the ones I am familiar with that support higher precision FP calculations will perform those higher precision FP calculations just as fast as 32 bit calculations.

I'm not clear what systems we are trying to not exclude? Graphics chips perhaps?

StephenPelcavatar of StephenPelc

On current hardware FP that has both 64 and 32 bit operations, I can see no reason to implement the FP stack with 32 bit operations. The only area in which the 64/32 bit distinctions become significant is in memory access, for which we already have the SFx and DFx operations. Provision of 32 bit only FP for embedded hardware is likely to be a transient of a few years until silicon improves. We are already seeing this in the transition from Cortex-M4F (universally 32 bit until now) to Cortex-M7 (mostly 64 bit and 32 bit).

I'm tired of writing ARM and Cortex FP packages.

Reply New Version