Digest #266 2024-06-29
Contributions
requestClarification - Behavior of `represent` when buffer length is zero
There was a recent discussion in comp.lang.forth about the behavior of represent ( c-addr u -- n flag1 flag2 ) ( F: r -- )
when the parameter at position u is zero
(links:
1
,
2
).
There are two positions regarding this case (u=0):
- the behavior is specified:
- nothing is written into the buffer,
- the significand (in the specified representation) is rounded to 0 digits after the decimal point — to an integer digit; n is adjusted correspondingly.
- the behavior is unclear/unspecified.
Some systems writes to the buffer (despite u is zero). Some throw an exception. Some behave according to (1).
Should we clarify the text description, or specify some other particular behavior (e.g., throw a specific exception), or declare an ambiguous condition?
If represent
does not throw an exception, I would expect behavior as follows:
t{ 0.1e pad 1 represent -> 0 0 -1 }t
t{ 0.1e pad 0 represent -> 1 0 -1 }t
t{ 0.1e 0 0 represent -> 1 0 -1 }t \ to ensure that nothing is written
It can be useful if the field width for a significand is calculated and can be zero in a program, and then only the order of magnitude for the number is shown by the program.
Replies
So the intention was to give no guarantees about that.
Yes, but this intention (concerning "duplicated nodes") is not reflected in the normative parts.
Maybe we should put that intention explicitly in the standard text.
I prefer to have access to shadowed definitions. And for introspection purposes, it's better if a Forth system does not skip shadowed words in traverse-wordlist
.
(From time to time people even ask how to access a shadowed word in Forth, see an example on StackOverflow).
Alternatively, if you want a guarantee, investigate existing practice, and, if it agrees with your preference, make a proposal for standardizing that.
From the formal point of view, we already have this guarantee: the text description says that xt is executed once for every word in the word list. And nothing in the normative text says that traverse-wordlist
can skip shadowed words in a word list. It means, it cannot skip the shadowed words.
Thus, to allow traverse-wordlist
to skip shadowed words, a proposal should be created to change the normative text description.
"Execute xt once for every word in the wordlist wid", but does not say whether "every word" includes shadowed words or not.
I can't agree with this.
"Every word in the word list" means every word that was placed in the word list and was not subsequently removed from the word list. That is, shadowing does not change anything in this regard.
Hello @JimPetersen
the test passes on gforth. It should pass on an standard system given it provides the BLOCK word set.
Suggestion for improving the test:
- Please comment on what actually is checked by the test and why it is likely to fail? (What do we expect, what could go wrong).
- As the loading of the blocks ist done interpretively and from within a definition it would be interesting what the idea behind distinguishing these cases is.
- maybe more telling names could help to improve understanding the test cases
referenceImplementation - Possible Reference Implementation
Hello @JimPetersen
your contribution has been discussed in the the Forth Standards Committee interim Meeting von 2024-02-16.
Thanks for the reference implementation of thru
that seems to work fine given u1
is less or equal to u2
. if u1
is greater than u2
it will load blocks in a huge sequence...
Your reference implementation might check that and load no block at all as is implied by the standard text numbered u1 through u2 in sequence
. The sequence is empty if u1
is greater than u2
.
Thanks also for the test case. The comments regarding the test case for contribution #280 hold for this test case as well.
Hello @albert,
we discussed your contribution #314 in the Forth Standard's Committee interim meeting on 2024-02-16. The comittee considers your contribution more a comment to some other contribution, most likely to the discussion of appendix F.
The committee kindly asks to repeat your comment there.
This contribution will be retired (but can be re-opened later at any time if requested).
Hallo Albert,
we also encourage you the provide appropriate string tests. The commitee will review and consider them.
Hello @albert,
we discussed your contribution in the Forth Standard's Committee interim meeting on 2024-02-16.
Thanks for you contribution. The committee appreciates your effort on simplifying string handling.
The committee kindly suggest, that you
- elaborate on possible buffer overflow issues that can arise when using the string words you propose.
- use terminology as close as possible to the current standard's terminology in order to allow to easier adoption.
- create a string package with a reference implementation loadable on standard systems in order to establish common pratice for your string words.
Thanks again for your effort.
Address units
I will extend the discussion of larger address units to suggest adding b-addr and related stuff, or alternatively specifying that systems that implement these words are required to have 8-bit address units.
But in contrast to what you write, if we take the latter option, there is no need for b-addr.
addr vs. c-addr
Yes, as far as the standard is concerned, addr has the least alignment requirements. But it is also used in practice as a stand-in for any kind of address, including addresses with stricter alignment requirements (I am sure I am not alone in this usage of addr); so using c-addr makes the intent of not requiring alignment clearer to the reader.
u vs. x
Zero-extending is what you do with unsigned numbers, so if you use the result of w@
as a number, the result of w@
is an unsigned number. OTOH, if you apply w>s
to the result of w@
, the result is just treated as a funny representation of a signed number (not an n, but not a u either); if we don't know what it is, we use x, so using x would be ok here. Likewise, if we use, say, lbe
on the result of w@
, one can see the input of lbe
as a funny representation of an unsigned or signed number, and the output as an unsigned number or a funny representation of a signed number. So yes, one can argue for using x everywhere except for the output of the sign-extending words.
However, I don't see that it makes a difference in what kinds of programs are considered conforming to this specification or how systems are implemented, so we should use the type that makes it easiest to understand what is going on. And in this respect using u as output type for w@
and w@ lbe
makes it very obvious that the result is zero-extended, no need to consult the prose of these words for determining that.
Another idea that I had some time ago was to have specialized types for the various intermediate results of the decomposition, e.g., bewn for a big-endian 16-bit signed number. This would allow specifying the proper sequences in the conversion through the type system, but would make the specification more complex. E.g., for the w
words we would have:
w@ ( c-addr -- u|wn|bewn|lewn|bewu|lewu )
w! ( u|n|bewn|lewn|bewu|lewu c-addr -- )
wbe ( bewn -- wn | bewu -- u | n -- bewn | u -- bewu )
wle ( lewn -- wn | lewu -- u | n -- lewn | u -- lewu )
w>s ( wn -- n )
I think that this amount of detail (including specifying all these types) makes the specification harder rather than easier to understand.
Concerning the whether the things that are processed with these words are numbers: they certainly cannot be xts in a portable program, because an xt may consume a full cell, which may be larger than 16 bits, 32 bits, and in theory even larger than 64 bits. Also, the xt is specific to the process where it originates from, so it makes no sense to communicate it to elsewhere. The same goes for addresses.
So that leaves us with integer numbers and bitmaps (including Forth flags). For bitmaps, x is more appropriate than u. Does this outweigh the benefit of making the zero extension obvious? If we add x to the variant with the specialized types above, things become even more complex.
2s-complement
2s-Complement Wrap-Around Integers have been standardized at the 2015 meeting.
2s-Complement Wrap-Around Integers have been standardized at the 2015 meeting.
Yes. But it's for single-cell and double-cell singed integer numbers in the Forth system.
If a binary interface uses another format for negative integers (for example, the least significant bit for sign), then the sequence:
w@ ( addr -- x ) wbe ( x -- x ) w>s ( x -- n )
does not return an implied signed integer number.
Why is this so from a formal point of view? Where are data types not matched?
I think, the text description for w>s ( x -- n )
should specify a format for the stack parameter in the position x. From this description it should be clear that the least significant 16 bits of x must represent a signed integer in two's complement format, and the remaining bits of x have no meaning.
u vs. x
I don't see that it makes a difference in what kinds of programs are considered conforming to this specification or how systems are implemented,
"u" is a symbol for the "unsigned number" data type; the set of values of this data type is the integers in the range { 0 ... max-u } (max-u is defined in 3.2.6 Environmental queries). This data type defines not only a format (zero-extended), but also the interpretation for data objects as particular integer numbers. Namely, u defines a particular mapping from the set of data objects to the set of values.
If a mapping from the set of single-cell data objects to a set of values is unknown (or can vary) for a stack parameter, the standard can use the only x data type for this parameter.
For example, in the phrase w@ ( x1 ) dup 1 rshift swap %1 and if negate then ( n )
the stack parameter in the position x1 is not a member of u, since the u mapping from the data objects to the values does not hold for this parameter! In this example, if the parameter in the position x1 is %11
(as a tuple of bits), it is not the number 3
, but the number -1
, and it's defined by the application, not by the standard.
using u as output type for
w@
andw@ lbe
makes it very obvious that the result is zero-extended, no need to consult the prose of these words for determining that.
The use of u also implies the specific mapping from the set of data objects (bit tuples) to the set of values for the stack parameter. And this mapping does not hold. Therefore, this use is invalid.
requestClarification - Behavior of `represent` when buffer length is zero
Correction. If not exception, it should be:
t{ 0.1e pad 0 represent -> 0 0 -1 }t
t{ 0.4e pad 0 represent -> 0 0 -1 }t
t{ 0.6e pad 0 represent -> 1 0 -1 }t
Concerning 0.5e — it's unclear. Probably, it's system defined.