Digest #262 2024-06-12
Contributions
Is the following fragment standard compliant?
-1 pad c!
The glossary entry for c!
says:
( char c-addr -- )
When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.
It seems, the text description implies a stack diagram ( x c-addr -- ) or ( char|x c-addr -- ).
Replies
See A.3.2.1 how deeply the assumptions of two's complement are rooted into Forth. Therefore, the operations +
and -
work interchangeable for unsigned and signed, and even mixed. If your system is not two's complement, unsigned is restricted to +n.
IIRC, not using two's complement is no longer an option.
referenceImplementation - Possible Reference Implementation
I should read the specification more closely. This correction gives the correct output if search fails to find the string. Reduced some stack juggling with the dreaded PICK.
: SEARCH ( caddr1 u1 caddr2 u2 -- caddr3 u3 flag)
BEGIN
DUP
WHILE
2OVER 3 PICK OVER COMPARE
WHILE
1 /STRING
REPEAT
2NIP TRUE EXIT
THEN
2DROP FALSE ;
```
As I see it, to solve the mentioned problem with the notion and definition of "data type", a more detailed formalization is required.
The main premise: every data object is formally associated with a set of data types.
Thus, we can talk about "typed data objects" (an abstraction), data objects that are associated with data types:
A typed data object is an ordered pair of a data object and a set of data types.
Consequently, every data type determines an abstract set of typed data objects. And it's possible to determine where a typed data object is a member of a data type (namely, a member of the set that is determined by that data type).
So where are "values"? A value for a typed data object is determined by data types. And since a typed data object may belong to several data types (for example, to n and to flag), it may have a set of different values. But one data type determines only one value for one data object (that is allowed to be associated with this data type).
A data type identifies the set of permissible values for a data object.
So, a variant that seems more correct:
A data type identifies a set of data objects and a value for each data object from that set.
If somebody interested, see my attempt of a more detailed formalization, a feedback is welcome.
For the moment, let's take the Forth-94/2012 position that the result on integer overflow (i.e. where a mathematical integer addition produces a result outside the target range) is implementation-defined. Let's say I want to avoid that; even then I can construct cases for various combinations of n and u; to make things more concrete, let's assume that the range of n is -32768..32767 and for u it is 0..65535:
1 1 + \ gives 2; +n1 +n2 -- +n (where +n is both u and n)
33000 -1 + \ gives 32999; u1 n2 -- u
33000 -1000 + \ gives 32000; u1 n2 -- +n
1 -2 + \ gives -1; +n1 n2 -- n
-1 -1 + \ gives -2; n1 n2 -- n
20000 20000 + \ gives 40000; +n1 +n2 -- u
For the stack diagrams with +n, you could produce either one with n or with u instead of the +n, which means that in the first case you can have all 8 combinations.
These are all standard Forth programs where the mathematical integer result is in the target range, so no, the diagram is not limited to be equivalent to ( n1 n2 -- n3 | u1 u2 -- u3 )
. My take is that n|u means a range of -32768..65535 for the example ranges above, just as +n (i.e., n&u) means 0..32767.
In 2015 the committee accepted 2s-Complement Wrap-Around Integers, which defines what happens in those cases where the result does not fit in the target range; with that it is up to the programmer in all cases how to interpret the arguments and the results of +
; e.g., the -1 -1 +
case for the ranges given above can also be interpreted as:
65535 65535 + \ gives 65534; u1 u2 -- u
Moving a data object shall not affect its type. (2)
Granted, I also find (2) confusing. In my case, it's because it implies that objects somehow "know" their type.
I think, a purpose of this statement is to guarantee a property that can be illustrated by the following examples.
Let's consider the word swap ( x1 x2 -- x3 x4 )
. It moves (in some sense) the data objects, but this shall not affect their data types (NB: a data object may be a member of several data types, not only the data type x). That is, not only the data objects in the stack parameters x3 and x2 are equal, but also their sets of data types are equal. I.e., they are equal as typed data objects. Ditto for x4 and x1.
Ditto for the sequential operations ! ( x1 a-addr -- )
and @ ( a-addr -- x2 )
for the same address: the typed data objects in the stack parameters x2 and x1 are equal.
Ditto for the result of the move
operation, etc (when you write data objects into one location, then copy them into another location, and then read from another location).
One idea, why the operations +
and -
should be defined when one argument in u, and the other argument in n is that these operations on any integer arguments should be equal to a series of operations on 0
and 1
.
Any number in u that is outside of the n range can be represented as the sum of several numbers in n.
For example, max-uint
is the sum of numbers (max-int
, max-int
, 1
), (supposing two's complement).
Thus, an operation on numbers in n and u should be equivalent to several operations in n only, resulting in n (taking into account the overflow rule) (1).
Any number in n that is outside of the u range can be represented as the subtraction of two numbers in u.
For example, min-int
is the subtraction of unsigned( max-int
+ 1
) from 0
(assuming two's complement).
Thus, an operation on numbers in n and u should be equivalent to several operations in u only, resulting in u (taking into account the overflow rule) (2).
I don't sure whether (1) and (2) both are always true for one's complement and sign-magnitude representations, but they are true for two's complement representation.
It means, that the result of +
and -
, when one argument in n, and the other in u, always belongs to both u and n data types, and in some cases it also belongs to the +n data type.
Also, if one (and not the other) of the arguments is in addr, the result (in the general case) is in u and in addr , but not in n.
It seems, (1) and (2) are also true for * ( n1|u1 n2|u2 -- n3|u3 )
.
Author:
- Anton Ertl
- Leon Wagner
Change Log
- 2024-06-06 replaced some
n
with+n
; formatting changes (AE) - 2023-09-14 Revision after discussion (AE)
- 2023-09-13 Initial proposal
Problem:
The stack comments for N>R and NR> don't make it clear that +n items are moved between the data and return stacks.
Solution:
The stack comments should more clearly indicate that +n data stack items are moved to or from the return stack.
Proposal:
In the definition of N>R
, replace
( i * n +n -- ) ( R: -- j * x +n )
with
( x_n ... x_1 +n -- ) ( R: -- j*x +n )
In the definition of NR>
, replace
( -- i * x +n ) ( R: j * x +n -- )
with
( -- x_n ... x_1 +n ) ( R: j*x +n -- )
Discussion
On the return stack, j*x +n
because the data may be in a separate buffer and only the address and +n
on the return stack. +n
on the return stack because the original specified that, and changing that would be a substantial change.
On the data stack x_n ... x_1 +n
because that is the way we usually specify a numbered number of cells (even for +n=0
). See, e.g., get-order
.