# Digest #262 2024-06-12

# Contributions

Is the following fragment standard compliant?

```
-1 pad c!
```

The glossary entry for `c!`

says:

( char c-addr -- )

When character size is smaller than cell size, only the number of low-order bits corresponding to character size are transferred.

It seems, the text description implies a stack diagram *( x c-addr -- )* or *( char|x c-addr -- )*.

# Replies

See A.3.2.1 how deeply the assumptions of two's complement are rooted into Forth. Therefore, the operations `+`

and `-`

work interchangeable for unsigned and signed, and even mixed. If your system is not two's complement, unsigned is restricted to +n.

IIRC, not using two's complement is no longer an option.

## referenceImplementation - Possible Reference Implementation

I should read the specification more closely. This correction gives the correct output if search fails to find the string. Reduced some stack juggling with the dreaded PICK.

```
: SEARCH ( caddr1 u1 caddr2 u2 -- caddr3 u3 flag)
BEGIN
DUP
WHILE
2OVER 3 PICK OVER COMPARE
WHILE
1 /STRING
REPEAT
2NIP TRUE EXIT
THEN
2DROP FALSE ;
```

```

As I see it, to solve the mentioned problem with the notion and definition of "data type", a more detailed formalization is required.

The main premise: every data object is *formally* associated with a set of data types.

Thus, we can talk about "typed data objects" (an abstraction), data objects that are associated with data types:

A *typed data object* is an ordered pair of a data object and a set of data types.

Consequently, every data type determines an *abstract* set of *typed data objects*. And it's possible to determine where a *typed data object* is a member of a data type (namely, a member of the set that is determined by that data type).

So where are "values"? A value for a typed data object is determined by data types. And since a typed data object may belong to several data types (for example, to *n* and to *flag*), it may have a set of different values. But one data type determines only one value for one data object (that is allowed to be associated with this data type).

A data type identifies the set of permissible values for a data object.

So, a variant that seems more correct:**A data type identifies a set of data objects and a value for each data object from that set.**

If somebody interested, see my attempt of a more detailed formalization, a feedback is welcome.

For the moment, let's take the Forth-94/2012 position that the result on integer overflow (i.e. where a mathematical integer addition produces a result outside the target range) is implementation-defined. Let's say I want to avoid that; even then I can construct cases for various combinations of n and u; to make things more concrete, let's assume that the range of n is -32768..32767 and for u it is 0..65535:

```
1 1 + \ gives 2; +n1 +n2 -- +n (where +n is both u and n)
33000 -1 + \ gives 32999; u1 n2 -- u
33000 -1000 + \ gives 32000; u1 n2 -- +n
1 -2 + \ gives -1; +n1 n2 -- n
-1 -1 + \ gives -2; n1 n2 -- n
20000 20000 + \ gives 40000; +n1 +n2 -- u
```

For the stack diagrams with +n, you could produce either one with n or with u instead of the +n, which means that in the first case you can have all 8 combinations.

These are all standard Forth programs where the mathematical integer result is in the target range, so no, the diagram is not limited to be equivalent to `( n1 n2 -- n3 | u1 u2 -- u3 )`

. My take is that n|u means a range of -32768..65535 for the example ranges above, just as +n (i.e., n&u) means 0..32767.

In 2015 the committee accepted 2s-Complement Wrap-Around Integers, which defines what happens in those cases where the result does not fit in the target range; with that it is up to the programmer in all cases how to interpret the arguments and the results of `+`

; e.g., the `-1 -1 +`

case for the ranges given above can also be interpreted as:

```
65535 65535 + \ gives 65534; u1 u2 -- u
```

Moving a data object shall not affect its type. (2)

Granted, I also find (2) confusing. In my case, it's because it implies that objects somehow "know" their type.

I think, a purpose of this statement is to guarantee a property that can be illustrated by the following examples.

Let's consider the word `swap ( x1 x2 -- x3 x4 )`

. It moves (in some sense) the data objects, but this shall not affect their data types (NB: a data object may be a member of several data types, not only the data type *x*). That is, not only the data objects in the stack parameters *x3* and *x2* are equal, but also their sets of data types are equal. I.e., they are equal as *typed data objects*. Ditto for *x4* and *x1*.

Ditto for the sequential operations `! ( x1 a-addr -- )`

and `@ ( a-addr -- x2 )`

for the same address: the typed data objects in the stack parameters *x2* and *x1* are equal.

Ditto for the result of the `move`

operation, etc (when you write data objects into one location, then copy them into another location, and then read from another location).

One idea, why the operations `+`

and `-`

should be defined when one argument in *u*, and the other argument in *n* is that these operations on any integer arguments should be equal to a series of operations on `0`

and `1`

.

Any number in *u* that is outside of the *n* range can be represented as the sum of several numbers in *n*.

For example, `max-uint`

is the sum of numbers (`max-int`

, `max-int`

, `1`

), (supposing two's complement).

Thus, an operation on numbers in *n* and *u* should be equivalent to several operations in *n* only, resulting in *n* (taking into account the overflow rule) (1).

Any number in *n* that is outside of the *u* range can be represented as the subtraction of two numbers in *u*.

For example, `min-int`

is the subtraction of unsigned( `max-int`

+ `1`

) from `0`

(assuming two's complement).

Thus, an operation on numbers in *n* and *u* should be equivalent to several operations in *u* only, resulting in *u* (taking into account the overflow rule) (2).

I don't sure whether (1) and (2) both are always true for one's complement and sign-magnitude representations, but they are true for two's complement representation.

It means, that the result of `+`

and `-`

, when one argument in *n*, and the other in *u*, always belongs to **both** *u* and *n* data types, and in some cases it also belongs to the *+n* data type.

Also, if one (and not the other) of the arguments is in *addr*, the result (in the general case) is in *u* and in *addr* , but not in *n*.

It seems, (1) and (2) are also true for `* ( n1|u1 n2|u2 -- n3|u3 ) `

.

## Author:

- Anton Ertl
- Leon Wagner

## Change Log

- 2024-06-06 replaced some
`n`

with`+n`

; formatting changes (AE) - 2023-09-14 Revision after discussion (AE)
- 2023-09-13 Initial proposal

## Problem:

The stack comments for N>R and NR> don't make it clear that *+n* items are moved between the data and return stacks.

## Solution:

The stack comments should more clearly indicate that *+n* data stack items are moved to or from the return stack.

## Proposal:

In the definition of `N>R`

, replace

`( i * n +n -- ) ( R: -- j * x +n )`

with

`( x_n ... x_1 +n -- ) ( R: -- j*x +n )`

In the definition of `NR>`

, replace

`( -- i * x +n ) ( R: j * x +n -- )`

with

`( -- x_n ... x_1 +n ) ( R: j*x +n -- )`

## Discussion

On the return stack, `j*x +n`

because the data may be in a separate buffer and only the address and `+n`

on the return stack. `+n`

on the return stack because the original specified that, and changing that would be a substantial change.

On the data stack `x_n ... x_1 +n`

because that is the way we usually specify a numbered number of cells (even for `+n=0`

). See, e.g., `get-order`

.