Proposal: Exclude zero from the data types that are identifiers

Formal

This page is dedicated to discussing this specific proposal

ContributeContributions

ruvavatar of ruv [252] Exclude zero from the data types that are identifiersProposal2022-08-13 23:24:52

Author

Ruv

Change Log

  • 2022-08-14 Initial version

Problem

In many cases it's supposed that a data object cannot be equal to a single-cell value zero, but the corresponding data type allows that.

For example

  • NAME>INTERPRET implies that an execution token cannot be zero.
  • FIND-NAME implies that a name token cannot be zero.
  • Usual practice in programs is to assume an address, a file identifier, a word list identifier be a nonzero value.

Solution

Explicitly exclude zero from the address, execution token, name token, word list identifier, file identifier data types. When zero is a valid value on the underlying level, it can be reserved from use, or filtered out in the wrappers over the API routines of the underlying level.

Also, fix incorrect wording in the "subtype" definition, since members are not a subject of the subset relationship (it actually operates on sets):

"A data type i is a subtype of type j if and only if the members of i are a subset of the members of j".

Also, add the missed data type relationships.

Proposal

Fix wording for "subtype"

In the section 3.1.1 Data-type relationships

Replace the phrase:

A data type i is a subtype of type j if and only if the members of i are a subset of the members of j.

With the phrase:

A data type i is a subtype of type j if and only if each member of i is a members of j.

Notation for difference between sets (relative complement)

In the section 3.1.1 Data-type relationships, add the following phrase at the end of the first paragraph:

The notation "i \ j" is used to denote "the data type that includes all those and only those members of i which are not members of j".

Exclude zero from the address and execution token data types

In the section 3.1.1 Data-type relationships

Replace:

a-addr ? c-addr ? addr ? u ;

With:

a-addr ? c-addr ? addr ? u \ {0} ;

Replace:

xt ? x;

With:

xt ? x \ {0};

Exclude zero from the name token data type

In the end of the section 15.3.1 Data types add the following subsection:

15.3.1.1 Data-type relationships

Add the following to the end of the list of subtype relationships in the section 3.1.1 Data-type relationships:

nt ? x \ {0};

Exclude zero from the word list identifier data type

In the end of the section 16.3.1 Data types add the following subsection:

16.3.1.1 Data-type relationships

Add the following to the end of the list of subtype relationships in the section 3.1.1 Data-type relationships:

wid ? x \ {0};

Exclude zero from the file identifier data type

In the end of the section 11.3.1 Data types add the following subsection:

11.3.1.1 Data-type relationships

Add the following to the end of the list of subtype relationships in the section 3.1.1 Data-type relationships:

fam ? x;

fileid ? x \ {0};

ruvavatar of ruv

Should the notation {0} be also described? It's a singleton.

Erratum: "is a members of" should be read as "is a member of".

In the data-type relationships, a question sign is shown due to a bug in the website engine.

ruvavatar of ruv

Should the notation {0} be also described?

Since relative complement is used only for {0}, this notation can be introduced at once with relative complement:

The notation i \ {0} is used to denote "the data type that includes all those and only those members of i which are not single-cell zero value".

ruvavatar of ruv

According to FILE SOURCE-ID, fileid cannot be -1 too.

So relationship for fileid should be: fileid => x \ {0,-1}

ruvavatar of ruv

Probably it makes sense to use simpler wording as:

Extend "3.1.1 Data-type relationships" by the following

instead of

Add the following to the end of the list of subtype relationships in the section 3.1.1 Data-type relationships

ruvavatar of ruv

Should the notation {0} be also described?

Actually, this notation is already described in 2.2.1 Numeric notation. Probably it should be extended to the case of explicit items enumeration as { number [, number ]* }, e.g. {0}, {-1,0,1}.

AntonErtlavatar of AntonErtl

In the 2022 meeting, there was lots of discussion, especially about the validity of file-ids with the value 0. When asked, none of the participants could name a system that has a problem with disallowing 0 as address, and none claimed that he had never used 0 as impossible address.

This proposal satisfies the formality criteria and is therefore promoted to formal. Please promote it to CfV when you think that you are not going to change it anymore (proposals in CfV state must not be revised).

Formal

albertavatar of albert

As 0 and 1 is widely used in Unix-like systems for file-identifiers , I would like to exclude those from the proposal.

ruvavatar of ruv

As 0 and 1 is widely used in Unix-like systems for file-identifiers , I would like to exclude those from the proposal.

The proposal does not mention 1, but -1.

Regarding 0. How does your Forth system behave if the OS returns 0 as a file-identifier for the input source? (for example, if it's an stdin pipe)

ruvavatar of ruv

Concerning zero as a file identifier

In POSIX, 0 is a possible value for a file descriptor, and this file descriptor is the standard input (stdin), by definition. A program can close this file descriptor, and then this file descriptor will be reused and returned by the next file open operation. And the next open file will be the standard input of the process, anyway. See also the StackOverflow question: Is it possible that linux file descriptor 0 1 2 not for stdin, stdout and stderr?

If a Forth system provides raw file descriptors to a program, and the program closes the fileid 0, the next file opened with open-file, create-file, or included will be the standard input for the process (since its file descriptor is 0) and probably the user input device for the Forth system. So, interpretation of source-id will be consistent. But this program is not a standard program and consequences are outside of the standard anyway. So, a standard program is not able to get the fileid 0 from open-file or create-file words.

Concerning zero as an address

There is one case when zero is possible instead of an address: for a character string of zero length.

The character string is defined as a pair ( c-addr u ). If we exclude zero from the address data type, then ( 0 0 ) is not a valid character string. But nothing wrong can occur if the value ( 0 0 ) is passed to a program that expects a character string stack parameter. And this fact is often used in practice.

Therefore, the value ( 0 0 ) should belong to the character string data type.

So I would suggest also updating the character string data type as: ( c-addr u | 0 0 ). And add a note that for brevity in stack diagrams, when ( c-addr u ) denotes a character string, the value pair ( 0 0 ) is also allowed.

Reply New Version