Digest #265 2024-06-21

Contributions

[345] 2024-06-20 12:04:57 ruv wrote:

proposal - Allow the text interpreter to use `WORD` and the pictured numeric output

Author

Ruv

Change Log

  • 2024-06-20 Initial revision

Problem

In traditional implementations, the Forth text interpreter itself uses the word WORD and thus it clobbers a transient region containing the parsed lexeme. But the standard does not allow the Forth text interpreter to clobber this region.

Ditto for the words <#, #>, when, for example, the text interpreter shows the stack items after the input string is interpreted.

This problem was pointed out in the contribution [315] WORD and the text interpreter by Anton Ertl.

Solution

The proposed options are:

  1. Fix the systems to avoid clobbering the WORD buffer in the text interpreter
  2. Change the standard to allow clobbering the word buffer by parsing in the text interpreter.

Since the word WORD will be obsolete in any case, there is little point in fixing the systems. Therefore, I propose to change the standard so the text interpreter can overwrite this buffer.

Concerning the pictured numeric output string buffer, it can be used when numbers are displayed, according to 3.2.1.3 Free-field number display: "Number display may use the pictured numeric output string buffer".

An additional benefit is that the user will be allowed to implement their own limited but standard-compliant text interpreter using WORD (this approach is probably used in some old programs) and displaying numbers.

Proposal

Add into the section 3.3.3.6 Other transient regions, after the second paragraph (that ends with "could also corrupt the regions.") the following paragraph:

The data space regions identified by WORD and #> may become invalid after any step in the Forth text interpreter loop.


[346] 2024-06-20 15:25:45 ruv wrote:

requestClarification - Eliminating ambiguous conditions for Tick

I'm interested in how we want to define Tick's behavior in edge cases, as part of the general trend of reducing the number of ambiguous conditions.

Undefined word

When a word is not found (regardless of STATE), Tick shall throw an exception -13 ("undefined word"). It's obvious.

Undefined interpretation semantics

When interpretation semantics for the word are not defined by the standard, Tick shall do one from the following options:

  • return xt for a system-defined behavior (so, an ambiguous condition exists if this xt is executed);
  • throw exception -32 "invalid name argument";
  • throw exception -13 "undefined word" (I think, this option is undesired. Can we exclude it?).

This case is for the words like if, >r, exit, begin, etc.

Interpretation semantics are defined, but execution semantics undefined

The main question is this: should we allow Tick to throw an exception if interpretation semantics for the word are defined, but execution semantics are not defined?

At the moment, it's for the five standard words: s", s\", to, is, action-of.

Most systems in this case return xt that, when executed in interpretation state, performs the interpretation semantics for the word. Behavior in compilation state vary from system to system. So, by the fact, an ambiguous condition exists only when this xt is executed in compilation state.

What do you think?

Replies

[r1238] 2024-06-18 07:17:32 AntonErtl replies:

proposal - Support for single line comments during `evaluate`

What about other parsing words, e.g., s", or user-defined parsing words?

If you want to support multi-line evaluate, wouldn't it be better to extend evaluate to only present the first line to parsing, and then after refill the next line and so on. Then \ should work automatically as intended,


[r1239] 2024-06-18 07:50:22 ruv replies:

proposal - Special memory access words

For l-family and x-family words, it should be mentioned in a note that the word can be only provided by the system if the cell size is not less than 32 and 64 bits correspondingly.

I mean, it should be mentioned in some normative part: either in the glossary entry for the word, or somewhere in the section "Optional Special Memory Access word set".


[r1240] 2024-06-18 08:15:46 ruv replies:

proposal - Support for single line comments during `evaluate`

What about other parsing words, e.g., s", or user-defined parsing words?

It can be solved by adding: "When parsing from a text string using a space delimiter, control characters shall be treated the same as the space character" (I.e., the same as for a file).

So, parse-name will skip a line terminator in evaluating string too. (It seems like most systems already behave this way).

s", will be able to parse multiple text lines in an evaluating string (if " is not found in the current text line), but I don't see any problem with that.

If you want to support multi-line evaluate, wouldn't it be better to extend evaluate to only present the first line to parsing, and then after refill the next line and so on. Then \ should work automatically as intended,

  1. My idea is that a program should not depend on refill, i.e., whether the parse area contains a single text line or multiple text lines.
  2. Changing \ is almost portable. I.e., it can be implemented via a polyfill (a portable module).
  3. This approach is slightly more efficient, since we don't need to break text into lines before feed the text interpreter.
  4. Less internal states, easier implementation (compared to support for refilling when evaluating a string).

[r1241] 2024-06-18 13:50:08 GeraldWodni replies:

proposal - Special memory access words

Yes, thanks! I cannot believe we have not yet standardized this, let's check for system compliance and get that standardized.

There is a small typo: It ways w is Wyde. I guess that should be Word or Wide?

One little bike-shedding I want to point out, is that words like w>s look like conversion words. lbe does not. I would profer >lbe, but this should be discussed face 2 face.


[r1242] 2024-06-18 17:32:06 AntonErtl replies:

proposal - Special memory access words

The sign-extending words w>s etc. are specified as ( x -- n ) because these can be considered as bitwise operations like lshift. Alternatively, they could be specified as ( u -- n ) because the result of the fetching words and the byte-order words are unsigned. But it makes no difference how they are specified.

The fetching words and byte-order words have unsigned results because they zero-extend (if anything) the data that they fetch or reorder. Specifying u here indicates that.

The addresses are specified as c-addr to point out that the addresses are not required to be aligned. This is less clear with addr.

About the wording of the sign-extension words: good point. I will change it to

Sign-extend the low-order 8 bits in x to the full cell width.

Concerning the l and x words: I expect that these words will be optional. It should be obvious to the implementor of a 32-bit system that the x words don't make sense for their system, but just in case we could mention that in the Rationale. I don't see a point in putting this in the normative part, and it would make the text more verbose and less usable.

There is a section "Larger address units" that discusses the case of address units larger than 8 bits. I don't see how the proposal (or any other practically usable wordset) could work with w@ w! that read the low-order 16 bits of an address unit and still result in portable code. There is a reason why byte-addressed architectures have won.

The section "Larger address units" outlines an approach where w@ w! etc. use only 8 bits per address unit. I don't want to prescribe this approach (maybe there are other ways, although I don't see them), so w@ w! etc. are not specified in this way; the current specification is also much more readable for the vast majority of users (those on byte-addressed systems) than one that prescribes accesses to 8 bits per au. Addressing systems with larger aus in the rationale looks good enough to me; and maybe the implementors of those systems don't want to implement any of these words anyway; there's a reason why they went for a system that is not byte-addressed. But maybe I should be putting in some normative text in the proposal that prescribes 8 bits per address unit (maybe associated with a type b-addr).

Octet is a term from telecommunications that has no place in computing since 8-bit bytes won >50 years ago.

c>s sign-extends 8-bit bytes.

There is no need for b words, even on systems with larger address units than bytes. See the section "Larger address units".

However, with possible systems where c@ zero-extends bigger units than 8 bits, we need a c>u that zero-extends 8-bit units. I will add this to the proposal.

I will replace "bottom" with "least significant".

As for addressing larger address units, see the discussion above.

"Wyde" is not a typo, but a word I have from Bernd Paysan; just as "byte" is not a typo for "bite", and "nybble" is not a typo for "nibble" (but "nibble" is more common). "w" may originally have come from DEC/Motorola/Intel usage, which is based on the idea of a 16-bit word (as in the PDP-11, 6800, and 8080), but other architectures started with 32-bit words, so "word" would not only conflict with other usage of "word" in Forth, but also with other usage in various computer architectures, and with the general use in computer architecture (where "word" means the same as "cell" means in Forth).

'lbe' is a conversion word, but it converts to and from big-endian order, so how should we place the >? l>be or lbe>. Either one is wrong for one of the two usages, so it was just called lbe.


[r1243] 2024-06-18 21:19:30 ruv replies:

proposal - Special memory access words

'lbe' is a conversion word, but it converts to and from big-endian order, so how should we place the >?

I agree with the form like lbe. It can be read as a modifier like bin ( fam1 -- fam2 ).


But maybe I should be putting in some normative text in the proposal that prescribes 8 bits per address unit (maybe associated with a type b-addr).

Introducing b-addr seems like a good idea! It will then be obvious that these words are for byte-addressed systems only (by design).

The addresses are specified as c-addr to point out that the addresses are not required to be aligned. This is less clear with addr.

Can't agree.
According to 3.1.1 Data-type relationships:

a-addr ⇒ c-addr ⇒ addr ⇒ u

Where "c-addr" is a symbol for the "character-aligned address" data type. See also 3.3.3.1 Address alignment.

In Forth-2012, the c-addr data type may be not equal to the addr data type; it means, the address returned by align here 1+ does not belong to c-addr in the general case (because, in some plausible system a character consists of four address units). So it's absolutely clear that a parameter of the addr data type is not required to be aligned at all.

In Forth-2019, c-addr is equal to addr, but "c-addr" is still defined as "character-aligned address".

Thus, a better way is to either introduce b-addr, or use addr.


The fetching words and byte-order words have unsigned results because they zero-extend (if anything) the data that they fetch or reorder. Specifying u here indicates that.

To indicate zero-extending, another data type should be introduced. By design, the data type u does not indicate zero-extending. This data type only indicates that the operation interprets a parameter of this type as unsigned number, and the result will be incorrect if the parameter is interpreted by the user as, for example, a negative number.

So, for a word that is specified as wbe ( u1 -- u2 ), the stack parameter in the position u2 should be always interpreted by the user as a particular unsigned number (ditto for u1). But it's wrong in the general case for the word wbe. Because after the byte order is changed (or before it is changed), the parameter is just a tuple of bits.

Also, it is possible that even in the native byte order a parameter is not interpreted as a number at all (that is, neither a signed number, nor an unsigned number). For example, xt is formally not a number (and xt ⇒ x). And when you specify u, you exclude such use case.


[r1244] 2024-06-19 08:30:04 ruv replies:

proposal - Special memory access words

after the byte order is changed (or before it is changed), the parameter is just a tuple of bits.

An improved version of the parameter data type specifications (via stack diagrams):

w@ ( addr -- x )
wbe ( u1 -- x2 | x1 -- u2 )
w>s ( x1 -- n2 )

But I'm still not happy enough with it.

Because the following sequence:

w@ ( addr -- x ) wbe ( x -- u )  w>s ( x -- n )

is formally incorrect, because on the last step we actually convert the parameter of the u data type to the parameter of the n data type.

If we have:

w@ ( addr -- x )
wbe ( x1 -- x2 )
w>s ( x1 -- n2 )

The following sequence is correct (in terms of data types matching):

w@ ( addr -- x ) wbe ( x -- x )  w>s ( x -- n )

[r1245] 2024-06-19 11:57:08 PeterFalth replies:

proposal - Special memory access words

The following sequence is correct (in terms of data types matching):

w@ ( addr -- x ) wbe ( x -- x ) w>s ( x -- n )

Does this sequence make any sense? how can w>s know that the stack item is in big-endian in this case and work properly?

I think it is a mistake to have a separate words for sign extending.

In lxf/ntf I have <w@ for sign extending when retrieving a number. I think this is much more simpler

BR Peter


[r1246] 2024-06-19 13:12:22 ruv replies:

requestClarification - NDCS xt

When searching for a word with NDCS, what XT should be returned?

NDCS means "Non-Default Compilation Semantics". Immediate words (in the normative sense) are NDCS words, but an NDCS word can be not immediate (in the normative sense).

At the moment my position is as follows.

Meaning of xt

If a word is found by search-wordlist, the returned xt identifies the execution semantics for the word (standard or system-specific), and it is the same xt regardless of STATE.

Ambiguous conditions

An ambiguous condition exists if interpretation semantics for the word are not defined by the standard and xt is executed (by any means).

An ambiguous condition exists if execution semantics for the word are not defined by the standard and xt is executed in compilation state.

Meaning of the code (the top value)

When a word is found, the top value shall be -1 if the word is an ordinary word (i.e., it has default interpretation semantics and default compilation semantics), and it shall be 1 otherwise.

Thus, the top value is 1 if the found word is an NDCS word. This will not have any unexpected effects concerning immediacy due to the mentioned ambiguous conditions.

Edge cases explanation

If interpretation semantics or execution semantics are not defined by the standard, xt identifies the system-specific execution semantics for the word with the following constrain: if this xt is executed in interpretation state, the interpretation semantics for the word (the standard behavior, or system-specific if not defined by the standard) shall be performed.

For words like to and s", for which interpretation semantics are defined, and execution semantics are not defined, the returned xt, when executed, shall perform the corresponding interpretation semantics in interpretation state regardless of implementation details, and it is not allowed to perform this xt in compilation state because the behavior depends on implementation details.

Ignoring an existing word

Probably, it is conceivable if search-wordlist returns 0 for a word for which interpretation semantics are not defined by the standard — since a standard program cannot execute the returned xt anyway; examples of such words are if, >r, etc.

But a more reliable option is to return an xt that simply throws an exception when executed (or perform a more useful system-defined behavior, if any — see A.3.4.3.2 Interpretation semantics).


[r1247] 2024-06-19 13:18:13 ruv replies:

requestClarification - NDCS xt

One important thing is that search-wordlist provides information whether the found word is an ordinary word (or implemented as an ordinary word), or not.

The later standardized words find-name-in, name>compile, name>interpret don't allow to obtain this information at the moment.

So, search-wordlist cannot be implemented via these words.


[r1248] 2024-06-19 13:31:34 ruv replies:

proposal - Special memory access words

w>s accepts argument in the only native endianness of the Forth system.

w@ ( addr -- x ) wbe ( x -- x ) w>s ( x -- n )

In this sequence we know that the wyde in memory is in big-endian. w@ reads this wyde as is, and wbe converts it from the big-endian to the native endianness. And w>s interprets 16 least significant bits of its parameter as a signed 16-bit 2s complement value, and extends the sign to the full cell.

Probably we should also specify two's complement in some normative parts.


[r1249] 2024-06-20 09:35:36 ruv replies:

proposal - New words: latest-name and latest-name-in

Author

Ruv

Change Log

  • 2023-10-22 Initial revision
  • 2023-10-23 Add testing, examples, a question to discuss, change the throw code description
  • 2023-10-27 Some rationales and explanations added, the throw code description changed back, better wording in some places
  • 2024-06-20 Fix some typos, make some wording and formatting better, add some examples and test cases, add motivation for LATEST-NAME-IN, change the status to "formal".

Problem

In some applications, mainly in libraries and extensions, the capability to obtain the most recently added definition is very useful and demanded.

To make such programs portable, we should introduce a standard method to obtain the most recently added word.

For example, if we are creating a library for decoration, tracing, support for OOP, simple DSLs (e.g., to describe Final State Machines), etc — it is always useful to have an accessor to the recent definition, instead of redefining a lot of words to define such an access method yourself, or juggling with the input buffer and search.

One simple example. If we want to have variables that are initialized by zero, we can use:

: var ( "name" -- )
  variable
  0  latest-name name> execute  !
;

A number of specific examples is provided in my post on ForthHub (those examples are not inserted here so as not to bloat the text).

And additionally, there has been much discussions regarding standardization of such a method in recent decades. For example, Elizabeth D. Rather wrote on 2011-12-09 in comp.lang.forth:

AFAIK most if not all Forths have some method for knowing the latest definition, it's kinda necessary. The problem is, that they all do it differently (at different times, in different forms, etc.), which is why it hasn't been possible to standardize it.

Although it's a system necessity, I haven't found this of much value in application programming.

Elizabeth D. Rather

It's true: depending on the system, an internal method can return the recent word regardless of the compilation word list, or depending on the compilation word list, a completed definition, or not yet completed definition, also unnamed definition, or only named definition, etc. The value in application programming is shown by me above.

Some known internal methods: latest ( -- nt|0 ), last @ ( -- nt|0 ), latestxt ( -- xt|0 ), etc.

Thus, although almost every Forth system contains such a method, there is no portable way for programs to obtain the latest definition. But a such portable method is actually very useful, as shown in my examples.

Solution

Let's introduce the following words:

  • LATEST-NAME-IN ( wid -- nt|0 )
  • LATEST-NAME ( -- nt )

The first word returns the name token for the definition whose name was placed most recently into the given word list, or zero if this word list is empty.

The second word returns the name token for the definition whose name was placed most recently into the compilation word list, or throws an exception if there is no such definition.

These words do not expose or limit any internal mechanism of the compiler. They just provide information about word lists, like the words FIND-NAME-IN, FIND-NAME, and TRAVERSE-WORDLIST do. It's a kind of introspection/reflection.

This words are intended for programs. The system may use them, but is not required to do so. The system may continue to use its internal LAST, LATEST, or whatever it was using before.

It seems, the best place for these words is the section 15.6.2 Programming-Tools extension words, where TRAVERSE-WORDLIST is also placed.

Rationale

Connection with word lists

By considering definitions in the frame of a word list only, we solve several problems, namely:

  1. A word list contains only completed definitions (see the accepted proposal #153 Traverse-wordlist does not find unnamed/unfinished definitions). This eliminates the question of whether the word of returned nt is finished — yes, it is always finished (completed).

  2. Nameless definitions are not considered since they are not placed into the compilation word list (regardless of whether the system creates a name token for them, or places them into an internal system-specific word list).

  3. An extension or library can create definitions in its internal word list for internal purposes. And it will not affect the compilation word list or other user-defined word lists. Thus, the user of such library always gets the expected result from latest-name (regardless of what words are created by this library for internal purposes on the fly). For example, when different dictionary spaces will be introduced, we can implement something like local variables (or local definitions) in portable way, and creating such a definition will not affect the value that latest-name returns.

Return values

As a matter of practice, almost all the use cases for the word LATEST-NAME imply that the requested definition exists, and if it doesn't exist, only an error can be reported. So the option to return 0 by this word only burdens users with having to analyze this zero, or redefine this word as:

: latest-name ( -- nt ) latest-name dup 0= -80 and throw ;

If the user needs to handle the case where the compilation word list is empty, they can use the word latest-name-in as:

get-current latest-name-in dup if ( nt ) ... else ( 0 ) drop ... then

Implementation options

If the the word list structure in a Forth system contains information about the latest placed definition, the implementations for the proposed words are trivial.

In some plausible Forth systems, the word list structure doesn't contain any information about the definition that was placed into this word list most recently. Such systems might not provide the proposed words, or they are changed to keep the mentioned information in the word list structure. It seems, in most systems the word list structure contains this information.

Some checked systems:

  • SwiftForth, VFX, Gforth, minForth, ikForth, SP-Forth — a word list keeps information about the definition that was placed in it most recently;
  • lxf/ntf 2017 — it seems, it doesn't keep this information.

If a system does not implement The optional Search-Order word set, it might not provide the word LATEST-NAME-IN.

Naming

The names LATEST-NAME-IN and LATEST-NAME of new words are similar to FIND-NAME-IN and FIND-NAME by the form. Stack effects are also similar.

The difference is that find is a verb, but latest is an adjective (or sometimes a noun, see Wiktionary). Both are historical in their use in naming words. As well as "NAME".

In Forth-84 "NAME" in word names denoted NFA (name field address), and now it denotes a name token, which is the successor of NFA. In all standard words, e.g. FIND-NAME, NAME>STRING, NAME>COMPILE, etc. (except PARSE-NAME), "NAME" denotes a name token.

NB: the term "token" in "name token" does not mean a character sequence! It's used in a general sense, like "something serving as an expression of something else" (see Wiktionary).

Throw code description

If the throw code description states that there is no latest name, it can be confusing since latest name in some sense probably always exists.

Therefore, it's better to say: "the compilation word list is empty" — it is what actually happens.

Motivation for LATEST-NAME-IN

  1. It's a natural factor for LATEST-NAME. It's always possible to extract this factor from the implementation of LATEST-NAME, because the latter returns nt from the compilation word list, and the system should take wid of the compilation word list and extract most recent nt from this word list.
  2. It's very important to specify the behavior of this word to avoid different behavior in different systems, since in many systems this word will exist (will be implemented as a natural factor).
  3. In some cases a program needs to check if a word list is empty, or obtain the latest word from a particular word list (for example, to use this word as entry point, like main, or as the default exported word from a module).
  4. These both words are optional. And if LATEST-NAME-IN is not provided, it can be implemented in a portable way via LATEST-NAME as:
    : latest-name-in ( wid -- nt|0 )
      get-current >r set-current
     ['] latest-name catch if 0 then
      r> set-current
    ;
    

Things to discuss

Is it worth introducing the word LATEST-NAME-XT ( -- xt )?

If name>interpret never returns 0 (see my comment), this word can be implemented as:

: latest-name-xt ( -- xt ) latest-name name>interpret ;

The desired (and much discussed) pattern is:

defer bar

: foo ... ; latest-name-xt is bar

Sometimes the name "it" has been suggested for this word, but this name is too short and has more chance for conflicts. Guido Draheim wrote in comp.lang.forth on 2003-03-16:

I think that everyone has been thinking of using IT for something really clever, it's a nice short word - and I'd say that we should leave it for application usage.

I want to support that argument also with real life experience in the telco world where there are a whole lot of abbreviations for various services, signals, connectors around. All too often now I see people making a SYNONYM at the file-start to get a second name for an ANS forth word that is needed in the implemenation but coincides with a common term of the application.

This seems convincing to me.

Typical use

: STRUCT: ( "name" -- wid.current.old u.offset )
  GET-CURRENT  VOCABULARY
  ALSO  LATEST-NAME NAME> EXECUTE  DEFINITIONS
  0
;
  \ In the application's vocabulary
  : IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

  DEFER FOO

  : BAR ... ; IT IS FOO

Proposal

Add the following line into the Table 9.1: THROW code assignments:

-80 the compilation word list is empty

Add the following sections into 15.6.2 Programming-Tools extension words:

15.6.2.2541 LATEST-NAME-IN

( wid -- nt|0 )
Remove the word list identifier wid from the stack. If the corresponding word list is empty, then return 0; otherwise, return the name token nt for the definition whose name was placed most recently into this word list.

15.6.2.2542 LATEST-NAME

( -- nt )
Return the name token nt for the definition whose name was placed most recently into the compilation word list, if such a definition exists. Otherwise, throw the exception code -80.

Reference implementation

In this implementation we assume that wid is an address that contains nt of the most recently placed definition name into the word list wid.

: LATEST-NAME-IN ( wid -- nt|0 ) @ ;

: LATEST-NAME ( -- nt )
  GET-CURRENT LATEST-NAME-IN  DUP IF EXIT THEN  -80 THROW
;

Testing

: IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

WORDLIST CONSTANT WL1

T{ : LN1 ; IT  ' LN1 =  -> TRUE }T
T{ GET-CURRENT LATEST-NAME-IN ' LN1 =  -> TRUE }T
T{ :NONAME [ IT ] LITERAL ; EXECUTE  ' LN1 =  -> TRUE }T
T{ : LN2 [ IT ] LITERAL ; LN2  ' LN1 =  -> TRUE }T
T{ WL1 LATEST-NAME-IN -> 0 }T
T{ GET-CURRENT WL1 SET-CURRENT ' LATEST-NAME CATCH SWAP SET-CURRENT -> -80 }T

[r1250] 2024-06-20 09:47:59 ruv replies:

proposal - New words: latest-name and latest-name-in

Author

Ruv

Change Log

  • 2023-10-22 Initial revision
  • 2023-10-23 Add testing, examples, a question to discuss, change the throw code description
  • 2023-10-27 Some rationales and explanations added, the throw code description changed back, better wording in some places
  • 2024-06-20 Fix some typos, make some wording and formatting better, add some examples and test cases, add motivation for LATEST-NAME-IN, change the status to "formal".
  • 2024-06-20 Add a test case to check that LATEST-NAME returns different value after the compilation word list is switched.

Problem

In some applications, mainly in libraries and extensions, the capability to obtain the most recently added definition is very useful and demanded.

To make such programs portable, we should introduce a standard method to obtain the most recently added word.

For example, if we are creating a library for decoration, tracing, support for OOP, simple DSLs (e.g., to describe Final State Machines), etc — it is always useful to have an accessor to the recent definition, instead of redefining a lot of words to define such an access method yourself, or juggling with the input buffer and search.

One simple example. If we want to have variables that are initialized by zero, we can use:

: var ( "name" -- )
  variable
  0  latest-name name> execute  !
;

A number of specific examples is provided in my post on ForthHub (those examples are not inserted here so as not to bloat the text).

And additionally, there has been much discussions regarding standardization of such a method in recent decades. For example, Elizabeth D. Rather wrote on 2011-12-09 in comp.lang.forth:

AFAIK most if not all Forths have some method for knowing the latest definition, it's kinda necessary. The problem is, that they all do it differently (at different times, in different forms, etc.), which is why it hasn't been possible to standardize it.

Although it's a system necessity, I haven't found this of much value in application programming.

Elizabeth D. Rather

It's true: depending on the system, an internal method can return the recent word regardless of the compilation word list, or depending on the compilation word list, a completed definition, or not yet completed definition, also unnamed definition, or only named definition, etc. The value in application programming is shown by me above.

Some known internal methods: latest ( -- nt|0 ), last @ ( -- nt|0 ), latestxt ( -- xt|0 ), etc.

Thus, although almost every Forth system contains such a method, there is no portable way for programs to obtain the latest definition. But a such portable method is actually very useful, as shown in my examples.

Solution

Let's introduce the following words:

  • LATEST-NAME-IN ( wid -- nt|0 )
  • LATEST-NAME ( -- nt )

The first word returns the name token for the definition whose name was placed most recently into the given word list, or zero if this word list is empty.

The second word returns the name token for the definition whose name was placed most recently into the compilation word list, or throws an exception if there is no such definition.

These words do not expose or limit any internal mechanism of the compiler. They just provide information about word lists, like the words FIND-NAME-IN, FIND-NAME, and TRAVERSE-WORDLIST do. It's a kind of introspection/reflection.

This words are intended for programs. The system may use them, but is not required to do so. The system may continue to use its internal LAST, LATEST, or whatever it was using before.

It seems, the best place for these words is the section 15.6.2 Programming-Tools extension words, where TRAVERSE-WORDLIST is also placed.

Rationale

Connection with word lists

By considering definitions in the frame of a word list only, we solve several problems, namely:

  1. A word list contains only completed definitions (see the accepted proposal #153 Traverse-wordlist does not find unnamed/unfinished definitions). This eliminates the question of whether the word of returned nt is finished — yes, it is always finished (completed).

  2. Nameless definitions are not considered since they are not placed into the compilation word list (regardless of whether the system creates a name token for them, or places them into an internal system-specific word list).

  3. An extension or library can create definitions in its internal word list for internal purposes. And it will not affect the compilation word list or other user-defined word lists. Thus, the user of such library always gets the expected result from latest-name (regardless of what words are created by this library for internal purposes on the fly). For example, when different dictionary spaces will be introduced, we can implement something like local variables (or local definitions) in portable way, and creating such a definition will not affect the value that latest-name returns.

Return values

As a matter of practice, almost all the use cases for the word LATEST-NAME imply that the requested definition exists, and if it doesn't exist, only an error can be reported. So the option to return 0 by this word only burdens users with having to analyze this zero, or redefine this word as:

: latest-name ( -- nt ) latest-name dup 0= -80 and throw ;

If the user needs to handle the case where the compilation word list is empty, they can use the word latest-name-in as:

get-current latest-name-in dup if ( nt ) ... else ( 0 ) drop ... then

Implementation options

If the the word list structure in a Forth system contains information about the latest placed definition, the implementations for the proposed words are trivial.

In some plausible Forth systems, the word list structure doesn't contain any information about the definition that was placed into this word list most recently. Such systems might not provide the proposed words, or they are changed to keep the mentioned information in the word list structure. It seems, in most systems the word list structure contains this information.

Some checked systems:

  • SwiftForth, VFX, Gforth, minForth, ikForth, SP-Forth — a word list keeps information about the definition that was placed in it most recently;
  • lxf/ntf 2017 — it seems, it doesn't keep this information.

If a system does not implement The optional Search-Order word set, it might not provide the word LATEST-NAME-IN.

Naming

The names LATEST-NAME-IN and LATEST-NAME of new words are similar to FIND-NAME-IN and FIND-NAME by the form. Stack effects are also similar.

The difference is that find is a verb, but latest is an adjective (or sometimes a noun, see Wiktionary). Both are historical in their use in naming words. As well as "NAME".

In Forth-84 "NAME" in word names denoted NFA (name field address), and now it denotes a name token, which is the successor of NFA. In all standard words, e.g. FIND-NAME, NAME>STRING, NAME>COMPILE, etc. (except PARSE-NAME), "NAME" denotes a name token.

NB: the term "token" in "name token" does not mean a character sequence! It's used in a general sense, like "something serving as an expression of something else" (see Wiktionary).

Throw code description

If the throw code description states that there is no latest name, it can be confusing since latest name in some sense probably always exists.

Therefore, it's better to say: "the compilation word list is empty" — it is what actually happens.

Motivation for LATEST-NAME-IN

  1. It's a natural factor for LATEST-NAME. It's always possible to extract this factor from the implementation of LATEST-NAME, because the latter returns nt from the compilation word list, and the system should take wid of the compilation word list and extract most recent nt from this word list.
  2. It's very important to specify the behavior of this word to avoid different behavior in different systems, since in many systems this word will exist (will be implemented as a natural factor).
  3. In some cases a program needs to check if a word list is empty, or obtain the latest word from a particular word list (for example, to use this word as entry point, like main, or as the default exported word from a module).
  4. These both words are optional. And if LATEST-NAME-IN is not provided, it can be implemented in a portable way via LATEST-NAME as:
    : latest-name-in ( wid -- nt|0 )
      get-current >r set-current
     ['] latest-name catch if 0 then
      r> set-current
    ;
    

Things to discuss

Is it worth introducing the word LATEST-NAME-XT ( -- xt )?

If name>interpret never returns 0 (see my comment), this word can be implemented as:

: latest-name-xt ( -- xt ) latest-name name>interpret ;

The desired (and much discussed) pattern is:

defer bar

: foo ... ; latest-name-xt is bar

Sometimes the name "it" has been suggested for this word, but this name is too short and has more chance for conflicts. Guido Draheim wrote in comp.lang.forth on 2003-03-16:

I think that everyone has been thinking of using IT for something really clever, it's a nice short word - and I'd say that we should leave it for application usage.

I want to support that argument also with real life experience in the telco world where there are a whole lot of abbreviations for various services, signals, connectors around. All too often now I see people making a SYNONYM at the file-start to get a second name for an ANS forth word that is needed in the implemenation but coincides with a common term of the application.

This seems convincing to me.

Typical use

: STRUCT: ( "name" -- wid.current.old u.offset )
  GET-CURRENT  VOCABULARY
  ALSO  LATEST-NAME NAME> EXECUTE  DEFINITIONS
  0
;
  \ In the application's vocabulary
  : IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

  DEFER FOO

  : BAR ... ; IT IS FOO

Proposal

Add the following line into the Table 9.1: THROW code assignments:

-80 the compilation word list is empty

Add the following sections into 15.6.2 Programming-Tools extension words:

15.6.2.2541 LATEST-NAME-IN

( wid -- nt|0 )
Remove the word list identifier wid from the stack. If the corresponding word list is empty, then return 0; otherwise, return the name token nt for the definition whose name was placed most recently into this word list.

15.6.2.2542 LATEST-NAME

( -- nt )
Return the name token nt for the definition whose name was placed most recently into the compilation word list, if such a definition exists. Otherwise, throw the exception code -80.

Reference implementation

In this implementation we assume that wid is an address that contains nt of the most recently placed definition name into the word list wid.

: LATEST-NAME-IN ( wid -- nt|0 ) @ ;

: LATEST-NAME ( -- nt )
  GET-CURRENT LATEST-NAME-IN  DUP IF EXIT THEN  -80 THROW
;

Testing

: IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

WORDLIST CONSTANT WL1

T{ : LN1 ; IT  ' LN1 =  -> TRUE }T
T{ GET-CURRENT LATEST-NAME-IN ' LN1 =  -> TRUE }T
T{ :NONAME [ IT ] LITERAL ; EXECUTE  ' LN1 =  -> TRUE }T
T{ : LN2 [ IT ] LITERAL ; LN2  ' LN1 =  -> TRUE }T
T{ WL1 LATEST-NAME-IN -> 0 }T
GET-CURRENT WL1 SET-CURRENT ( wid.prev )
T{ ' LATEST-NAME CATCH -> -80 }T
T{ : LN3 ;  -> }T
SET-CURRENT
T{ IT ' LN2 = -> TRUE }T

[r1251] 2024-06-20 11:03:25 ruv replies:

proposal - New words: latest-name and latest-name-in

Author

Ruv

Change Log

  • 2023-10-22 Initial revision
  • 2023-10-23 Add testing, examples, a question to discuss, change the throw code description
  • 2023-10-27 Some rationales and explanations added, the throw code description changed back, better wording in some places
  • 2024-06-20 Fix some typos, make some wording and formatting better, add some examples and test cases, add motivation for LATEST-NAME-IN, change the status to "formal".
  • 2024-06-20 Add a test case to check that LATEST-NAME returns different value after the compilation word list is switched.
  • 2024-06-20 Simplify the normative text description, and add a rationale for this simplification.

Problem

In some applications, mainly in libraries and extensions, the capability to obtain the most recently added definition is very useful and demanded.

To make such programs portable, we should introduce a standard method to obtain the most recently added word.

For example, if we are creating a library for decoration, tracing, support for OOP, simple DSLs (e.g., to describe Final State Machines), etc — it is always useful to have an accessor to the recent definition, instead of redefining a lot of words to define such an access method yourself, or juggling with the input buffer and search.

One simple example. If we want to have variables that are initialized by zero, we can use:

: var ( "name" -- )
  variable
  0  latest-name name> execute  !
;

A number of specific examples is provided in my post on ForthHub (those examples are not inserted here so as not to bloat the text).

And additionally, there has been much discussions regarding standardization of such a method in recent decades. For example, Elizabeth D. Rather wrote on 2011-12-09 in comp.lang.forth:

AFAIK most if not all Forths have some method for knowing the latest definition, it's kinda necessary. The problem is, that they all do it differently (at different times, in different forms, etc.), which is why it hasn't been possible to standardize it.

Although it's a system necessity, I haven't found this of much value in application programming.

Elizabeth D. Rather

It's true: depending on the system, an internal method can return the recent word regardless of the compilation word list, or depending on the compilation word list, a completed definition, or not yet completed definition, also unnamed definition, or only named definition, etc. The value in application programming is shown by me above.

Some known internal methods: latest ( -- nt|0 ), last @ ( -- nt|0 ), latestxt ( -- xt|0 ), etc.

Thus, although almost every Forth system contains such a method, there is no portable way for programs to obtain the latest definition. But a such portable method is actually very useful, as shown in my examples.

Solution

Let's introduce the following words:

  • LATEST-NAME-IN ( wid -- nt|0 )
  • LATEST-NAME ( -- nt )

The first word returns the name token for the definition whose name was placed most recently into the given word list, or zero if this word list is empty.

The second word returns the name token for the definition whose name was placed most recently into the compilation word list, or throws an exception if there is no such definition.

These words do not expose or limit any internal mechanism of the compiler. They just provide information about word lists, like the words FIND-NAME-IN, FIND-NAME, and TRAVERSE-WORDLIST do. It's a kind of introspection/reflection.

This words are intended for programs. The system may use them, but is not required to do so. The system may continue to use its internal LAST, LATEST, or whatever it was using before.

It seems, the best place for these words is the section 15.6.2 Programming-Tools extension words, where TRAVERSE-WORDLIST is also placed.

Rationale

Connection with word lists

By considering definitions in the frame of a word list only, we solve several problems, namely:

  1. A word list contains only completed definitions (see the accepted proposal #153 Traverse-wordlist does not find unnamed/unfinished definitions). This eliminates the question of whether the word of returned nt is finished — yes, it is always finished (completed).

  2. Nameless definitions are not considered since they are not placed into the compilation word list (regardless of whether the system creates a name token for them, or places them into an internal system-specific word list).

  3. An extension or library can create definitions in its internal word list for internal purposes. And it will not affect the compilation word list or other user-defined word lists. Thus, the user of such library always gets the expected result from latest-name (regardless of what words are created by this library for internal purposes on the fly). For example, when different dictionary spaces will be introduced, we can implement something like local variables (or local definitions) in portable way, and creating such a definition will not affect the value that latest-name returns.

Return values

As a matter of practice, almost all the use cases for the word LATEST-NAME imply that the requested definition exists, and if it doesn't exist, only an error can be reported. So the option to return 0 by this word only burdens users with having to analyze this zero, or redefine this word as:

: latest-name ( -- nt ) latest-name dup 0= -80 and throw ;

If the user needs to handle the case where the compilation word list is empty, they can use the word latest-name-in as:

get-current latest-name-in dup if ( nt ) ... else ( 0 ) drop ... then

Implementation options

If the the word list structure in a Forth system contains information about the latest placed definition, the implementations for the proposed words are trivial.

In some plausible Forth systems, the word list structure doesn't contain any information about the definition that was placed into this word list most recently. Such systems might not provide the proposed words, or they are changed to keep the mentioned information in the word list structure. It seems, in most systems the word list structure contains this information.

Some checked systems:

  • SwiftForth, VFX, Gforth, minForth, ikForth, SP-Forth — a word list keeps information about the definition that was placed in it most recently;
  • lxf/ntf 2017 — it seems, it doesn't keep this information.

If a system does not implement The optional Search-Order word set, it might not provide the word LATEST-NAME-IN.

Naming

The names LATEST-NAME-IN and LATEST-NAME of new words are similar to FIND-NAME-IN and FIND-NAME by the form. Stack effects are also similar.

The difference is that find is a verb, but latest is an adjective (or sometimes a noun, see Wiktionary). Both are historical in their use in naming words. As well as "NAME".

In Forth-84 "NAME" in word names denoted NFA (name field address), and now it denotes a name token, which is the successor of NFA. In all standard words, e.g. FIND-NAME, NAME>STRING, NAME>COMPILE, etc. (except PARSE-NAME), "NAME" denotes a name token.

NB: the term "token" in "name token" does not mean a character sequence! It's used in a general sense, like "something serving as an expression of something else" (see Wiktionary).

Normative text description

The proposed normative text description is based on:

  • 16.2: "compilation word list: The word list into which new definition names are placed",
  • 15.3.1: "A name token is a single-cell value that identifies a named word",
  • 3.4.3: "[Semantics] are largely specified by the stack notation in the glossary entries, which shows what values shall be consumed and produced. The prose in each glossary entry further specifies the definition's behavior" (there is no need to repeat in the text description what is already indicated in the stack diagrams). (emphasis added)

Throw code description

If the throw code description states that there is no latest name, it can be confusing since latest name in some sense probably always exists.

Therefore, it's better to say: "the compilation word list is empty" — it is what actually happens.

Motivation for LATEST-NAME-IN

  1. It's a natural factor for LATEST-NAME. It's always possible to extract this factor from the implementation of LATEST-NAME, because the latter returns nt from the compilation word list, and the system should take wid of the compilation word list and extract most recent nt from this word list.
  2. It's very important to specify the behavior of this word to avoid different behavior in different systems, since in many systems this word will exist (will be implemented as a natural factor).
  3. In some cases a program needs to check if a word list is empty, or obtain the latest word from a particular word list (for example, to use this word as entry point, like main, or as the default exported word from a module).
  4. These both words are optional. And if LATEST-NAME-IN is not provided, it can be implemented in a portable way via LATEST-NAME as:
    : latest-name-in ( wid -- nt|0 )
      get-current >r set-current
     ['] latest-name catch if 0 then
      r> set-current
    ;
    

Things to discuss

Is it worth introducing the word LATEST-NAME-XT ( -- xt )?

If name>interpret never returns 0 (see my comment), this word can be implemented as:

: latest-name-xt ( -- xt ) latest-name name>interpret ;

The desired (and much discussed) pattern is:

defer bar

: foo ... ; latest-name-xt is bar

Sometimes the name "it" has been suggested for this word, but this name is too short and has more chance for conflicts. Guido Draheim wrote in comp.lang.forth on 2003-03-16:

I think that everyone has been thinking of using IT for something really clever, it's a nice short word - and I'd say that we should leave it for application usage.

I want to support that argument also with real life experience in the telco world where there are a whole lot of abbreviations for various services, signals, connectors around. All too often now I see people making a SYNONYM at the file-start to get a second name for an ANS forth word that is needed in the implemenation but coincides with a common term of the application.

This seems convincing to me.

Typical use

: STRUCT: ( "name" -- wid.current.old u.offset )
  GET-CURRENT  VOCABULARY
  ALSO  LATEST-NAME NAME> EXECUTE  DEFINITIONS
  0
;
  \ In the application's vocabulary
  : IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

  DEFER FOO

  : BAR ... ; IT IS FOO

Proposal

Add the following line into the Table 9.1: THROW code assignments:

-80 the compilation word list is empty

Add the following sections into 15.6.2 Programming-Tools extension words:

15.6.2.xxxx LATEST-NAME-IN TOOLS EXT

( wid -- nt|0 )
If the word list identified by wid is empty, then the returned value is 0; otherwise, the name token nt identifies the definition whose name was placed most recently into the word list wid.

15.6.2.xxxx LATEST-NAME TOOLS EXT

( -- nt )
If the compilation word list is not empty, the name token nt identifies the definition whose name was placed most recently into this word list. Otherwise, the exception code -80 is thrown.

Reference implementation

In this implementation we assume that wid is an address that contains nt of the most recently placed definition name into the word list wid.

: LATEST-NAME-IN ( wid -- nt|0 ) @ ;

: LATEST-NAME ( -- nt )
  GET-CURRENT LATEST-NAME-IN  DUP IF EXIT THEN  -80 THROW
;

Testing

: IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

WORDLIST CONSTANT WL1

T{ : LN1 ; IT  ' LN1 =  -> TRUE }T
T{ GET-CURRENT LATEST-NAME-IN ' LN1 =  -> TRUE }T
T{ :NONAME [ IT ] LITERAL ; EXECUTE  ' LN1 =  -> TRUE }T
T{ : LN2 [ IT ] LITERAL ; LN2  ' LN1 =  -> TRUE }T
T{ WL1 LATEST-NAME-IN -> 0 }T
GET-CURRENT WL1 SET-CURRENT ( wid.prev )
T{ ' LATEST-NAME CATCH -> -80 }T
T{ : LN3 ;  -> }T
SET-CURRENT
T{ IT ' LN2 = -> TRUE }T