Proposal: Forth Standards, Backward Compatibility and modern `state of the art` words with their `historical traditional` counterparts.

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

EkkehardSkirlavatar of EkkehardSkirl [424] Forth Standards, Backward Compatibility and modern `state of the art` words with their `historical traditional` counterparts.Proposal2026-02-12 09:39:50

Author:

Ekkehard Skirl

Change Log:

  • 2026-02-12 10-05 CET: Initial proposal
  • 2026-02-12 07-07 CET: Created as working draft

Problem:

For not longtime involved users the standard may look a little bit confusing. Especially the existence of modern state of the art words and their historical traditional counterparts hard to find out. This is caused by the kind of backward compatibility the standard committee is practising but could be more clearly handled in the standards.

P.S. All this is the result of my current personal experience with and understanding of the standard and forth systems. So maybe I am wrong and this is not a problem at all. But I think it is worth to be discussed and maybe improved.

Solution:

Creating backward compatibility word sets and move the historical traditional words to there, if modern state of the art word pendants exist.

So the now existing word sets deal with the state of the art words only.

If a forth system claims to be forth standard XY conform the backward compatibility VW word sets may be absent. But if it claims to be forth standard VW conform too it has to provide the fitting backward compatibility word sets additional.

If a forth program is for instance forth standard XY conform and wants to run on a less standard system it could add the words of the fitting backward compatibility word sets to the system and shall run.

Thus the standard shall provide wrapping definitions for the historical traditional words as adapters using the modern state of the art words. These wrappers shall exist as one possible solution proposal and shall not claim to be the best solution in any situation.

So my proposal is to have two backward compatibility word sets for the 1994 and the 2012 standards called STANDARD94 and STANDARD2012.

One advantage is, too, to make it more clear and easy to give words better and more speechable names from standard to standard providing their historical traditional names as aliases inside the backward compatibility word sets.

This concept offers additional the possibility of having words with the same useful names with different state of the art definitions. But maybe this brings more problems than it gives a positive effort.

Typical use:

Many of the historical traditional words can be reconstructed using the more modern state of the art words to demonstrate flexibility and implementation possibilities. So here are some examples.

Comparison of Traditional and Modern Standard Words

Category Classic Word (Legacy) Modern Word (Forth 2012) Advantage of the Modern Variant
Parsing WORD PARSE-NAME Returns addr u; no need to copy to the HERE buffer.
Dictionary FIND FIND-NAME (or SEARCH-WORDLIST) Returns a name token (nt) instead of an xt with a flag; cleaner.
Introspection >NAME and >BODY NAME>XT and NAME>INTERPRET Works with tokens instead of direct address offsets.
Header Data ID. (Prints Name) NAME>STRING (then TYPE) Separates name finding from output.
Word Lists CONTEXT and CURRENT GET-ORDER and SET-CURRENT Supports multiple word lists (namespaces) more cleanly.
Compilation [COMPILE] POSTPONE POSTPONE is smarter and replaces almost all special cases.
Memory ALLOT ALIGN and ALLOCATE Improved support for data alignment in RAM.

Dictionary Search: FIND vs. FIND-NAME

FIND is a traditional term that searches directly in the dictionary. It could be replaced by a combination of more modern terms like SEARCH-WORDLIST and NAME>XT to achieve the same functionality.

The old FIND is notorious for its "odd" return value (-1, 0, 1) and because it expects a counted string.

Code snippet for FIND using more modern terms:

: FIND ( c-addr -- c-addr 0 | xt 1 | xt -1 )
    DUP COUNT FIND-NAME ( c-addr nt|0 )
    DUP 0= IF EXIT THEN \ Not Found
    NIP \ c-addr removed
    DUP NAME>XT SWAP \ fetch xt, then check nt
    NAME-IMMEDIATE? IF 1 ELSE -1 THEN ;

Dictionary Manipulation: LATEST vs. GET-CURRENT

Previously, LATEST was a variable that could be easily read using @. However, this assumes there is only one word list. In modern systems with multiple word lists (e.g., for different modules or libraries), GET-CURRENT is a function that returns the ID of the current word list, which is more flexible and clearer.

Code snippet for LATEST with GET-CURRENT:

: LATEST ( -- addr )
    GET-CURRENT ; \ Returns the ID, which is functionally equivalent to LATEST

Name Strings: WORD vs. PARSE-NAME

WORD writes the result to a fixed buffer (HERE), which often leads to problems when reading the next word and overwriting the previous one. PARSE-NAME, on the other hand, directly returns the address and length of the word in the input buffer (TIB) without making a copy, which is more efficient and safer.

Code snippet for WORD with more modern words:

: WORD ( char -- c-addr )
    PARSE HERE ( addr len ) \ Reads up to the delimiter after HERE
    \ (The length byte logic of WORD would need to be added here)
;

Proposal:

See Solution:.

TBD.

Reference implementation:

Not applicable or TBD.

Testing: (Optional)

Not applicable.

ruvavatar of ruv

Creating backward compatibility word sets and move the historical traditional words to there, if modern state of the art word pendants exist.

If you are considering creating a new section (a word set) in the document, please note that proposals to move a word between sections usually do not find sufficient support. Regarding obsolescent words, they are usually excluded from the document on the next iteration. I don't think we should create a new normative section (a word set) containing specifications for obsolescent word.

Dictionary Manipulation: LATEST vs. GET-CURRENT
Previously, LATEST was a variable that could be easily read using @.

Probably, you mean the word current.

In FIG-Forth, the word latest returns an NFA (Name Field Area, now name token), not an address of the NFA (as I recently mentioned). The word last traditionally returns an address that contains NFA, see Forth-83, the section "UNCONTROLLED REFERENCE WORDS" (link).

EkkehardSkirlavatar of EkkehardSkirl

What I'd like to propose for discussion is a more future-oriented approach to the standard (what I mean by "state of the art"), incorporating all the positive experiences and progressive insights gained, without clinging too tightly to the past.

However, to ensure that the past is also adequately addressed, the standard should provide a framework for integrating it when needed.

So, I envision the standard having two parts: the current standardizations and, secondly, the option to bridge the gap to the past.

And sometimes I sense a kind of fear, between the lines, that it will be seen as "bad" if a system doesn't conform to Standard 999 but only to Standard 666. But Standard 666 is also perfectly fine if it fulfills its purpose, even if sometimes with somewhat more "complex" methods, isn't it?

And programs that conform to standard 666 might be considered inferior to those that conform to standard 999. But isn't the primary goal of programming a program that fulfills its tasks, not necessarily adhering to the latest standard at all costs?

The last two paragraphs were more philosophical in nature.

I can only suggest something like this. Whether such a committee then considers such modern approaches is beyond my control. And I realize that this will likely be a very long-term undertaking that can't be implemented in the very next standard.

ruvavatar of ruv

The last two paragraphs were more philosophical in nature.

Yes, and the ideas are quite general. Could you please identify and describe the first specific change in the document of the Standard? The smaller the change, the better.

EkkehardSkirlavatar of EkkehardSkirl

And sometimes I sense a kind of fear, between the lines,

This feeling resonates with me in discussions when it's argued that system A does things this way and system B reacts in a certain way, or when a decision is attributed to how things are usually done. The focus on the past always carries so much weight.

Could you please identify and describe the first specific change

My first idea was to add an Pargraph with a list as overview and than later down describe all the resources and token the standard handles with on an absract implementation independent level.

Than from there it could be possible to isolate the resources and token an abstract Forth kernel shall have to be a Forth system in the manner of the forth standard. Of course there may be more other systems that deal in other ways. But this shows "only" that the abstraction isn't abstarct enough or that this is considered not as a topic the standard wants to handle.

This may be considered as an implementation independent description what a forth system is at an abstract level.

A first collection at a draft level could look like follows. The version is only for my personal knowledge of my thoughts state. The '+' sign tells that this token are part of a structur token

I'm shure that this is not complete due to my not always sufficiently in-depth knowledge. And some things are viewed from the perspective of someone interested in Forth, who misses a kind of system-thematic overview presentation as a "research anchor".

A draft of a Forth system abstraction (Version 0.4)

Forth System Components

  • Forth Stack system
    • Data-Stack token (former called 'parameter stack' too)
      • Data-Stack-Entry token
    • Return-Stack token
      • Return-Stack-Entry token
    • Control-Stack token
      • Control-Stack-Entry token
    • Float-Stack token
      • Float-Stack-Entry token
    • Exeption-Stack token
      • Exeption-Stack-Entry token
    • Input-Source-Stack token
      • Input-Source-Stack-Entry token
  • Forth Word system
    • Word token
      • Word-Header token
        • +Word-Name token
        • +Word-Link token
        • +Word-State token (former called 'flags' too)
          • +Word-Execution-State
          • +Word-Visibility-State
        • +Word-Behavior token (former called 'codefield')
      • Word-Data token
  • Forth Dictonary system
    • Dictionary token
      • Dictionary-Entry token
  • Forth Input-Source system
    • String-Literal Input-Source token
    • User-Input-Device Input-Source token
    • File-Input Input-Source token
      • +File-Descriptor token
      • +File-Position token
      • +File-Buffer token
        • +Buffer-Address token
        • +Buffer-Length token
        • +Buffer-Position token
    • Stream-Input Input-Source token
  • Forth String-Token-Type system
    • Dictionary-Word-String token
    • Numerical-String system
      • Integer-String system
        • Half-Integer-String token
        • Single-Integer-String token
        • Double-Integer-String token
        • Quad-Integer-String token
      • Float-String system
        • Half-Float-String token
        • Single-Float-String token
        • Double-Float-String token
        • Quad-Float-String token
  • Forth Numerical-Value system
    • Integer-Value system
      • Half-Integer token
      • Single-Integer token
      • Double-Integer token
      • Quad-Integer token
    • Float-Value system
      • Half-Float token
      • Single-Float token
      • Double-Float token
      • Qaud-Float token

Forth system components are considered to have at least one member component.

A Forth system shall define all the needed components by mapping them to the underlying system any way direct or indierct by using other components.

Forth Token

A token is considered as a collection of at least one a token.

A Forth system shall define all the needed token.

Forth Basic Components

Forth basic components are considered as token itself.

  • Forth character token (CHAR)
    • Forth character-string token
      • Forth counted-string token
      • Forth zero-terminated-string token
      • Forth descriptor-string token
  • Forth datacell token (CELL)
  • Forth pointer token
  • Forth boolean token
    • Forth true token
    • Forth false token

A base consideration is that a Forth datacell token is a multiple of a Forth character token.

So a datacell token may considered to represent a fixed count (the multiple) of charakter token.

A Forth pointer token holds the information from where to get a token and its served informations.

Forth Basic Components may be considered as bricks to build the building of an Abstract Forth Kernel and the levels of Forth Systems built upon it.

All the needed Forth Basic Components shall be mapped to the underlying system any way.

Forth Resources

To organize components and token they may considered as logical ressource spaces. This spaces are not considered to be contigous but may of course.

  • Dictionary-System Space: Dictionary Space collection
    • Dictionary Space: Word-Header Token Collection
  • Data-System Space: Data Space Collection
    • Word-Data Space: Word-Data Token Collection
  • Character-String Space: String-Space Collection
    • Name-String Space: Word-Name Token Collection
    • Data-String Space: Character-String Collection
    • Transient-String Space: Character-String Collection
  • Behavior Space: Word-Behavior Token Collection
  • Stack Space: Stack Space Collection
    • Data Stack Space: Data-Stack-Entry Token Collection
    • Return Stack Space: Return-Stack-Entry Token Collection
    • Control Stack Space: Control-Stack-Entry Token Collection
    • Float Stack Space: Float-Stack-Entry Token Collection
    • Input-Source Space: Input-Source-Stack-Entry Token Collection
  • Input Space: Input-Source Collection
    • Text_String Space: Character-String Token Collection
    • User-Input Space: Terminal-input-Buffer-String Token Collection
    • Text-File Space: Text-File-Buffer-String Token Collection
    • Text-Stream Space: Stream-Input-Buffer-String Token Collection

EkkehardSkirlavatar of EkkehardSkirl

The second chapter I would introduce is an optional wordset KERNEL that provides missing words to describe a functioning Forth-Kernel in Forth words inside a dictionary.

These are for instance words to handle the stacks (push-xx, pop-xx, constants to reset or something else), the missing interpret word and the words associated with it, including the group of recognizer words to define the interpre word and others.

One goal at all from possible multiple more is to be able to describe a standard Forth system based on a Forth kernel, not as the optimal way to do but as an showcase, that fits all the needs the standard prescribes and requires.

The third step could be to reorder the word groups by making modern more state of the art words mandatory and elder words optional. But this is a big step into the future to make history backpacks smaler.

EkkehardSkirlavatar of EkkehardSkirl

One unsystematic aspect, in my opinion, is that there are words with three, two, or one behavior (interpreting, compiling, run-time). On the other hand, there are groups of words that represent this behavior in separate, distinct words. The standard should address one version here.

However, I do think it makes sense to separate the interpreting behavior from the compiling and run-time behavior. This would certainly simplify the work of the text interpreter in its three planned states (interpreting, compiling as immediate conditional compilation, and postponing as unconditional compilation).

EkkehardSkirlavatar of EkkehardSkirl

Addendum:

According to the last thought, the Word Behavior token should have three sub-tokens:

  • +Word Behavior token (formerly called 'codefield')
    • +Word Interpreting Behavior token
    • +Word Conditional Compiling Behavior token (compiling)
    • +Word Unconditional Compiling Behavior token (postponing)
Reply New Version