ALLOT

( n -- )

If n is greater than zero, reserve n address units of data space. If n is less than zero, release | n | address units of data space. If n is zero, leave the data-space pointer unchanged.

If the data-space pointer is aligned and n is a multiple of the size of a cell when ALLOT begins execution, it will remain aligned when ALLOT finishes execution.

If the data-space pointer is character aligned and n is a multiple of the size of a character when ALLOT begins execution, it will remain character aligned when ALLOT finishes execution.

See:

Testing:

HERE 1 ALLOT
HERE
CONSTANT 2NDA
CONSTANT 1STA
T{ 1STA 2NDA U< -> <TRUE> }T    \ HERE MUST GROW WITH ALLOT
T{      1STA 1+ ->   2NDA }T    \ ... BY ONE ADDRESS UNIT
( MISSING TEST: NEGATIVE ALLOT )

ContributeContributions

TG9541avatar of TG9541 ALLOT in ROMable systemsRequest for clarification2021-04-18 09:07:57

Problem statement

ALLOT assumes that the data space for mutable data and for compiled code is one and the same. This is a problem for ROMable systems (i.e. the dictionary is compiled to ROM).

Examples:

The compilation semantic of CREATE doesn't indicate whether "data space" should be mutable or not. The problem could potentially be solved by ALLOT (usage of which implies that data should be mutable).

Data Space is mutable

\ define a standard variable
VARIABLE x

\ create an array
CREATE array 10 ALLOT
: 

#### Data Space is immutable:

\ the 10 first prime numbers CREATE primes 2 , 3 , 5 , 7, 11 , 13 , 17 , 19 , 23 , 29 ,

\ defining word for a table based interpolation using @inter : INTER ( u[x y] u "name" -- ) CREATE DUP , 0 DO SWAP , , LOOP DOES> @inter ;


### Discussion

The problem of defining the intent of memory allotment could potentially be solved by `ALLOT` (assuming that it indicates that data should be mutable). This is, however, not compliant with the standard definition of that word. In fact the test defined in `ALLOT` (which is normative, I suppose) is based with the assumption that the dictionary is in RAM:

HERE 1 ALLOT HERE CONSTANT 2NDA CONSTANT 1STA T{ 1STA 2NDA U< -> }T \ HERE MUST GROW WITH ALLOT T{ 1STA 1+ -> 2NDA }T \ ... BY ONE ADDRESS UNIT ( MISSING TEST: NEGATIVE ALLOT )


It's also possible to use `VARIABLE` to indicate the intent of a data space allotment:

\ define a mutable array with 10 chars VARIABLE carray 4 CELLS ALLOT


This idiomatic definition of mutable arrays appears not to be precluded by the tests in the standard:

T{ VARIABLE V1 -> }T T{ 123 V1 ! -> }T T{ V1 @ -> 123 }T

```

FlashForth uses data space prefixes (i.e. ram, eeprom) to solve the problem. This requires, however, to first amend any source code, including code from a packet repository.

The solution in STM8 eForth is to use the (known) compilation target semantics of the Forth system (i.e. RAM or NVM for "Non Volatile Memory") to compile to ROM or to RAM (executable data space), ALLOT memory beneath the code or in a RAM block reserved for mutable data , and to compile a matching run-time word (either dovar or dovarptr).

To sum it up:

  • the tests given in ALLOT assume an implementation that's not ROMable
  • in ROMable systems "data space" is ambiguous with respect to use cases of CREATE
  • there is a potential to infer the intended properties of the data space from the idiomatic use which might be used to write portable code

It's of course also possible to define ROMable systems as out-of-scope for a Forth Standard (and maybe in the scope of a different standard?).

Clarification or guidance is highly appreciated.

TG9541avatar of TG9541New Version

Hide differences

Problem statement

ALLOT assumes that the data space for mutable data and for compiled code is one and the same. This is a problem for ROMable systems (i.e. the dictionary is compiled to ROM).

Examples:

The compilation semantic of CREATE doesn't indicate whether "data space" should be mutable or not. The problem could potentially be solved by ALLOT (usage of which implies that data should be mutable).

Data Space is mutable

\ define a standard variable
VARIABLE x

\ create an array
CREATE array 10 ALLOT
: 

Data Space is immutable:

\ the 10 first prime numbers
CREATE primes 2 , 3 , 5 , 7, 11 , 13 , 17 , 19 , 23 , 29 ,

\ defining word for a table based interpolation using @inter
: INTER ( u[x y] u  "<spaces>name" -- ) CREATE DUP , 0 DO SWAP , , LOOP DOES> @inter ;

Discussion

The problem of defining the intent of memory allotment could potentially be solved by ALLOT (assuming that it indicates that data should be mutable). This is, however, not compliant with the standard definition of that word. In fact the test defined in ALLOT (which is normative, I suppose) is based with the assumption that the dictionary is in RAM:

HERE 1 ALLOT
HERE
CONSTANT 2NDA
CONSTANT 1STA
T{ 1STA 2NDA U< -> <TRUE> }T    \ HERE MUST GROW WITH ALLOT
T{      1STA 1+ ->   2NDA }T    \ ... BY ONE ADDRESS UNIT
( MISSING TEST: NEGATIVE ALLOT ) 

It's also possible to use VARIABLE to indicate the intent of a data space allotment:

\  define a mutable array with 10 chars
VARIABLE carray 4 CELLS ALLOT

This idiomatic definition of mutable arrays appears not to be precluded by the tests in the standard:

T{ VARIABLE V1 ->     }T
T{    123 V1 ! ->     }T
T{        V1 @ -> 123 }T

FlashForth uses data space prefixes (i.e. ram, eeprom) to solve the problem. This requires, however, to first amend any source code, including code from a packet repository.

The solution in STM8 eForth is to use the (known) compilation target semantics of the Forth system (i.e. RAM or NVM for "Non Volatile Memory") to compile to ROM or to RAM (executable data space), ALLOT memory beneath the code or in a RAM block reserved for mutable data , and to compile a matching run-time word (either dovar or dovarptr).

To sum it up:

  • the tests given in ALLOT assume an implementation that's not ROMable
  • in ROMable systems "data space" is ambiguous with respect to use cases of CREATE
  • there is a potential to infer the intended properties of the data space from the idiomatic use which might be used to write portable code

It's of course also possible to define ROMable systems as out-of-scope for a Forth Standard (and maybe in the scope of a different standard?).

Clarification or guidance is highly appreciated.

StephenPelcavatar of StephenPelc

The most complete discussion of these topics is given by the documents written by Stephen Pelc (MPE) and Elizabeth Rather (Forth Inc) for the OTA project in the late 1990s. The documents are a result of a drive to source compatibility between the MPE and FI cross compilers. Most of the action took place on white boards in Waterloo and in pubs in Brussels. The documents are available at:

https://www.mpeforth.com/resource-links/downloads/

Go to the Cross Compilation section and download the documents XCapp5.PDF and XCtext5.PDF

The basis is that there are three memory types: CDATA, IDATA, UDATA (code, initialised RAM, uninitialised RAM) and that at least one SECTION of each type exists. The behaviour of CREATE and the memory access words may change according to the current section type.

The current MPE and FI cross compilers broadly follow this design. I have been reluctant to bring this to full standards level because

1) There are cross compiler designs for which the notation may not be appropriate 2) There are not that many serious cross compilers in use with a body of users beyond MPE and FI 3) Leon and I can usually resolve our differences over coffee and wine.

If people are interested, I can arrange a virtual meeting. Note that Forth-200x meetings are public, and the use of real names is strongly encouraged.

MitraArdronavatar of MitraArdron

Thanks for raising this issue, I ran into the same problem with webForth, with the intent that a dictionary can be created, then Rommed, then, potentially after reboot, extended in Ram. The standard does appear to be ambiguous to me, it seems to implicitly assume that its in Ram, but with Ram being so limited on many processors, the last thing I want to do is copy my dictionary from Flash to Ram.

Here is how I solved it - and I'm not claiming its the best way .... note my implementation started with eForth before being made Forth2012 compliant (passes test suite) and Rommable. vCREATE uses vALLOT which explicitly creates in RAM CREATE uses "," which creates in wherever the dictionary is being built (typically initially in an area that will be flashed, and when booted from flash is in the ram) I keep a separate VP which is like CP but is set during pre-flash build to point at a separate (uninitialized) RAM area.

The code is in https://github.com/mitra42/webForth/blob/master/index.js if its useful to look at.

Words like VARIABLE & BUFFER: use vCREATE, or CREATE followed by vALLOT so their header is in ROM, but data space in RAM. This means VARIABLE's aren't initialized, but eForth also has USER variables which are initialized from flash to ram.

BODY has to be clever enough to know where to look.

There are words useRam, which move pointers around but are intended to be used just once, when running in Ram, building a rommable image, before switching to Ram for further work.

I'm in the process of adding useEeprom to define words that write to the non-volatile EEPROM area, along with eePromSave which writes the eeprom (initially just on ESP8266).

AntonErtlavatar of AntonErtl

ROMmed systems are out-of-scope for the existing standards. There have been discussions about standardizing cross-compilation for a long time, and I presume that includes ROMmable systems, but nothing even close to consensus has been reached (my impression as an outsider is that this is not because of fundamental differences, but because the motivation is insufficient; it seems to suffice only for "I do it this way; let's standardize that").

So Forth-2012 is based on the idea that data space is mutable. There is a difference between uninitialized (BUFFER:, VARIABLE) and initialized (ALLOT, ",") data, but that's all there is in Forth-2012.

The way all the dictionary words work assume that the dictionary is mutable.

  • In a classical cross-compiled system (not in the standard) there is first a cross-compilation phase where the dictionary is mutable, and then a run-time phase where parts of it are read-only.

  • In a compile-to-flash system (with byte granularity for writes) there are ways to write-once to the dictionary, or to tell the system that something should be in RAM.

  • There have also been discussions about declaring some memory regions read-only in order to allow optimizing read accesses to this memory.

  • There have also been discussions about what is preserved in an image (using SAVESYSTEM on some systems, or other non-standard mechanisms on others); the classical answer is that the dictionary is preserved and ALLOCATEd memory is not. I don't know if all systems preserve BUFFER: and VARIABLE contents, however.

I think that all these issues should be considered when proposing something for one of these issues (so ideally we can address the needs of several or all of them with one proposal), but I am not going to write that proposal.

JimPetersonavatar of JimPeterson

I don't mean to extend this discussion further than it needs to go, but I'm confused by the original issue. Maybe I misunderstand what is meant by "dictionary" (defined in 2.1 as "An extensible structure that contains definitions and associated data space."), but I don't see any details in the definitions of ALLOT, CREATE, HERE, or , that say anything about where compiled code or word lists ("dictionary"?) are stored relative to the data space. Therefore, I don't understand why the original complaint is "ALLOT assumes that the data space for mutable data and for compiled code is one and the same". My own system keeps the word list in a completely different section of memory than what is pointed to by HERE, and I could have also easily put the executable code at yet another location (though I chose not to do so) without seeing any deviation from the standard or violation of the tests.

Certainly, if you do HERE 1 ALLOT HERE, as shown in the test code, your stack would be left with ( x x+1 )? Subsequently setting constants 2NDA to x+1 and 1STA to x would allow for the given tests to succeed regardless of where compiled code goes?

While it is convenient to be able to make/compile the basic, ROM-ed portion of a system using ALLOT, etc., isn't this rather easily done just by adjusting what HERE points to before creating the ROM-able portion of the system and then switching it back before flashing the ROM (or having the boot process set it), when you're ready to make stuff in RAM? Either way, I see no assumption in the standard that claims compiled code must go anywhere near HERE.

What's more, to say that VARIABLE carray 4 CELLS ALLOT would "define a mutable array with 10 chars" seems to assume that the data space reserved for the cell that stores carray is somehow guaranteed to be adjacent to HERE, and therefore also adjacent to the cells reserved with the subsequent ALLOT. The definition of VARIABLE doesn't explicitly say that the one cell of data space reserved for storage actually came from the data-space pointer. Maybe that's the default/assumed behavior when the standard says that something "reserves one cell of data space", but that is quite ambiguous if it is the intent. My system will actually interject a small amount of code between the storage used for carray and the four cells allocated with ALLOT, such that it appears as though the intent of that line would fail. Is my system non-standard in that sense? I don't understand what portion of the standard I have violated in doing so.

It feels as though the standard may need to be far more explicit about what operations are guaranteed to not change the data-space pointer and what operations are not guaranteed to not change the data-space pointer. If I do 10 ALLOT <other_stuff> 10 ALLOT, under what conditions on <other_stuff> am I guaranteed that the two calls to ALLOT will provide me with contiguous regions? Just about any call to define new words (:, :NONAME, CONSTANT, DEFER, MARKER, VARIABLE, VALUE, possibly even S") feels like it could inject something on systems that intermingle code and data, but it feels like calling CREATE might not inject something except on systems that also intermingle word list entries. The standard explicitly states that space pointed to by a CREATE-d word is right where HERE pointed just after executing CREATE, but it says nothing about VARIABLE, VALUE, etc.

Sorry... there's a whole can of worms.

ruvavatar of ruv

If I do 10 ALLOT <other_stuff> 10 ALLOT, under what conditions on <other_stuff> am I guaranteed that the two calls to ALLOT will provide me with contiguous regions?

3.3.3.2 Contiguous regions says: “Since an implementation is free to allocate data space for use by code, the above operators need not produce contiguous regions of data space if definitions are added to or removed from the dictionary between allocations”.

Also, the following words are allowed to allot data space: WORDLIST, REPLACES, INCLUDED (and then several other standard words that perform the function of INCLUDED).

Except ,, C,, XC,, ALLOT that are intended to allot data space, there are ALIGN, FALIGN, SFALIGN, DFALIGN, that also allot data space to align it.

FILE S" in interpretation state is not allowed to allot data space (i.e., to change data-space pointer).

CREARE is a defining word, it adds new definition into the dictionary, and so it is allowed to use data space for its internal purposes.

Concerning variables, 3.3.3.3 explicitly says: “The region allocated for a variable may be non-contiguous with regions subsequently allocated with , (comma) or ALLOT”. Perhaps this section should be referenced in the glossary entry for VARIABLE.

MitchBradleyavatar of MitchBradley

Here's how I handle it in cforth. When you save the image that will go into ROM, everything that is currently in the dictionary becomes immutable. Then when you run that image, the dictionary pointer is set so it points to RAM and anything you incrementally compile thereafter is mutable. As a usage requirement, if you want to precompile something that will be mutable, do not use ALLOT or , (comma). BUFFER: usually does what you want.

TG9541avatar of TG9541

@StephenPelc: thanks for insightful reply! The documents on Cross Compilation are very useful! The following example from XCapp5 clearly shows the potential - and the different machine semantics - of a ROMable:

: PRINTS ( n -- )
   CDATA   \ Select code section.
   CREATE ,   \ New definition with value n.
   IDATA   \ Restore default iData section.
   DOES> ( -- )   \ Target execution behavior.
         @ . ;\ Fetch value and display it.

A self-contained ROMable Forth system is, of course, quite a bit less complicated than capturing the pitfalls of the mixed host-target semantics of a XC scenario. In any case, much of the problem of the data space is shared. Memory sections, managed partitions of the memory CDATA, IDATA or UDATA can serve as a reminder that embedded machines can harbor more complexity - it's certainly also a valuable concept.

@MitraArdron, thanks for sharing your approach. Indicating the target memory with vCREATE and vALLOT is akin to the memory types. The programmer can easily state the intent of a memory allocation. The flexibility of the XC approach above is very nice, though.

@AntonErtl: thanks for the analysis of the standard with respect to the question raised! Right now the machine architecture of a Forth VM is, I assume, in the "von Neumann family". A unified data space for code and data somehow implies that data is mutable but µCs certainly blur the distinction between "Harvard" and "von Neumann" architectures. Memory protection blurs the lines even more.

What I had in mind is implementing the standard Core word set for a self-contained µC based Forth system so that packages with, e.g. the CRC-8 package or other code that doesn't depend on an OS can be used without changing it first (instead of "clone and own"). The CRC-8 code is a good example for a "non mutable" use case for CREATE.

Provided that the "stage for target memory" can be set by the integrator before "WANTing" a package (e.g. XC style) then code that's "embedded friendly" should work. A coding guideline and package tags might be a solution.

@JimPeterson: I understand that the issue is not clear-cut - maybe that's due to my limited understanding of the "standard jargon" (please bear with me), or because of my focus on certain types of "embedded systems".

The example you wrote, 10 ALLOT <other_stuff> 10 ALLOT shows that there is no simple solution to mixing mutable and non-mutable code. I'm not suggesting that a magic solution is possible, something that works for everybody without breaking anything. What I'm looking for is some clarity with regard to the standard (alignment of assumptions about machine architectures).

@ruv: thanks for the references!

3.3.3.2 makes it clear that memory contiguousness takes precedence over implied mutability (I assume that most programmers will expect CREATE array 8 ALLOT to produce a mutable (but uninitialized) array).

3.3.3.3 is a clear warning that VARIABLE array 6 ALLOT is not the safe harbor that I had hoped it is. That makes "clearly non-standard" solutions like @MitraArdron 's vCREATE and vALLOT or @StephenPelc 's CDATA, IDATA and UDATA much more attractive.

My solution has been to use a global target defining word (RAM or NVM for ROM space) that controls the behavior of VARIABLE and ALLOT. This works pretty well, unless someone uses CREATE (which will produce a mutable array only in RAM mode - but that can maybe be fixed by generalizing the solution from VARIABLE).

@MitchBradley: thanks for sharing your approach! I also first thought of something as radical as what you describe (and Dr. Ting had proposed something similar when he presented the code I now work with). Using BUFFER: and a set of constants instead of VARIABLE is possible but it requires a special programming style which wouldn't be a good fit for "regular" targets.

MitraArdronavatar of MitraArdron

@TG: Harvard v von Neuman really doesn't capture modern µC's. In a threaded code system such as webForth for example, its irrelevant, as the C compiler takes care of where the code sits, and the dictionary looks like data. What is important is that whether the data is mixed with code or not, there are three kinds of data - ROM/Flash etc, survives reboot and can't be changed; Ram needs initializing, and can be changed; and optionally EEprom or similar that can be changed, survives reboot, but is probably in (very) short supply.

As Anton says, "Rommable systems - are outside the scope of the standard" which in practice means that while it is possible to build a system that is both Forth2012 compliant and rommable, it requires extra non-standard implementation, and a high likelihood that any system level packages are not going to be interoperable.

@TG - the problem with the simplistic memory type specifier like CDATA IDATA is that sometimes this decision happens at the source level, but sometimes it has to be inside the word definition, i.e. VARIABLES always go in Ram and CONSTANTS in Rom, so you'd have to define : VARIABLE ... UDATA CREATE ...; and : CONSTANT ... CDATA CREATE ... ; so that specifying at the source code level is going to be overridden - or at least needs careful standardisation (which won't happen as long as the standards committee thinks its outside scope).

@TG - am I missing something in your example CREATE array 8 ALLOT I think contiguousness just means that I expect those 8 cells to be contiguous, I don't care where that is relative to the definition of array only that executing array gets me a pointer to it.

Reply New Version