Digest #136 2021-03-13
From the description "u is the contents of the character at c-addr1" it seems to imply that the length is the size of one character and hence limited to 255. If a FORTH implementation had a cell before the characters of the string and hence a theoretical string size of max-int, would that still be a standard system?
I think the systems being referred to are those based on Ting's eForth - these were the only two conflicts I found, where the same name was used with different behavior. <# # #S #> on eForth use single precision numbers, and CHARS outputs a char a multiple number of times. When I was converting webforth over from eForth to Forth2012 both of those conflicts bit me as well.
Someone (and it seems that you, MitraArdron feel the itch) could add comments to # and CHARS that warn eforth users that eforth uses non-standard meanings for the respective word.
Concerning the question in the title, IMO BLOCK must not do that.
Concerning your example: If we assume that nothing between the second call to BUFFER and the call to BLOCK invalidates the buffer, it should produce "MASS STORAGE NOT READ".
However, 7.3.2 contains some caveats about buffer invalidation, including at REFILL and parsing. Given that there is parsing and refilling happening from the text interpreter between these calls, the buffer may be invalidated, and BLOCK may read the block from mass storage, resulting in "MASS STORAGE READ".
Actually, there is parsing between the first call to BUFFER and the UPDATE, so there is no guarantee that "MASS STORAGE READ" reaches mass storage.
The way to deal with that is to UPDATE inside PREPARE_BUFFER.
AFAIK these provisions are there for cooperative multi-tasking systems that may task-switch on REFILL (and apparently also on parsing), and where a different task might require a buffer, which might invalidate an existing buffer.
I'm simultaneously intrigued and disappointed. My system is far too resource constrained (like, 1980's-style computing constrained) to properly implement those optimizations, and I was tickled that Forth actually could do code like
[ A 5 CELLS + ] LITERAL, leveraging the full power of Forth during the process of defining a word, almost like template meta-programming in C++ only less obtuse. I'll have to stick with 0ENDCASE for my system, though I might look into examining the previous instruction when encoding a DROP.
I'll have to stick with
0ENDCASEfor my system
ESAC is a better name. It's shorter, it's used in some other languages as pair to
CASE, and it was already used exactly in this meaning in Forth publications.
CASE ... OF ... ENDOF ... 0 ENDCASEStretching Forth prefers
CASE ... OF ... ELSE ... ESAC.
: ESAC POSTONE FALSE POSTPONE ENDCASE ; IMMEDIATE
the implementer of POSTPONE (me in this case) has to write a definition that checks if
branchare immediate and behave differently for the two cases.
If you implement immediacy via the corresponding flag then yes, you have to check it. Perhaps not in the definition of
POSTPONE directly, but in some place it should be checked. And that was the initial point — to move the problem of choosing between
[COMPILE] from the programmers to the system. This solution has some edge cases, but they are very rare in practice.
: ENDIF POSTPONE THEN ; IMMEDIATE : AGAIN POSTPONE branch <RESOLVE ; IMMEDIATE : foo .... ENDIF .... AGAIN ;
THENto be compiled into
ENDIF, while the second causes
branchto be compiled into
I can suggest another point of view. In Forth we can (with some obvious reservations) replace the use of a word by its definition. If a word is immediate then its definition is placed in square brackets when it's used in compilation state (with reservation concerning control flow and setting compilation state).
Let's assume that the interpretation semantics of
POSTPONE is to take the word in the argument and perform its compilation semantics.
Then your definition of
: foo .... ENDIF .... AGAIN ;
is equivalent to
: foo .... [ POSTPONE THEN ] .... [ POSTPONE branch <RESOLVE ] ;
That's equivalent to
: foo .... THEN .... branch [ <RESOLVE ] ;
<small>(since performing compilation semantics is what the Forth text interpreter does when it encounter a word in compilation state)
It looks like the both words are just syntactically placed into
foo. And what is compiled into
foo depends on these words.
So we can think about it as if the argument of
POSTPONE (or better say, its uniquely named synonym) is always syntactically placed instead of the containing word (e.g., into the definition of the target word), and is encountered by the Forth text interpreter. And what is compiled depends on this word. If it's a regular word then it's compiled as is, if it's an immediate word then it's executed and it can compile some other words.
Yes, in a standard program a counted string can be at most 255 chars long. On a system with a larger char size (e.g., on a word-addressed machine) the count (and therefore counted strings) can be larger, but a standard program cannot rely on that.
A Forth system with a COUNT equivalent to the following
: count dup cell+ swap @ ;
would be non-standard unless 1 cells gives 1 on this system.
I recommend using the c-addr u representation (where only the addressable and available memory limits string length) instead of counted strings. I don't use counted strings.
So just to confirm my understanding, there is nothing non-standard if using, say, PARSE and having c-addr u with u > 255 ?