Digest #135 2021-03-07
I am the author of the block word set test in Gerry Jackson's forth2012-test-suite. An issue has been raised against one of the tests since it assumes that
BLOCK does not overwrite the buffer from the mass storage if the buffer is already allocated by
BUFFER. The author of the issue is Francois Laagel who is coming from an implementation of FORTH-79. FORTH-79 definition for
Leave the address of the first byte in block n. If the block is not already in memory, it is transferred from mass storage into whichever memory buffer has been least recently accessed. If the block occupying that buffer has been UPDATEd (i.e. modified), it is rewritten onto mass storage before block n in read into the buffer. n is an unassigned number. If correct mass storage read or write [is] not possible, an error condition exists. Only data within the latest block referenced by BLOCK is valid by byte address, due to sharing of the block buffers.
(Sorry - I hit submit by accident. Continuing ...)
Francois has an implementation of FORTH79 which keeps track of whether an assigned buffer has been sourced from the mass storage or not. The BLOCK word uses this to detect a buffer that has been allocated by BUFFER, but not read. In such cases, it will do the transfer from mass storage.
As I read the FORTH79 (and FORTH83) standard, this seems like a compliant implementation since the standards do not say what should happen in BLOCK if a buffer is already assigned. However, ANS FORTH seems to me to have tightened up here and states:
If block u is already in a block buffer, a-addr is the address of that block buffer.
As I read the ANS FORTH standard, is does not say that the block buffer may be altered in anyway if the block is already in a block buffer. In my head I am applying a meta rule that says "and nothing else changes" (without which all language standards would be 10 times as long and 100 times less intelligible.)
Therefore, I would expect a definition of
BLOCK which transfers from mass storage when the block is already in a block buffer to be non-compliant with ANS FORTH.
Sometimes, I find a small code example can illustrate the point much better. Consider the following:
( Prepare a buffer in memory with some contents ) : PREPARE_BUFFER ( blk c-addr u -- ) ROT BUFFER DUP 1024 BL FILL SWAP 1024 MIN CMOVE ; ( Test whether BLOCK forces a read from mass storage ) : BLOCK_FORCED_READ? ( blk -- ) EMPTY-BUFFERS \ Establish known starting condition DUP S" MASS STORAGE READ" PREPARE_BUFFER UPDATE FLUSH \ Prepare mass storage DUP S" MASS STORAGE NOT READ" PREPARE_BUFFER \ Prepare buffer DUP BLOCK DROP \ Does BLOCK read from mass storage? ... LIST ; 20 BLOCK_FORCED_READ?
Is the LIST output required by the ANS Standard to contain "MASS STORAGE READ" or "MASS STORAGE NOT READ" or are both allowed?
Sure - as I said, precisely explained but essentially incomprehensible.
Yes, I agree there are multiple ways of implementing this, but this word explicitly replaces COMPILE in a way that is VERY hard to follow, with no examples. My example referred to cases where the code is implemented as a threaded dictionary (old-style) and in that case I assert you do still need to know whether a word is immediate both in the use of POSTPONE (i.e. I want it to compile "branch" into foo, but to execute THEN) and have to check this in POSTPONE (you've used NAME>COMPILE to effectively do that checking, and NAME>COMPILE is, as far as I can tell, not relevant in systems that only use a single XT.
Its a pity if, as you suggest, the Standards Committee doesn't really want this standard to be clear and implementable, that might suggest why a lot of implementations still use older standards or eForth :-(
I think it's the other way round: Because many people think in terms of immediate-flag systems, the standard appears hard to understand to them. Maybe we need another Bill Muench and another C.H.Ting to code up and produce educational material for a system that's more in line with standard concepts.
NAME>COMPILE is a standard word and is relevant to single-xt+immediate-flag systems, too, but yes, on such a system it will check the immediate flag.
Could I suggest that the definition be clarified to at least comment that many earlier versions of Forth used the same words to work on single width numbers.
I think this is one of the few cases in Forth2012 where the same name is used to have a significantly different stack effect from other common forth versions.
CHARS is another example.
Interesting, this definition is inherited directly from the ANS Forth document (1994). The previous standard Forth '83 has the definition:
# +d1 -- +d2 79 "sharp"
The remainder of +d1 divided by the value of BASE is converted to an ASCII character and appended to the output string toward lower memory addresses. +d2 is the quotient and is maintained for further processing. Typically used between <# and #> .
While the Forth '79 standard has:
# ud1 -- ud2 158
Generate from an unsigned double-number d1, the next ASCII character which is placed in an output string. Result d2 is the quotient after division by BASE and is maintained for further processing. Used between <# and #>. "sharp"
Yes it does have
ud2 in the signature and refer to
d2 in the definition, at least they cleared that up in the '83 standard.
Therefore any system using single cell numbers for
# are not and never have been standard. I see no need to refer to old non-standard systems.
I fail to see how CHARS can be considered another example as CHARS was introduced in the ANS Forth Standard document and this definition is inherited directly from that document.
I agree with Gerry, neither the test nor the description is unambiguous (only the implementation, which is great to see , but shouldn't be relied on). IMHO the spec should be clear about prepend v append
: 0ENDCASE [ 0 ] LITERAL POSTPONE ENDCASE ; IMMEDIATE ?
It would normalize/reserve the idea of a 0ENDCASE word, and systems could implement it in a more efficient way than pushing and immediately dropping a 0.
: 0ENDCASE 0 POSTPONE LITERAL POSTPONE ENDCASE ; IMMEDIATE
Actually, in a number of modern Forth implementations, 0 DROP generates no code, and
: foo case 0 endcase ;
generates code for a word that just returns. I just tested this with development Gforth, iForth, SwiftForth, and VFX, and only SwiftForth generates more code. Doing this optimization (and similar ones) is relatively simple and explained in