Digest #176 2022-03-09
18.104.22.168.d seems to contradict 22.214.171.124.g, which says "... including do-loops", given that the run-time definition of DO and LOOP involve placing and removing loop-sys on/from the return stack. The way I read 126.96.36.199.d implies to me that the locals would not be accessible within the loop while 188.8.131.52.g explicitly says that they are. Perhaps I should interpret this to mean that some words that manipulate the return stack (e.g., DO and LOOP) have special conditions that still allow for locals, but if that's the case, I believe it should be more explicitly stated in the spec.
No mention is explicitly made about whether or not the remainder of the data stack that was not loaded in to locals would still be accessible during execution of the definition. I can think of at least one implementation of support for locals that simply records a "frame pointer" (e.g., gforth's SP@) that points to the current data stack position, leaving all the locals right on the data stack and allowing read/write access through direct addressing, and then relocates any additional return values placed on the stack upon exit from the word. There is a strong implication that such an implementation, while "efficient", would not be compliant, but I feel like it would be beneficial for the spec to explicitly state, one way or another, whether or not accessing (or even dropping) items on the data stack below those pushed into locals should be allowed.
As described in Section 3.1, c-addr refers to a character-aligned address. It may point to any character, among them the count character (generalization of "count byte", but on byte-addressed machines it usually is a byte) of a counted string, or any content character of a string in any representation. Currently the stack effect notation does not tell you what c-addr points to, you have to read the prose.
There have been repeated discussions on whether we should put this knowledge into the stack effect notation by having a separate type name for strings represented as c-addr u. Your question points to also having a separate type name for counted strings, too. In the past these discussions have always been decided in favour of not changing anything; my memory is that the discussion was more focussed on replacing "c-addr u" with a single name (maybe "str"), but the prose often refers to the c-addr and u part, so having "str" would complicate that. There has also been the observation that the current stack effect have not led to confusion, but your request is counterevidence for that claim. What we might do to reduce the confusion is to have a name (maybe "cstr") for counted string addresses, and names (maybe "str-addr" and "str-u") for the components of c-addr u strings. While we are at it, we could also introduce names (maybe "buf-addr" and "buf-u") for buffers (memory areas for holding strings, typically used as destination addresses in words; e.g., MOVE might be described with the stack effect ( str-addr buf-addr str-u -- )). I leave this request open for future discussion of these questions by the committee.
Back to your questions:
c-addr does not always (nor usually) refer to a counted string and therefore imply that there is a count byte anywhere. c-addr may address an isolated character somewhere, it may address any character in any form of string and it may address the count character in a counted string; just because the letter case exists does not mean that c-addr always refers to the count character.
I don't know what similar reasoning you have in mind, but the specificaton of MOVE clearly states that it moves u characters, where u is given on the stack, not in a count character of any of the operands. Indeed, usually the source operand is not a counted string.
A fourth case is c@, which most clearly demonstrates that c-addr just refers to a character, irrespective of what the purpose of the character is and how the memory around it is used.
My two cents in addition to Anton's answer.
Forth 94 moved toward the consistent use of the "c-addr u" representation of strings on the stack.
it appears that what was gained is a potential of confusion: c-addr appears to be used to indicate address of count byte, as it is done here. But elsewhere, for example with
s"it represents the address of the first character. It isn't obvious from the stack diagrams that those identically looking
c-addrrefer to different kind of data.
1. The phrase ? "c-addr u" representation of strings on the stack ?doesn't refer stack diagrams, but the actual data stack, i.e., how to represent strings on the data stack. Stack diagrams just reflect the data types of the stack items.
c-addr doesn't mean "counted", but it means "character-aligned". So
c-addr means character-aligned address. And
a-addr means aligned address (cell-aligned),
addr means any address (including unaligned).
addr — are data type symbols that refer addresses, but they say nothing about data stored on the address.
The corresponding data type relations are:
a-addr ⇒ c-addr ⇒ addr ⇒ u
To better understand the problem of aligning, just imagine that in some plausible Forth system (that Forth-2012 allows) a character takes four address units, and a cell takes eight address units. And in such a system the phrase
create x -1 , x 1+ c@ may throw an exception triggered by the CPU, since
c@ reads from an address that is not character-aligned. So the standard implicitly disallows for a standard program to unconditionally perform such a phrase, and it does it via data types and stack diagrams that use special notation (see 2.2.2 Stack notation).