6.2.0855 C" c-quote CORE EXT

Interpretation:

Interpretation semantics for this word are undefined.

Compilation:

( "ccc<quote>" -- )

Parse ccc delimited by " (double-quote) and append the run-time semantics given below to the current definition.

Run-time:

( -- c-addr )

Return c-addr, a counted string consisting of the characters ccc. A program shall not alter the returned string.

See:

Rationale:

Typical use: : X ... C" ccc" ... ;

See: A.3.1.3.4 Counted strings.

Testing:

T{ : cq1 C" 123" ; -> }T
T{ : cq2 C" " ;    -> }T
T{ cq1 COUNT EVALUATE -> 123 }T
T{ cq2 COUNT EVALUATE ->     }T

ContributeContributions

LSchmidtavatar of LSchmidt [226] c-addr used in stack diagramsRequest for clarification2022-03-06 21:06:16

about representation of counted strings on stack, A.3.1.3.4 Counted strings says:

Forth 94 moved toward the consistent use of the "c-addr u" representation of strings on the stack.

While I like consistency, it appears that what was gained is a potential of confusion: c-addr appears to be used to indicate address of count byte, as it is done here. But elsewhere, for example with s" it represents the address of the first character. It isn't obvious from the stack diagrams that those identically looking c-addr refer to different kind of data.

When pointing to the leading count byte of a string, the "c" in "c-addr" makes sense. But when pointing to the first character, it doesn't really. Does it imply that there's a count byte at c-addr-1? Not necessarily, I think. Then what makes such a string of characters "counted"? Shouldn't, by similar reasoning, the source address of movenot also be "counted" - after all, there's also the count of items to move on stack, similar to the c-addr u representation of a string when c-addr points to the first character.

As I see it, there are three cases where c-addr is used to depict stack effects:

  • c-addr indicating the address of a leading count byte
  • c-addr indicating the address of first character in a string with a leading count byte at c-addr-1
  • c-addr indicating the address of first character in a string without a leading count byte.

It is my opinion that for these three different cases not the same stack symbol ought to be used.

AntonErtlavatar of AntonErtl

As described in Section 3.1, c-addr refers to a character-aligned address. It may point to any character, among them the count character (generalization of "count byte", but on byte-addressed machines it usually is a byte) of a counted string, or any content character of a string in any representation. Currently the stack effect notation does not tell you what c-addr points to, you have to read the prose.

There have been repeated discussions on whether we should put this knowledge into the stack effect notation by having a separate type name for strings represented as c-addr u. Your question points to also having a separate type name for counted strings, too. In the past these discussions have always been decided in favour of not changing anything; my memory is that the discussion was more focussed on replacing "c-addr u" with a single name (maybe "str"), but the prose often refers to the c-addr and u part, so having "str" would complicate that. There has also been the observation that the current stack effect have not led to confusion, but your request is counterevidence for that claim. What we might do to reduce the confusion is to have a name (maybe "cstr") for counted string addresses, and names (maybe "str-addr" and "str-u") for the components of c-addr u strings. While we are at it, we could also introduce names (maybe "buf-addr" and "buf-u") for buffers (memory areas for holding strings, typically used as destination addresses in words; e.g., MOVE might be described with the stack effect ( str-addr buf-addr str-u -- )). I leave this request open for future discussion of these questions by the committee.

Back to your questions:

c-addr does not always (nor usually) refer to a counted string and therefore imply that there is a count byte anywhere. c-addr may address an isolated character somewhere, it may address any character in any form of string and it may address the count character in a counted string; just because the letter case exists does not mean that c-addr always refers to the count character.

I don't know what similar reasoning you have in mind, but the specificaton of MOVE clearly states that it moves u characters, where u is given on the stack, not in a count character of any of the operands. Indeed, usually the source operand is not a counted string.

A fourth case is c@, which most clearly demonstrates that c-addr just refers to a character, irrespective of what the purpose of the character is and how the memory around it is used.

ruvavatar of ruv

My two cents in addition to Anton's answer.

Forth 94 moved toward the consistent use of the "c-addr u" representation of strings on the stack.

it appears that what was gained is a potential of confusion: c-addr appears to be used to indicate address of count byte, as it is done here. But elsewhere, for example with s" it represents the address of the first character. It isn't obvious from the stack diagrams that those identically looking c-addr refer to different kind of data.

1. The phrase ? "c-addr u" representation of strings on the stack ?doesn't refer stack diagrams, but the actual data stack, i.e., how to represent strings on the data stack. Stack diagrams just reflect the data types of the stack items.

2. c in c-addr doesn't mean "counted", but it means "character-aligned". So c-addr means character-aligned address. And a-addr means aligned address (cell-aligned), addr means any address (including unaligned). c-addr, a-addr, addr — are data type symbols that refer addresses, but they say nothing about data stored on the address.

The corresponding data type relations are: a-addr ⇒ c-addr ⇒ addr ⇒ u

See also: 3.1 Data types, 3.1.3.3 Addresses, 3.3.3.1 Address alignment.

To better understand the problem of aligning, just imagine that in some plausible Forth system (that Forth-2012 allows) a character takes four address units, and a cell takes eight address units. And in such a system the phrase create x -1 , x 1+ c@ may throw an exception triggered by the CPU, since c@ reads from an address that is not character-aligned. So the standard implicitly disallows for a standard program to unconditionally perform such a phrase, and it does it via data types and stack diagrams that use special notation (see 2.2.2 Stack notation).

Reply New Version