6.2.0620 ?DO question-do CORE EXT

Interpretation:

Interpretation semantics for this word are undefined.

Compilation:

( C: -- do-sys )

Put do-sys onto the control-flow stack. Append the run-time semantics given below to the current definition. The semantics are incomplete until resolved by a consumer of do-sys such as LOOP.

Run-time:

( n1 | u1 n2 | u2 -- ) ( R: -- loop-sys )

If n1 | u1 is equal to n2 | u2, continue execution at the location given by the consumer of do-sys. Otherwise set up loop control parameters with index n2 | u2 and limit n1 | u1 and continue executing immediately following ?DO. Anything already on the return stack becomes unavailable until the loop control parameters are discarded. An ambiguous condition exists if n1 | u1 and n2 | u2 are not both of the same type.

See:

Rationale:

Typical use:

   : X ... ?DO ... LOOP ... ;

Testing:

DECIMAL

: qd ?DO I LOOP ;
T{   789   789 qd -> }T
T{ -9876 -9876 qd -> }T
T{     5     0 qd -> 0 1 2 3 4 }T

: qd1 ?DO I 10 +LOOP ;
T{ 50 1 qd1 -> 1 11 21 31 41 }T
T{ 50 0 qd1 -> 0 10 20 30 40 }T

: qd2 ?DO I 3 > IF LEAVE ELSE I THEN LOOP ;
T{ 5 -1 qd2 -> -1 0 1 2 3 }T

: qd3 ?DO I 1 +LOOP ;
T{ 4  4 qd3 -> }T
T{ 4  1 qd3 ->  1 2 3 }T
T{ 2 -1 qd3 -> -1 0 1 }T

: qd4 ?DO I -1 +LOOP ;
T{  4 4 qd4 -> }T
T{  1 4 qd4 -> 4 3 2  1 }T
T{ -1 2 qd4 -> 2 1 0 -1 }T

: qd5 ?DO I -10 +LOOP ;
T{   1 50 qd5 -> 50 40 30 20 10   }T
T{   0 50 qd5 -> 50 40 30 20 10 0 }T
T{ -25 10 qd5 -> 10 0 -10 -20     }T

VARIABLE qditerations
VARIABLE qdincrement

: qd6 ( limit start increment -- )    qdincrement !
   0 qditerations !
   ?DO
     1 qditerations +!
     I
     qditerations @ 6 = IF LEAVE THEN
     qdincrement @
   +LOOP qditerations @
;

T{  4  4 -1 qd6 ->                   0  }T
T{  1  4 -1 qd6 ->  4  3  2  1       4  }T
T{  4  1 -1 qd6 ->  1  0 -1 -2 -3 -4 6  }T
T{  4  1  0 qd6 ->  1  1  1  1  1  1 6  }T
T{  0  0  0 qd6 ->                   0  }T
T{  1  4  0 qd6 ->  4  4  4  4  4  4 6  }T
T{  1  4  1 qd6 ->  4  5  6  7  8  9 6  }T
T{  4  1  1 qd6 ->  1  2  3          3  }T
T{  4  4  1 qd6 ->                   0  }T
T{  2 -1 -1 qd6 -> -1 -2 -3 -4 -5 -6 6  }T
T{ -1  2 -1 qd6 ->  2  1  0 -1       4  }T
T{  2 -1  0 qd6 -> -1 -1 -1 -1 -1 -1 6  }T
T{ -1  2  0 qd6 ->  2  2  2  2  2  2 6  }T
T{ -1  2  1 qd6 ->  2  3  4  5  6  7 6  }T
T{  2 -1  1 qd6 -> -1  0  1          3  }T

ContributeContributions

JimPetersonavatar of JimPeterson [370] Return Stack Notation Mildly InaccurateRequest for clarification2024-12-03 21:05:42

The run-time stack effect is declared as:

( n1 | u1 n2 | u2 -- ) ( R: -- loop-sys )

but in actuality, the return stack may or may not have loop-sys added (unless I significantly misunderstand the function of this word). Is this more appropriate?:

( n1 | u1 n2 | u2 -- ) ( R: -- loop-sys | )

ruvavatar of ruv

Is this more appropriate?: ( n1|u1 n2|u2 -- ) ( R: -- loop-sys | )

There is no need to indicate the case when loop-sys is not placed on the return stack, because this case is already specified by the fact that loop-sys may take zero or more cells on the return stack, see 3.1.5.2 System-execution types.

JimPetersonavatar of JimPeterson

I understand what you're saying, but in terms of conveying information to the reader, I think some indication that there may not be loop control parameters to UNLOOP, presented in the stack notation, may be of use. I know that the text below it says as much, but ?DO having the same stack notation as DO just feels wrong or misleading.

I know, from a machine's perspective, the notation is technically correct, but I feel like this documentation is being written for humans, and they often require (or at least benefit from) a little more hand-holding. The change I suggest is also technically correct but more informative.

AntonErtlavatar of AntonErtl

You are correct, but originally failed to get your point across to me, and apparently to ruv, who addressed the issue that a loop-sys may have 0 items on the return stack. But yes, if n1|u1 = n2|u2, no loop-sys is pushed by the run-time semantics of ?do.

ruvavatar of ruv

JimPeterson, I now see what you mean. My argument about size of loop-sys is irrelevant.

that there may not be loop control parameters to UNLOOP

This is impossible. UNLOOP can only be used in a loop body. And if ?DO Run-time semantics do not place loop-sys, the loop body does not gain control.

  program1 ( S: u.limit u.initial ) ( R: 0*x ) ?DO ( S: 0*x ) ( R: loop-sys ) program2 ( R: loop-sys ) LOOP ( R: 0*x )

By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Take a look:

program1 ( S: param-type.1 ) program2 ( S: param-type.2 ) program3

When we specify a stack diagram for program2, which is ( param-type.1 -- param-type.2 ), we indicate:

  • the input parameter type for program2, which should be provided by program1,
  • the output parameter type of program2, which is available for program3,
  • we can indicate a case when program3 does not gain control, but we have to indicate in prose who will gain control and what parameters will be passed.

For example:

: ?return-true ( -- ) ]] if true exit then [[ ; immediate
  • A correct stack diagram for ?return-true Run-time semantics:
    • ( x -- )
  • A more narrow stack diagram for ?return-true Run-time semantics:
    • ( 0 -- | x\0 -- true never )
    • But this diagram does not tell us where the output parameter true is available (if any).

See also my post about the never data type.


NB: ( n1 | u1 n2 | u2 -- ) is incorrect due to excessive spaces, it's an editorial issue.

ruvavatar of ruv

  • A correct stack diagram for ?return-true Run-time semantics:
    • ( x -- )

It could be unclear why this diagram is correct if the ?return-true Run-time semantics do not return control in some cases.

The answer is that the data type never is a subtype of any other data type, and it is a subtype of 0*x.

( x -- )( x -- 0*x )( x -- 0*x|never )( x -- | x -- never )

A stack diagram by itself does not guarantee that a word returns control. It only guarantees that if the word returns control, it returns a parameter of specified type.

AntonErtlavatar of AntonErtl

By default, the part "after" of a stack diagram indicates only parameters that are available for the next code fragment. Thus, the diagram ( R: -- loop-sys | ) would be incorrect, because you are trying to indicate in "after" both a parameter that is available and that is not available to the next code fragment.

Counterexample: THROW

ruvavatar of ruv

Anton, thank you for this example. Yes, I'm wrong that it "would be incorrect". Formally, it is still correct. Because specifying a wider data type than possible does not introduce contradictions. But it does introduce confusing. I think that data types should be specified as narrow as possible.

The stack diagram ( k*x n -- k*x | i*x n ) is justified only if neither type of ( i*x n ) and ( k*x ) is a subtype of the other. This holds for the word throw because ( i*x n ) is never returned to the caller.

But specifying two data types in the union so that the members of one are sometimes returned and the members of the other are never returned is useless and just confusing.

A better diagram for throw is ( k*x 0 -- k*x | k*x n1\0 -- i*x n1 never ). This diagram explicitly says that the output parameter of the type ( i*x n1 ) (where the value that corresponds to n1 is the same in the input and in the output parameter) is not available to the caller.

A better diagram for ?do Run-time semantics is ( n1 n2 -- ; R: -- loop-sys ; | u1 u2 -- ; R: -- loop-sys ; | x1 x1 -- never ; ). This diagram specifies:

  • an ambiguous condition exists if the input parameter has type neither ( n n ) nor ( u u ),
  • if the first input parameter and the second input parameter are identical, then the loop body may not gain control.

ruvavatar of ruv

By "a wider data type than possible" I mean a data type that has more members than another suitable data type. How best to phrase that?

ruvavatar of ruv

The stack diagram ( k*x n -- k*x | i*x n ) is justified only if neither type of ( i*x n ) and ( k*x ) is a subtype of the other.

Actually, this always holds.

( k*x n -- k*x | i*x n )( k*x n -- k*x | k*x n -- i*x n )

To prove that neither type from the union is a subtype of the other, it is enough to give two examples, one of which is a member of only ( k*x n -- k*x ) (in the union), and the other is a member of only ( k*x n -- i*x n ) (in the union).

Here are such examples:

  • the mapping ( 0 ↦ ) is a member of only ( k*x n -- k*x ) (in the union)
  • the mapping ( 1 ↦ 1 ) is a member of only ( k*x n -- i*x n ) (in the union)

There are also members that belongs to both. For example, the following mappings:

  • ( 0 0 ↦ 0 )
  • ( 1 1 ↦ 1 )
  • ( 123 0 0 ↦ 123 0 )

AntonErtlavatar of AntonErtl

It is obvious that the idea behind the stack diagram of throw is to specify what happens on the data stack in both cases, including the case where the control flow does not continue sequentially. And I think it's a good idea to specify the stack effect for that case, and it should also be done for ?do.

Whether the | syntax as used for throw is good enough or whether we should have separate stack diagrams for the two cases is something one might discuss. However, I have not seen confused questions about what the stack effect of throw means, and I would not expect confusion from a similar usage of | for the stack effect of ?do. In both cases the prose makes it clear enough which part of the stack effect diagram corresponds to which case. Actually for the ?do case it has taken 3 decades until someone asked about the lack of a stack-effect diagram for the case when index=limit.

Reply New Version