17.6.1.0245 /STRING slash-string STRING

( c-addr1 u1 n -- c-addr2 u2 )

Adjust the character string at c-addr1 by n characters. The resulting character string, specified by c-addr2 u2, begins at c-addr1 plus n characters and is u1 minus n characters long.

See:

Rationale:

/STRING is used to remove or add characters relative to the current position in the character string. Positive values of n will exclude characters from the string while negative values of n will include characters to the left of the string.

S" ABC" 2 /STRING 2DUP TYPE \ outputs "C"
-1 /STRING TYPE \ outputs "BC"

Testing:

T{ s1  5 /STRING -> s1 SWAP 5 + SWAP 5 - }T
T{ s1 10 /STRING -4 /STRING -> s1 6 /STRING }T
T{ s1  0 /STRING -> s1 }T

ContributeContributions

JimPetersonavatar of JimPeterson [203] Possible Reference ImplementationSuggested reference implementation2021-05-21 17:58:29

Would this be a sufficient reference implementation?:

: /STRING  DUP >R - SWAP R> CHARS + SWAP ;

Are there no checks for when u2 might go negative (or, really, wraps around, as it's considered unsigned)?

ruvavatar of ruv

Are there no checks for when u2 might go negative?

Formally, u2 cannot be negative since it's an unsigned number (a number that is interpreted as an unsigned number). See data type symbols and their meaning.

In some use-cases it could be a problem if n > u1, but it's out of the scope of the standard, if a user tries to shoot himself in the foot. OTOH, it could be a correct and expected intermediate result.

Would this be a sufficient reference implementation?

Yes, I think. Other possible variants:

: /string ( c-addr1 u1 n -- c-addr2 u2 ) tuck - >r chars + r> ;
: /string ( c-addr1 u1 n -- c-addr2 u2 ) tuck - -rot chars + swap ;

AntonErtlavatar of AntonErtl

Gforth, iForth, lxf, SwiftForth 3.11, and VFX 5.11 do not check for n<=u; so if n>u, the result is broken (I explored whether you could do something useful with strings with negative length, but did not come up with anything). I dimly remember that there are systems where /STRING behaves like OVER MIN /STRING, so the resulting string is always valid, but I don't remember which systems behave that way.

JimPetersonavatar of JimPeterson

I prefer your first implementation over the other two options. It seems like it would be the most efficient on many systems.

Reply New Version

ruvavatar of ruv [352] Unspecified ambiguous condition in /STRINGRequest for clarification2024-07-28 17:17:53

The specification says that u.2 is u.1 minus n.

If n is greater than u.1, the result of subtraction is wrapped around, and the resulting ( c-addr.2 u.2 ) cannot not be interpreted as a character string. Is it correct?

I think, either behavior (or maybe meaning) should be specified, or an ambiguous condition should be explicitly declared for this case.

Thus, I see the following options:

  • change the result stack parameter data type to ( c-addr.2 n.2 ) and describe what these parameters mean if n.2 is negative;
    • then, the input stack parameter data type should be changed to ( c-addr.1 +n.1 n ), and /string will not be applicable to character strings longer than max-n;
  • specify that if n is greater than u.1, it is replaced by u.1;
  • declare that an ambiguous condition exists if n is greater than u.1;
  • specify that if n is greater than u.1, an exception is thrown.

It seems, the last option is preferable. What do you think?

AntonErtlavatar of AntonErtl

What is existing practice? Let's try:

create foo 10 chars allot
foo 2 3 /string cr . foo - .

I tested this on Gforth, iForth 5.1-mini, SwiftForth 4.0.0-RC89 and VFX 5.43, and the output of this test was invariably "-1 3", i.e., the result of wraparound (as a fan of applying correct typing, you can use u. instead of ., but the output will not be any more enlightening). I don't see a reason for any of your options. If any clarification is needed, it should be along the lines of

c-addr2 is c-addr1+n; u2 is u1-n

ruvavatar of ruv

The rationale says: "/STRING is used to remove or add characters relative to the current position in the character string. Positive values of n will exclude characters from the string while negative values of n will include characters to the left of the string."

The specification says that resulting character string has length "u1 minus n characters".

It's impossible to remove any character from a string whose length is zero. Therefore, if If n is greater than u1, the operation cannot be interpreted as removing characters from the string, and the result cannot be interpreted as a character string, despite the specification says that it's a character string.

So, I think, the case when n is greater than u1 should be somehow described to eliminate this confusion.

For example:

Note: if n is greater than u1, u2 is the result of wraparound on underflow, and c-addr2 u2 does not represent a character string.

Reply New Version