17.6.2.2255 SUBSTITUTE STRING EXT

( c-addr₁ u₁ c-addr₂ u₂ -- c-addr₂ u₃ n )

Perform substitution on the string c-addr₁ u₁ placing the result at string c-addr₂ u₃, where u3 is the length of the resulting string. An error occurs if the resulting string will not fit into c-addr₂ u₂ or if c-addr₂ is the same as c-addr₁. The return value n is positive or 0 on success and indicates the number of substitutions made. A negative value for n indicates that an error occurred, leaving c-addr₂ u₃ undefined. Negative values of n are implementation defined except for values in table 9.1 THROW code assignments.

Substitution occurs left to right from the start of c-addr₁ in one pass and is non-recursive.

When text of a potential substitution name, surrounded by `%' (ASCII $25) delimiters is encountered by SUBSTITUTE, the following occurs:

If the name is null, a single delimiter character is passed to the output, i.e., %% is replaced by %. The current number of substitutions is not changed.
If the text is a valid substitution name acceptable to 17.6.2.2141 REPLACES, the leading and trailing delimiter characters and the enclosed substitution name are replaced by the substitution text. The current number of substitutions is incremented.
If the text is not a valid substitution name, the name with leading and trailing delimiters is passed unchanged to the output. The current number of substitutions is not changed.
Parsing of the input string resumes after the trailing delimiter.

If after processing any pairs of delimiters, the residue of the input string contains a single delimiter, the residue is passed unchanged to the output.

See:

17.6.2.2141 REPLACES, 17.6.2.2375 UNESCAPE, A.17.6.2.2255 SUBSTITUTE.

Rationale:

Many applications need to be able to perform text substitution, for example:

Your balance at <time> on <date> is <currencyvalue>.

Translation of a sentence or message from one language to another may result in changes to the displayed parameter order. The example, the Afrikaans translation of this sentence requires a different order:

Jou balans op <date> om <time> is <currencyvalue>.

The words SUBSTITUTE and REPLACES provide for this requirements by defining a text substitution facility. For example, we can provide an initial string in the form:

Your balance at %time% on %date% is %currencyvalue%.

The % is used as delimiters for the substitution name. The text "currencyvalue", "date" and "time" are text substitutions, where the replacement text is defined by REPLACES:

: date S" 10/Nov/2014" ;
: time S" 02:52" ;
date S" date" REPLACES
time S" time" REPLACES

The substitution name "date" is defined to be replaced with the string "10/Nov/2014" and "time" to be replaced with "02:52". Thus SUBSTITUTE would produce the string:

Your balance at 02:52 on 10/Nov/2014 is %currencyvalue%.

As the substitution name "currencyvalue" has not been defined, it is left unchanged in the resulting string.

The return value n is nonnegative on success and indicates the number of substitutions made. In the above example, this would be two. A negative value indicates that an error occurred. As substitution is not recursive, the return value could be used to provide a recursive substitution.

Implementation of SUBSTITUTE may be considered as being equivalent to a wordlist which is searched. If the substitution name is found, the word is executed, returning a substitution string. Such words can be deferred or multiple wordlists can be used. The implementation techniques required are similar to those used by ENVIRONMENT?. There is no provision for changing the delimiter character, although a system may provide system-specific extensions.

Implementation:

Assuming E.17.6.2.2141 REPLACES has been defined.

[UNDEFINED] bounds [IF]
   : bounds    \ addr len -- addr+len addr
     OVER + SWAP
   ;
[THEN]

[UNDEFINED] -rot [IF]
   : -rot    \ a b c -- c a b
     ROT ROT
   ;
[THEN]

CHAR % CONSTANT delim     \ Character used as the substitution name delimiter.
string-max BUFFER: Name \ Holds substitution name as a counted string.
VARIABLE DestLen           \ Maximum length of the destination buffer.
2VARIABLE Dest             \ Holds destination string current length and address.
VARIABLE SubstErr          \ Holds zero or an error code.

: addDest \ char --
\ Add the character to the destination string.
   Dest @ DestLen @ < IF
     Dest 2@ + C! 1 CHARS Dest +!
   ELSE
     DROP -1 SubstErr !
   THEN
;

: formName \ c-addr len -- c-addr' len'
\ Given a source string pointing at a leading delimiter, place the name string in the name buffer.
   1 /STRING 2DUP delim scan >R DROP \ find length of residue
   2DUP R> - DUP >R Name place        \ save name in buffer
   R> 1 CHARS + /STRING                 \ step over name and trailing %
;

: >dest \ c-addr len --
\ Add a string to the output string.
   bounds ?DO
     I C@ addDest
   1 CHARS +LOOP
;

: processName \ -- flag
\ Process the last substitution name. Return true if found, 0 if not found.
   Name COUNT findSubst DUP >R IF
     EXECUTE COUNT >dest
   ELSE
     delim addDest Name COUNT >dest delim addDest
   THEN
   R>
;

: SUBSTITUTE \ src slen dest dlen -- dest dlen' n
\ Expand the source string using substitutions.
\ Note that this version is simplistic, performs no error checking,
\ and requires a global buffer and global variables.
   Destlen ! 0 Dest 2! 0 -rot \ -- 0 src slen
   0 SubstErr !
   BEGIN
     DUP 0 >
   WHILE
     OVER C@ delim <> IF                \ character not %
       OVER C@ addDest 1 /STRING
     ELSE
       OVER 1 CHARS + C@ delim = IF    \ %% for one output %
         delim addDest 2 /STRING       \ add one % to output
       ELSE
         formName processName IF
           ROT 1+ -rot                    \ count substitutions
         THEN
       THEN
     THEN
   REPEAT
   2DROP Dest 2@ ROT SubstErr @ IF
     DROP SubstErr @
   THEN
;

Testing:

30 CHARS BUFFER: subbuff \ Destination buffer

\ Define a few string constants
: "hi" S" hi" ;
: "wld" S" wld" ;
: "hello" S" hello" ;
: "world" S" world" ;

\ Define a few test strings
: sub1 S" Start: %hi%,%wld%! :End" ;    \ Original string
: sub2 S" Start: hello,world! :End" ;   \ First target string
: sub3 S" Start: world,hello! :End" ;   \ Second target string

\ Define the hi and wld substitutions
T{ "hello" "hi" REPLACES -> }T \ Replace "%hi%" with "hello"
T{ "world" "wld" REPLACES -> }T \ Replace "%wld%" with "world"

\ "%hi%,%wld%" changed to "hello,world"
T{ sub1 subbuff 30 SUBSTITUTE ROT ROT sub2 COMPARE -> 2 0 }T

\ Change the hi and wld substitutions
T{ "world" "hi" REPLACES -> }T
T{ "hello" "wld" REPLACES -> }T

\ Now "%hi%,%wld%" should be changed to "world,hello"
T{ sub1 subbuff 30 SUBSTITUTE ROT ROT sub3 COMPARE -> 2 0 }T

\ Where the subsitution name is not defined
: sub4 S" aaa%bbb%ccc" ;
T{ sub4 subbuff 30 SUBSTITUTE ROT ROT sub4 COMPARE -> 0 0 }T

\ Finally the % character itself
: sub5 S" aaa%%bbb" ;
: sub6 S" aaa%bbb" ;
T{ sub5 subbuff 30 SUBSTITUTE ROT ROT sub6 COMPARE -> 0 0 }T

ContributeContributions

alextangent [30] Case sensitivityRequest for clarification2017-07-10 18:29:41

Is %test% considered the same string as or different from %TEST% ?

StephenPelc [r86] 2017-07-11 15:07:46

Alex - good catch. Since the rationale mentions using wordlists, it would be sensible for the case sensitivity of the key names to be that of the system dictionary. All versions of Proforth and VFX have used wordlists (case-insensitive) for text macros since the late 1980s.

I suspect that we also ought to deal with case sensitivity in terms of the 7-bit ASCII that the standards use. The MPE dictionary tools only convert case for the range A-Z and a-z.

Reply New Version