Digest #236 2023-11-13

Contributions

[313] 2023-11-12 13:12:55 albert wrote:

proposal - String store and fetch

Hint: Please delete the blockquote explanations, they are just for your convenience while writing the proposal

Author:

Albert van der Horst .

Change Log:

Problem:

With the advent of a proper notation for strings (recognizers) it is time to tackle the manipulation of strings. Strings deserve a prefix like D DF SF U to set these wordset apart from other words and greatly enhance reasibility. The foremost decision is how to choose the prefix. STR and $ comes to mind. In this proposal I use <string> because to discussion of how to choose the prefix distracts from the conclusion that we need such a prefix. In the following I use the $ because we are obliged to use a web interface instead of the brilliant programmer editors like emacs and gvim.

As we adapt this solution the future look bright with words as: $^ $\ $/ $= $,

Solution:

We need analogous words to ! @ +! , DF! DF@ 2! 2! . Define a stringconstant with an (address character-count) pair. In the stackdiagram indicated as sc Define a stringvariable as a-addr with sufficient space for a string to be stored. In the stacj k diagram indicated as sv.

This requires thus <string>@ <string>! <string>+! and <string>C+

The foremost decision is that these words work with a cell that determines the count. The habit of storing the count in a character, can largely be ignored. It is fortunate that the brain-damaged word to do this (COUNT, PLACE) are different from this proposal.

This wordset is used since 1984 on the Osborne and has been used internally and externally in ciforth implementation since 2000.

Typical use: (Optional)

CREATE MYBUFFER 100 ALLOT "ROME" MYBUFFER <string>! MYBUFFER <string>@ TYPE \ ROME OK BL MYBUFFER <string>+C MYBUFFER <string>@ TYPE CHAR | EMIT \ ROME | OK "ATHENE" MYBUFFER <string>+!

Proposal

Add the following paragraphs to the STRING chapter.

$! "string-store"

STACKEFFECT: sc addr ---

DESCRIPTION:

Store a string constant sc in the string variable at address addr.

 __________________________________________________________________

$+! "string-plus-store"

STACKEFFECT: sc addr ---

DESCRIPTION:

Append a string constant sc to the string variable at address addr. __________________________________________________________________

$@ "string-fetch"

STACKEFFECT: addr --- sc

DESCRIPTION:

From address addr fetch a string constant sc . __________________________________________________________________

$C+ "string-char-append"

STACKEFFECT: ch addr ---

DESCRIPTION:

Append a char ch to the string variable at address addr. __________________________________________________________________

Reference implementation:

If the proposal surviive the first ridicule I shall have the reference implementation copied from PROJECT-FORTH-WORKS See PROJECT-FORTH-WORKS

Testing: (Optional

HEX 4 , 41 C, 39 C, 45 C, 30 C, HERE 4 CELL+ - $@ TYPE | A9E0 HEX 2 , 41 C, 39 C, HERE 2 CELL+ - $@ PAD $! PAD $@ TYPE | A9 HEX 2 , 41 C, 39 C, HERE 2 CELL+ - $@ 2DUP PAD $! PAD $+! PAD $@ TYPE | A9A9 HEX 2 , 41 C, 39 C, HERE 2 CELL+ - $@ PAD $! HERE 1 - C@ PAD $C+ PAD $@ TYPE | A99

The testresults are after the vertical bar. If the proposal surviive the first ridicule I shall edit to correspond test harness in Appendix F.

[314] 2023-11-12 13:26:28 albert wrote:

proposal - Appendix F doesn't cater for strings.

Author:

Albert van der Horst

Change Log:

A list of changes to the last published edition on the proposal.

Problem:

The test harness of appendix F behaves as if there exits only numbers. I'd like to test HERE HEX 40 C, 41 C, HERE 2 TYPE and check that the output is "AB"

Solution:

I can't think of any. A half solution is accepting S" AB" or "AB" om a test harness.

Typical use: (Optional)

Proposal:

I hope this is taken up by someone.

Reference implementation:

Testing: (Optional)

Replies

[r1125] 2023-10-27 20:22:13 ruv replies:

proposal - New words: latest-name and latest-name-in

Author

Ruv

Change Log

2023-10-22 Initial revision
2023-10-23 Add testing, examples, a question to discuss, change the throw code description
2023-10-27 Some rationales and explanations added, the throw code description changed back, better wording in some places

Problem

In some applications, mainly in libraries and extensions, the capability to obtain the most recently added definition is very useful and demanded.

For example, if we are creating a library for decoration, tracing, support for OOP, simple DSLs (e.g., to describe Final State Machines), etc — it is always useful to have an accessor to the recent definition, instead of redefining a lot of words to define such an access method yourself, or juggling with the input buffer and search.

However, many Forth systems have such internal methods to access the recently added word. Among them: latest ( -- nt|0 ), last @ ( -- nt|0 ), latestxt ( -- xt|0 ), etc.

And additionally, there has been much discussions regarding standardization of such a method in recent decades. For example, Elizabeth D. Rather wrote on 2011-12-09 in comp.lang.forth:

AFAIK most if not all Forths have some method for knowing the latest definition, it's kinda necessary. The problem is, that they all do it differently (at different times, in different forms, etc.), which is why it hasn't been possible to standardize it.

Although it's a system necessity, I haven't found this of much value in application programming.

Elizabeth D. Rather

It's true: depending on the system, an internal method can return the recent word regardless of the compilation word list, or depending on the compilation word list, a completed definition, or not yet completed definition, also unnamed definition, or only named definition, etc.

Thus, although almost every Forth system contains such a method, there is no portable way for programs to obtain the latest definition.

Solution

Let's introduce the following words:

LATEST-NAME-IN ( wid -- nt|0 )
LATEST-NAME ( -- nt )

The first word returns the name token for the definition whose name was placed most recently into the given word list, or zero if this word list is empty.

The second word returns the name token for the definition whose name was placed most recently into the compilation word list, or throws an exception if there is no such definition.

These words do not expose or limit any internal mechanism of the compiler. They just provide information about word lists, like the words FIND-NAME-IN, FIND-NAME, and TRAVERSE-WORDLIST do.

This words are intended for programs. The system may use them, but is not required to do so. The system may continue to use its internal LAST, LATEST, or whatever it was using before.

It seems, the best place for these words is the section 15.6.2 Programming-Tools extension words), where TRAVERSE-WORDLIST is also placed.

Rationale

Connection with word lists

By considering definitions in the frame of a word list only, we solve several problems, namely:

A word list contains only completed definitions (see the accepted proposal #153 Traverse-wordlist does not find unnamed/unfinished definitions). This eliminates the question of whether the word of returned nt is finished — yes, it is always finished (completed).
Nameless definitions are not considered since they are not placed into the compilation word list (regardless of whether the system creates a name token for them, or places them into an internal system-specific word list).
An extension or library can create definitions in its internal word list for internal purposes. And it will not affect the compilation word list or other user-defined word lists. Thus, the user of such library always gets the expected result from latest-name (regardless of what words are created by this library for internal purposes on the fly).

Return values

As a matter of practice, almost all the use cases for the word LATEST-NAME imply that the requested definition exists, and if it doesn't exist, only an error can be reported. So the option to return 0 by this word only burdens users with having to analyze this zero, or redefine this word as:

: latest-name ( -- nt ) latest-name dup 0= -80 and throw ;

If the user needs to handle the case where the compilation word list is empty, they can use the word latest-name-in as:

get-current latest-name-in dup if ( nt ) ... else ( 0 ) drop ... then

Implementation options

If the the word list structure in a Forth system contains information about the latest placed definition, the implementations for the proposed words are trivial.

In some plausible Forth systems, the word list structure doesn't contain any information about the definition that was placed into this word list most recently. Such systems might not provide the proposed words, or they are changed to keep the mentioned information in the word list structure. It seems, in most systems the word list structure contains this information.

If a system does not implement The optional Search-Order word set, it might not provide the word LATEST-NAME-IN.

Naming

The names LATEST-NAME-IN and LATEST-NAME of new words are similar to FIND-NAME-IN and FIND-NAME by the form. Stack effects are also similar.

The difference is that find is a verb, but latest is an adjective (or sometimes a noun, see Wiktionary). Both are historical in their use in naming words. As well as "NAME".

In Forth-84 "NAME" in word names denoted NFA (name field address), and now it denotes a name token, which is the successor of NFA. In all standard words, e.g. FIND-NAME, NAME>STRING, NAME>COMPILE, etc. (except PARSE-NAME), "NAME" denotes a name token.

NB: the term "token" in "name token" does not mean a character sequence! It's used in a general sense, like "something serving as an expression of something else" (see Wiktionary).

Throw code description

If the throw code description states that there is no latest name, it can be confusing since latest name in some sense probably always exists.

Therefore, it's better to say: "the compilation word list is empty" — it is what actually happens.

Things to discuss

Is it worth introducing the word LATEST-NAME-XT ( -- xt )?

If name>interpret never returns 0 (see my comment), this word can be implemented as:

: latest-name-xt ( -- xt ) latest-name name>interpret ;

The desired (and much discussed) pattern is:

defer bar

: foo ... ; latest-name-xt is bar

Sometimes the name "it" has been suggested for this word, but this name is too short and has more chance for conflicts. Guido Draheim wrote in comp.lang.forth on 2003-03-16:

I think that everyone has been thinking of using IT for something really clever, it's a nice short word - and I'd say that we should leave it for application usage.

I want to support that argument also with real life experience in the telco world where there are a whole lot of abbreviations for various services, signals, connectors around. All too often now I see people making a SYNONYM at the file-start to get a second name for an ANS forth word that is needed in the implemenation but coincides with a common term of the application.

This seems convincing to me.

Typical use

: STRUCT: ( "name" -- wid.current.old u.offset )
  GET-CURRENT  VOCABULARY
  ALSO  LATEST-NAME NAME> EXECUTE  DEFINITIONS
  0
;

  \ In the application's vocabulary
  : IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

  DEFER FOO

  : BAR ... ; IT IS FOO

Proposal

Add the following line into the Table 9.1: THROW code assignments:

-80 the compilation word list is empty

Add the following sections into 15.6.2 Programming-Tools extension words:

15.6.2.2541 LATEST-NAME-IN

( wid -- nt|0 )
Remove the word list identifier wid from the stack. If the corresponding word list is empty, then return 0; otherwise, return the name token nt for the definition whose name was placed most recently into this word list.

15.6.2.2542 LATEST-NAME

( -- nt )
Return the name token nt for the definition whose name was placed most recently into the compilation word list, if such a definition exists. Otherwise, throw exception code -80.

Reference implementation

In this implementation we assume that wid is an address that contains nt of the most recently placed definition name into the word list wid.

: LATEST-NAME-IN ( wid -- nt|0 ) @ ;

: LATEST-NAME ( -- nt )
  GET-CURRENT LATEST-NAME-IN  DUP IF EXIT THEN  -80 THROW
;

Testing

: IT ( -- xt ) LATEST-NAME NAME>INTERPRET ;

T{ : LN1 ; IT  ' LN1 =  -> TRUE }T
T{ :NONAME [ IT ] LITERAL ; EXECUTE  ' LN1 =  -> TRUE }T
T{ : LN2 [ IT ] LITERAL ; LN2  ' LN1 =  -> TRUE }T

[r1126] 2023-10-31 08:45:44 ruv replies:

proposal - New words: latest-name and latest-name-in

See also some additional details and usage examples in ForthHub discussion#153

[r1127] 2023-11-03 22:24:43 ruv replies:

example - Digests and Meta discussion

New Digest

New digests are cool.

Currently a digest email shows all contributions at the first, and then all replies.

I think, it's better to have end-to-end chronological sorting without separation to these groups. Since in a digest, it does not matter whether it's aa first post or a reply. But unchronological ordering (between the groups) is confusing.

[r1128] 2023-11-07 09:16:20 albert replies:

proposal - Obsolescence for SAVE-INPUT and RESTORE-INPUT

There is a technique to reuse input that doesn't disturb the current input stream. That is saving and restoring >IN. This may be restricted to the current input buffer, but that may cover a substantial part of the cases.

Clearly the original intent was a possibility to be a factor of INCLUDE , interrupting the current input stream. However as the proposal points out, this is not going to be a useful system word, rather a later burden. I would call this "meddling in matters that should be up to the implementer". This kind of words should be exterminated from the standard. They are almost never needed, and hard to implement. So obsolete these words!

[r1129] 2023-11-07 09:31:56 albert replies:

proposal - Relax documentation requirements of Ambiguous Conditions

The preprogrammed answers make no sense in this case.

What I want to say that is that ciforth doesn't conform to the standard requirements of documentation. The only documentation is that if you invoke an ambiguous condition, the operating system will crash the program. There is an exception. If you depend on a ciforth documented behaviour, that is not guaranteed by the standard, you have a ciforth dependancy, and you are entitled to report defects against that behaviour.

Actually the requirement as it stands is too severe. The behaviour on actual ambiguous conditions is not necessarily under the control of the Forth implementor. The original wording required a massive retesting for each new release of MS-Windows, possibly for security releases as well. This proposal is actually a confirmation of existing practice.

[r1130] 2023-11-07 09:36:02 albert replies:

proposal - Exclude zero from the data types that are identifiers

As 0 and 1 is widely used in Unix-like systems for file-identifiers , I would like to exclude those from the proposal.

[r1131] 2023-11-07 09:46:37 albert replies:

proposal - PLACE +PLACE

I'm vehemently opposed. The first character is used as count. This is wrong. If the string is stored with a count, the count should be an int. So the practice since the 1970 is continued. If a word like PLACE is standardised, it guarantees backwardness. I applaud that standardisation is moving away from words that requires the "counted strings' and use an 'addr u' description. Lets continue.

[r1132] 2023-11-09 14:28:48 ruv replies:

proposal - Exclude zero from the data types that are identifiers

As 0 and 1 is widely used in Unix-like systems for file-identifiers , I would like to exclude those from the proposal.

The proposal does not mention 1, but -1.

Regarding 0. How does your Forth system behave if the OS returns 0 as a file-identifier for the input source? (for example, if it's an stdin pipe)

[r1133] 2023-11-09 17:38:03 ruv replies:

proposal - Relax documentation requirements of Ambiguous Conditions

This proposal is just wording change, and it affects neither existing Forth systems nor existing programs.

After this change in the standard, new Forth systems (or new versions) are allowed to provide less documentation than before this change, but are not obligated to do it. Therefore, a system cannot avoid implementing this proposal, and cannot fail to implement this proposal.

Concerning programmers/users. If a user has ever used the system's documentation about ambiguous conditions — does that mean that they have used this proposal or not used this proposal?

[r1134] 2023-11-09 18:47:36 ruv replies:

proposal - PLACE +PLACE

the underlying system might provide PLACE or +PLACE, but with a different behaviour, rendering the rest of the standard program invalid.

If PLACE and +PLACE are standardized, then a compliant system (if it decides to provide them) has to define them with the exact proposed behaviour.

Actually, this problem exists for many well-known words. If we add them all to the standard, it will bloat the standard too much (and I don't mention supporting of bad practice). Therefore, this problem should be solved in another way.

A possible solution is to support URI in include, require, etc. If the system provides an efficient implementation for a module, it maps module's URI on a local file (or an internal entity); otherwise it downloads and caches the implementation. Also, some fallback mechanism can be supported (e.g., to provide local implementation).

The specification for each module or library is maintained separately from the standard and specifications of other modules.

So a program can use:

require https://theforth.net/stdlib/cstring/place/?v=1.*

: foo ... place ... ;
: bar ... +place ... ;