Digest #100 2020-06-10


[136] 2020-06-09 16:36:22 ruv wrote:

proposal - Same name token for different words

Have a look at Execution tokens:

Different definitions may have the same execution token if the definitions are equivalent.

Should we have to add a similar clause for name tokens? E.g.

Different words may have the same name token if their names are identical, the interpretation semantics for them are equivalent, and the compilation semantics for them are equivalent.


[r365] 2020-05-30 04:18:16 MarcelHendrix replies:

requestClarification - Word set of S>D word

Didn't you forget about number formatting, interfacing with FM/MOD, SM/REM, REPOSITION-FILE and FILE-SIZE ?

[r366] 2020-05-30 04:18:38 MarcelHendrix replies:

requestClarification - Word set of S>D word

Didn't you forget about number formatting, interfacing with FM/MOD, SM/REM, REPOSITION-FILE and FILE-SIZE ?

[r367] 2020-05-30 09:36:27 AntonErtl replies:

requestClarification - Word set of S>D word

A quick (and possibly incomplete) search of words found the following words that take a d as input and do not require the double wordset to be present:

core: fm/mod sm/rem

float: d>f

Number conversion words and REPOSITION-FILE take an ud, so do not need S>D. Anyway, the words above seem to be good reasons for having s>d in core.

But in any case, it's water down the river. There is very little point in changing the wordset of a word in a later release. There is probably also little harm in this case, because most (all?) standard systems include double by default, but that reinforces the point that there is little point.

[r368] 2020-05-30 12:11:07 StephenPelc replies:

proposal - Recognizer RfD rephrase 2020

I apologise for my delay in responding to Ulli's document.

Overall I think that it's a really good first step, and Ruv's comments are also good.

I do not want to go back to an older proposal in particular, I want a proposal that ordinary mortals can understand. I just want clarity.


I don't much care whether the recogniser triple is called a rid or a rit. The more neutral term seems to be rid, but rit is now in use, so let's keep to it. Either can be pronounced clearly in discussions.

In normal Forth usage the word that lays the implementation- dependent data would be called RECOGNIZER, (with a comma). It should return a rit. What's the point of having an identifier if we don't use it? If we use this terminology, then the obvious way to refer to return values is RIT-NUM, RIT-FLOAT and so on.


Do these ever get used outside the internals of other words? If not, a standard team has no business prescribing these. How many other implementations of a rit exist (in the wild) apart from the xt triple? Ruv's point about a use-case is well taken here.

The POSTPONE action xt is needed for two reasons:

  1. POSTPONE needs it
  2. Not all parsers are for literals, e.g. OOP parsers.

We cannot predict how recognisers will be used, so attempting to automate the POSTPONE actions is doomed to failure. OOP is not a hand-waving prediction, VFX's CIAO and ClassVFX packages both use recognisers.

STATE dependency

Having RECOGNISE be dependent of STATE is horrible.

[r369] 2020-06-02 03:31:49 ruv replies:

proposal - Recognizer RfD rephrase 2020

Regarding time-shifting action

I am not sure if changing the name now would be helpful, or if it's too late.

I think correcting the name (and corresponding terminology) will be helpful. Since we should not call something "postpone action" if it isn't actually "postpone action" (well, perhaps it is "postponing" in some sense, but not in the sense of POSTPONE word, so it is confusing). Moreover, we still need to refer in discussions the both conceptions: full postpone action and "time-shifting" action.

OTOH, it seems "time-shifting" is a sub-optimal term.

What the corresponding Forth definition should do? It should take the token from the stacks (the data stack, floating-point stack, or something else) and compile code that when executed will place the token on the stacks. In other words it compiles the token as numbers (i.e. literally). So it is distinct from the "compile action" that performs the compilation semantics for the source lexeme.

I suggest the term reproducing. So we can have: interpreting action, reproducing action, compiling action, postponing action (if any).

to reproduce a token: to take the token and compile code that when executed will place the token on the stacks

[r370] 2020-06-02 04:23:49 ruv replies:

proposal - Recognizer RfD rephrase 2020

Solution for items 3 and 5.

I also have found a solution for a "RECOGNIZER" mess and a "triple" of xts.

I suggest the term token descriptor (or just descriptor).

token descriptor: an implementation dependent data object (a set of information) that describes translating a token.

Also I have one more idea to discuss how to avoid providing all these (one, or two, or three, or even four) actions when you create a token descriptor.

So far so good.

Further development

I created SpfDev team in ForthHub, and fep-recognizer private repository to design the specification for Recognizer (fep from Forth Enhancement Proposal, after PEP). I included some people that I have found in ForthHub and who made proposals here. Write to me here or in a distinct issue if you are interested to be included.

I published a draft for terms definitions and data types, and an issue for feedback. I created this draft since I see too more issues in the current proposals from the formal point of view, and no answers. See also news:rb43tl$elj$1@dont-email.me (copy). Now I'm looking forward to your thoughts.

Please let me know if you think it is worth to make fep-recognizer repository public (since Gerald Wodni said "normally proposals are developed in smaller groups and only presented to the public once they are pretty solid"). I hope the GitHub tools help us to better organize collaboration on this work.

[r371] 2020-06-02 13:08:01 StephenPelc replies:

proposal - Recognizer RfD rephrase 2020

Naming of return values

There's a proposal that we should standardise the values returned by RECOGNIZE (RECTYPE-xxx, rit-xxx ...). After a while debugging two new FP packages on the same host, and even loading one after the other, I believe that this proposal is doomed to fail.

If two float packs return the same value, they are impossible to distinguish and hence to debug. If we return rit-SSE64 and rit-NDPfloat then they can be separated. The source code becomes impenetrable without separate names.

We should also acknowledge that parsers may return one or more rits on success, e.g R:SNUM and R:DNUM . There's no point saying "don't do that"; such systems exist in the wild. They are a natural consequence of what Ruvim calls compound recognisers.

[r372] 2020-06-08 09:41:43 rrt replies:

requestClarification - Numeric overflow/underflow

For reference, the gforth source code that Anton refers to is the word (+loop), which you can find by searching for condbranch((+loop), at http://git.savannah.gnu.org/cgit/gforth.git/tree/prim

[r373] 2020-06-08 09:49:29 rrt replies:

requestClarification - Numeric overflow/underflow

A more helpful link direct to gforth's source for (+loop): http://git.savannah.gnu.org/cgit/gforth.git/tree/prim#n373

[r374] 2020-06-09 16:59:51 ruv replies:

proposal - Recognizer RfD rephrase 2020

Standardizing token descriptors

I think, a set of the basic descriptors should be standardized. We need the descriptors at least for the following tokens: xt, nt, x, xd, f, c-addr u (a string), and also an implementation dependent token for the result of FIND (for back compatibility) — seven descriptors in total.

Should the values be standardized (as in the case of throw codes), or the names? The values are required for binary interoperability — not our case at the moment; also, they allow to reduce the number of names. But in any case, it is better to have names instead of just magic numbers in source code.

I would suggest to form these names using a mnemonical prefix TD- (after "Token Descriptor"). E.g. TD-XT, TD-NT, TD-LIT, TD-2LIT, TD-FLIT, TD-SLIT, TD-WORD.

The names TD-LIT, TD-2LIT, TD-FLIT, TD-SLIT are after the corresponding standard words LITERAL, 2LITERAL, FLITERAL, SLITERAL.

Other variants for them: TD-NUM, TD-2NUM, TD-FLOAT,TD-STRING — are not well mapped into the corresponding words for compilation the tokens.

An FP package, as well as any other, may provide its own token descriptor, if it is reasonable.