Proposal: Common terminology for recognizers discurse and specifications

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

ruvavatar of ruv Common terminology for recognizers discurse and specificationsProposal2020-09-07 13:56:43

Author

Ruv

Change Log

  • 2020-09-07 The document is published at forth-standard.org

Problem

The different proposals about recognizers use the different terminology that conflict with each other and with the language of the Standard. We have many examples of that.

Also, the wrong terminology produces the names for words that have confusing etymology. And after all, it discredits the Forth language in the more wide community of programmers.

Solution:

Let's use the common terminology that is correct, accurately defined, and compatible with the Standard language. I suggest the following one. The latest version is available at GitHub. Improvements are welcome.

Proposal

tuple: a logical union of several elements that keeps their order; when a tuple is placed into the data stack, the rightmost element in writing is the topmost on the stack, and floating-point numbers are placed into the floating-point stack.

lexeme: a syntactic unit of a program (a source code); unless otherwise noted, it is a sequence of non-blank characters delimited by a blank.

to recognize a lexeme: to determine the interpretation semantics and the compilation semantics for the lexeme in the current dynamic context.

to interpret a lexeme: to perform the interpretation semantics for the lexeme in the current dynamic context.

to compile a lexeme: to perform the compilation semantics for the lexeme in the current dynamic context.

to translate a lexeme: to interpret the lexeme if interpreting, or to compile the lexeme if compiling.

dynamic context of a lexeme: information that is available at the time the lexeme is translated.

unqualified token: a tuple of arbitrary data objects that determines the interpretation semantics and the compilation semantics for a lexeme in its dynamic context.

token: unqualified token (a synonym, when it is clear from context).

to interpret a token: to perform the interpretation semantics that are determined by the token.

to compile a token: to perform the compilation semantics that are determined by the token.

to translate a token: to interpret the token if interpreting, or to compile the token if compiling.

token translator: a Forth definition that translates a token; also, depending on context, an execution token for this Forth definition.

resolver: a Forth definition that recognizes a lexeme producing a tuple of a token and its token translator.

token descriptor object: an implementation dependent data object (a set of information) that describes how to interpret and how to compile a token.

token descriptor: a value that identifies a token descriptor object; also, less formally and depending on context, a Forth definition that just returns this value, or a token descriptor object itself.

fully qualified token: a tuple of a token and its token descriptor.

recognizer: a Forth definition that recognizes a lexeme producing a fully qualified token.

simple recognizer: a recognizer that may produce the same token descriptor only.

compound recognizer: a recognizer that can produce the different token descriptors.

perceptor: a recognizer that is currently used by the Forth text interpreter to translate a lexeme.

default perceptor: the perceptor before it was changed by a program.

Rationale

The need for some terms is obvious from the comparison of some proposals.

"Perceptor"

The less obvious term is perceptor. Why not just "current recognizer"? One argument is that "current" leads to longer names of the words. In general, we introduce new nouns to make things shorter. E.g., how to name the word that sets (or selects) a recognizer that will be used by the Forth text interpreter?

We have: set-current-recognizer vs set-perceptor. The latter is shorter. The former also has wrong connection to the set-current word (its name is suboptimal too).

"Token"

A token brings all required information to interpret it and to compile it, if you know what kind this token is.

An "execution token" is a token, the same as a "token of a single cell number" is a token too.

Examples:

  • Having "execution token" on the stack, you know how to interpret it, and how to compile it. So, it brings all required semantics for translation. So, it's a token.
  • Having "name token" on the stack, you know how to interpret it, and how to compile it. So, it brings all required semantics for translation. So, it's a token.
  • Having "token of a single cell number" on the stack, you know how to interpret it, and how to compile it (since it represents the number itself). So, it brings all required semantics for translation. So, it's a token.
  • Having "token of a double cell number" on the stack (it takes two cells), you know how to interpret it, and how to compile it (since it represents the number itself). So, it brings all required semantics for translation. So, it's a token.
  • Having "string literal token" on the stack (it takes two cells), you know how to interpret it, and how to compile it. So, it brings all required semantics for translation. So, it's a token.

Discussion

If we want to allow a recognizer to have other effects beyond determining of the semantics for a lexeme, then the definition should be changed accordingly.

The definition of "to translate" can be extended to other modes.

We can replace "token" with another appropriate English noun. But not with "rectype", that isn't an English word, and has inappropriate etymology.

We can replace "descriptor" with "type", but the former sounds better for me. The words "type" and "class" have more abstract connotation than "descriptor". In any case, the corresponding object describes something. So "descriptor" looks like a good choice.

We can replace "preceptor" with another appropriate English noun, the better if it's a single word (see comparison of some variants).

ruvavatar of ruv

Correction:

But not with "rectype", that isn't an English word, and has inappropriate etymology.

Indeed, "rectype" is an alternative not to "token" but to "token descriptor" (or just "descriptor", when it is obvious from the context).

Reply New Version