Proposal: Case insensitivity

Informal

This proposal has been moved into this section. Its former address was: /standard/usage

This page is dedicated to discussing this specific proposal

ContributeContributions

AntonErtlavatar of AntonErtl Case insensitivityProposal2019-09-06 18:27:48

Problem

This is an alternative proposal for the same problem as in proposal Case sensitivity

Forth-2012 states:

Programs that use lower case for standard definition names or depend on the case-sensitivity properties of a system have an environmental dependency.

This differs from common practice:

It is common practice for programs to use lower case for standard definition names, and also not uncommon practice to use capitalized (i.e., mixed-case) names for some standard definition names.

It is common practice for systems to support case insensitivity for ASCII characters, either by default (Gforth, iForth, SwiftForth, VFX), or after invoking a special command (SP-Forth).

Solution

Standardize the common practice of systems.

Typical use

Create a 5 cells allot

Remarks

What about non-ASCII characters? They are treated case-sensitively.

The advantages of this approach are: This approach is common practice. The implementation is relatively simple (especially if you consider the complexity of locale-dependent case insensitivity in UTF-8). Forth source files work independent of the encoding and locale, i.e., the system does not need to know the encoding to know whether a word matches a dictionary entry (of course, the application itself may be locale-dependent). The main purpose

The disadvantage of this approach is that users might be confused by the difference in case sensitivity between ASCII and non-ASCII characters. E.g., "WIEN" would match "Wien", but "KÖLN" would not match "Köln".

Comparison with the Case sensitivity proposal

The present proposal covers the practice of using mixed-case names. It makes this part of the standard air-tight rather than being unnecessarily loose: while having special case-sensitivity rules for standard words and other rules for other words has been discussed, the common and simpler practice is to just implement case insensitivity for ASCII characters.

Proposal

In 3.3.1.2, delete

Programs that use lower case for standard definition names or depend on the case-sensitivity properties of a system have an environmental dependency.

In 3.4.2, replace

The case sensitivity (whether or not the upper-case letters match the lower-case letters) is implementation defined. A system may be either case sensitive, treating upper- and lower-case letters as different and not matching, or case insensitive, ignoring differences in case while searching.

The matching of upper- and lower-case letters with alphabetic characters in character set extensions such as accented international characters is implementation defined.

A system shall be capable of finding the definition names defined by this standard when they are spelled with upper-case letters.

with

ASCII characters are matched case-insensitively. All other characters are matched exactly (case sensitively).

Reference implementation

System-dependent

Testing

T{ 1 constant case-insensitive -> }T
T{ 2 Constant Case-INSENSITIVE -> }T
T{ case-insensitive -> 2 }T

Experience

Gforth has implemented this approach since its inception. Several other systems (SwiftForth, VFX, iForth) have also done so for as long as I have used them.

Many published programs use lower-case or mixed-case system words.

ruvavatar of ruv

Testing

I think, it is better to use SEARCH-WORDLIST in Testing, — to get a careful failing of the test (if any) and avoid a "not-found" stop-error.

Effect on performance

Comparing to my proposal (that is to support lower case only) — this solution affects performance more significant.

By my test in SP-Forth, the lowercase synonyms of the standard words yields +23-24% in time of SEARCH-WORDLIST (for FORTH-WORDLIST wordlist), when insensitive search yields 51-71% in time.

Also, lowercase approach affects only FORTH-WORDLIST, when case-insensitive approach affects any wordlist. In some use-cases it can be critical (for example, when a wordlist is used for case-sensitive file names).

OTOH, synonyms approach perhaps require more space.

Optionality

What do you think to have a standard API to change this case-sensitivity?

I.e. if a standard Forth system supports case-insensitive search, it should also supports this API to turn off (or turn on) case-sensitivity?

Or, maybe, a separate word for case-sensitive search?

ruvavatar of ruv

Clarification of "my proposal (that is to support lower case only)" — I meant, to support lower case only (and not mixed case) in addition to upper case of the standard words.

Regarding use of mixed case in my proposal. In practice, it is used for some several standard words only. If a user wants to use mixed case for some words, he have to just create the synonyms for these words in the wanted case.

BerndPaysanavatar of BerndPaysan

The wording of the proposal should leave a way out for case sensitive systems; they will not go away and likely comply with other parts of the standard to keep them in.

At the moment, we have programs with environmental dependency on case insensitivity. This will go away when case insensitivity in one way or the other is added.

After acceptance, a system that still is case sensitive shall have an environmental restriction on programs using all-uppercase standard words; it's not non-standard, it's just restricting programs.

AntonErtlavatar of AntonErtl

Effect on Performance

I have measured this in <2002Nov22.175007@a0.complang.tuwien.ac.at>:

Gforth stores words in the original case, and uses a case-insensitive compare. I did some timings in Gforth on an Athlon:

1) searching for "execute" in a (case-sensitive) wordlist that contains only "execute": 2009 cycles

2) searching for "execute" in a (case-insensitive) wordlist that contains only "execute": 2042 cycles

3) searching for "execute" in forth-wordlist (case-insensitive): 2117 cycles.

Reply New Version