Digest #108 2020-08-09
I believe the author intended to use the word 'test' instead of the word 'teat'.
I led the effort to specify the input stream in a way that works across files, blocks, keyboard, and string input. Before the standard, I had used, for many years, a stream-oriented approach in my own systems, so I fully understand why that is appealing to you. But such an approach was just not feasible when considering the existing practice around >IN and the variables that are now hidden within SOURCE . As a result, the standard specifies an input model that is intended to be strictly line-oriented - the input buffer contains exactly one line. PARSE works entirely within a line - so "0 PARSE" will return the remainder of the line. To get the next line you must do an explicit REFILL. If you want a parsing operation to work across multiple lines, it must be explicitly programmed to do REFILL as necessary. It is possible that the text does not make this clear, or that it is just wrong, but that behavior was what I intended, and I believe that the committee understood that to be the case. Your approach of reading the entire file into one big buffer is certainly feasible - I do that myself in some scenarios - but to be compliant with the standard as I understand it, REFILL needs to look for the next line delimiter and arrange for the input buffer to contain just the one line.
"Extend the execution semantics of 6.2.2125 REFILL with the following:
When the input source is a text file, attempt to read the next line from the text-input file. If successful, make the result the current input buffer, set >IN to zero, and return true. Otherwise return false."
It technically doesn't say you have to read only one line on the first read. But I see your point. Why is the standard dictating implementation instead of behavior?
But it also says this:
"When the input source is a string from EVALUATE, return false and perform no other action."
And I'm using EVALUATE to do the buffers. I use EVALUATE for everything... So... is this compliant or non compliant?
" But such an approach was just not feasible when considering the existing practice around >IN and the variables that are now hidden within SOURCE"
The way I do it is: each buffer, including the terminal input buffer, has its own >IN offset. And then I use a current input buffer variable to determine what the current >IN and SOURCE are. This also allows for including one file within another file. When the included file finishes, the first file continues from where it left off. Basically the include does this: save the current input buffer to a local variable, load the file to a buffer, set the current input buffer to the new buffer, evaluate the buffer, put the current input buffer back the way it was.
The main reasons for doing an entire file in one go are:
- There is a lot of operating system overhead involved with doing a read from a file. Doing one big read is a lot more efficient and faster than doing a lot of smaller reads.
- It means I need a lot less words to manipulate files. I only need two, one to load the entire file to a buffer, and one to save a buffer to a file. (The saved file overwrites any existing file... so the whole buffer is the file after the save.)
- It really simplifies the parsing code. I just do an EVALUATE on the whole buffer. And parse for a set of delimiters instead of just a space.
But I am wondering what specific standard behaviors will break by doing it this way? Won't all the words be parsed in exactly the same way? If not, is it possible to get the standard changed to allow this? And technically, the way it's worded now doesn't prevent someone implementing things this way...
I forgot to mention... include frees the buffer after the evaluate... kind of important to not leak memory :-)
"PARSE works entirely within a line - so "0 PARSE" will return the remainder of the line"
I just looked through PARSE and I couldn't find where it says where it only works within a line. I checked my parse and it ends when the delimiter is found or it reaches the end of the buffer. Changing my parse to only go to the end of a line is an easy fix, I just add the set of line terminators to the list of delimiters for a parse. But, is this really the intent of the standard?
That means you can't use ( to do multi line comments in a file. It also means you can't pass strings to EVALUATE with line terminators in them. It's also extra rules for special cases... Also no multi line ." C" " S" or .( in a file. I find multi line ( useful.
If this is the intent of the standard, can PARSE and all the words using it be changed to say they only go to the end of the line if the terminator is not found? I would rather things go the other way though, where these words can operate over multiple lines if the buffer is from a file.
In other words, this is the definition of parse area in the standard: "parse area: The portion of the input buffer that has not yet been parsed, and is thus available to the system for subsequent processing by the text interpreter and other parsing operations."
I interpreted this as I could change the input buffer to be whatever I wanted, including a buffer I loaded from a file. The thing that is missing from the definition is that the parse area can not contain line terminator characters, and what to do if they are found. In reality, you can leave this definition alone, and all the words that use parse alone, and just put something in PARSE that says what to do if the end of the parse area is reached without finding the delimiter. And change PARSE-NAME to include line terminators as additional white space delimiters when parsing from a buffer from a file or EVALUATE. And change \ to use line terminators as delimiters when parsing from a buffer from EVALUATE or a file. The wording of SOURCE does not need to change when parsing from a file. Neither does the wording of >IN need to change. They just return values relative to the current input buffer when loading from a file... which is what they already say.
So in reality, to have parsing from a buffer loaded from a file be compliant with the standard all that needs to happen is a slight rewording of \ and PARSE-NAME. \ would have to parse to a line terminator or the end of the parse area. and PARSE-NAME would have to use white space as delimiters which would include the space character and line terminator characters. This would also fix the ambiguous condition of what happens when someone passes a string to EVALUATE which contains line terminators.
PARSE needs to be reworded anyways because it does not specifically say the parse ends when the end of the parse area is reached... (I got confused by the wording.)