220.127.116.110 READ-LINE FILE
Read the next line from the file specified by fileid into memory at the address c-addr. At most u1 characters are read. Up to two implementation-defined line-terminating characters may be read into memory at the end of the line, but are not included in the count u2. The line buffer provided by c-addr should be at least u1+2 characters long.
If the operation succeeded, flag is true and ior is zero. If a line terminator was received before u1 characters were read, then u2 is the number of characters, not including the line terminator, actually read (0 <= u2 <= u1). When u1 = u2 the line terminator has yet to be reached.
If the operation is initiated when the value returned by FILE-POSITION is equal to the value returned by FILE-SIZE for the file identified by fileid, flag is false, ior is zero, and u2 is zero. If ior is non-zero, an exception occurred during the operation and ior is the implementation-defined I/O result code.
An ambiguous condition exists if the operation is initiated when the value returned by FILE-POSITION is greater than the value returned by FILE-SIZE for the file identified by fileid, or if the requested operation attempts to read portions of the file not written.
At the conclusion of the operation, FILE-POSITION returns the next file position after the last character read.
Standard Programs may not depend on the presence of any such terminator sequence in the buffer.
A typical line-oriented sequential file-processing algorithm might look like:
AntonErtl Dealing with newlinesComment2016-02-02 15:47:01
Up to Gforth 0.4, we used the C approach to text files: let the C library translate between OS-dependent newlines in the file and one newline character (typically LF) in memory on input and on output. That approach turned out to cause problems when dealing with CRLF-containing files in combination with READ-FILE and REPOSITION-FILE (among other cases), because READ-FILE referred to the in-memory length, while REPOSITION-FILE referred to the in-file length.
So, in Gforth 0.5 we switched to opening all files as binary files (whether BIN is used in fam or not); READ-FILE recognizes all three kinds of newlines (LF, CR, and CRLF), and CR and WRITELINE output the standard newline of the platform (LF on Unix, CRLF on Windows). If the user reads text files with READ-FILE or writes them with WRITE-FILE, they have to worry about that themselves.
The experience with this new (well, by now,16-year old) approach is positive; no problems have been reported, and the problems we had with the previous approach were solved.
This approach works so well, because Forth has tended to avoid dealing with newlines as characters or strings: We have CR and WRITE-LINE for outputting a newline, and READ-LINE and ACCEPT for inputting lines. In all these places the actual value of the newline is abstracted away. The C approach, OTOH is due to the fact that in the Unix roots of C newline was visible as a single character, and they wanted to make programs written for that model run on OSs that have CRLF newlines.
, but no problems with that approach have been reported.
AntonErtl Some clarificationsComment2021-11-01 19:19:13
The "0 <= u2 <= u1" is misleading. As becomes clear from the rest, 0 <= u2 < u1 is also guaranteed if the line terminator starts before u1 chars are received.
READ-LINE reads at most u1 characters that are not part of the line terminator. Line terminators of up to 2 chars can occur (i.e., CRLF). However, even with such a line terminator, it's enough to read u1+1 chars: if the line terminator does not start at the last char in the buffer, READ-LINE does not need to know if a line terminator follows right afterwards: It just returns u2=u1, no need to know about line terminators.
At least in Linux it is significantly faster to use input buffering than to always read u1+1 characters using a system call and reposition the file with another system call.
Acknowledgments: Discussions with ruv and dxforth resulted in this comment.