READ-LINE

( c-addr u1 fileid -- u2 flag ior )

Read the next line from the file specified by fileid into memory at the address c-addr. At most u1 characters are read. Up to two implementation-defined line-terminating characters may be read into memory at the end of the line, but are not included in the count u2. The line buffer provided by c-addr should be at least u1+2 characters long.

If the operation succeeded, flag is true and ior is zero. If a line terminator was received before u1 characters were read, then u2 is the number of characters, not including the line terminator, actually read (0 <= u2 <= u1). When u1 = u2 the line terminator has yet to be reached.

If the operation is initiated when the value returned by FILE-POSITION is equal to the value returned by FILE-SIZE for the file identified by fileid, flag is false, ior is zero, and u2 is zero. If ior is non-zero, an exception occurred during the operation and ior is the implementation-defined I/O result code.

An ambiguous condition exists if the operation is initiated when the value returned by FILE-POSITION is greater than the value returned by FILE-SIZE for the file identified by fileid, or if the requested operation attempts to read portions of the file not written.

At the conclusion of the operation, FILE-POSITION returns the next file position after the last character read.

See:

Rationale:

Implementations are allowed to store the line terminator in the memory buffer in order to allow the use of line reading functions provided by host operating systems, some of which store the terminator. Without this provision, a transient buffer might be needed. The two-character limitation is sufficient for the vast majority of existing operating systems. Implementations on host operating systems whose line terminator sequence is longer than two characters may have to take special action to prevent the storage of more than two terminator characters.

Standard Programs may not depend on the presence of any such terminator sequence in the buffer.

A typical line-oriented sequential file-processing algorithm might look like:

BEGIN                        ( )
    ... READ-LINE THROW      ( length not-eof-flag )
WHILE                        ( length )
    ...                      ( )
REPEAT DROP                  ( )
READ-LINE needs a separate end-of-file flag because empty (zero-length) lines are a routine occurrence, so a zero-length line cannot be used to signify end-of-file.

Testing:

200 CONSTANT bsize
CREATE buf bsize ALLOT
VARIABLE #chars

T{ fn1 R/O OPEN-FILE SWAP fid1 ! -> 0 }T
T{ fid1 @ FILE-POSITION -> 0. 0 }T
T{ buf 100 fid1 @ READ-LINE ROT DUP #chars ! ->
    <TRUE> 0 line1 SWAP DROP }T

T{ buf #chars @ line1 COMPARE -> 0 }T
T{ fid1 @ CLOSE-FILE -> 0 }T

ContributeContributions

AntonErtlavatar of AntonErtl Dealing with newlinesComment2016-02-02 15:47:01

Up to Gforth 0.4, we used the C approach to text files: let the C library translate between OS-dependent newlines in the file and one newline character (typically LF) in memory on input and on output. That approach turned out to cause problems when dealing with CRLF-containing files in combination with READ-FILE and REPOSITION-FILE (among other cases), because READ-FILE referred to the in-memory length, while REPOSITION-FILE referred to the in-file length.

So, in Gforth 0.5 we switched to opening all files as binary files (whether BIN is used in fam or not); READ-FILE recognizes all three kinds of newlines (LF, CR, and CRLF), and CR and WRITELINE output the standard newline of the platform (LF on Unix, CRLF on Windows). If the user reads text files with READ-FILE or writes them with WRITE-FILE, they have to worry about that themselves.

The experience with this new (well, by now,16-year old) approach is positive; no problems have been reported, and the problems we had with the previous approach were solved.

This approach works so well, because Forth has tended to avoid dealing with newlines as characters or strings: We have CR and WRITE-LINE for outputting a newline, and READ-LINE and ACCEPT for inputting lines. In all these places the actual value of the newline is abstracted away. The C approach, OTOH is due to the fact that in the Unix roots of C newline was visible as a single character, and they wanted to make programs written for that model run on OSs that have CRLF newlines.

, but no problems with that approach have been reported.

Reply