11 The optional File-Access word set

11.1 Introduction

These words provide access to mass storage in the form of "files" under the following assumptions:

  • files are provided by a host operating system;
  • file names are represented as character strings;
  • the format of file names is determined by the host operating system;
  • an open file is identified by a single-cell file identifier (fileid);
  • file-state information (e.g., position, size) is managed by the host operating system;
  • file contents are accessed as a sequence of characters;
  • file read operations return an actual transfer count, which can differ from the requested transfer count.

11.2 Additional terms

file-access method:
A permissible means of accessing a file, such as "read/write" or "read only".
file position:
The character offset from the start of the file.
input file:
The file, containing a sequence of lines, that is the input source.

11.3 Additional usage requirements

11.3.1 Data types

Append table 11.1 to table 3.1.

Table 11.1: Data types

Symbol Data type Size on stack

fam file access method 1 cell
fileid file identifier 1 cell

11.3.1.1 File identifiers

File identifiers are implementation-dependent single-cell values that are passed to file operators to designate specific files. Opening a file assigns a file identifier, which remains valid until closed.

11.3.1.3 File access methods (11.3.1.3)

File access methods are implementation-defined single-cell values.

11.3.1.4 File names

A character string containing the name of the file. The file name may include an implementation-dependent path name. The format of file names is implementation defined.

11.3.2 Blocks in files

Blocks may, but need not, reside in files. When they do:

  • Block numbers may be mapped to one or more files by implementation-defined means. An ambiguous condition exists if a requested block number is not currently mapped;
  • An UPDATEd block that came from a file shall be transferred back to the same file.

11.3.3 Input source

The File-Access word set creates another input source for the text interpreter. When the input source is a text file, BLK shall contain zero, SOURCE-ID shall contain the fileid of that text file, and the input buffer shall contain one line of the text file. During text interpretation from a text file, the value returned by FILE-POSITION for the fileid returned by SOURCE-ID is undefined. A standard program shall not call REPOSITION-FILE on the fileid returned by SOURCE-ID.

Input with INCLUDED, INCLUDE-FILE, LOAD and EVALUATE shall be nestable in any order to at least eight levels.

A program that uses more than eight levels of input-file nesting has an environmental dependency. See: 3.3.3.5 Input buffers, 9 The optional Exception word set.

11.3.4 Other transient regions

The system provides transient buffers for S" and S\" strings. These buffers shall be no less than 80 characters in length, and there shall be at least two buffers. The system should be able to store two strings defined by sequential use of S" or S\". RAM-limited systems may have environmental restrictions on the number of buffers and their lifetimes.

11.3.5 Parsing

When parsing from a text file using a space delimiter, control characters shall be treated the same as the space character.

Lines of at least 128 characters shall be supported. A program that requires lines of more than 128 characters has an environmental dependency.

A program may reposition the parse area within the input buffer by manipulating the contents of >IN. More extensive repositioning can be accomplished using SAVE-INPUT and RESTORE-INPUT.

See: 3.4.1 Parsing.

11.4 Additional documentation requirements

11.4.1 System documentation

11.4.1.1 Implementation-defined options

11.4.1.2 Ambiguous conditions

11.4.1.3 Other system documentation

  • no additional requirements.

11.4.2 Program documentation

11.4.2.1 Environmental dependencies

11.4.2.2 Other program documentation

  • no additional requirements.

11.5 Compliance and labeling

11.5.1 Forth-2012 systems

The phrase "Providing the File Access word set" shall be appended to the label of any Standard System that provides all of the File Access word set.

The phrase "Providing name(s) from the File Access Extensions word set" shall be appended to the label of any Standard System that provides portions of the File Access Extensions word set.

The phrase "Providing the File Access Extensions word set" shall be appended to the label of any Standard System that provides all of the File Access and File Access Extensions word sets.

11.5.2 Forth-2012 programs

The phrase "Requiring the File Access word set" shall be appended to the label of Standard Programs that require the system to provide the File Access word set.

The phrase "Requiring name(s) from the File Access Extensions word set" shall be appended to the label of Standard Programs that require the system to provide portions of the File Access Extensions word set.

The phrase "Requiring the File Access Extensions word set" shall be appended to the label of Standard Programs that require the system to provide all of the File Access and File Access Extensions word sets.

11.6 Glossary

11.6.1 File Access words

11.6.2 File-Access extension words

ContributeContributions

GeraldWodniavatar of GeraldWodni Directory experiemental proposalProposal2016-12-12 15:42:57

Directory proposal

In order to write cross platform and cross system libraries it is essential to have means to traverse a systems file structure. This proposal is based upon the only known (by the authors) widly adopted implementation in Gforth.

Authors: Ulrich Hoffmann & Gerald Wodni

Add new Type wdirid (or the like) to section 3.x

Words for traversal:

open-dir ( c-addr u – wdirid wior )

Open the directory specified by c-addr, u and return dir-id for futher access to it.

read-dir ( c-addr u1 wdirid – u2 flag wior )

Attempt to read the next entry from the directory specified by dir-id to the buffer of length u1 at address c-addr. If the attempt fails because there is no more entries, ior=0, flag=0, u2=0, and the buffer is unmodified. If the attempt to read the next entry fails because of any other reason, return ior<>0. If the attempt succeeds, store file name to the buffer at c-addr and return ior=0, flag=true and u2 equal to the size of the file name. If the length of the file name is greater than u1, store first u1 characters from file name into the buffer and indicate "name too long" with ior, flag=true, and u2=u1.

close-dir ( wdirid – wior )

Close the directory specified by dir-id.

mkdir ( c-addr u – wior )

create the directory c-addr u and all its parents Remark: renamed mkdir-parents to mkdir, removed unix-specific umask

Words for pathes:

Description take from the Node.js manual:

path-normalize ( c-addr-1 u1 -- c-addr-1 u-2 )

Normalize a string path, taking care of '..' and '.' parts. When multiple slashes are found, they're replaced by a single one; when the path contains a trailing slash, it is preserved.

path-basename ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the last portion of a path. Similar to the Unix basename command.

path-dirname ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the directory name of a path. Similar to the Unix dirname command.

path-extname ( c-addr-1 u1 -- c--addr-2 u-2 )

Return the extension of the path, from the last '.' to end of string in the last portion of the path. If there is no '.' in the last portion of the path or the first character of it is '.', then it returns an empty string.

path-absolute? ( c-addr-1 u1 -- f )

Determines whether path is an absolute path. An absolute path will always resolve to the same location, regardless of the working directory.

path.join ( c-addr-1 u1 c-addr2 u2 -- c-addr-3 u3 )

Join all arguments together and normalize the resulting path. Arguments must be strings. Use implicit allocation?

mtruteavatar of mtrute 2016-12-12 18:07:49

opendir / readdir / closedir reminds me to old DOS ages. What exactly is the benefit of that transactional (?) model? Why not use simply traverse-dir? It in the the traverse-wordlist pattern, already known in the standard.

BerndPaysanavatar of BerndPaysan 2016-12-13 01:15:24

Yes, the traverse-dir approach also has the advantage that you could pass the matching information, and either implement it on POSIX with fnmatch, or on DOS/Windows with the mask passed to FindFirstFile.

traverse-dir ( ix diraddr u1 patternaddr u2 xt -- kx ) with xt taking ( ix fileaddr u -- jx )

ruvavatar of ruv 2016-12-13 10:57:57

One question for mkdir that automatically creates ancestors — should it remove the directories created on the previous steps if it fails on some step?

My old LAY-PATH implementation does not remove intermediate steps on fail.

In any case, the operation is not atomic. But this should be mentioned.

GeraldWodniavatar of GeraldWodniNew Version 2016-12-13 11:09:34

Directory proposal

In order to write cross platform and cross system libraries it is essential to have means to traverse a systems file structure. This proposal is based upon the only known (by the authors) widly adopted implementation in Gforth.

Authors: Ulrich Hoffmann & Gerald Wodni

Add new Type wdirid (or the like) to section 3.x

Words for traversal:

open-dir ( c-addr u -- dirid ior )

Open the directory specified by c-addr, u and return dir-id for futher access to it.

read-dir ( c-addr u1 dirid -- u2 flag ior )

Attempt to read the next entry from the directory specified by dir-id to the buffer of length u1 at address c-addr. If the attempt fails because there is no more entries, ior=0, flag=0, u2=0, and the buffer is unmodified. If the attempt to read the next entry fails because of any other reason, return ior<>0. If the attempt succeeds, store file name to the buffer at c-addr and return ior=0, flag=true and u2 equal to the size of the file name. If the length of the file name is greater than u1, store first u1 characters from file name into the buffer and indicate "name too long" with ior, flag=true, and u2=u1.

close-dir ( dirid -- ior )

Close the directory specified by dir-id.

traverse-dir ( ix c-addr u xt -- kx ) with xt taking ( ix c-addr-filename u-filename -- jx )

Possible alternative/addition to the upper three words. Suggested by Bernd Paysan and Matthias Trute

dir? ( c-addr u -- flag ior )

check if path is a directory

make-dir ( c-addr u -- ior )

create the directory c-addr u and all its parents Remark: renamed mkdir-parents to mkdir, removed unix-specific umask

Words for pathes:

Description take from the Node.js manual:

normalize-path ( c-addr-1 u1 -- c-addr-1 u-2 )

Normalize a string path, taking care of '..' and '.' parts. When multiple slashes are found, they're replaced by a single one; when the path contains a trailing slash, it is preserved.

basename-path ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the last portion of a path. Similar to the Unix basename command.

dirname-path ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the directory name of a path. Similar to the Unix dirname command.

extname-path ( c-addr-1 u1 -- c--addr-2 u-2 )

Return the extension of the path, from the last '.' to end of string in the last portion of the path. If there is no '.' in the last portion of the path or the first character of it is '.', then it returns an empty string.

absolute-path? ( c-addr-1 u1 -- f )

Determines whether path is an absolute path. An absolute path will always resolve to the same location, regardless of the working directory.

join-path ( c-addr-1 u1 c-addr2 u2 -- c-addr-3 u3 )

Join all arguments together and normalize the resulting path. Arguments must be strings. Use implicit allocation?

filename-match ( c-addr1 u1 c-addr2 u2 – f )

check if both pathes match (after expanding) any '.' and '..'

parent-dir ( c-addr1 u1 -- c-addr1 u2 )

move up one directory

parent-dir? ( c-addr1 u1 -- f )

check if there is a parent directory

: parent-dir? >r r@ parent-dir nip r> = ;

ruvavatar of ruv 2016-12-13 13:23:49

Also, it seems to me that developing such proposals in GitHub (or perhaps in GitHub/ForthHub) is far more convenient than here. Version control gives too much to don't use it.

GeraldWodniavatar of GeraldWodni 2016-12-13 17:43:06

@mtrute, @BernyPaysan: traverse-dir would certainly be comfortable if you want to iterate a single directory. But having open/read/close one can easily compare multiple directories and it would be trivial to implement traverse-dir.

@ruv: I agree with you, that a version control would add quite some convenience. However normally proposals are developed in smaller groups and only presented to the public once they are pretty solid. At the standards meeting the committee decided to allow for experimental proposals, and Ullrich Hoffmann and me wanted to take it one step further and publish a very drafty version to collect some early feedback. Additionally let me point out that forth-standard.org tries to centralize the committees actions. That being said, I'll look into implementing some versioning features.

Thanks for pointing out the problems with a failing mkdir for parents!

BerndPaysanavatar of BerndPaysan 2016-12-14 01:19:16

@GeraldWodni To compare two directories, you either need to sort both (the answers of read-dir can be in any order the OS chooses; usually, the file names are returned in the order they are stored in the directory, which is very implementation specific), or take the filenames of one one-by-one and do a FILE-STATUS check in the other, and repeat the other way round, so no, read-dir doesn't help much here.

The TRAVERSE-DIR documentation needs some more words, e.g. about the lifetime of the provided buffer for xt: This is a one-shot buffer, and the data there lives only for one call of xt. On the other hand, it is always as large as needed, unlike the READ-DIR output, which can indicate that the result didn't fit.

That alone makes TRAVERSE-DIR much easier to use and harder to implement.

AntonErtlavatar of AntonErtl 2016-12-31 20:36:48

Yes, TRAVERSE-DIR, and JOIN-PATH need to specify where the output string is and how long it lives. Also, the buffer handling of READ-DIR makes it hard to use properly. Better make an interface like

read-dir ( dir-id -- c-addr u ior )

where the resulting string lives until the next invocation of read-dir (for the same dir-id?), and c-addr u is 0 0 if there is no entry left.

Reply