Proposal: [25] Directory experiemental proposal

Considered

This proposal has been moved into this section. Its former address was: /standard/file

This page is dedicated to discussing this specific proposal

ContributeContributions

GeraldWodniavatar of GeraldWodni [25] Directory experiemental proposalProposal2016-12-12 15:42:57

Directory proposal

In order to write cross platform and cross system libraries it is essential to have means to traverse a systems file structure. This proposal is based upon the only known (by the authors) widly adopted implementation in Gforth.

Authors: Ulrich Hoffmann & Gerald Wodni

Add new Type wdirid (or the like) to section 3.x

Words for traversal:

open-dir ( c-addr u – wdirid wior )

Open the directory specified by c-addr, u and return dir-id for futher access to it.

read-dir ( c-addr u1 wdirid – u2 flag wior )

Attempt to read the next entry from the directory specified by dir-id to the buffer of length u1 at address c-addr. If the attempt fails because there is no more entries, ior=0, flag=0, u2=0, and the buffer is unmodified. If the attempt to read the next entry fails because of any other reason, return ior<>0. If the attempt succeeds, store file name to the buffer at c-addr and return ior=0, flag=true and u2 equal to the size of the file name. If the length of the file name is greater than u1, store first u1 characters from file name into the buffer and indicate "name too long" with ior, flag=true, and u2=u1.

close-dir ( wdirid – wior )

Close the directory specified by dir-id.

mkdir ( c-addr u – wior )

create the directory c-addr u and all its parents Remark: renamed mkdir-parents to mkdir, removed unix-specific umask

Words for pathes:

Description take from the Node.js manual:

path-normalize ( c-addr-1 u1 -- c-addr-1 u-2 )

Normalize a string path, taking care of '..' and '.' parts. When multiple slashes are found, they're replaced by a single one; when the path contains a trailing slash, it is preserved.

path-basename ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the last portion of a path. Similar to the Unix basename command.

path-dirname ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the directory name of a path. Similar to the Unix dirname command.

path-extname ( c-addr-1 u1 -- c--addr-2 u-2 )

Return the extension of the path, from the last '.' to end of string in the last portion of the path. If there is no '.' in the last portion of the path or the first character of it is '.', then it returns an empty string.

path-absolute? ( c-addr-1 u1 -- f )

Determines whether path is an absolute path. An absolute path will always resolve to the same location, regardless of the working directory.

path.join ( c-addr-1 u1 c-addr2 u2 -- c-addr-3 u3 )

Join all arguments together and normalize the resulting path. Arguments must be strings. Use implicit allocation?

mtruteavatar of mtrute

opendir / readdir / closedir reminds me to old DOS ages. What exactly is the benefit of that transactional (?) model? Why not use simply traverse-dir? It in the the traverse-wordlist pattern, already known in the standard.

BerndPaysanavatar of BerndPaysan

Yes, the traverse-dir approach also has the advantage that you could pass the matching information, and either implement it on POSIX with fnmatch, or on DOS/Windows with the mask passed to FindFirstFile.

traverse-dir ( ix diraddr u1 patternaddr u2 xt -- kx ) with xt taking ( ix fileaddr u -- jx )

ruvavatar of ruv

One question for mkdir that automatically creates ancestors — should it remove the directories created on the previous steps if it fails on some step?

My old LAY-PATH implementation does not remove intermediate steps on fail.

In any case, the operation is not atomic. But this should be mentioned.

GeraldWodniavatar of GeraldWodniNew Version: [25] Directory experiemental proposal

Show differences

Directory proposal

In order to write cross platform and cross system libraries it is essential to have means to traverse a systems file structure. This proposal is based upon the only known (by the authors) widly adopted implementation in Gforth.

Authors: Ulrich Hoffmann & Gerald Wodni

Add new Type wdirid (or the like) to section 3.x

Words for traversal:

open-dir ( c-addr u -- dirid ior )

Open the directory specified by c-addr, u and return dir-id for futher access to it.

read-dir ( c-addr u1 dirid -- u2 flag ior )

Attempt to read the next entry from the directory specified by dir-id to the buffer of length u1 at address c-addr. If the attempt fails because there is no more entries, ior=0, flag=0, u2=0, and the buffer is unmodified. If the attempt to read the next entry fails because of any other reason, return ior<>0. If the attempt succeeds, store file name to the buffer at c-addr and return ior=0, flag=true and u2 equal to the size of the file name. If the length of the file name is greater than u1, store first u1 characters from file name into the buffer and indicate "name too long" with ior, flag=true, and u2=u1.

close-dir ( dirid -- ior )

Close the directory specified by dir-id.

traverse-dir ( ix c-addr u xt -- kx ) with xt taking ( ix c-addr-filename u-filename -- jx )

Possible alternative/addition to the upper three words. Suggested by Bernd Paysan and Matthias Trute

dir? ( c-addr u -- flag ior )

check if path is a directory

make-dir ( c-addr u -- ior )

create the directory c-addr u and all its parents Remark: renamed mkdir-parents to mkdir, removed unix-specific umask

Words for pathes:

Description take from the Node.js manual:

normalize-path ( c-addr-1 u1 -- c-addr-1 u-2 )

Normalize a string path, taking care of '..' and '.' parts. When multiple slashes are found, they're replaced by a single one; when the path contains a trailing slash, it is preserved.

basename-path ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the last portion of a path. Similar to the Unix basename command.

dirname-path ( c-addr-1 u1 -- c-addr-2 u-2 )

Return the directory name of a path. Similar to the Unix dirname command.

extname-path ( c-addr-1 u1 -- c--addr-2 u-2 )

Return the extension of the path, from the last '.' to end of string in the last portion of the path. If there is no '.' in the last portion of the path or the first character of it is '.', then it returns an empty string.

absolute-path? ( c-addr-1 u1 -- f )

Determines whether path is an absolute path. An absolute path will always resolve to the same location, regardless of the working directory.

join-path ( c-addr-1 u1 c-addr2 u2 -- c-addr-3 u3 )

Join all arguments together and normalize the resulting path. Arguments must be strings. Use implicit allocation?

filename-match ( c-addr1 u1 c-addr2 u2 – f )

check if both pathes match (after expanding) any '.' and '..'

parent-dir ( c-addr1 u1 -- c-addr1 u2 )

move up one directory

parent-dir? ( c-addr1 u1 -- f )

check if there is a parent directory

: parent-dir? >r r@ parent-dir nip r> = ;

ruvavatar of ruv

Also, it seems to me that developing such proposals in GitHub (or perhaps in GitHub/ForthHub) is far more convenient than here. Version control gives too much to don't use it.

GeraldWodniavatar of GeraldWodni

@mtrute, @BernyPaysan: traverse-dir would certainly be comfortable if you want to iterate a single directory. But having open/read/close one can easily compare multiple directories and it would be trivial to implement traverse-dir.

@ruv: I agree with you, that a version control would add quite some convenience. However normally proposals are developed in smaller groups and only presented to the public once they are pretty solid. At the standards meeting the committee decided to allow for experimental proposals, and Ullrich Hoffmann and me wanted to take it one step further and publish a very drafty version to collect some early feedback. Additionally let me point out that forth-standard.org tries to centralize the committees actions. That being said, I'll look into implementing some versioning features.

Thanks for pointing out the problems with a failing mkdir for parents!

BerndPaysanavatar of BerndPaysan

@GeraldWodni To compare two directories, you either need to sort both (the answers of read-dir can be in any order the OS chooses; usually, the file names are returned in the order they are stored in the directory, which is very implementation specific), or take the filenames of one one-by-one and do a FILE-STATUS check in the other, and repeat the other way round, so no, read-dir doesn't help much here.

The TRAVERSE-DIR documentation needs some more words, e.g. about the lifetime of the provided buffer for xt: This is a one-shot buffer, and the data there lives only for one call of xt. On the other hand, it is always as large as needed, unlike the READ-DIR output, which can indicate that the result didn't fit.

That alone makes TRAVERSE-DIR much easier to use and harder to implement.

AntonErtlavatar of AntonErtl

Yes, TRAVERSE-DIR, and JOIN-PATH need to specify where the output string is and how long it lives. Also, the buffer handling of READ-DIR makes it hard to use properly. Better make an interface like

read-dir ( dir-id -- c-addr u ior )

where the resulting string lives until the next invocation of read-dir (for the same dir-id?), and c-addr u is 0 0 if there is no entry left.

StephenPelcavatar of StephenPelc

Should we not use the name CREATE-DIR rather than MAKE-DIR in sympathy with the file words?

GeraldWodniavatar of GeraldWodni

As I have been ask where the current implementation lives (not finished), it is on github GeraldWodni/directories

ruvavatar of ruv

What was a rationale to change names from

path-basename, path-dirname, etc

to

basename-path, dirname-path, etc

?

It seems to me, these words are similar to file-status, file-position, etc., so the former names are better.

normalize-path is OK since it's in the form "{verb}-{noun}" and it modifies the input string.

GeraldWodniavatar of GeraldWodni

The committee asks the author to please work the comments into your proposal and update it. Also please provide a full reference implementation.

Considered
Reply New Version