The Standard ML Basis Library


The OS.Path structure

The OS.Path structure provides support for manipulating the syntax of file system paths independent of the underlying file system. It is purposely designed not to rely on any file system operations: none of the functions accesses the actual file system. There are two good reasons for this. Many systems support multiple file systems that may have different semantics, and applications may need to manipulate paths that do not exist in the underlying file system.

Before discussing the model and the semantics of the individual operations, we need to define some terms:

In addition to operations for canonicalizing paths and computing relative paths, the Path structure supports path manipulations relative to three different views of a path:

  1. A navigation oriented view, where a path is broken down into its root and a non-empty list of arcs. A path is either absolute or relative. The root of a path specifies the volume to which the path is taken to be relative. For Unix, there is only the "" volume.
  2. A directory/file view, where a path is broken down into a directory specifier and a file name.
  3. A base/extension view, where a path is broken down into a base filename, and an extension. We make the assumption that the extension separator character is #".". This works for Windows, OS/2, and Unix; the Macintosh does not really have a notion of extension.

Our main design principle is that the functions should behave in a natural fashion when applied to canonical paths. All functions, except concat, preserve canonical paths, i.e., if all arguments are canonical, then so is the result.

Note that although the model of path manipulation provided by the Path structure is operating system independent, the analysis of strings is not. In particular, any given implementation of the Path structure has an implicit notion of what the arc separator character is. Thus, on a DOS system, Path will treat the string "\\d\\e" as representing an absolute path with two arcs, whereas on a Unix system, it will correspond to a relative path with one arc.


Synopsis

signature OS_PATH
structure Path : OS_PATH

Interface

exception Path
exception InvalidArc
val parentArc : string
val currentArc : string
val validVolume : {isAbs : bool, vol : string} -> bool
val fromString : string -> {isAbs : bool, vol : string, arcs : string list}
val toString : {isAbs : bool, vol : string, arcs : string list} -> string
val getVolume : string -> string
val getParent : string -> string
val splitDirFile : string -> {dir : string, file : string}
val joinDirFile : {dir : string, file : string} -> string
val dir : string -> string
val file : string -> string
val splitBaseExt : string -> {base : string, ext : string option}
val joinBaseExt : {base : string, ext : string option} -> string
val base : string -> string
val ext : string -> string option
val mkCanonical : string -> string
val isCanonical : string -> bool
val mkAbsolute : (string * string) -> string
val mkRelative : (string * string) -> string
val isAbsolute : string -> bool
val isRelative : string -> bool
val isRoot : string -> bool
val concat : (string * string) -> string
val toUnixPath : string -> string
val fromUnixPath : string -> string

Description

exception Path
exception InvalidArc

parentArc
denotes the parent directory (e.g., ".." on DOS and UNIX).

currentArc
denotes the current directory (e.g., "." on DOS and UNIX).

validVolume {isAbs, vol}
returns true if vol is a valid volume name for an absolute or relative path, respectively as isAbs is true or false. Under Unix, the only valid volume name is "". Under DOS, the valid volume names have the form "a:", "A:", "b:", "B:", etc. and, if isAbs = false, also "". Under MacOS, isAbs can be true if and only if vol is "".

fromString s
returns the decomposition {isAbs, vol, arcs} of the path specified by s. vol is the volume name and arcs is the list of (possibly empty) arcs of the path. isAbs is true if the path is absolute. Under Unix, the volume name is always the empty string; under DOS, in addition it can have the form "A:", "C:", etc.

Here are some examples for UNIX paths:

          fromString ""           = {isAbs=false, vol="", arcs=[]}
          fromString "/"          = {isAbs=true,  vol="", arcs=[""]}
          fromString "//"         = {isAbs=true,  vol="", arcs=["", ""]}
          fromString "a"          = {isAbs=false, vol="", arcs=["a"]}
          fromString "/a"         = {isAbs=true,  vol="", arcs=["a"]}
          fromString "//a"        = {isAbs=true,  vol="", arcs=["","a"]}
          fromString "a/"         = {isAbs=false, vol="", arcs=["a", ""]}
          fromString "a//"        = {isAbs=false, vol="", arcs=["a", "", ""]}
          fromString "a/b"        = {isAbs=false, vol="", arcs=["a", "b"]}
          


toString {isAbs, vol, arcs}
makes a string out of a path represented as a list of arcs. isAbs specifies whether or not the path is absolute, and vol provides a corresponding volume. It returns "" when applied to {isAbs=false, vol="", arcs=[]}. The exception Path is raised if validVolume{isAbs, vol} is false, or if isAbs is false and arcs has an initial empty arc. The exception InvalidArc is raised if if any component in arcs is not a valid representation of an arc.

toString o fromString is the identity. fromString o toString is also the identity, provided no exception is raised and none of the strings in arcs contains an embedded arc separator character. In addition, isRelative(toString {isAbs=false, vol, arcs}) evaluates to true when defined.

getVolume s
returns the volume portion of the path.

getParent s
returns a string denoting the parent directory of s. It holds that getParent s = s if and only if s is a root. If the last arc is empty or the parent arc, then getParent appends a parent arc. If the last arc is the current arc, then it is replaced with the parent arc. Note that if s is canonical, then the result of getParent will also be canonical.

Here are some examples for UNIX paths:

            getParent "/"           = "/"
            getParent "a"           = "."
            getParent "a/"          = "a/.."
            getParent "a///"        = "a///.."
            getParent "a/b"         = "a"
            getParent "a/b/"        = "a/b/.."
            getParent ".."          = "../.."
            getParent "."           = ".."
            getParent ""            = ".."
          


splitDirFile s
splits the string path s into its directory and file parts, where the file part is defined to be the last arc. The file will be "", if the last arc is "".

Here are some examples for UNIX paths:

          splitDirFile ""         = {dir = "", file = ""}
          splitDirFile "."        = {dir = "", file = "."}
          splitDirFile "b"        = {dir = "", file = "b"}
          splitDirFile "b/"       = {dir = "b", file = ""}
          splitDirFile "a/b"      = {dir = "a", file = "b"}
          splitDirFile "/a"       = {dir = "/", file = "a"}
          


joinDirFile {dir, file}
creates a whole path out of a directory and a file by extending the path dir with the arc file. If the string file does not correspond to an arc, raises InvalidArc.

dir s
file s
return the directory and file parts of a path, respectively. They are equivalent to #dir o splitDirFile and #file o splitDirFile, respectively, although they are probably more efficient.

splitBaseExt s
splits the path s into its base and extension parts. The extension is a non-empty sequence of characters following the right-most, non-initial, occurrence of "." in the last arc; NONE is returned if the extension is not defined. The base part is everything to the left of the extension except the final ".". Note that if there is no extension, a terminating "." is included with the base part.

Here are some examples for UNIX paths:

          splitBaseExt ""           = {base = "", ext = NONE}
          splitBaseExt ".login"     = {base = ".login", ext = NONE}
          splitBaseExt "/.login"    = {base = "/.login", ext = NONE}
          splitBaseExt "a"          = {base = "a", ext = NONE}
          splitBaseExt "a."         = {base = "a.", ext = NONE}
          splitBaseExt "a.b"        = {base = "a", ext = SOME "b"}
          splitBaseExt "a.b.c"      = {base = "a.b", ext = SOME "c"}
          splitBaseExt ".news/comp" = {base = ".news/comp", ext = NONE} 
          


joinBaseExt {base, ext}
returns an arc composed of the base name and the extension (if different from NONE). It is a left inverse of splitBaseExt, i.e., joinBaseExt o splitBaseExt is the identity. The opposite does not hold, since the extension may be empty, or may contain extension separators. Note that although splitBaseExt will never return the extension SOME "", joinBaseExt treats this as equivalent to NONE.

base s
ext s
return the base and extension parts of a path, respectively. They are equivalent to #base o splitBaseExt and #ext o splitBaseExt, respectively, although they are probably more efficient.

mkCanonical s
returns the canonical path equivalent to s. Redundant occurrences of the parent arc, the current arc, and the empty arc are removed. The canonical path will never be the empty string; the empty path is converted to the current directory path ("." under Unix and DOS).

Note that the syntactic canonicalization provided by mkCanonical may not preserve file system meaning in the presence of symbolic links (cf. concat).

isCanonical s
returns true if s is a canonical path. It is equivalent to (s = mkCanonical s).

mkAbsolute (p, abs)
returns an absolute path that is equivalent to the path p relative to the absolute path abs. If p is already absolute, it is returned unchanged. Otherwise, the function returns the canonical concatenation of abs with p, i.e., mkCanonical (concat (abs, p)) Thus, if p and abs are canonical, the result will be canonical. If abs is not absolute, or if the two paths refer to different volumes, then the Path exception is raised.

mkRelative (p, abs)
returns a relative path p' that, when taken relative to the canonical form of the absolute path abs, is equivalent to the path p. If p is relative, it is returned unchanged. If p is absolute, the procedure for computing the relative path is to first compute the canonical form abs' of abs. If p and abs' are equal, then the current arc is the result. Otherwise, the common prefix is stripped from p and abs' giving p'' and abs'', and p'' is appended to a path consisting of one parent arc for each arc in abs''. Note that if both paths are canonical, then the result will be canonical.

If abs is not absolute, or if p and abs are both absolute but have different roots, the Path exception is raised.

Here are some examples for UNIX paths:

          mkRelative ("a/b", "/c/d")              = "a/b"
          mkRelative ("/", "/a/b/c")              = "../../.."
          mkRelative ("/a/b/", "/a/c")            = "../b/"
          mkRelative ("/a/b",  "/a/c")            = "../b"
          mkRelative ("/a/b/", "/a/c/")           = "../b/"
          mkRelative ("/a/b",  "/a/c/")           = "../b"
          mkRelative ("/", "/")                   = "."
          mkRelative ("/", "/.")                  = "."
          mkRelative ("/", "/..")                 = "."
          mkRelative ("/a/b/../c", "/a/d")        = "../b/../c"
          mkRelative ("/a/b", "/c/d")             = "../../a/b"
          mkRelative ("/c/a/b", "/c/d")           = "../a/b"
          mkRelative ("/c/d/a/b", "/c/d")         = "a/b"
          


isAbsolute s
isRelative s
return true if s is, respectively, absolute or relative.

isRoot s
returns true if s is a canonical specification of a root directory.

concat (s, t)
returns the path consisting of s followed by t. It raises the exception Path if t is not a relative path or if s and t refer to different volumes. A implementation of concat would be:
fun concat (p1, p2) = (case (fromString p1, fromString p2)
       of (_, {isAbs=true, ...}) => raise Path
        | ({isAbs, vol=v1, arcs=al1}, {vol=v2, arcs=al2, ...}) =>
            if ((v2 = "") orelse (v1 = v2))
              then toString{isAbs=isAbs, vol=v1, arcs=concatArcs(al1, al2)}
              else raise Path
where concatArcs is like List.@, except that a trailing empty arc in the first argument is dropped. Note that concat should not be confused with the concatenation of two strings.

concat does not preserve canonical paths. For example, concat("a/b", "../c") returns "a/b/../c". The parent arc is not removed because "a/b/../c" and "a/c" may not be equivalent in the presence of symbolic links.

toUnixPath s
fromUnixPath s
convert between paths as represented in the underlying operating system and Unix-style paths.


Discussion

Syntactically, two paths can be checked for equality by applying string equality to canonical versions of the paths. Since volumes and individual arcs are just special classes of paths, an identical test for equality can be applied to these classes.

Question:

We need to add more information concerning MacOS, including adopting Apple's algorithm for determining relative/absolute paths on the Mac OS. In general, we should pull out and extend the examples as specific system notes concerning Unix, DOS and MacOS.

Rationale:

Because of some confusion arising between the abstract nature of arcs and their concrete representation as strings, we considered making arcs an abstract type. The advantages of this, however, seemed negligible, and the problems could be addressed almost as well by better documentation and validating arcs in certain cases, e.g., the toString and joinDirFile functions.

See Also

OS, OS.FileSys, OS.Process, OS.IO, Posix.FileSys

[ INDEX | TOP | Parent | Root ]

Last Modified October 6, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies