1# Path library 2 3This document explains why the `lib.path` library is designed the way it is. 4 5The purpose of this library is to process [filesystem paths]. 6It does not read files from the filesystem. 7It exists to support the native Nix [path value type] with extra functionality. 8 9[filesystem paths]: https://en.m.wikipedia.org/wiki/Path_(computing) 10[path value type]: https://nixos.org/manual/nix/stable/language/values.html#type-path 11 12As an extension of the path value type, it inherits the same intended use cases and limitations: 13- Only use paths to access files at evaluation time, such as the local project source. 14- Paths cannot point to derivations, so they are unfit to represent dependencies. 15- A path implicitly imports the referenced files into the Nix store when interpolated to a string. 16 Therefore paths are not suitable to access files at build- or run-time, as you risk importing the path from the evaluation system instead. 17 18Overall, this library works with two types of paths: 19- Absolute paths are represented with the Nix [path value type]. 20 Nix automatically normalises these paths. 21- Subpaths are represented with the [string value type] since path value types don't support relative paths. 22 This library normalises these paths as safely as possible. 23 Absolute paths in strings are not supported. 24 25 A subpath refers to a specific file or directory within an absolute base directory. 26 It is a stricter form of a relative path, notably [without support for `..` components][parents] since those could escape the base directory. 27 28[string value type]: https://nixos.org/manual/nix/stable/language/values.html#type-string 29 30This library is designed to be as safe and intuitive as possible, throwing errors when operations are attempted that would produce surprising results, and giving the expected result otherwise. 31 32This library is designed to work well as a dependency for the `lib.filesystem` and `lib.sources` library components. 33Contrary to these library components, `lib.path` does not read any paths from the filesystem. 34 35This library makes only these assumptions about paths and no others: 36- `dirOf path` returns the path to the parent directory of `path`, unless `path` is the filesystem root, in which case `path` is returned. 37 - There can be multiple filesystem roots: `p == dirOf p` and `q == dirOf q` does not imply `p == q`. 38 - While there's only a single filesystem root in stable Nix, the [lazy trees feature](https://github.com/NixOS/nix/pull/6530) introduces [additional filesystem roots](https://github.com/NixOS/nix/pull/6530#discussion_r1041442173). 39- `path + ("/" + string)` returns the path to the `string` subdirectory in `path`. 40 - If `string` contains no `/` characters, then `dirOf (path + ("/" + string)) == path`. 41 - If `string` contains no `/` characters, then `baseNameOf (path + ("/" + string)) == string`. 42- `path1 == path2` returns `true` only if `path1` points to the same filesystem path as `path2`. 43 44Notably we do not make the assumption that we can turn paths into strings using `toString path`. 45 46## Design decisions 47 48Each subsection here contains a decision along with arguments and counter-arguments for (+) and against (-) that decision. 49 50### Leading dots for relative paths 51[leading-dots]: #leading-dots-for-relative-paths 52 53Observing: Since subpaths are a form of relative paths, they can have a leading `./` to indicate it being a relative path, this is generally not necessary for tools though. 54 55Considering: Paths should be as explicit, consistent and unambiguous as possible. 56 57Decision: Returned subpaths should always have a leading `./`. 58 59<details> 60<summary>Arguments</summary> 61 62- (+) In shells, just running `foo` as a command wouldn't execute the file `foo`, whereas `./foo` would execute the file. 63 In contrast, `foo/bar` does execute that file without the need for `./`. 64 This can lead to confusion about when a `./` needs to be prefixed. 65 If a `./` is always included, this becomes a non-issue. 66 This effectively then means that paths don't overlap with command names. 67- (+) Prepending with `./` makes the subpaths always valid as relative Nix path expressions. 68- (+) Using paths in command line arguments could give problems if not escaped properly, e.g. if a path was `--version`. 69 This is not a problem with `./--version`. 70 This effectively then means that paths don't overlap with GNU-style command line options. 71- (-) `./` is not required to resolve relative paths, resolution always has an implicit `./` as prefix. 72- (-) It's less noisy without the `./`, e.g. in error messages. 73 - (+) But similarly, it could be confusing whether something was even a path. 74 e.g. `foo` could be anything, but `./foo` is more clearly a path. 75- (+) Makes it more uniform with absolute paths (those always start with `/`). 76 - (-) That is not relevant for practical purposes. 77- (+) `find` also outputs results with `./`. 78 - (-) But only if you give it an argument of `.`. 79 If you give it the argument `some-directory`, it won't prefix that. 80- (-) `realpath --relative-to` doesn't prefix relative paths with `./`. 81 - (+) There is no need to return the same result as `realpath`. 82 83</details> 84 85### Representation of the current directory 86[curdir]: #representation-of-the-current-directory 87 88Observing: The subpath that produces the base directory can be represented with `.` or `./` or `./.`. 89 90Considering: Paths should be as consistent and unambiguous as possible. 91 92Decision: It should be `./.`. 93 94<details> 95<summary>Arguments</summary> 96 97- (+) `./` would be inconsistent with [the decision to not persist trailing slashes][trailing-slashes]. 98- (-) `.` is how `realpath` normalises paths. 99- (+) `.` can be interpreted as a shell command (it's a builtin for sourcing files in `bash` and `zsh`). 100- (+) `.` would be the only path without a `/`. 101 It could not be used as a Nix path expression, since those require at least one `/` to be parsed as such. 102- (-) `./.` is rather long. 103 - (-) We don't require users to type this though, as it's only output by the library. 104 As inputs all three variants are supported for subpaths (and we can't do anything about absolute paths) 105- (-) `builtins.dirOf "foo" == "."`, so `.` would be consistent with that. 106- (+) `./.` is consistent with the [decision to have leading `./`][leading-dots]. 107- (+) `./.` is a valid Nix path expression, although this property does not hold for every relative path or subpath. 108 109</details> 110 111### Subpath representation 112[relrepr]: #subpath-representation 113 114Observing: Subpaths such as `foo/bar` can be represented in various ways: 115- string: `"foo/bar"` 116- list with all the components: `[ "foo" "bar" ]` 117- attribute set: `{ type = "relative-path"; components = [ "foo" "bar" ]; }` 118 119Considering: Paths should be as safe to use as possible. 120We should generate string outputs in the library and not encourage users to do that themselves. 121 122Decision: Paths are represented as strings. 123 124<details> 125<summary>Arguments</summary> 126 127- (+) It's simpler for the users of the library. 128 One doesn't have to convert a path a string before it can be used. 129 - (+) Naively converting the list representation to a string with `concatStringsSep "/"` would break for `[]`, requiring library users to be more careful. 130- (+) It doesn't encourage people to do their own path processing and instead use the library. 131 With a list representation it would seem easy to just use `lib.lists.init` to get the parent directory, but then it breaks for `.`, which would be represented as `[ ]`. 132- (+) `+` is convenient and doesn't work on lists and attribute sets. 133 - (-) Shouldn't use `+` anyways, we export safer functions for path manipulation. 134 135</details> 136 137### Parent directory 138[parents]: #parent-directory 139 140Observing: Relative paths can have `..` components, which refer to the parent directory. 141 142Considering: Paths should be as safe and unambiguous as possible. 143 144Decision: `..` path components in string paths are not supported, neither as inputs nor as outputs. 145Hence, string paths are called subpaths, rather than relative paths. 146 147<details> 148<summary>Arguments</summary> 149 150- (+) If we wanted relative paths to behave according to the "physical" interpretation (as a directory tree with relations between nodes), it would require resolving symlinks, since e.g. `foo/..` would not be the same as `.` if `foo` is a symlink. 151 - (-) The "logical" interpretation is also valid (treating paths as a sequence of names), and is used by some software. 152 It is simpler, and not using symlinks at all is safer. 153 - (+) Mixing both models can lead to surprises. 154 - (+) We can't resolve symlinks without filesystem access. 155 - (+) Nix also doesn't support reading symlinks at evaluation time. 156 - (-) We could just not handle such cases, e.g. `equals "foo" "foo/bar/.. == false`. 157 The paths are different, we don't need to check whether the paths point to the same thing. 158 - (+) Assume we said `relativeTo /foo /bar == "../bar"`. 159 If this is used like `/bar/../foo` in the end, and `bar` turns out to be a symlink to somewhere else, this won't be accurate. 160 - (-) We could decide to not support such ambiguous operations, or mark them as such, e.g. the normal `relativeTo` will error on such a case, but there could be `extendedRelativeTo` supporting that. 161- (-) `..` are a part of paths, a path library should therefore support it. 162 - (+) If we can convincingly argue that all such use cases are better done e.g. with runtime tools, the library not supporting it can nudge people towards using those. 163- (-) We could allow "..", but only in the prefix. 164 - (+) Then we'd have to throw an error for doing `append /some/path "../foo"`, making it non-composable. 165 - (+) The same is for returning paths with `..`: `relativeTo /foo /bar => "../bar"` would produce a non-composable path. 166- (+) We argue that `..` is not needed at the Nix evaluation level, since we'd always start evaluation from the project root and don't go up from there. 167 - (+) `..` is supported in Nix paths, turning them into absolute paths. 168 - (-) This is ambiguous in the presence of symlinks. 169- (+) If you need `..` for building or runtime, you can use build-/run-time tooling to create those (e.g. `realpath` with `--relative-to`), or use absolute paths instead. 170 This also gives you the ability to correctly handle symlinks. 171 172</details> 173 174### Trailing slashes 175[trailing-slashes]: #trailing-slashes 176 177Observing: Subpaths can contain trailing slashes, like `foo/`, indicating that the path points to a directory and not a file. 178 179Considering: Paths should be as consistent as possible, there should only be a single normalisation for the same path. 180 181Decision: All functions remove trailing slashes in their results. 182 183<details> 184<summary>Arguments</summary> 185 186- (+) It allows normalisations to be unique, in that there's only a single normalisation for the same path. 187 If trailing slashes were preserved, both `foo/bar` and `foo/bar/` would be valid but different normalisations for the same path. 188- Comparison to other frameworks to figure out the least surprising behavior: 189 - (+) Nix itself doesn't support trailing slashes when parsing and doesn't preserve them when appending paths. 190 - (-) [Rust's std::path](https://doc.rust-lang.org/std/path/index.html) does preserve them during [construction](https://doc.rust-lang.org/std/path/struct.Path.html#method.new). 191 - (+) Doesn't preserve them when returning individual [components](https://doc.rust-lang.org/std/path/struct.Path.html#method.components). 192 - (+) Doesn't preserve them when [canonicalizing](https://doc.rust-lang.org/std/path/struct.Path.html#method.canonicalize). 193 - (+) [Python 3's pathlib](https://docs.python.org/3/library/pathlib.html#module-pathlib) doesn't preserve them during [construction](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath). 194 - Notably it represents the individual components as a list internally. 195 - (-) [Haskell's filepath](https://hackage.haskell.org/package/filepath-1.4.100.0) has [explicit support](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html#g:6) for handling trailing slashes. 196 - (-) Does preserve them for [normalisation](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html#v:normalise). 197 - (-) [NodeJS's Path library](https://nodejs.org/api/path.html) preserves trailing slashes for [normalisation](https://nodejs.org/api/path.html#pathnormalizepath). 198 - (+) For [parsing a path](https://nodejs.org/api/path.html#pathparsepath) into its significant elements, trailing slashes are not preserved. 199- (+) Nix's builtin function `dirOf` gives an unexpected result for paths with trailing slashes: `dirOf "foo/bar/" == "foo/bar"`. 200 Inconsistently, `baseNameOf` works correctly though: `baseNameOf "foo/bar/" == "bar"`. 201 - (-) We are writing a path library to improve handling of paths though, so we shouldn't use these functions and discourage their use. 202- (-) Unexpected result when normalising intermediate paths, like `relative.normalise ("foo" + "/") + "bar" == "foobar"`. 203 - (+) This is not a practical use case though. 204 - (+) Don't use `+` to append paths, this library has a `join` function for that. 205 - (-) Users might use `+` out of habit though. 206- (+) The `realpath` command also removes trailing slashes. 207- (+) Even with a trailing slash, the path is the same, it's only an indication that it's a directory. 208 209</details> 210 211### Prefer returning subpaths over components 212[subpath-preference]: #prefer-returning-subpaths-over-components 213 214Observing: Functions could return subpaths or lists of path component strings. 215 216Considering: Subpaths are used as inputs for some functions. 217Using them for outputs, too, makes the library more consistent and composable. 218 219Decision: Subpaths should be preferred over list of path component strings. 220 221<details> 222<summary>Arguments</summary> 223 224- (+) It is consistent with functions accepting subpaths, making the library more composable 225- (-) It is less efficient when the components are needed, because after creating the normalised subpath string, it will have to be parsed into components again 226 - (+) If necessary, we can still make it faster by adding builtins to Nix 227 - (+) Alternatively if necessary, versions of these functions that return components could later still be introduced. 228- (+) It makes the path library simpler because there's only two types (paths and subpaths). 229 Only `lib.path.subpath.components` can be used to get a list of components. 230 And once we have a list of component strings, `lib.lists` and `lib.strings` can be used to operate on them. 231 For completeness, `lib.path.subpath.join` allows converting the list of components back to a subpath. 232</details> 233 234## Other implementations and references 235 236- [Rust](https://doc.rust-lang.org/std/path/struct.Path.html) 237- [Python](https://docs.python.org/3/library/pathlib.html) 238- [Haskell](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html) 239- [Nodejs](https://nodejs.org/api/path.html) 240- [POSIX.1-2017](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html)