lib.path: init README.md document

Adds initial work towards a `lib.path` library

Originally proposed in https://github.com/NixOS/nixpkgs/pull/200718, but has
since gone through some revisions

Co-Authored-By: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-Authored-By: Robert Hensing <robert@roberthensing.nl>

Changed files
+197
.github
lib
path
+1
.github/CODEOWNERS
···
/lib/cli.nix @edolstra @nbp @Profpatsch
/lib/debug.nix @edolstra @nbp @Profpatsch
/lib/asserts.nix @edolstra @nbp @Profpatsch
+
/lib/path.* @infinisil @fricklerhandwerk
# Nixpkgs Internals
/default.nix @nbp
+196
lib/path/README.md
···
+
# Path library
+
+
This document explains why the `lib.path` library is designed the way it is.
+
+
The purpose of this library is to process [filesystem paths]. It does not read files from the filesystem.
+
It exists to support the native Nix [path value type] with extra functionality.
+
+
[filesystem paths]: https://en.m.wikipedia.org/wiki/Path_(computing)
+
[path value type]: https://nixos.org/manual/nix/stable/language/values.html#type-path
+
+
As an extension of the path value type, it inherits the same intended use cases and limitations:
+
- Only use paths to access files at evaluation time, such as the local project source.
+
- Paths cannot point to derivations, so they are unfit to represent dependencies.
+
- A path implicitly imports the referenced files into the Nix store when interpolated to a string. Therefore paths are not suitable to access files at build- or run-time, as you risk importing the path from the evaluation system instead.
+
+
Overall, this library works with two types of paths:
+
- Absolute paths are represented with the Nix [path value type]. Nix automatically normalises these paths.
+
- Subpaths are represented with the [string value type] since path value types don't support relative paths. This library normalises these paths as safely as possible. Absolute paths in strings are not supported.
+
+
A subpath refers to a specific file or directory within an absolute base directory.
+
It is a stricter form of a relative path, notably [without support for `..` components][parents] since those could escape the base directory.
+
+
[string value type]: https://nixos.org/manual/nix/stable/language/values.html#type-string
+
+
This library is designed to be as safe and intuitive as possible, throwing errors when operations are attempted that would produce surprising results, and giving the expected result otherwise.
+
+
This library is designed to work well as a dependency for the `lib.filesystem` and `lib.sources` library components. Contrary to these library components, `lib.path` does not read any paths from the filesystem.
+
+
This library makes only these assumptions about paths and no others:
+
- `dirOf path` returns the path to the parent directory of `path`, unless `path` is the filesystem root, in which case `path` is returned.
+
- There can be multiple filesystem roots: `p == dirOf p` and `q == dirOf q` does not imply `p == q`.
+
- While there's only a single filesystem root in stable Nix, the [lazy trees feature](https://github.com/NixOS/nix/pull/6530) introduces [additional filesystem roots](https://github.com/NixOS/nix/pull/6530#discussion_r1041442173).
+
- `path + ("/" + string)` returns the path to the `string` subdirectory in `path`.
+
- If `string` contains no `/` characters, then `dirOf (path + ("/" + string)) == path`.
+
- If `string` contains no `/` characters, then `baseNameOf (path + ("/" + string)) == string`.
+
- `path1 == path2` returns `true` only if `path1` points to the same filesystem path as `path2`.
+
+
Notably we do not make the assumption that we can turn paths into strings using `toString path`.
+
+
## Design decisions
+
+
Each subsection here contains a decision along with arguments and counter-arguments for (+) and against (-) that decision.
+
+
### Leading dots for relative paths
+
[leading-dots]: #leading-dots-for-relative-paths
+
+
Observing: Since subpaths are a form of relative paths, they can have a leading `./` to indicate it being a relative path, this is generally not necessary for tools though.
+
+
Considering: Paths should be as explicit, consistent and unambiguous as possible.
+
+
Decision: Returned subpaths should always have a leading `./`.
+
+
<details>
+
<summary>Arguments</summary>
+
+
- (+) In shells, just running `foo` as a command wouldn't execute the file `foo`, whereas `./foo` would execute the file. In contrast, `foo/bar` does execute that file without the need for `./`. This can lead to confusion about when a `./` needs to be prefixed. If a `./` is always included, this becomes a non-issue. This effectively then means that paths don't overlap with command names.
+
- (+) Prepending with `./` makes the subpaths always valid as relative Nix path expressions.
+
- (+) Using paths in command line arguments could give problems if not escaped properly, e.g. if a path was `--version`. This is not a problem with `./--version`. This effectively then means that paths don't overlap with GNU-style command line options.
+
- (-) `./` is not required to resolve relative paths, resolution always has an implicit `./` as prefix.
+
- (-) It's less noisy without the `./`, e.g. in error messages.
+
- (+) But similarly, it could be confusing whether something was even a path.
+
e.g. `foo` could be anything, but `./foo` is more clearly a path.
+
- (+) Makes it more uniform with absolute paths (those always start with `/`).
+
- (-) That is not relevant for practical purposes.
+
- (+) `find` also outputs results with `./`.
+
- (-) But only if you give it an argument of `.`. If you give it the argument `some-directory`, it won't prefix that.
+
- (-) `realpath --relative-to` doesn't prefix relative paths with `./`.
+
- (+) There is no need to return the same result as `realpath`.
+
+
</details>
+
+
### Representation of the current directory
+
[curdir]: #representation-of-the-current-directory
+
+
Observing: The subpath that produces the base directory can be represented with `.` or `./` or `./.`.
+
+
Considering: Paths should be as consistent and unambiguous as possible.
+
+
Decision: It should be `./.`.
+
+
<details>
+
<summary>Arguments</summary>
+
+
- (+) `./` would be inconsistent with [the decision to not persist trailing slashes][trailing-slashes].
+
- (-) `.` is how `realpath` normalises paths.
+
- (+) `.` can be interpreted as a shell command (it's a builtin for sourcing files in `bash` and `zsh`).
+
- (+) `.` would be the only path without a `/`. It could not be used as a Nix path expression, since those require at least one `/` to be parsed as such.
+
- (-) `./.` is rather long.
+
- (-) We don't require users to type this though, as it's only output by the library.
+
As inputs all three variants are supported for subpaths (and we can't do anything about absolute paths)
+
- (-) `builtins.dirOf "foo" == "."`, so `.` would be consistent with that.
+
- (+) `./.` is consistent with the [decision to have leading `./`][leading-dots].
+
- (+) `./.` is a valid Nix path expression, although this property does not hold for every relative path or subpath.
+
+
</details>
+
+
### Subpath representation
+
[relrepr]: #subpath-representation
+
+
Observing: Subpaths such as `foo/bar` can be represented in various ways:
+
- string: `"foo/bar"`
+
- list with all the components: `[ "foo" "bar" ]`
+
- attribute set: `{ type = "relative-path"; components = [ "foo" "bar" ]; }`
+
+
Considering: Paths should be as safe to use as possible. We should generate string outputs in the library and not encourage users to do that themselves.
+
+
Decision: Paths are represented as strings.
+
+
<details>
+
<summary>Arguments</summary>
+
+
- (+) It's simpler for the users of the library. One doesn't have to convert a path a string before it can be used.
+
- (+) Naively converting the list representation to a string with `concatStringsSep "/"` would break for `[]`, requiring library users to be more careful.
+
- (+) It doesn't encourage people to do their own path processing and instead use the library.
+
With a list representation it would seem easy to just use `lib.lists.init` to get the parent directory, but then it breaks for `.`, which would be represented as `[ ]`.
+
- (+) `+` is convenient and doesn't work on lists and attribute sets.
+
- (-) Shouldn't use `+` anyways, we export safer functions for path manipulation.
+
+
</details>
+
+
### Parent directory
+
[parents]: #parent-directory
+
+
Observing: Relative paths can have `..` components, which refer to the parent directory.
+
+
Considering: Paths should be as safe and unambiguous as possible.
+
+
Decision: `..` path components in string paths are not supported, neither as inputs nor as outputs. Hence, string paths are called subpaths, rather than relative paths.
+
+
<details>
+
<summary>Arguments</summary>
+
+
- (+) If we wanted relative paths to behave according to the "physical" interpretation (as a directory tree with relations between nodes), it would require resolving symlinks, since e.g. `foo/..` would not be the same as `.` if `foo` is a symlink.
+
- (-) The "logical" interpretation is also valid (treating paths as a sequence of names), and is used by some software. It is simpler, and not using symlinks at all is safer.
+
- (+) Mixing both models can lead to surprises.
+
- (+) We can't resolve symlinks without filesystem access.
+
- (+) Nix also doesn't support reading symlinks at evaluation time.
+
- (-) We could just not handle such cases, e.g. `equals "foo" "foo/bar/.. == false`. The paths are different, we don't need to check whether the paths point to the same thing.
+
- (+) Assume we said `relativeTo /foo /bar == "../bar"`. If this is used like `/bar/../foo` in the end, and `bar` turns out to be a symlink to somewhere else, this won't be accurate.
+
- (-) We could decide to not support such ambiguous operations, or mark them as such, e.g. the normal `relativeTo` will error on such a case, but there could be `extendedRelativeTo` supporting that.
+
- (-) `..` are a part of paths, a path library should therefore support it.
+
- (+) If we can convincingly argue that all such use cases are better done e.g. with runtime tools, the library not supporting it can nudge people towards using those.
+
- (-) We could allow "..", but only in the prefix.
+
- (+) Then we'd have to throw an error for doing `append /some/path "../foo"`, making it non-composable.
+
- (+) The same is for returning paths with `..`: `relativeTo /foo /bar => "../bar"` would produce a non-composable path.
+
- (+) We argue that `..` is not needed at the Nix evaluation level, since we'd always start evaluation from the project root and don't go up from there.
+
- (+) `..` is supported in Nix paths, turning them into absolute paths.
+
- (-) This is ambiguous in the presence of symlinks.
+
- (+) If you need `..` for building or runtime, you can use build-/run-time tooling to create those (e.g. `realpath` with `--relative-to`), or use absolute paths instead.
+
This also gives you the ability to correctly handle symlinks.
+
+
</details>
+
+
### Trailing slashes
+
[trailing-slashes]: #trailing-slashes
+
+
Observing: Subpaths can contain trailing slashes, like `foo/`, indicating that the path points to a directory and not a file.
+
+
Considering: Paths should be as consistent as possible, there should only be a single normalisation for the same path.
+
+
Decision: All functions remove trailing slashes in their results.
+
+
<details>
+
<summary>Arguments</summary>
+
+
- (+) It allows normalisations to be unique, in that there's only a single normalisation for the same path. If trailing slashes were preserved, both `foo/bar` and `foo/bar/` would be valid but different normalisations for the same path.
+
- Comparison to other frameworks to figure out the least surprising behavior:
+
- (+) Nix itself doesn't support trailing slashes when parsing and doesn't preserve them when appending paths.
+
- (-) [Rust's std::path](https://doc.rust-lang.org/std/path/index.html) does preserve them during [construction](https://doc.rust-lang.org/std/path/struct.Path.html#method.new).
+
- (+) Doesn't preserve them when returning individual [components](https://doc.rust-lang.org/std/path/struct.Path.html#method.components).
+
- (+) Doesn't preserve them when [canonicalizing](https://doc.rust-lang.org/std/path/struct.Path.html#method.canonicalize).
+
- (+) [Python 3's pathlib](https://docs.python.org/3/library/pathlib.html#module-pathlib) doesn't preserve them during [construction](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath).
+
- Notably it represents the individual components as a list internally.
+
- (-) [Haskell's filepath](https://hackage.haskell.org/package/filepath-1.4.100.0) has [explicit support](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html#g:6) for handling trailing slashes.
+
- (-) Does preserve them for [normalisation](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html#v:normalise).
+
- (-) [NodeJS's Path library](https://nodejs.org/api/path.html) preserves trailing slashes for [normalisation](https://nodejs.org/api/path.html#pathnormalizepath).
+
- (+) For [parsing a path](https://nodejs.org/api/path.html#pathparsepath) into its significant elements, trailing slashes are not preserved.
+
- (+) Nix's builtin function `dirOf` gives an unexpected result for paths with trailing slashes: `dirOf "foo/bar/" == "foo/bar"`.
+
Inconsistently, `baseNameOf` works correctly though: `baseNameOf "foo/bar/" == "bar"`.
+
- (-) We are writing a path library to improve handling of paths though, so we shouldn't use these functions and discourage their use.
+
- (-) Unexpected result when normalising intermediate paths, like `relative.normalise ("foo" + "/") + "bar" == "foobar"`.
+
- (+) This is not a practical use case though.
+
- (+) Don't use `+` to append paths, this library has a `join` function for that.
+
- (-) Users might use `+` out of habit though.
+
- (+) The `realpath` command also removes trailing slashes.
+
- (+) Even with a trailing slash, the path is the same, it's only an indication that it's a directory.
+
+
</details>
+
+
## Other implementations and references
+
+
- [Rust](https://doc.rust-lang.org/std/path/struct.Path.html)
+
- [Python](https://docs.python.org/3/library/pathlib.html)
+
- [Haskell](https://hackage.haskell.org/package/filepath-1.4.100.0/docs/System-FilePath.html)
+
- [Nodejs](https://nodejs.org/api/path.html)
+
- [POSIX.1-2017](https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html)