Thicket data repository for the EEG
1{
2 "id": "https://www.tunbury.org/2025/05/15/zfs-system-concept",
3 "title": "ZFS System Concept",
4 "link": "https://www.tunbury.org/2025/05/15/zfs-system-concept/",
5 "updated": "2025-05-15T00:00:00",
6 "published": "2025-05-15T00:00:00",
7 "summary": "How would the distributed ZFS storage system look in practical terms? Each machine with a ZFS store would have an agent application installed. Centrally, there would be a tracker server, and users would interact with the system using a CLI tool. The elements will interact with each other using Capt’n Proto capability files.",
8 "content": "<p>How would the distributed ZFS storage system look in practical terms? Each machine with a ZFS store would have an agent application installed. Centrally, there would be a tracker server, and users would interact with the system using a CLI tool. The elements will interact with each other using Capt’n Proto capability files.</p>\n\n<h1>Tracker</h1>\n\n<p>The tracker would generate capability files on first invocation, one per <em>location</em>, where the location could be as granular as a specific rack in a datacenter or a larger grouping, such as at the institution level. The purpose of the location grouping is to allow users to see where the data is held. As a prototype, the command could be something like:</p>\n\n<div><div><pre><code>tracker --capnp-listen-address tcp:1.2.3.4:1234 --locations datacenter-01,datacenter-02,datacenter-03\n</code></pre></div></div>\n\n<h1>Agent</h1>\n\n<p>Each machine would have the agent application. The agent would register with the tracker using the capability file generated by the tracker. The agent command line would be used to provide a list of zpools, that are in scope for management. The zpools will be scanned to compile a list of available datasets, which will be passed to the tracker. Perhaps an invocation like this:</p>\n\n<div><div><pre><code>agent --connect datacenter-01.cap --name machine-01 --zpools tank-01,tank-02\n</code></pre></div></div>\n\n<h1>CLI</h1>\n\n<p>The CLI tool will display the system state by connecting to the tracker. Perhaps a command like <code>cli --connect user.cap show</code>, which would output a list of datasets and where they are:</p>\n\n<div><div><pre><code>dataset-01: datacenter-01\\machine-01\\tank-01 (online), datacenter-02\\machine-03\\tank-06 (online)\ndataset-02: datacenter-01\\machine-01\\tank-02 (online), datacenter-02\\machine-04\\tank-07 (offline)\n</code></pre></div></div>\n\n<p>Another common use case would be to fetch a dataset: <code>cli --connect user.cap download dataset-02</code>. This would set up a <code>zfs send | zfs receive</code> between the agent and the current machine.</p>\n\n<p>Potentially, all machines would run the agent, and rather than <code>download</code>, we would initiate a <code>copy</code> of a dataset to another location in the form <code>datacenter\\machine\\tank</code>.</p>",
9 "content_type": "html",
10 "author": {
11 "name": "Mark Elvers",
12 "email": "mark.elvers@tunbury.org",
13 "uri": null
14 },
15 "categories": [
16 "openzfs",
17 "tunbury.org"
18 ],
19 "source": "https://www.tunbury.org/atom.xml"
20}