mte/2025_03_30_box-diff.json at main · anil.recoil.org/thicket-eeg

Thicket data repository for the EEG
thicket-eeg / mte / 2025_03_30_box-diff.json
at main 4.0 kB view raw
 1{
 2  "id": "https://www.tunbury.org/2025/03/30/box-diff",
 3  "title": "Box Diff Tool",
 4  "link": "https://www.tunbury.org/2025/03/30/box-diff/",
 5  "updated": "2025-03-30T00:00:00",
 6  "published": "2025-03-30T00:00:00",
 7  "summary": "Box has an unlimited storage model but has an upload limit of 1TB per month. I have been uploading various data silos but would now like to verify that the data is all present. Box has an extensive API, but I only need the list items in folder call.",
 8  "content": "<p>Box has an unlimited storage model but has an upload limit of 1TB per month. I have been uploading various data silos but would now like to verify that the data is all present. Box has an extensive <a href=\"https://developer.box.com/reference/\">API</a>, but I only need the <a href=\"https://developer.box.com/reference/get-folders-id-items/\">list items in folder</a> call.</p>\n\n<p>The list-items call assumes that you have a folder ID which you would like to query. The root of the tree is always ID 0. To check for the presence of file <code>foo</code> in a folder tree <code>a/b/c/foo</code>, we need to call the API with folder ID 0. This returns a list of entries in that folder.  e.g.</p>\n\n<div><div><pre><code><span>{</span><span>\n  </span><span>\"entries\"</span><span>:</span><span> </span><span>[</span><span>\n    </span><span>{</span><span>\n      </span><span>\"id\"</span><span>:</span><span> </span><span>\"12345\"</span><span>,</span><span>\n      </span><span>\"type\"</span><span>:</span><span> </span><span>\"folder\"</span><span>,</span><span>\n      </span><span>\"name\"</span><span>:</span><span> </span><span>\"a\"</span><span>\n    </span><span>}</span><span>\n  </span><span>]</span><span>\n</span><span>}</span><span>\n</span></code></pre></div></div>\n\n<p>The API must now be called again with the new ID number to get the contents of folder <code>a</code>. This is repeated until we finally have the entries for folder <code>c</code> which would contain the file itself. I have used a <code>Hashtbl</code> to cache the results of each call.</p>\n\n<div><div><pre><code><span>{</span><span>\n  </span><span>\"entries\"</span><span>:</span><span> </span><span>[</span><span>\n    </span><span>{</span><span>\n      </span><span>\"id\"</span><span>:</span><span> </span><span>\"78923434\"</span><span>,</span><span>\n      </span><span>\"type\"</span><span>:</span><span> </span><span>\"file\"</span><span>,</span><span>\n      </span><span>\"name\"</span><span>:</span><span> </span><span>\"foo\"</span><span>\n    </span><span>}</span><span>\n  </span><span>]</span><span>\n</span><span>}</span><span>\n</span></code></pre></div></div>\n\n<p>Each call defaults to returning at most 100 entries. This can be increased to a maximum of 1000 by passing <code>?limit=1000</code> to the GET request. For more results, Box offers two pagination systems: <code>offset</code> and <code>marker</code>. Offset allows you to pass a starting item number along with the call, but this is limited to 10,000 entries.</p>\n\n<blockquote>\n  <p>Queries with offset parameter value exceeding 10000 will be rejected with a 400 response.</p>\n</blockquote>\n\n<p>To deal with folders of any size, we should use the marker system. For this, we pass <code>?usemarker=true</code> to the first GET request, which causes the API to return <code>next_marker</code> and <code>prev_marker</code> as required as additional JSON properties. Subsequent calls would use <code>?usemarker=true&amp;marker=XXX</code>. The end is detected by the absence of the <code>next_marker</code> when no more entries are available.</p>\n\n<p>The project can be found on GitHub in <a href=\"https://github.com/mtelvers/ocaml-box-diff\">mtelvers/ocaml-box-diff</a>.</p>",
 9  "content_type": "html",
10  "author": {
11    "name": "Mark Elvers",
12    "email": "mark.elvers@tunbury.org",
13    "uri": null
14  },
15  "categories": [
16    "OCaml,Box",
17    "tunbury.org"
18  ],
19  "source": "https://www.tunbury.org/atom.xml"
20}