Thicket data repository for the EEG
at main 11 kB view raw
1{ 2 "id": "https://www.tunbury.org/2025/06/24/opam2web", 3 "title": "Improve the deployment time for opam2web", 4 "link": "https://www.tunbury.org/2025/06/24/opam2web/", 5 "updated": "2025-06-24T00:00:00", 6 "published": "2025-06-24T00:00:00", 7 "summary": "The opam2web image for opam.ocaml.org is huge weighing in at more than 25 GB. The bulk of this data is opam archives, which are updated and copied into a stock caddy image.", 8 "content": "<p>The opam2web image for <a href=\"https://opam.ocaml.org\">opam.ocaml.org</a> is huge weighing in at more than 25 GB. The bulk of this data is opam archives, which are updated and copied into a stock caddy image.</p>\n\n<p>There are two archives, <code>ocaml/opam.ocaml.org-legacy</code>, which hasn’t changed for 5 years and holds the cache for opam 1.x and <code>ocaml/opam:archive</code>, which is updated weekly.</p>\n\n<p>The current <code>Dockerfile</code> copies these files into a new layer each time opam2web builds.</p>\n\n<div><div><pre><code><span>FROM</span><span> </span><span>--platform=linux/amd64 ocaml/opam:archive</span><span> </span><span>as</span><span> </span><span>opam-archive</span>\n<span>FROM</span><span> </span><span>ocaml/opam.ocaml.org-legacy</span><span> </span><span>as</span><span> </span><span>opam-legacy</span>\n<span>FROM</span><span> </span><span>alpine:3.20</span><span> </span><span>as</span><span> </span><span>opam2web</span>\n...\n<span>COPY</span><span> --from=opam-legacy . /www</span>\n...\n<span>RUN </span><span>--mount</span><span>=</span><span>type</span><span>=</span><span>bind</span>,target<span>=</span>/cache,from<span>=</span>opam-archive rsync <span>-aH</span> /cache/cache/ /www/cache/\n...\n</code></pre></div></div>\n\n<p>And later, the entire <code>/www</code> structure is copied into a <code>caddy:2.8.4</code> image.</p>\n\n<div><div><pre><code><span>FROM</span><span> caddy:2.8.4</span>\n<span>WORKDIR</span><span> /srv</span>\n<span>COPY</span><span> --from=opam2web /www /usr/share/caddy</span>\n<span>COPY</span><span> Caddyfile /etc/caddy/Caddyfile</span>\n<span>ENTRYPOINT</span><span> [\"caddy\", \"run\", \"--config\", \"/etc/caddy/Caddyfile\", \"--adapter\", \"caddyfile\"]</span>\n</code></pre></div></div>\n\n<p>This method is considered “best practice” when creating Docker images, but in this case, it produces a very large image, which takes a long time to deploy.</p>\n\n<p>For Docker to use an existing layer, we need the final <code>FROM ...</code> to be the layer we want to use as the base. In the above snippet, the <code>caddy:2.8.4</code> layer will be the base layer and will be reused.</p>\n\n<p>The archive, <code>ocaml/opam:archive</code>, is created by this Dockerfile, which ultimately uses <code>alpine:latest</code>.</p>\n\n<div><div><pre><code><span>FROM</span><span> </span><span>ocaml/opam:archive</span><span> </span><span>AS</span><span> </span><span>opam-archive</span>\n<span>FROM</span><span> </span><span>ocurrent/opam-staging@sha256:f921cd51dda91f61a52a2c26a8a188f8618a2838e521d3e4afa3ca1da637903e</span><span> </span><span>AS</span><span> </span><span>archive</span>\n<span>WORKDIR</span><span> /home/opam/opam-repository</span>\n<span>RUN </span><span>--mount</span><span>=</span><span>type</span><span>=</span><span>bind</span>,target<span>=</span>/cache,from<span>=</span>opam-archive rsync <span>-aH</span> /cache/cache/ /home/opam/opam-repository/cache/\n<span>RUN </span>opam admin cache <span>--link</span><span>=</span>/home/opam/opam-repository/cache\n\n<span>FROM</span><span> alpine:latest</span>\n<span>COPY</span><span> --chown=0:0 --from=archive [ \"/home/opam/opam-repository/cache\", \"/cache\" ]</span>\n</code></pre></div></div>\n\n<p>In our opam2web build, we could use <code>FROM ocaml/opam:archive</code> and then <code>apk add caddy</code>, which would reuse the entire 15GB layer and add the few megabytes for <code>caddy</code>.</p>\n\n<p><code>ocaml/opam.ocaml.org-legacy</code> is another 8GB. This legacy data could be integrated by adding it to <code>ocaml/opam:archive</code> in a different directory to ensure compatibility with anyone else using this image. This is <a href=\"https://github.com/ocurrent/docker-base-images/pull/324\">PR#324</a></p>\n\n<div><div><pre><code> <span>let</span> <span>install_package_archive</span> <span>opam_image</span> <span>=</span>\n <span>let</span> <span>open</span> <span>Dockerfile</span> <span>in</span>\n<span>+</span> <span>from</span> <span>~</span><span>alias</span><span>:</span><span>\"opam-legacy\"</span> <span>\"ocaml/opam.ocaml.org-legacy\"</span> <span>@@</span>\n <span>from</span> <span>~</span><span>alias</span><span>:</span><span>\"opam-archive\"</span> <span>\"ocaml/opam:archive\"</span> <span>@@</span>\n <span>from</span> <span>~</span><span>alias</span><span>:</span><span>\"archive\"</span> <span>opam_image</span> <span>@@</span>\n <span>workdir</span> <span>\"/home/opam/opam-repository\"</span> <span>@@</span>\n <span>run</span> <span>~</span><span>mounts</span><span>:</span><span>[</span><span>mount_bind</span> <span>~</span><span>target</span><span>:</span><span>\"/cache\"</span> <span>~</span><span>from</span><span>:</span><span>\"opam-archive\"</span> <span>()</span><span>]</span> <span>\"rsync -aH /cache/cache/ /home/opam/opam-repository/cache/\"</span> <span>@@</span>\n <span>run</span> <span>\"opam admin cache --link=/home/opam/opam-repository/cache\"</span> <span>@@</span>\n <span>from</span> <span>\"alpine:latest\"</span> <span>@@</span>\n<span>+</span> <span>copy</span> <span>~</span><span>chown</span><span>:</span><span>\"0:0\"</span> <span>~</span><span>from</span><span>:</span><span>\"opam-legacy\"</span> <span>~</span><span>src</span><span>:</span><span>[</span><span>\"/\"</span><span>]</span> <span>~</span><span>dst</span><span>:</span><span>\"/legacy\"</span> <span>()</span> <span>@@</span>\n <span>copy</span> <span>~</span><span>chown</span><span>:</span><span>\"0:0\"</span> <span>~</span><span>from</span><span>:</span><span>\"archive\"</span> <span>~</span><span>src</span><span>:</span><span>[</span><span>\"/home/opam/opam-repository/cache\"</span><span>]</span> <span>~</span><span>dst</span><span>:</span><span>\"/cache\"</span> <span>()</span>\n</code></pre></div></div>\n\n<p>Finally, we need to update <a href=\"https://github.com/ocaml-opam/opam2web\">opam2web</a> to use <code>ocaml/opam:archive</code> as the base layer rather than <code>caddy:2.8.4</code>, resulting in the final part of the <code>Dockerfile</code> looking like this.</p>\n\n<div><div><pre><code><span>FROM</span><span> ocaml/opam:archive</span>\n<span>RUN </span>apk add <span>--update</span> git curl rsync libstdc++ rdfind caddy\n<span>COPY</span><span> --from=build-opam2web /opt/opam2web /usr/local</span>\n<span>COPY</span><span> --from=build-opam-doc /usr/bin/opam-dev /usr/local/bin/opam</span>\n<span>COPY</span><span> --from=build-opam-doc /opt/opam/doc /usr/local/share/opam2web/content/doc</span>\n<span>COPY</span><span> ext/key/opam-dev-team.pgp /www/opam-dev-pubkey.pgp</span>\n<span>ADD</span><span> bin/opam-web.sh /usr/local/bin</span>\n<span>ARG</span><span> DOMAIN=opam.ocaml.org</span>\n<span>ARG</span><span> OPAM_REPO_GIT_SHA=master</span>\n<span>ARG</span><span> BLOG_GIT_SHA=master</span>\n<span>RUN </span><span>echo</span> <span>${</span><span>OPAM_REPO_GIT_SHA</span><span>}</span> <span>&gt;&gt;</span> /www/opam_git_sha\n<span>RUN </span><span>echo</span> <span>${</span><span>BLOG_GIT_SHA</span><span>}</span> <span>&gt;&gt;</span> /www/blog_git_sha\n<span>RUN </span>/usr/local/bin/opam-web.sh <span>${</span><span>DOMAIN</span><span>}</span> <span>${</span><span>OPAM_REPO_GIT_SHA</span><span>}</span> <span>${</span><span>BLOG_GIT_SHA</span><span>}</span>\n<span>WORKDIR</span><span> /srv</span>\n<span>COPY</span><span> Caddyfile /etc/caddy/Caddyfile</span>\n<span>ENTRYPOINT</span><span> [\"caddy\", \"run\", \"--config\", \"/etc/caddy/Caddyfile\", \"--adapter\", \"caddyfile\"]</span>\n</code></pre></div></div>\n\n<p>I acknowledge that this final image now contains some extra unneeded packages such as <code>git</code>, <code>curl</code>, etc, but this seems a minor inconvenience.</p>\n\n<p>The <code>Caddyfile</code> can be adjusted to make everything still appear to be in the same place:</p>\n\n<div><div><pre><code>:80 {\n\tredir /install.sh https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh\n\tredir /install.ps1 https://raw.githubusercontent.com/ocaml/opam/master/shell/install.ps1\n\n\t@version_paths path /1.1/* /1.2.0/* /1.2.2/*\n\thandle @version_paths {\n\t\troot * /legacy\n\t\tfile_server\n\t}\n\n\thandle /cache/* {\n\t\troot * /\n\t\tfile_server\n\t}\n\n\thandle {\n\t\troot * /www\n\t\tfile_server\n\t}\n}\n</code></pre></div></div>\n\n<p>In this configuration, the Docker <em>push</em> is only 650MB rather than 25GB.</p>\n\n<p>The changes to opam2web are in <a href=\"https://github.com/ocaml-opam/opam2web/pull/245\">PR#245</a></p>\n\n<p>Test with some external URLs:</p>\n\n<ul>\n <li><a href=\"https://staging.opam.ocaml.org/index.tar.gz\">https://staging.opam.ocaml.org/index.tar.gz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/archives/0install.2.18/0install-2.18.tbz\">https://staging.opam.ocaml.org/archives/0install.2.18/0install-2.18.tbz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/cache/0install.2.18/0install-2.18.tbz\">https://staging.opam.ocaml.org/cache/0install.2.18/0install-2.18.tbz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/1.2.2/archives/0install.2.12.3+opam.tar.gz\">https://staging.opam.ocaml.org/1.2.2/archives/0install.2.12.3+opam.tar.gz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/1.2.0/archives/0install.2.12.1+opam.tar.gz\">https://staging.opam.ocaml.org/1.2.0/archives/0install.2.12.1+opam.tar.gz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/1.1/archives/0install.2.10+opam.tar.gz\">https://staging.opam.ocaml.org/1.1/archives/0install.2.10+opam.tar.gz</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/opam_git_sha\">https://staging.opam.ocaml.org/opam_git_sha</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/blog_git_sha\">https://staging.opam.ocaml.org/blog_git_sha</a></li>\n <li><a href=\"https://staging.opam.ocaml.org/opam-dev-pubkey.pgp\">https://staging.opam.ocaml.org/opam-dev-pubkey.pgp</a></li>\n</ul>", 9 "content_type": "html", 10 "author": { 11 "name": "Mark Elvers", 12 "email": "mark.elvers@tunbury.org", 13 "uri": null 14 }, 15 "categories": [ 16 "opam", 17 "tunbury.org" 18 ], 19 "source": "https://www.tunbury.org/atom.xml" 20}