Thicket data repository for the EEG
1{
2 "id": "https://www.tunbury.org/2025/05/13/ubuntu-apparmor",
3 "title": "Ubuntu 24.04 runc issues with AppArmor",
4 "link": "https://www.tunbury.org/2025/05/13/ubuntu-apparmor/",
5 "updated": "2025-05-13T12:00:00",
6 "published": "2025-05-13T12:00:00",
7 "summary": "Patrick reported issues with OCaml-CI running tests on ocaml-ppx.",
8 "content": "<p>Patrick reported issues with OCaml-CI running tests on <code>ocaml-ppx</code>.</p>\n\n<blockquote>\n <p>Fedora seems to be having some issues: https://ocaml.ci.dev/github/ocaml-ppx/ppxlib/commit/0d6886f5bcf22287a66511817e969965c888d2b7/variant/fedora-40-5.3_opam-2.3</p>\n <div><div><pre><code>sudo: PAM account management error: Authentication service cannot retrieve authentication info\nsudo: a password is required\n\"/usr/bin/env\" \"bash\" \"-c\" \"sudo dnf install -y findutils\" failed with exit status 1\n2025-05-12 08:55.09: Job failed: Failed: Build failed\n</code></pre></div> </div>\n</blockquote>\n\n<p>I took this problem at face value and replied that the issue would be related to Fedora 40, which is EOL. I created <a href=\"https://github.com/ocurrent/ocaml-ci/pull/1011\">PR#1011</a> for OCaml-CI and deployed it. However, the problem didn’t go away. We were now testing Fedora 42, but jobs were still failing. I created a minimal obuilder job specification:</p>\n\n<div><div><pre><code>((from ocaml/opam:fedora-42-ocaml-4.14@sha256:475a852401de7d578efec2afce4384d87b505f5bc610dc56f6bde3b87ebb7664)\n(user (uid 1000) (gid 1000))\n(run (shell \"sudo ln -f /usr/bin/opam-2.3 /usr/bin/opam\")))\n</code></pre></div></div>\n\n<p>Submitting the job to the cluster showed it worked on all machines except for <code>bremusa</code>.</p>\n\n<div><div><pre><code><span>$ </span>ocluster-client submit-obuilder <span>--connect</span> mtelvers.cap <span>--pool</span> linux-x86_64 <span>--local-file</span> fedora-42.spec\nTailing log:\nBuilding on bremusa.ocamllabs.io\n\n<span>(</span>from ocaml/opam:fedora-42-ocaml-4.14@sha256:475a852401de7d578efec2afce4384d87b505f5bc610dc56f6bde3b87ebb7664<span>)</span>\n2025-05-12 16:55.42 <span>---</span><span>></span> using <span>\"aefb7551cd0db7b5ebec7e244d5637aef02ab3f94c732650de7ad183465adaa0\"</span> from cache\n\n/: <span>(</span>user <span>(</span>uid 1000<span>)</span> <span>(</span>gid 1000<span>))</span>\n\n/: <span>(</span>run <span>(</span>shell <span>\"sudo ln -f /usr/bin/opam-2.3 /usr/bin/opam\"</span><span>))</span>\n<span>sudo</span>: PAM account management error: Authentication service cannot retrieve authentication info\n<span>sudo</span>: a password is required\n<span>\"/usr/bin/env\"</span> <span>\"bash\"</span> <span>\"-c\"</span> <span>\"sudo ln -f /usr/bin/opam-2.3 /usr/bin/opam\"</span> failed with <span>exit </span>status 1\nFailed: Build failed.\n</code></pre></div></div>\n\n<p>Changing the image to <code>opam:debian-12-ocaml-4.14</code> worked, so the issue only affects Fedora images and only on <code>bremusa</code>. I was able to reproduce the issue directly using <code>runc</code>.</p>\n\n<div><div><pre><code><span># runc run test</span>\n<span>sudo</span>: PAM account management error: Authentication service cannot retrieve authentication info\n<span>sudo</span>: a password is required\n</code></pre></div></div>\n\n<p>Running <code>ls -l /etc/shadow</code> in the container showed that the permissions on <code>/etc/shadow</code> are 000. If these are changed to <code>640</code>, then <code>sudo</code> works correctly. Permissions are set 000 for <code>/etc/shadow</code> in some distributions as access is limited to processes with the capability <code>DAC_OVERRIDE</code>.</p>\n\n<p>Having seen a permission issue with <code>runc</code> and <code>libseccomp</code> compatibility <a href=\"https://github.com/ocaml/infrastructure/issues/121\">before</a>, I went down a rabbit hole investigating that. Ultimately, I compiled <code>runc</code> without <code>libseccomp</code> support, <code>make MAKETAGS=\"\"</code>, and this still had the same issue.</p>\n\n<p>All the machines in the <code>linux-x86_64</code> pool are running Ubuntu 22.04 except for <code>bremusa</code>. I configured a spare machine with Ubuntu 24.04 and tested. The problem appeared on this machine as well.</p>\n\n<p>Is there a change in Ubuntu 24.04?</p>\n\n<p>I temporarily disabled AppArmor by editing <code>/etc/default/grub</code> and added <code>apparmor=0</code> to <code>GRUB_CMDLINE_LINUX</code>, ran <code>update-grub</code> and rebooted. Disabling AppArmor entirely like this can create security vulnerabilities, so this isn’t recommended, but it did clear the issue.</p>\n\n<p>After enabling AppArmor again, I disabled the configuration for <code>runc</code> by running:</p>\n\n<div><div><pre><code><span>ln</span> <span>-s</span> /etc/apparmor.d/runc /etc/apparmor.d/disable/\napparmor_parser <span>-R</span> /etc/apparmor.d/runc\n</code></pre></div></div>\n\n<p>This didn’t help - in fact, this was worse as now <code>runc</code> couldn’t run at all. I restored the configuration and added <code>capability dac_override</code>, but this didn’t help either.</p>\n\n<p>Looking through the profiles with <code>grep shadow -r /etc/apparmor.d</code>, I noticed <code>unix-chkpwd</code>, which could be the source of the issue. I disabled this profile and the issue was resolved.</p>\n\n<div><div><pre><code><span>ln</span> <span>-s</span> /etc/apparmor.d/unix-chkpwd /etc/apparmor.d/disable\napparmor_parser <span>-R</span> /etc/apparmor.d/unix-chkpwd\n</code></pre></div></div>\n\n<p>Armed with the answer, it’s pretty easy to find other people with related issues:</p>\n<ul>\n <li>https://github.com/docker/build-push-action/issues/1302</li>\n <li>https://github.com/moby/moby/issues/48734</li>\n</ul>",
9 "content_type": "html",
10 "author": {
11 "name": "Mark Elvers",
12 "email": "mark.elvers@tunbury.org",
13 "uri": null
14 },
15 "categories": [
16 "Ubuntu,runc,AppArmor",
17 "tunbury.org"
18 ],
19 "source": "https://www.tunbury.org/atom.xml"
20}