Thicket data repository for the EEG
at main 7.2 kB view raw
1{ 2 "id": "https://lucasma8795.github.io/blog/2025/07/11/effects-scheduling-w02", 3 "title": "Effects-based scheduling for the OCaml compiler - w02", 4 "link": "https://lucasma8795.github.io/blog/2025/07/11/effects-scheduling-w02.html", 5 "updated": "2025-07-11T08:00:00", 6 "published": "2025-07-11T08:00:00", 7 "summary": "Hours of refactoring and bug-fixing later, I was able to get the OCaml compiler to invoke itself in another process to compile a missing dependency, then resume the compilation process as usual.", 8 "content": "<p>Hours of refactoring and bug-fixing later, I was able to get the OCaml compiler to invoke itself in another process to compile a missing dependency, then resume the compilation process as usual.</p>\n\n<p>More specifically, consider the two <code>.ml</code> files below (and their corresponding <code>.mli</code> interface files, omitted):</p>\n\n<div><div><pre><code><span>(* foo.ml *)</span>\n<span>let</span> <span>bar</span> <span>=</span> <span>42</span>\n\n<span>(* program.ml *)</span>\n<span>let</span> <span>()</span> <span>=</span> <span>Printf</span><span>.</span><span>printf</span> <span>\"%d\"</span> <span>Foo</span><span>.</span><span>bar</span>\n</code></pre></div></div>\n\n<p>If we invoke the compiler on <code>program.ml</code> without first compiling <code>foo.ml</code>, clearly it doesn\u2019t work: we are missing a dependency <code>foo.cmi</code>. However, if we catch the exception that would\u2019ve normally been raised by the compiler, in our effect handler:</p>\n\n<div><div><pre><code><span>effc</span> <span>=</span> <span>fun</span> <span>(</span><span>type</span> <span>c</span><span>)</span> <span>(</span><span>eff</span><span>:</span> <span>c</span> <span>Effect</span><span>.</span><span>t</span><span>)</span> <span>-&gt;</span>\n <span>match</span> <span>eff</span> <span>with</span>\n <span>(* filename -&gt; filename *)</span>\n <span>|</span> <span>Load_path</span><span>.</span><span>Find_path</span> <span>fn</span> <span>-&gt;</span>\n <span>Some</span> <span>(</span><span>fun</span> <span>(</span><span>k</span><span>:</span> <span>(</span><span>c</span><span>,</span> <span>_</span><span>)</span> <span>continuation</span><span>)</span> <span>-&gt;</span>\n <span>try</span>\n <span>Effect</span><span>.</span><span>Deep</span><span>.</span><span>continue</span> <span>k</span> <span>(</span><span>find_path</span> <span>fn</span><span>)</span>\n <span>with</span> <span>Not_found</span> <span>-&gt;</span> <span>begin</span>\n <span>(* missing dependency, we need to compile it\n imitate what find_path would normally return *)</span>\n <span>try</span>\n <span>Effect</span><span>.</span><span>Deep</span><span>.</span><span>continue</span> <span>k</span> <span>(</span><span>compile_dependency</span> <span>fn</span><span>)</span>\n <span>(* source file not found, give up *)</span>\n <span>with</span> <span>Not_found</span> <span>-&gt;</span>\n <span>Effect</span><span>.</span><span>Deep</span><span>.</span><span>discontinue</span> <span>k</span> <span>Not_found</span>\n <span>end</span>\n <span>)</span>\n \n <span>|</span> <span>...</span>\n</code></pre></div></div>\n\n<p>Invoking <code>./ocamlrun ./ocamlc -c program.ml -I ./stdlib</code>, we find a missing dependency, and <code>compile_dependency: filename -&gt; filename</code> generates the following (hopefully, portable?) command to compile our dependency <code>foo.ml</code> (we inherit the load path from the calling parent):</p>\n\n<div><div><pre><code>'runtime/ocamlrun' './ocamlc' '-c' 'foo.ml' '-I' './stdlib' '-I' ''\n</code></pre></div></div>\n\n<p>\u2026and we then resume compilation for <code>program.ml</code> with the <code>continue</code> primitive.</p>\n\n<p>Linking the object files together, we then get</p>\n\n<div><div><pre><code>\u279c ocamlrun ocamlc foo.cmo program.cmo -I stdlib -o program\n\u279c ocamlrun ./program\n42\n</code></pre></div></div>\n\n<p>as expected!</p>\n\n<p>Using the above, I was then able to trace through the Makefile and build <code>ocamlcommon.cma</code> and <code>ocamlbytecomp.cma</code>, first by building the required <code>.cmo</code> files (in no particular order, and missing <code>.cmi</code> dependencies are auto-discovered and compiled), then linking the objects in dependency order (which is something I\u2019d hope to be able to relax in the future? <a href=\"https://lucasma8795.github.io/blog/2025/07/11/effects-scheduling-w02.html#fn:1\">1</a>). With this done, we are only two commands away to produce <code>ocamlc</code>, the OCaml <a href=\"https://ocaml.org/manual/5.3/comp.html\">bytecode compiler</a>:</p>\n\n<div><div><pre><code>ocamlrun ocamlc -c driver/main.ml &lt;compiler flags&gt; &lt;load path&gt;\nocamlrun ocamlc ocamlcommon.cma ocamlbytecomp.cma driver/main.cmo -o ocamlc &lt;compiler flags&gt; &lt;load path&gt;\n</code></pre></div></div>\n\n<p>An issue that I can see coming: the <a href=\"https://ocaml.org/manual/5.2/api/compilerlibref/Load_path.html\">original</a> <code>Load_path</code> module makes the assumption that the contents of the load path don\u2019t change throughout the lifetime of the compiler process, and for a good reason: file system calls are much much slower than simply reading from memory, and so the compiler reads in the filenames and directories and caches them in memory. However, we want newly compiled dependencies to be present in the load path state to avoid compiling dependencies twice, and so it now needs to be mutable and synchronized across compiler instances.</p>\n\n<p>For now I\u2019ve added file system calls to avoid overwriting existing <code>.cmi</code> and <code>.cmo</code> files (having to synchronize load path state across independent compiler <em>processes</em> sounds like a lot of pain), but this should be quite straightforward when I eventually transition over to using <a href=\"https://ocaml.org/manual/5.1/parallelism.html\">domains</a>.</p>\n\n<p>The next step would be to work on building the rest of the targets that <code>make install</code> requires, more to come on this\u2026</p>\n\n<div>\n <ol>\n <li>\n <p>Week 5 Lucas here, turns out this was not possible! The initialization order of modules is the order of which they are linked. This is a <a href=\"https://en.wikipedia.org/wiki/Total_order\">total order</a> of the modules that respects the dependency graph, but notice that this is not unique, so in general the link order is not a function of the program text. Arbitrarily picking a valid total order also doesn\u2019t work, suppose we had some global state in <code>A</code>, with <code>B</code> and <code>C</code> both trying to read and modify that global state, then the program behaviour would depend on the link order.\u00a0<a href=\"https://lucasma8795.github.io/blog/2025/07/11/effects-scheduling-w02.html#fnref:1\">&#8617;</a></p>\n </li>\n </ol>\n</div>", 9 "content_type": "html", 10 "author": { 11 "name": "", 12 "email": null, 13 "uri": null 14 }, 15 "categories": [ 16 "ocaml-effects-scheduling" 17 ], 18 "source": "https://lucasma8795.github.io/blog/feed/ocaml-effects-scheduling.xml" 19}