Thicket data repository for the EEG
1{
2 "id": "https://www.tunbury.org/2025/08/01/program-specification",
3 "title": "OCaml Program Specification for Claude",
4 "link": "https://www.tunbury.org/2025/08/01/program-specification/",
5 "updated": "2025-08-01T00:00:00",
6 "published": "2025-08-01T00:00:00",
7 "summary": "I have a dataset that I would like to visualise using a static website hosted on GitHub Pages. The application that generates the dataset is still under development, which results in frequently changing data formats. Therefore, rather than writing a static website generator and needing to revise it continually, could I write a specification and have Claude create a new one each time there was a change?",
8 "content": "<p>I have a dataset that I would like to visualise using a static website hosted on GitHub Pages. The application that generates the dataset is still under development, which results in frequently changing data formats. Therefore, rather than writing a static website generator and needing to revise it continually, could I write a specification and have Claude create a new one each time there was a change?</p>\n\n<p>Potentially, I could do this cumulatively by giving Claude the original specification and code and then the new specification, but my chosen approach is to see if Claude can create the application in one pass from the specification. I’ve also chosen to do this using Claude Sonnet’s web interface; obviously, the code I will request will be in OCaml.</p>\n\n<p>I wrote a detailed 500-word specification that included the file formats involved, example directory tree layouts, and what I thought was a clear definition of the output file structure.</p>\n\n<p>The resulting code wasn’t what I wanted: Claude had inlined huge swathes of HTML and was using <code>Printf.sprintf</code> extensively. Each file included the stylesheet as a <code><style>...</style></code>. However, the biggest problem was that Claude had chosen to write the JSON parser from scratch, and this code had numerous issues and wouldn’t even build. I directed Claude to use <code>yojson</code> rather than handcraft a parser.</p>\n\n<p>I intended but did not state in my specification that I wanted the code to generate HTML using <code>tyxml</code>. I updated my specification, requesting that the code be written using <code>tyxml</code>, <code>yojson</code>, and <code>timedesc</code> to handle the ISO date format. I also thought of some additional functionality around extracting data from a Git repo.</p>\n\n<p>Round 2 - Possibly a step backwards as Claude struggled to find the appropriate functions in the <code>timedesc</code> library to parse and sort dates. There were also some issues extracting data using <code>git</code>. I have to take responsibility here as I gave the example command as <code>git show --date=iso-strict ce03608b4ba656c052ef5e868cf34b9e86d02aac -C /path/to/repo</code>, but <code>git</code> requires the <code>-C /path/to/repo</code> to precede the <code>show</code> command. However, the fact that my example had overwritten Claude’s <em>knowledge</em> was potentially interesting. Could I use this to seed facts I knew Claude would need?</p>\n\n<p>Claude still wasn’t creating a separate <code>stylesheet.css</code>.</p>\n\n<p>Round 3 - This time, I gave examples on how to use the <code>timedesc</code> library, i.e.</p>\n\n<blockquote>\n <p>To use the <code>timedesc</code> library, we can call <code>Timedesc.of_iso8601</code> to convert the Git ISO strict output to a Timedesc object and then compare it with <code>compare (Timedesc.to_timestamp_float_s b.date) (Timedesc.to_timestamp_float_s a.date)</code>.</p>\n</blockquote>\n\n<p>Also, in addition to stating that all the styles should be shared in a common <code>stylesheet.css</code>, I gave a file tree of the expected output, including the <code>stylesheet.css</code>.</p>\n\n<p>Claude now correctly used the <code>timedesc</code> library and tried to write a stylesheet. However, Claude had hallucinated a <code>css</code> and <code>css_rule</code> function in <code>tyxml</code> to do this, where none exists. Furthermore, adding the link to the stylesheet was causing problems as <code>link</code> had multiple definitions in scope and needed to be explicitly referenced as <code>Tyxml.Html.link</code>. Claude’s style was to open everything at the beginning of the file:</p>\n\n<div><div><pre><code><span>open</span> <span>Yojson</span><span>.</span><span>Safe</span>\n<span>open</span> <span>Yojson</span><span>.</span><span>Safe</span><span>.</span><span>Util</span>\n<span>open</span> <span>Tyxml</span><span>.</span><span>Html</span>\n<span>open</span> <span>Printf</span> \n<span>open</span> <span>Unix</span> \n</code></pre></div></div>\n\n<p>The compiler picked <code>Unix.link</code> rather than <code>Tyxml.Html.link</code>:</p>\n\n<div><div><pre><code>File \"ci_generator.ml\", line 347, characters 18-33:\n347 | link ~rel:[ `Stylesheet ] ~href:\"/stylesheet.css\" ();\n ^^^^^^^^^^^^^^^\nError: The function applied to this argument has type\n ?follow:bool -> string -> unit\nThis argument cannot be applied with label ~rel\n</code></pre></div></div>\n\n<blockquote>\n <p>Stylistically, please can we only <code>open</code> things in functions where they are used: <code>let foo () = let open Tyxml.Html in ...</code>. This will avoid global opens at the top of the file and avoid any confusion where libraries have functions with the same name, e.g., <code>Unix.link</code> and <code>TyXml.Html.link</code>.</p>\n</blockquote>\n\n<p>Furthermore, I had two JSON files in my input, each with the field <code>name</code>. Claude converted these into OCaml types; however, when referencing these later as function parameters, the compiler frequently picks the wrong one. This can be <em>fixed</em> by adding a specific type to the function parameter <code>let f (t:foo) = ...</code>. I’ve cheated here and renamed the field in one of the JSON files.</p>\n\n<div><div><pre><code><span>type</span> <span>foo</span> <span>=</span> <span>{</span>\n <span>name</span> <span>:</span> <span>string</span><span>;</span>\n <span>x</span> <span>:</span> <span>string</span><span>;</span>\n<span>}</span>\n\n<span>type</span> <span>bar</span> <span>=</span> <span>{</span>\n <span>name</span> <span>:</span> <span>string</span><span>;</span>\n <span>y</span> <span>:</span> <span>string</span><span>;</span>\n<span>}</span>\n</code></pre></div></div>\n\n<p>Claude chose to extract the data from the Git repo using <code>git show --pretty=format:'%H|%ai|%s'</code>, this ignores the <code>--date=iso-strict</code> directive. The correct format should be <code>%aI</code>. I updated my guidance on the use of <code>git show</code>.</p>\n\n<p>My specification now comes in just under 1000 words. From that single specification document, Claude produces a valid OCaml program on the first try, which builds the static site as per my design. <code>wc -l</code> shows me there are 662 lines of code.</p>\n\n<p>It’s amusing to run it more than once to see the variations in styling!</p>",
9 "content_type": "html",
10 "author": {
11 "name": "Mark Elvers",
12 "email": "mark.elvers@tunbury.org",
13 "uri": null
14 },
15 "categories": [
16 "opam",
17 "tunbury.org"
18 ],
19 "source": "https://www.tunbury.org/atom.xml"
20}