commit 86201b498f8237aac2a64e35d9fa401b634a6da5 · anil.recoil.org/slop

stack/bushel/.gitignore

···

       1
       1
       +
       _build

     

       2
       2
       +
       .*.swp

     

       3
       3
       +
       **/.claude/settings.local.json

     

       4
       4
       +
       .photos-api

     

       5
       5
       +
       .karakeep-api

     

       6
       6
       +
       KARAKEEP.md

     

       7
       7
       +
       karakeep-src

     

       8
       8
       +
       .DS_Store

     

       9
       9
       +
       .openapi-key

stack/bushel/.ocamlformat

···

       1
       1
       +
       profile=janestreet

+89

stack/bushel/README.md

···

       1
       1
       +
       # bushel - a reconstruction of livejournal for the 2020s

     

       2
       2
       +
       

     

       3
       3
       +
       **wip spec, if you dont know what this is, its probably not ready for you yet.

     

       4
       4
       +
       it is extremely work in progress**

     

       5
       5
       +
       

     

       6
       6
       +
       Bushel is an implementation of the classic ``webring'' of old times, when a

     

       7
       7
       +
       smallish group of people would collaborate to share content while maintaining

     

       8
       8
       +
       their own websites and online identities.

     

       9
       9
       +
       

     

       10
       10
       +
       In a classic webring, each node would generate its own styled HTML site and

     

       11
       11
       +
       share data via RSS or Atom.  Every node that is consuming data has to parse the

     

       12
       12
       +
       Atom feeds, figure out the peer's encodings, and then turn those back into

     

       13
       13
       +
       datastructures for the local site.

     

       14
       14
       +
       

     

       15
       15
       +
       Bushel instead provides a simpler distributed datamodel that is hopefully more

     

       16
       16
       +
       maintainable in the long term.  Bushel uses the a "resolvable Markdown" format

     

       17
       17
       +
       directly share filesystem based data structures across the webring. This avoids

     

       18
       18
       +
       having to roundtrip via Atom's XML, and allows direct sharing of typed versioned

     

       19
       19
       +
       datastructures across nodes owned by different people.  Examples of such

     

       20
       20
       +
       datastructures include blog posts, wiki entries, social media feeds, academic

     

       21
       21
       +
       papers, events and so on.  Every datastructure is versioned with migrations so

     

       22
       22
       +
       that older feeds can always be upgraded to newer feeds.

     

       23
       23
       +
       

     

       24
       24
       +
       ## Motivation

     

       25
       25
       +
       

     

       26
       26
       +
       Working with open source communities in the modern world is a tremendous

     

       27
       27
       +
       undertaking due to the sheer number of channels and lines of communication we

     

       28
       28
       +
       have to deal with for even simple interactions.  A single feature might begin

     

       29
       29
       +
       life in an email thread on a mailing list, then migrate to several GitHub

     

       30
       30
       +
       issues, show up on Hacker News for some random criticism, be pushed to a git

     

       31
       31
       +
       repository or three, be released onto a package manager, get threads on a

     

       32
       32
       +
       Discourse forum, be published as a research paper, have some Jupyter notebooks

     

       33
       33
       +
       released as a tutorial, and then a video of a conference talk released.

     

       34
       34
       +
       

     

       35
       35
       +
       The above is a pretty common example of _one_ person's workflow, but open

     

       36
       36
       +
       source rarely happens solo these days.  Instead, small groups of people

     

       37
       37
       +
       collaborate across these tasks, and this is where tracking the information flow

     

       38
       38
       +
       gets complex and manual across all the communication medium.

     

       39
       39
       +
       

     

       40
       40
       +
       ## Goals

     

       41
       41
       +
       

     

       42
       42
       +
       Bushel is intended to provide a framework for:

     

       43
       43
       +
       

     

       44
       44
       +
       - **individual** open source contributors to quickly write about their work,

     

       45
       45
       +
         and mirror it to some of these communication mediums

     

       46
       46
       +
       - **groups** of contributors to share content via bidirectional

     

       47
       47
       +
         links (e.g. release announcements or blog posts about work done together)

     

       48
       48
       +
         with the ability to work in private if they desire

     

       49
       49
       +
       - **organisations** to track feature flows among bigger groups of individuals

     

       50
       50
       +
         working together across many projects

     

       51
       51
       +
       - **projects** to pull together the contributions of individuals and

     

       52
       52
       +
         credit them appropriately, while providing a coherent feature flow

     

       53
       53
       +
         to others.

     

       54
       54
       +
       

     

       55
       55
       +
       ## Design

     

       56
       56
       +
       

     

       57
       57
       +
       The authoring workflow requires minimal work for the users by:

     

       58
       58
       +
       

     

       59
       59
       +
       - generating **content mirrored from online sources** and convert them

     

       60
       60
       +
         into a markdown format ready for human editing

     

       61
       61
       +
       - providing **markdown shortcuts** to easily reference other sources

     

       62
       62
       +
         and projects in the webring database (e.g. `@avsm` or `@avsm/bushel`)

     

       63
       63
       +
       - letting **humans quickly edit** those autogenerated files from their

     

       64
       64
       +
         perspective (e.g. a project might writeup a github release differently

     

       65
       65
       +
         from an individual that is proud of a particular feature they worked on)

     

       66
       66
       +
       - ensuring all html **endpoints are version controlled** like

     

       67
       67
       +
         a wiki, e.g. so users reference either the latest version of a blog post

     

       68
       68
       +
         or a particular revision.

     

       69
       69
       +
       - **bidirectional links** mean that if you reference someone elses content,

     

       70
       70
       +
         their site is rebuilt to reflect your link. 

     

       71
       71
       +
       

     

       72
       72
       +
       The serving workflow requires a custom server instead of just static

     

       73
       73
       +
       pages, since:

     

       74
       74
       +
       

     

       75
       75
       +
       - nodes offer GitHub oAuth to offer per-group privacy settings for content

     

       76
       76
       +
         that shouldn't immediately be public. This allows, for example, draft

     

       77
       77
       +
         blog posts to be easily shared among authors and then go public at the

     

       78
       78
       +
         same time. 

     

       79
       79
       +
       - some content can be private for a group indefinitely, for example design

     

       80
       80
       +
         discussions.  Although open source is eventually released, most small

     

       81
       81
       +
         group design is easier when done in private without the whole Internet

     

       82
       82
       +
         giving their opinions.

     

       83
       83
       +
       - content can initially be served via HTTPS, but eventually will have bridges

     

       84
       84
       +
         to other mechanisms such as a CLI, and GraphQL endpoints of the datamodel

     

       85
       85
       +
         for custom clients.

     

       86
       86
       +
       

     

       87
       87
       +
       ## Schema

     

       88
       88
       +
       

     

       89
       89
       +
       TODO filesystem schema for authoring

+127

stack/bushel/bin/bushel_bibtex.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Printf

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Generate bibtex entry from paper data *)

     

       5
       5
       +
       let generate_bibtex_entry paper =

     

       6
       6
       +
         let open Bushel.Paper in

     

       7
       7
       +
         (* Use slug as the bibtex key/label *)

     

       8
       8
       +
         let bibkey = slug paper in

     

       9
       9
       +
         let bibtype = try bibtype paper with _ -> "misc" in

     

       10
       10
       +
         let title = try title paper with _ -> "Untitled" in

     

       11
       11
       +
         let authors =

     

       12
       12
       +
           let auth_list = try authors paper with _ -> [] in

     

       13
       13
       +
           String.concat " and " auth_list

     

       14
       14
       +
         in

     

       15
       15
       +
         let year = try year paper with _ -> 0 in

     

       16
       16
       +
       

     

       17
       17
       +
         (* Build the bibtex entry *)

     

       18
       18
       +
         let buf = Buffer.create 1024 in

     

       19
       19
       +
         Buffer.add_string buf (sprintf "@%s{%s,\n" bibtype bibkey);

     

       20
       20
       +
         Buffer.add_string buf (sprintf "  title = {%s},\n" title);

     

       21
       21
       +
         Buffer.add_string buf (sprintf "  author = {%s},\n" authors);

     

       22
       22
       +
         Buffer.add_string buf (sprintf "  year = {%d}" year);

     

       23
       23
       +
       

     

       24
       24
       +
         (* Add optional fields *)

     

       25
       25
       +
         (match String.lowercase_ascii bibtype with

     

       26
       26
       +
         | "article" ->

     

       27
       27
       +
           (try

     

       28
       28
       +
             Buffer.add_string buf (sprintf ",\n  journal = {%s}" (journal paper))

     

       29
       29
       +
           with _ -> ());

     

       30
       30
       +
           (match volume paper with

     

       31
       31
       +
           | Some v -> Buffer.add_string buf (sprintf ",\n  volume = {%s}" v)

     

       32
       32
       +
           | None -> ());

     

       33
       33
       +
           (match issue paper with

     

       34
       34
       +
           | Some i -> Buffer.add_string buf (sprintf ",\n  number = {%s}" i)

     

       35
       35
       +
           | None -> ());

     

       36
       36
       +
           (match pages paper with

     

       37
       37
       +
           | "" -> ()

     

       38
       38
       +
           | p -> Buffer.add_string buf (sprintf ",\n  pages = {%s}" p))

     

       39
       39
       +
         | "inproceedings" ->

     

       40
       40
       +
           (try

     

       41
       41
       +
             Buffer.add_string buf (sprintf ",\n  booktitle = {%s}" (booktitle paper))

     

       42
       42
       +
           with _ -> ());

     

       43
       43
       +
           (match pages paper with

     

       44
       44
       +
           | "" -> ()

     

       45
       45
       +
           | p -> Buffer.add_string buf (sprintf ",\n  pages = {%s}" p));

     

       46
       46
       +
           (match publisher paper with

     

       47
       47
       +
           | "" -> ()

     

       48
       48
       +
           | p -> Buffer.add_string buf (sprintf ",\n  publisher = {%s}" p))

     

       49
       49
       +
         | "techreport" ->

     

       50
       50
       +
           (try

     

       51
       51
       +
             Buffer.add_string buf (sprintf ",\n  institution = {%s}" (institution paper))

     

       52
       52
       +
           with _ -> ());

     

       53
       53
       +
           (match number paper with

     

       54
       54
       +
           | Some n -> Buffer.add_string buf (sprintf ",\n  number = {%s}" n)

     

       55
       55
       +
           | None -> ())

     

       56
       56
       +
         | "book" ->

     

       57
       57
       +
           (match publisher paper with

     

       58
       58
       +
           | "" -> ()

     

       59
       59
       +
           | p -> Buffer.add_string buf (sprintf ",\n  publisher = {%s}" p));

     

       60
       60
       +
           (try

     

       61
       61
       +
             Buffer.add_string buf (sprintf ",\n  isbn = {%s}" (isbn paper))

     

       62
       62
       +
           with _ -> ())

     

       63
       63
       +
         | _ -> ());

     

       64
       64
       +
       

     

       65
       65
       +
         (* Add DOI if available *)

     

       66
       66
       +
         (match doi paper with

     

       67
       67
       +
         | Some d -> Buffer.add_string buf (sprintf ",\n  doi = {%s}" d)

     

       68
       68
       +
         | None -> ());

     

       69
       69
       +
       

     

       70
       70
       +
         (* Add URL if available *)

     

       71
       71
       +
         (match url paper with

     

       72
       72
       +
         | Some u -> Buffer.add_string buf (sprintf ",\n  url = {%s}" u)

     

       73
       73
       +
         | None -> ());

     

       74
       74
       +
       

     

       75
       75
       +
         Buffer.add_string buf "\n}\n";

     

       76
       76
       +
         Buffer.contents buf

     

       77
       77
       +
       

     

       78
       78
       +
       (** TODO:claude Main function to export bibtex for all papers *)

     

       79
       79
       +
       let export_bibtex base_dir output_file latest_only =

     

       80
       80
       +
         (* Load all papers *)

     

       81
       81
       +
         let bushel = Bushel.load base_dir in

     

       82
       82
       +
         let papers = Bushel.Entry.papers bushel in

     

       83
       83
       +
       

     

       84
       84
       +
         (* Filter to only latest versions if requested *)

     

       85
       85
       +
         let papers =

     

       86
       86
       +
           if latest_only then

     

       87
       87
       +
             List.filter (fun p -> p.Bushel.Paper.latest) papers

     

       88
       88
       +
           else

     

       89
       89
       +
             papers

     

       90
       90
       +
         in

     

       91
       91
       +
       

     

       92
       92
       +
         (* Sort papers by year (most recent first) *)

     

       93
       93
       +
         let papers = List.sort Bushel.Paper.compare papers in

     

       94
       94
       +
       

     

       95
       95
       +
         (* Generate bibtex for each paper *)

     

       96
       96
       +
         let bibtex_entries = List.map generate_bibtex_entry papers in

     

       97
       97
       +
         let bibtex_content = String.concat "\n" bibtex_entries in

     

       98
       98
       +
       

     

       99
       99
       +
         (* Output to file or stdout *)

     

       100
       100
       +
         match output_file with

     

       101
       101
       +
         | None ->

     

       102
       102
       +
           print_string bibtex_content;

     

       103
       103
       +
           0

     

       104
       104
       +
         | Some file ->

     

       105
       105
       +
           let oc = open_out file in

     

       106
       106
       +
           output_string oc bibtex_content;

     

       107
       107
       +
           close_out oc;

     

       108
       108
       +
           printf "Bibtex exported to %s (%d entries)\n" file (List.length papers);

     

       109
       109
       +
           0

     

       110
       110
       +
       

     

       111
       111
       +
       (** TODO:claude Command line arguments *)

     

       112
       112
       +
       let output_file_arg =

     

       113
       113
       +
         let doc = "Output file for bibtex (defaults to stdout)" in

     

       114
       114
       +
         Arg.(value & opt (some string) None & info ["o"; "output"] ~docv:"FILE" ~doc)

     

       115
       115
       +
       

     

       116
       116
       +
       let latest_only_arg =

     

       117
       117
       +
         let doc = "Export only the latest version of each paper" in

     

       118
       118
       +
         Arg.(value & flag & info ["latest"] ~doc)

     

       119
       119
       +
       

     

       120
       120
       +
       (** TODO:claude Command term *)

     

       121
       121
       +
       let term =

     

       122
       122
       +
         Term.(const export_bibtex $ Bushel_common.base_dir $ output_file_arg $ latest_only_arg)

     

       123
       123
       +
       

     

       124
       124
       +
       let cmd =

     

       125
       125
       +
         let doc = "Export bibtex for all papers" in

     

       126
       126
       +
         let info = Cmd.info "bibtex" ~doc in

     

       127
       127
       +
         Cmd.v info term

+76

stack/bushel/bin/bushel_common.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       

     

       3
       3
       +
       (** TODO:claude Get default base directory from BUSHEL_DATA env variable or current directory *)

     

       4
       4
       +
       let get_default_base_dir () =

     

       5
       5
       +
         match Sys.getenv_opt "BUSHEL_DATA" with

     

       6
       6
       +
         | Some dir -> dir

     

       7
       7
       +
         | None -> "."

     

       8
       8
       +
       

     

       9
       9
       +
       (** TODO:claude Optional base directory term with BUSHEL_DATA env variable support *)

     

       10
       10
       +
       let base_dir =

     

       11
       11
       +
         let doc = "Base directory containing Bushel data (defaults to BUSHEL_DATA env var or current directory)" in

     

       12
       12
       +
         Arg.(value & opt dir (get_default_base_dir ()) & info ["d"; "dir"] ~docv:"DIR" ~doc)

     

       13
       13
       +
       

     

       14
       14
       +
       (** TODO:claude Output directory as option *)

     

       15
       15
       +
       let output_dir ~default =

     

       16
       16
       +
         let doc = "Output directory for generated files" in

     

       17
       17
       +
         Arg.(value & opt string default & info ["o"; "output"] ~docv:"DIR" ~doc)

     

       18
       18
       +
       

     

       19
       19
       +
       (** TODO:claude URL term with custom default *)

     

       20
       20
       +
       let url_term ~default ~doc =

     

       21
       21
       +
         Arg.(value & opt string default & info ["u"; "url"] ~docv:"URL" ~doc)

     

       22
       22
       +
       

     

       23
       23
       +
       (** TODO:claude API key file term *)

     

       24
       24
       +
       let api_key_file ~default =

     

       25
       25
       +
         let doc = "File containing API key" in

     

       26
       26
       +
         Arg.(value & opt string default & info ["k"; "key-file"] ~docv:"FILE" ~doc)

     

       27
       27
       +
       

     

       28
       28
       +
       (** TODO:claude API key term *)

     

       29
       29
       +
       let api_key =

     

       30
       30
       +
         let doc = "API key for authentication" in

     

       31
       31
       +
         Arg.(value & opt (some string) None & info ["api-key"] ~docv:"KEY" ~doc)

     

       32
       32
       +
       

     

       33
       33
       +
       (** TODO:claude Overwrite flag *)

     

       34
       34
       +
       let overwrite =

     

       35
       35
       +
         let doc = "Overwrite existing files" in

     

       36
       36
       +
         Arg.(value & flag & info ["overwrite"] ~doc)

     

       37
       37
       +
       

     

       38
       38
       +
       (** TODO:claude Verbose flag *)

     

       39
       39
       +
       let verbose =

     

       40
       40
       +
         let doc = "Enable verbose output" in

     

       41
       41
       +
         Arg.(value & flag & info ["v"; "verbose"] ~doc)

     

       42
       42
       +
       

     

       43
       43
       +
       (** TODO:claude File path term *)

     

       44
       44
       +
       let file_term ~default ~doc =

     

       45
       45
       +
         Arg.(value & opt string default & info ["f"; "file"] ~docv:"FILE" ~doc)

     

       46
       46
       +
       

     

       47
       47
       +
       (** TODO:claude Channel/handle term *)

     

       48
       48
       +
       let channel ~default =

     

       49
       49
       +
         let doc = "Channel or handle name" in

     

       50
       50
       +
         Arg.(value & opt string default & info ["c"; "channel"] ~docv:"CHANNEL" ~doc)

     

       51
       51
       +
       

     

       52
       52
       +
       (** TODO:claude Optional handle term *)

     

       53
       53
       +
       let handle_opt =

     

       54
       54
       +
         let doc = "Process specific handle" in

     

       55
       55
       +
         Arg.(value & opt (some string) None & info ["h"; "handle"] ~docv:"HANDLE" ~doc)

     

       56
       56
       +
       

     

       57
       57
       +
       (** TODO:claude Tag term for filtering *)

     

       58
       58
       +
       let tag =

     

       59
       59
       +
         let doc = "Tag to filter or apply" in

     

       60
       60
       +
         Arg.(value & opt (some string) None & info ["t"; "tag"] ~docv:"TAG" ~doc)

     

       61
       61
       +
       

     

       62
       62
       +
       (** TODO:claude Limit term *)

     

       63
       63
       +
       let limit =

     

       64
       64
       +
         let doc = "Limit number of items to process" in

     

       65
       65
       +
         Arg.(value & opt (some int) None & info ["l"; "limit"] ~docv:"N" ~doc)

     

       66
       66
       +
       

     

       67
       67
       +
       (** TODO:claude Setup logging with standard options *)

     

       68
       68
       +
       let setup_log style_renderer level =

     

       69
       69
       +
         Fmt_tty.setup_std_outputs ?style_renderer ();

     

       70
       70
       +
         Logs.set_level level;

     

       71
       71
       +
         Logs.set_reporter (Logs_fmt.reporter ());

     

       72
       72
       +
         ()

     

       73
       73
       +
       

     

       74
       74
       +
       (** TODO:claude Common setup term combining logs setup *)

     

       75
       75
       +
       let setup_term =

     

       76
       76
       +
         Term.(const setup_log $ Fmt_cli.style_renderer () $ Logs_cli.level ())

+290

stack/bushel/bin/bushel_doi.ml

···

       1
       1
       +
       module ZT = Zotero_translation

     

       2
       2
       +
       open Lwt.Infix

     

       3
       3
       +
       module J = Ezjsonm

     

       4
       4
       +
       open Cmdliner

     

       5
       5
       +
       

     

       6
       6
       +
       (* Extract all DOIs from notes by scanning for doi.org URLs *)

     

       7
       7
       +
       let extract_dois_from_notes notes =

     

       8
       8
       +
         let doi_url_pattern = Re.Perl.compile_pat "https?://(?:dx\\.)?doi\\.org/([^)\\s\"'>]+)" in

     

       9
       9
       +
         let dois = ref [] in

     

       10
       10
       +
         List.iter (fun note ->

     

       11
       11
       +
           let body = Bushel.Note.body note in

     

       12
       12
       +
           let matches = Re.all doi_url_pattern body in

     

       13
       13
       +
           List.iter (fun group ->

     

       14
       14
       +
             try

     

       15
       15
       +
               let encoded_doi = Re.Group.get group 1 in

     

       16
       16
       +
               let doi = Uri.pct_decode encoded_doi in

     

       17
       17
       +
               if not (List.mem doi !dois) then

     

       18
       18
       +
                 dois := doi :: !dois

     

       19
       19
       +
             with _ -> ()

     

       20
       20
       +
           ) matches

     

       21
       21
       +
         ) notes;

     

       22
       22
       +
         !dois

     

       23
       23
       +
       

     

       24
       24
       +
       (* Extract publisher URLs from notes (Elsevier, Nature, ACM, Sage, UPenn, Springer, Taylor & Francis) *)

     

       25
       25
       +
       let extract_publisher_urls_from_notes notes =

     

       26
       26
       +
         (* Matches publisher URLs: linkinghub.elsevier.com, nature.com, journals.sagepub.com, garfield.library.upenn.edu, link.springer.com, tandfonline.com/doi, and dl.acm.org/doi/10.* URLs *)

     

       27
       27
       +
         let publisher_pattern = Re.Perl.compile_pat "https?://(?:(?:www\\.)?(?:linkinghub\\.elsevier\\.com|nature\\.com|journals\\.sagepub\\.com|garfield\\.library\\.upenn\\.edu|link\\.springer\\.com)/[^)\\s\"'>]+|(?:dl\\.acm\\.org|(?:www\\.)?tandfonline\\.com)/doi(?:/pdf)?/10\\.[^)\\s\"'>]+)" in

     

       28
       28
       +
         let urls = ref [] in

     

       29
       29
       +
         List.iter (fun note ->

     

       30
       30
       +
           let body = Bushel.Note.body note in

     

       31
       31
       +
           let matches = Re.all publisher_pattern body in

     

       32
       32
       +
           List.iter (fun group ->

     

       33
       33
       +
             try

     

       34
       34
       +
               let url = Re.Group.get group 0 in

     

       35
       35
       +
               if not (List.mem url !urls) then

     

       36
       36
       +
                 urls := url :: !urls

     

       37
       37
       +
             with _ -> ()

     

       38
       38
       +
           ) matches

     

       39
       39
       +
         ) notes;

     

       40
       40
       +
         !urls

     

       41
       41
       +
       

     

       42
       42
       +
       (* Resolve a single DOI via Zotero and convert to doi_entry *)

     

       43
       43
       +
       let resolve_doi zt ~verbose doi =

     

       44
       44
       +
         Printf.printf "Resolving DOI: %s\n%!" doi;

     

       45
       45
       +
         let doi_url = Printf.sprintf "https://doi.org/%s" doi in

     

       46
       46
       +
         Lwt.catch

     

       47
       47
       +
           (fun () ->

     

       48
       48
       +
             ZT.json_of_doi zt ~slug:"temp" doi >>= fun json ->

     

       49
       49
       +
             if verbose then begin

     

       50
       50
       +
               Printf.printf "  Raw Zotero response:\n%s\n%!" (J.value_to_string json)

     

       51
       51
       +
             end;

     

       52
       52
       +
             try

     

       53
       53
       +
               let keys = J.get_dict (json :> J.value) in

     

       54
       54
       +
               let title = J.find json ["title"] |> J.get_string in

     

       55
       55
       +
               let authors = J.find json ["author"] |> J.get_list J.get_string in

     

       56
       56
       +
               let year = J.find json ["year"] |> J.get_string |> int_of_string in

     

       57
       57
       +
               let bibtype = J.find json ["bibtype"] |> J.get_string in

     

       58
       58
       +
               let publisher =

     

       59
       59
       +
                 try

     

       60
       60
       +
                   (* Try journal first, then booktitle, then publisher *)

     

       61
       61
       +
                   match List.assoc_opt "journal" keys with

     

       62
       62
       +
                   | Some j -> J.get_string j

     

       63
       63
       +
                   | None ->

     

       64
       64
       +
                     match List.assoc_opt "booktitle" keys with

     

       65
       65
       +
                     | Some b -> J.get_string b

     

       66
       66
       +
                     | None ->

     

       67
       67
       +
                       match List.assoc_opt "publisher" keys with

     

       68
       68
       +
                       | Some p -> J.get_string p

     

       69
       69
       +
                       | None -> ""

     

       70
       70
       +
                 with _ -> ""

     

       71
       71
       +
               in

     

       72
       72
       +
               let entry = Bushel.Doi_entry.create_resolved ~doi ~title ~authors ~year ~bibtype ~publisher ~source_urls:[doi_url] () in

     

       73
       73
       +
               Printf.printf "  ✓ Resolved: %s (%d)\n%!" title year;

     

       74
       74
       +
               Lwt.return entry

     

       75
       75
       +
             with e ->

     

       76
       76
       +
               Printf.eprintf "  ✗ Failed to parse response for %s: %s\n%!" doi (Printexc.to_string e);

     

       77
       77
       +
               Lwt.return (Bushel.Doi_entry.create_failed ~doi ~error:(Printexc.to_string e) ~source_urls:[doi_url] ())

     

       78
       78
       +
           )

     

       79
       79
       +
           (fun exn ->

     

       80
       80
       +
             Printf.eprintf "  ✗ Failed to resolve %s: %s\n%!" doi (Printexc.to_string exn);

     

       81
       81
       +
             Lwt.return (Bushel.Doi_entry.create_failed ~doi ~error:(Printexc.to_string exn) ~source_urls:[doi_url] ())

     

       82
       82
       +
           )

     

       83
       83
       +
       

     

       84
       84
       +
       (* Resolve a publisher URL via Zotero /web endpoint *)

     

       85
       85
       +
       let resolve_url zt ~verbose url =

     

       86
       86
       +
         Printf.printf "Resolving URL: %s\n%!" url;

     

       87
       87
       +
         Lwt.catch

     

       88
       88
       +
           (fun () ->

     

       89
       89
       +
             (* Use Zotero's resolve_url which calls /web endpoint with the URL directly *)

     

       90
       90
       +
             ZT.resolve_url zt url >>= function

     

       91
       91
       +
             | Error (`Msg err) ->

     

       92
       92
       +
               Printf.eprintf "  ✗ Failed to resolve URL: %s\n%!" err;

     

       93
       93
       +
               Lwt.return (Bushel.Doi_entry.create_failed ~doi:url ~error:err ~source_urls:[url] ())

     

       94
       94
       +
             | Ok json ->

     

       95
       95
       +
               if verbose then begin

     

       96
       96
       +
                 Printf.printf "  Raw Zotero response:\n%s\n%!" (J.value_to_string json)

     

       97
       97
       +
               end;

     

       98
       98
       +
               try

     

       99
       99
       +
                 (* Extract metadata from the JSON response *)

     

       100
       100
       +
                 let json_list = match json with

     

       101
       101
       +
                   | `A lst -> lst

     

       102
       102
       +
                   | single -> [single]

     

       103
       103
       +
                 in

     

       104
       104
       +
                 match json_list with

     

       105
       105
       +
                 | [] ->

     

       106
       106
       +
                   Printf.eprintf "  ✗ Empty response\n%!";

     

       107
       107
       +
                   Lwt.return (Bushel.Doi_entry.create_failed ~doi:url ~error:"Empty response" ~source_urls:[url] ())

     

       108
       108
       +
                 | item :: _ ->

     

       109
       109
       +
                   (* Extract DOI if present, otherwise use URL *)

     

       110
       110
       +
                   let doi = try J.find item ["DOI"] |> J.get_string with _ ->

     

       111
       111
       +
                     try J.find item ["doi"] |> J.get_string with _ -> url

     

       112
       112
       +
                   in

     

       113
       113
       +
                   let title = try J.find item ["title"] |> J.get_string with _ ->

     

       114
       114
       +
                     "Unknown Title"

     

       115
       115
       +
                   in

     

       116
       116
       +
                   (* Extract authors from Zotero's "creators" field *)

     

       117
       117
       +
                   let authors = try

     

       118
       118
       +
                     J.find item ["creators"] |> J.get_list (fun creator_obj ->

     

       119
       119
       +
                       try

     

       120
       120
       +
                         let last_name = J.find creator_obj ["lastName"] |> J.get_string in

     

       121
       121
       +
                         let first_name = try J.find creator_obj ["firstName"] |> J.get_string with _ -> "" in

     

       122
       122
       +
                         if first_name = "" then last_name else first_name ^ " " ^ last_name

     

       123
       123
       +
                       with _ -> "Unknown Author"

     

       124
       124
       +
                     )

     

       125
       125
       +
                   with _ -> []

     

       126
       126
       +
                   in

     

       127
       127
       +
                   (* Extract year from Zotero's "date" field *)

     

       128
       128
       +
                   (* Handles both ISO format "2025-07" and text format "November 28, 2023" *)

     

       129
       129
       +
                   let year = try

     

       130
       130
       +
                     let date_str = J.find item ["date"] |> J.get_string in

     

       131
       131
       +
                     (* First try splitting on '-' for ISO dates like "2025-07" or "2024-11-04" *)

     

       132
       132
       +
                     let parts = String.split_on_char '-' date_str in

     

       133
       133
       +
                     match parts with

     

       134
       134
       +
                     | year_str :: _ when String.length year_str = 4 ->

     

       135
       135
       +
                       (try int_of_string year_str with _ -> 0)

     

       136
       136
       +
                     | _ ->

     

       137
       137
       +
                       (* Try splitting on space and comma for dates like "November 28, 2023" *)

     

       138
       138
       +
                       let space_parts = String.split_on_char ' ' date_str in

     

       139
       139
       +
                       let year_candidate = List.find_opt (fun s ->

     

       140
       140
       +
                         let s = String.trim (String.map (fun c -> if c = ',' then ' ' else c) s) in

     

       141
       141
       +
                         String.length s = 4 && String.for_all (function '0'..'9' -> true | _ -> false) s

     

       142
       142
       +
                       ) space_parts in

     

       143
       143
       +
                       (match year_candidate with

     

       144
       144
       +
                        | Some year_str -> int_of_string (String.trim year_str)

     

       145
       145
       +
                        | None -> 0)

     

       146
       146
       +
                   with _ -> 0

     

       147
       147
       +
                   in

     

       148
       148
       +
                   (* Extract type/bibtype from Zotero's "itemType" field *)

     

       149
       149
       +
                   let bibtype = try J.find item ["itemType"] |> J.get_string with _ -> "article" in

     

       150
       150
       +
                   (* Extract publisher/journal from Zotero's "publicationTitle" field *)

     

       151
       151
       +
                   let publisher = try

     

       152
       152
       +
                     J.find item ["publicationTitle"] |> J.get_string

     

       153
       153
       +
                   with _ -> ""

     

       154
       154
       +
                   in

     

       155
       155
       +
                   (* Include both the original URL and the DOI URL in source_urls *)

     

       156
       156
       +
                   let doi_url = if doi = url then [] else [Printf.sprintf "https://doi.org/%s" doi] in

     

       157
       157
       +
                   let source_urls = url :: doi_url in

     

       158
       158
       +
                   let entry = Bushel.Doi_entry.create_resolved ~doi ~title ~authors ~year ~bibtype ~publisher ~source_urls () in

     

       159
       159
       +
                   Printf.printf "  ✓ Resolved: %s (%d) [DOI: %s]\n%!" title year doi;

     

       160
       160
       +
                   Lwt.return entry

     

       161
       161
       +
               with e ->

     

       162
       162
       +
                 Printf.eprintf "  ✗ Failed to parse response: %s\n%!" (Printexc.to_string e);

     

       163
       163
       +
                 Lwt.return (Bushel.Doi_entry.create_failed ~doi:url ~error:(Printexc.to_string e) ~source_urls:[url] ())

     

       164
       164
       +
           )

     

       165
       165
       +
           (fun exn ->

     

       166
       166
       +
             Printf.eprintf "  ✗ Failed to resolve %s: %s\n%!" url (Printexc.to_string exn);

     

       167
       167
       +
             Lwt.return (Bushel.Doi_entry.create_failed ~doi:url ~error:(Printexc.to_string exn) ~source_urls:[url] ())

     

       168
       168
       +
           )

     

       169
       169
       +
       

     

       170
       170
       +
       let run base_dir force verbose =

     

       171
       171
       +
         Printf.printf "Loading bushel database...\n%!";

     

       172
       172
       +
         let entries = Bushel.load base_dir in

     

       173
       173
       +
         let notes = Bushel.Entry.notes entries in

     

       174
       174
       +
       

     

       175
       175
       +
         Printf.printf "Scanning %d notes for DOI URLs...\n%!" (List.length notes);

     

       176
       176
       +
         let found_dois = extract_dois_from_notes notes in

     

       177
       177
       +
         Printf.printf "Found %d unique DOIs\n%!" (List.length found_dois);

     

       178
       178
       +
       

     

       179
       179
       +
         Printf.printf "Scanning %d notes for publisher URLs...\n%!" (List.length notes);

     

       180
       180
       +
         let found_urls = extract_publisher_urls_from_notes notes in

     

       181
       181
       +
         Printf.printf "Found %d unique publisher URLs\n%!" (List.length found_urls);

     

       182
       182
       +
       

     

       183
       183
       +
         let data_dir = Bushel.Entry.data_dir entries in

     

       184
       184
       +
         let doi_yml_path = Filename.concat data_dir "doi.yml" in

     

       185
       185
       +
         Printf.printf "Loading existing DOI cache from %s...\n%!" doi_yml_path;

     

       186
       186
       +
         let existing_entries = Bushel.Doi_entry.load doi_yml_path in

     

       187
       187
       +
         Printf.printf "Loaded %d cached DOI entries\n%!" (List.length existing_entries);

     

       188
       188
       +
       

     

       189
       189
       +
         (* Filter DOIs that need resolution *)

     

       190
       190
       +
         let dois_to_resolve =

     

       191
       191
       +
           List.filter (fun doi ->

     

       192
       192
       +
             match Bushel.Doi_entry.find_by_doi_including_ignored existing_entries doi with

     

       193
       193
       +
             | Some _ when not force ->

     

       194
       194
       +
               Printf.printf "Skipping DOI %s (already cached)\n%!" doi;

     

       195
       195
       +
               false

     

       196
       196
       +
             | Some _ when force ->

     

       197
       197
       +
               Printf.printf "Re-resolving DOI %s (--force)\n%!" doi;

     

       198
       198
       +
               true

     

       199
       199
       +
             | Some _ -> false  (* Catch-all for Some case *)

     

       200
       200
       +
             | None -> true

     

       201
       201
       +
           ) found_dois

     

       202
       202
       +
         in

     

       203
       203
       +
       

     

       204
       204
       +
         (* Filter URLs that need resolution *)

     

       205
       205
       +
         let urls_to_resolve =

     

       206
       206
       +
           List.filter (fun url ->

     

       207
       207
       +
             match Bushel.Doi_entry.find_by_url_including_ignored existing_entries url with

     

       208
       208
       +
             | Some _ when not force ->

     

       209
       209
       +
               Printf.printf "Skipping URL %s (already cached)\n%!" url;

     

       210
       210
       +
               false

     

       211
       211
       +
             | Some _ when force ->

     

       212
       212
       +
               Printf.printf "Re-resolving URL %s (--force)\n%!" url;

     

       213
       213
       +
               true

     

       214
       214
       +
             | Some _ -> false  (* Catch-all for Some case *)

     

       215
       215
       +
             | None -> true

     

       216
       216
       +
           ) found_urls

     

       217
       217
       +
         in

     

       218
       218
       +
       

     

       219
       219
       +
         if List.length dois_to_resolve = 0 && List.length urls_to_resolve = 0 then begin

     

       220
       220
       +
           Printf.printf "No DOIs or URLs to resolve!\n%!";

     

       221
       221
       +
           0

     

       222
       222
       +
         end else begin

     

       223
       223
       +
           Printf.printf "Resolving %d DOI(s) and %d URL(s)...\n%!" (List.length dois_to_resolve) (List.length urls_to_resolve);

     

       224
       224
       +
       

     

       225
       225
       +
           let zt = ZT.v "http://svr-avsm2-eeg-ce:1969" in

     

       226
       226
       +
       

     

       227
       227
       +
           (* Resolve all DOIs *)

     

       228
       228
       +
           let resolved_doi_entries_lwt =

     

       229
       229
       +
             Lwt_list.map_s (resolve_doi zt ~verbose) dois_to_resolve

     

       230
       230
       +
           in

     

       231
       231
       +
       

     

       232
       232
       +
           (* Resolve all publisher URLs *)

     

       233
       233
       +
           let resolved_url_entries_lwt =

     

       234
       234
       +
             Lwt_list.map_s (resolve_url zt ~verbose) urls_to_resolve

     

       235
       235
       +
           in

     

       236
       236
       +
       

     

       237
       237
       +
           let new_doi_entries = Lwt_main.run resolved_doi_entries_lwt in

     

       238
       238
       +
           let new_url_entries = Lwt_main.run resolved_url_entries_lwt in

     

       239
       239
       +
           let new_entries = new_doi_entries @ new_url_entries in

     

       240
       240
       +
       

     

       241
       241
       +
           (* Merge with existing entries, combining source_urls for entries with the same DOI *)

     

       242
       242
       +
           let all_entries =

     

       243
       243
       +
             if force then

     

       244
       244
       +
               (* Replace existing entries with new ones - match by DOI *)

     

       245
       245
       +
               let is_updated entry =

     

       246
       246
       +
                 List.exists (fun new_e ->

     

       247
       247
       +
                   new_e.Bushel.Doi_entry.doi = entry.Bushel.Doi_entry.doi

     

       248
       248
       +
                 ) new_entries

     

       249
       249
       +
               in

     

       250
       250
       +
               let kept_existing = List.filter (fun e -> not (is_updated e)) existing_entries in

     

       251
       251
       +
               kept_existing @ new_entries

     

       252
       252
       +
             else

     

       253
       253
       +
               (* Merge new entries with existing ones, combining source_urls *)

     

       254
       254
       +
               let merged = ref existing_entries in

     

       255
       255
       +
               List.iter (fun new_entry ->

     

       256
       256
       +
                 match Bushel.Doi_entry.find_by_doi_including_ignored !merged new_entry.Bushel.Doi_entry.doi with

     

       257
       257
       +
                 | Some existing_entry ->

     

       258
       258
       +
                   (* DOI already exists - merge the entries by combining source_urls and preserving ignore flag *)

     

       259
       259
       +
                   let combined = Bushel.Doi_entry.merge_entries existing_entry new_entry in

     

       260
       260
       +
                   merged := combined :: (List.filter (fun e -> e.Bushel.Doi_entry.doi <> new_entry.Bushel.Doi_entry.doi) !merged)

     

       261
       261
       +
                 | None ->

     

       262
       262
       +
                   (* New DOI - add it *)

     

       263
       263
       +
                   merged := new_entry :: !merged

     

       264
       264
       +
               ) new_entries;

     

       265
       265
       +
               !merged

     

       266
       266
       +
           in

     

       267
       267
       +
       

     

       268
       268
       +
           (* Save updated cache *)

     

       269
       269
       +
           Printf.printf "Saving %d total entries to %s...\n%!" (List.length all_entries) doi_yml_path;

     

       270
       270
       +
           Bushel.Doi_entry.save doi_yml_path all_entries;

     

       271
       271
       +
       

     

       272
       272
       +
           Printf.printf "Done!\n%!";

     

       273
       273
       +
           0

     

       274
       274
       +
         end

     

       275
       275
       +
       

     

       276
       276
       +
       let force_flag =

     

       277
       277
       +
         let doc = "Force re-resolution of already cached DOIs" in

     

       278
       278
       +
         Arg.(value & flag & info ["force"; "f"] ~doc)

     

       279
       279
       +
       

     

       280
       280
       +
       let verbose_flag =

     

       281
       281
       +
         let doc = "Show raw Zotero API responses for debugging" in

     

       282
       282
       +
         Arg.(value & flag & info ["verbose"; "v"] ~doc)

     

       283
       283
       +
       

     

       284
       284
       +
       let term =

     

       285
       285
       +
         Term.(const run $ Bushel_common.base_dir $ force_flag $ verbose_flag)

     

       286
       286
       +
       

     

       287
       287
       +
       let cmd =

     

       288
       288
       +
         let doc = "Resolve DOIs found in notes via Zotero Translation Server" in

     

       289
       289
       +
         let info = Cmd.info "doi-resolve" ~doc in

     

       290
       290
       +
         Cmd.v info term

+182

stack/bushel/bin/bushel_faces.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Lwt.Infix

     

       3
       3
       +
       open Printf

     

       4
       4
       +
       

     

       5
       5
       +
       (* Type for person response *)

     

       6
       6
       +
       type person = {

     

       7
       7
       +
         id: string;

     

       8
       8
       +
         name: string;

     

       9
       9
       +
         thumbnailPath: string option;

     

       10
       10
       +
       }

     

       11
       11
       +
       

     

       12
       12
       +
       (* Parse a person from JSON *)

     

       13
       13
       +
       let parse_person json =

     

       14
       14
       +
         let open Ezjsonm in

     

       15
       15
       +
         let id = find json ["id"] |> get_string in

     

       16
       16
       +
         let name = find json ["name"] |> get_string in

     

       17
       17
       +
         let thumbnailPath = 

     

       18
       18
       +
           try Some (find json ["thumbnailPath"] |> get_string)

     

       19
       19
       +
           with _ -> None

     

       20
       20
       +
         in

     

       21
       21
       +
         { id; name; thumbnailPath }

     

       22
       22
       +
       

     

       23
       23
       +
       (* Parse a list of people from JSON response *)

     

       24
       24
       +
       let parse_people_response json =

     

       25
       25
       +
         let open Ezjsonm in

     

       26
       26
       +
         get_list parse_person json

     

       27
       27
       +
       

     

       28
       28
       +
       (* Read API key from file *)

     

       29
       29
       +
       let read_api_key file =

     

       30
       30
       +
         let ic = open_in file in

     

       31
       31
       +
         let key = input_line ic in

     

       32
       32
       +
         close_in ic;

     

       33
       33
       +
         key

     

       34
       34
       +
       

     

       35
       35
       +
       (* Search for a person by name *)

     

       36
       36
       +
       let search_person base_url api_key name =

     

       37
       37
       +
         let open Cohttp_lwt_unix in

     

       38
       38
       +
         let headers = Cohttp.Header.init_with "X-Api-Key" api_key in

     

       39
       39
       +
         let encoded_name = Uri.pct_encode name in

     

       40
       40
       +
         let url = Printf.sprintf "%s/api/search/person?name=%s" base_url encoded_name in

     

       41
       41
       +
         

     

       42
       42
       +
         Client.get ~headers (Uri.of_string url) >>= fun (resp, body) ->

     

       43
       43
       +
         if resp.status = `OK then

     

       44
       44
       +
           Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       45
       45
       +
           let json = Ezjsonm.from_string body_str in

     

       46
       46
       +
           Lwt.return (parse_people_response json)

     

       47
       47
       +
         else

     

       48
       48
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       49
       49
       +
           Lwt.fail_with (Printf.sprintf "HTTP error: %d" status_code)

     

       50
       50
       +
       

     

       51
       51
       +
       (* Download thumbnail for a person *)

     

       52
       52
       +
       let download_thumbnail base_url api_key person_id output_path =

     

       53
       53
       +
         let open Cohttp_lwt_unix in

     

       54
       54
       +
         let headers = Cohttp.Header.init_with "X-Api-Key" api_key in

     

       55
       55
       +
         let url = Printf.sprintf "%s/api/people/%s/thumbnail" base_url person_id in

     

       56
       56
       +
         

     

       57
       57
       +
         Client.get ~headers (Uri.of_string url) >>= fun (resp, body) ->

     

       58
       58
       +
         match resp.status with

     

       59
       59
       +
         | `OK ->

     

       60
       60
       +
           Cohttp_lwt.Body.to_string body >>= fun img_data ->

     

       61
       61
       +
           (* Ensure output directory exists *)

     

       62
       62
       +
           (try

     

       63
       63
       +
              let dir = Filename.dirname output_path in

     

       64
       64
       +
              if not (Sys.file_exists dir) then Unix.mkdir dir 0o755;

     

       65
       65
       +
              Lwt.return_unit

     

       66
       66
       +
            with _ -> Lwt.return_unit) >>= fun () ->

     

       67
       67
       +
           Lwt_io.with_file ~mode:Lwt_io.output output_path

     

       68
       68
       +
             (fun oc -> Lwt_io.write oc img_data) >>= fun () ->

     

       69
       69
       +
           Lwt.return_ok output_path

     

       70
       70
       +
         | _ ->

     

       71
       71
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       72
       72
       +
           Lwt.return_error (Printf.sprintf "HTTP error: %d" status_code)

     

       73
       73
       +
       

     

       74
       74
       +
       (* Get face for a single contact *)

     

       75
       75
       +
       (* TODO:claude *)

     

       76
       76
       +
       let get_face_for_contact base_url api_key output_dir contact =

     

       77
       77
       +
         let names = Bushel.Contact.names contact in

     

       78
       78
       +
         let handle = Bushel.Contact.handle contact in

     

       79
       79
       +
         let output_path = Filename.concat output_dir (handle ^ ".jpg") in

     

       80
       80
       +
       

     

       81
       81
       +
         (* Skip if file already exists *)

     

       82
       82
       +
         if Sys.file_exists output_path then

     

       83
       83
       +
           Lwt.return (`Skipped (sprintf "Thumbnail for '%s' already exists at %s" (List.hd names) output_path))

     

       84
       84
       +
         else begin

     

       85
       85
       +
           printf "Processing contact: %s (handle: %s)\n%!" (List.hd names) handle;

     

       86
       86
       +
       

     

       87
       87
       +
           (* Try each name in the list until we find a match *)

     

       88
       88
       +
           let rec try_names = function

     

       89
       89
       +
             | [] ->

     

       90
       90
       +
                 Lwt.return (`Error (sprintf "No person found with any name for contact '%s'" handle))

     

       91
       91
       +
             | name :: rest_names ->

     

       92
       92
       +
                 printf "  Trying name: %s\n%!" name;

     

       93
       93
       +
                 search_person base_url api_key name >>= function

     

       94
       94
       +
                 | [] ->

     

       95
       95
       +
                     printf "  No results for '%s', trying next name...\n%!" name;

     

       96
       96
       +
                     try_names rest_names

     

       97
       97
       +
                 | person :: _ ->

     

       98
       98
       +
                     printf "  Found match for '%s'\n%!" name;

     

       99
       99
       +
                     download_thumbnail base_url api_key person.id output_path >>= function

     

       100
       100
       +
                     | Ok path ->

     

       101
       101
       +
                         Lwt.return (`Ok (sprintf "Saved thumbnail for '%s' to %s" name path))

     

       102
       102
       +
                     | Error err ->

     

       103
       103
       +
                         Lwt.return (`Error (sprintf "Error for '%s': %s" name err))

     

       104
       104
       +
           in

     

       105
       105
       +
           try_names names

     

       106
       106
       +
         end

     

       107
       107
       +
       

     

       108
       108
       +
       (* Process all contacts or a specific one *)

     

       109
       109
       +
       let process_contacts base_dir output_dir specific_handle api_key base_url =

     

       110
       110
       +
         printf "Loading Bushel database from %s\n%!" base_dir;

     

       111
       111
       +
         let db = Bushel.load base_dir in

     

       112
       112
       +
         let contacts = Bushel.Entry.contacts db in

     

       113
       113
       +
         printf "Found %d contacts\n%!" (List.length contacts);

     

       114
       114
       +
         

     

       115
       115
       +
         (* Ensure output directory exists *)

     

       116
       116
       +
         if not (Sys.file_exists output_dir) then Unix.mkdir output_dir 0o755;

     

       117
       117
       +
         

     

       118
       118
       +
         (* Filter contacts based on specific_handle if provided *)

     

       119
       119
       +
         let contacts_to_process = 

     

       120
       120
       +
           match specific_handle with

     

       121
       121
       +
           | Some handle -> 

     

       122
       122
       +
               begin match Bushel.Contact.find_by_handle contacts handle with

     

       123
       123
       +
               | Some contact -> [contact]

     

       124
       124
       +
               | None -> 

     

       125
       125
       +
                   eprintf "No contact found with handle '%s'\n%!" handle;

     

       126
       126
       +
                   []

     

       127
       127
       +
               end

     

       128
       128
       +
           | None -> contacts

     

       129
       129
       +
         in

     

       130
       130
       +
         

     

       131
       131
       +
         (* Process each contact *)

     

       132
       132
       +
         let results = Lwt_main.run begin

     

       133
       133
       +
           Lwt_list.map_s 

     

       134
       134
       +
             (fun contact -> 

     

       135
       135
       +
                get_face_for_contact base_url api_key output_dir contact >>= fun result ->

     

       136
       136
       +
                Lwt.return (Bushel.Contact.handle contact, result))

     

       137
       137
       +
             contacts_to_process

     

       138
       138
       +
         end in

     

       139
       139
       +
         

     

       140
       140
       +
         (* Print summary *)

     

       141
       141
       +
         let ok_count = List.length (List.filter (fun (_, r) -> match r with `Ok _ -> true | _ -> false) results) in

     

       142
       142
       +
         let error_count = List.length (List.filter (fun (_, r) -> match r with `Error _ -> true | _ -> false) results) in

     

       143
       143
       +
         let skipped_count = List.length (List.filter (fun (_, r) -> match r with `Skipped _ -> true | _ -> false) results) in

     

       144
       144
       +
         

     

       145
       145
       +
         printf "\nSummary:\n";

     

       146
       146
       +
         printf "  Successfully processed: %d\n" ok_count;

     

       147
       147
       +
         printf "  Errors: %d\n" error_count;

     

       148
       148
       +
         printf "  Skipped (already exist): %d\n" skipped_count;

     

       149
       149
       +
         

     

       150
       150
       +
         (* Print detailed results *)

     

       151
       151
       +
         if error_count > 0 then begin

     

       152
       152
       +
           printf "\nError details:\n";

     

       153
       153
       +
           List.iter (fun (handle, result) ->

     

       154
       154
       +
             match result with

     

       155
       155
       +
             | `Error msg -> printf "  %s: %s\n" handle msg

     

       156
       156
       +
             | _ -> ())

     

       157
       157
       +
             results;

     

       158
       158
       +
         end;

     

       159
       159
       +
         

     

       160
       160
       +
         if ok_count > 0 || skipped_count > 0 then 0 else 1

     

       161
       161
       +
       

     

       162
       162
       +
       (* Command line interface *)

     

       163
       163
       +
       

     

       164
       164
       +
       (* Export the term for use in main bushel.ml *)

     

       165
       165
       +
       let term =

     

       166
       166
       +
         Term.(

     

       167
       167
       +
           const (fun base_dir output_dir handle api_key_file base_url ->

     

       168
       168
       +
             try

     

       169
       169
       +
               let api_key = read_api_key api_key_file in

     

       170
       170
       +
               process_contacts base_dir output_dir handle api_key base_url

     

       171
       171
       +
             with e -> 

     

       172
       172
       +
               eprintf "Error: %s\n%!" (Printexc.to_string e);

     

       173
       173
       +
               1

     

       174
       174
       +
           ) $ Bushel_common.base_dir $ Bushel_common.output_dir ~default:"." $ Bushel_common.handle_opt $ 

     

       175
       175
       +
             Bushel_common.api_key_file ~default:".photos-api" $ 

     

       176
       176
       +
             Bushel_common.url_term ~default:"https://photos.recoil.org" ~doc:"Base URL of the Immich instance")

     

       177
       177
       +
       

     

       178
       178
       +
       let cmd =

     

       179
       179
       +
         let info = Cmd.info "faces" ~doc:"Retrieve face thumbnails for Bushel contacts from Immich" in

     

       180
       180
       +
         Cmd.v info term

     

       181
       181
       +
       

     

       182
       182
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+77

stack/bushel/bin/bushel_ideas.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       

     

       3
       3
       +
       (** TODO:claude List completed ideas as markdown bullet list *)

     

       4
       4
       +
       let list_ideas_md base_dir =

     

       5
       5
       +
         let ideas_dir = Printf.sprintf "%s/ideas" base_dir in

     

       6
       6
       +
         let contacts_dir = Printf.sprintf "%s/contacts" base_dir in

     

       7
       7
       +
       

     

       8
       8
       +
         if not (Sys.file_exists ideas_dir) then (

     

       9
       9
       +
           Printf.eprintf "Ideas directory not found: %s\n" ideas_dir;

     

       10
       10
       +
           1

     

       11
       11
       +
         ) else (

     

       12
       12
       +
           (* Load all contacts *)

     

       13
       13
       +
           let contacts =

     

       14
       14
       +
             if Sys.file_exists contacts_dir then

     

       15
       15
       +
               Sys.readdir contacts_dir

     

       16
       16
       +
               |> Array.to_list

     

       17
       17
       +
               |> List.filter (String.ends_with ~suffix:".md")

     

       18
       18
       +
               |> List.filter_map (fun contact_file ->

     

       19
       19
       +
                 let filepath = Filename.concat contacts_dir contact_file in

     

       20
       20
       +
                 try Some (Bushel.Contact.of_md filepath)

     

       21
       21
       +
                 with e ->

     

       22
       22
       +
                   Printf.eprintf "Error loading contact %s: %s\n" filepath (Printexc.to_string e);

     

       23
       23
       +
                   None

     

       24
       24
       +
               )

     

       25
       25
       +
             else []

     

       26
       26
       +
           in

     

       27
       27
       +
       

     

       28
       28
       +
           let idea_files = Sys.readdir ideas_dir

     

       29
       29
       +
                            |> Array.to_list

     

       30
       30
       +
                            |> List.filter (String.ends_with ~suffix:".md") in

     

       31
       31
       +
           let ideas = List.filter_map (fun idea_file ->

     

       32
       32
       +
             let filepath = Filename.concat ideas_dir idea_file in

     

       33
       33
       +
             try

     

       34
       34
       +
               let idea = Bushel.Idea.of_md filepath in

     

       35
       35
       +
               match Bushel.Idea.status idea with

     

       36
       36
       +
               | Bushel.Idea.Completed -> Some idea

     

       37
       37
       +
               | _ -> None

     

       38
       38
       +
             with e ->

     

       39
       39
       +
               Printf.eprintf "Error processing %s: %s\n" filepath (Printexc.to_string e);

     

       40
       40
       +
               None

     

       41
       41
       +
           ) idea_files in

     

       42
       42
       +
       

     

       43
       43
       +
           (* Sort by year descending *)

     

       44
       44
       +
           let sorted_ideas = List.sort (fun a b ->

     

       45
       45
       +
             compare (Bushel.Idea.year b) (Bushel.Idea.year a)

     

       46
       46
       +
           ) ideas in

     

       47
       47
       +
       

     

       48
       48
       +
           (* Output as markdown bullet list *)

     

       49
       49
       +
           List.iter (fun idea ->

     

       50
       50
       +
             let student_names =

     

       51
       51
       +
               Bushel.Idea.students idea

     

       52
       52
       +
               |> List.filter_map (fun handle ->

     

       53
       53
       +
                 match Bushel.Contact.find_by_handle contacts handle with

     

       54
       54
       +
                 | Some contact -> Some (Bushel.Contact.name contact)

     

       55
       55
       +
                 | None ->

     

       56
       56
       +
                   Printf.eprintf "Warning: contact not found for handle %s\n" handle;

     

       57
       57
       +
                   Some handle

     

       58
       58
       +
               )

     

       59
       59
       +
               |> String.concat ", "

     

       60
       60
       +
             in

     

       61
       61
       +
             let level_str = Bushel.Idea.level_to_string (Bushel.Idea.level idea) in

     

       62
       62
       +
             Printf.printf "- %d: \"%s\", %s (%s)\n"

     

       63
       63
       +
               (Bushel.Idea.year idea)

     

       64
       64
       +
               (Bushel.Idea.title idea)

     

       65
       65
       +
               student_names

     

       66
       66
       +
               level_str

     

       67
       67
       +
           ) sorted_ideas;

     

       68
       68
       +
           0

     

       69
       69
       +
         )

     

       70
       70
       +
       

     

       71
       71
       +
       let term =

     

       72
       72
       +
         Term.(const list_ideas_md $ Bushel_common.base_dir)

     

       73
       73
       +
       

     

       74
       74
       +
       let cmd =

     

       75
       75
       +
         let doc = "List completed ideas as markdown bullet list" in

     

       76
       76
       +
         let info = Cmd.info "ideas-md" ~doc in

     

       77
       77
       +
         Cmd.v info term

+202

stack/bushel/bin/bushel_info.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Bushel

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude List all slugs with their types *)

     

       5
       5
       +
       let list_all_slugs entries =

     

       6
       6
       +
         let all = Entry.all_entries entries in

     

       7
       7
       +
         (* Sort by slug for consistent output *)

     

       8
       8
       +
         let sorted = List.sort (fun a b ->

     

       9
       9
       +
           String.compare (Entry.slug a) (Entry.slug b)

     

       10
       10
       +
         ) all in

     

       11
       11
       +
         Fmt.pr "@[<v>";

     

       12
       12
       +
         Fmt.pr "%a@," (Fmt.styled `Bold Fmt.string) "Available entries:";

     

       13
       13
       +
         Fmt.pr "@,";

     

       14
       14
       +
         List.iter (fun entry ->

     

       15
       15
       +
           let slug = Entry.slug entry in

     

       16
       16
       +
           let type_str = Entry.to_type_string entry in

     

       17
       17
       +
           let title = Entry.title entry in

     

       18
       18
       +
           Fmt.pr "  %a %a - %a@,"

     

       19
       19
       +
             (Fmt.styled `Cyan Fmt.string) slug

     

       20
       20
       +
             (Fmt.styled `Faint Fmt.string) (Printf.sprintf "(%s)" type_str)

     

       21
       21
       +
             Fmt.string title

     

       22
       22
       +
         ) sorted;

     

       23
       23
       +
         Fmt.pr "@]@.";

     

       24
       24
       +
         0

     

       25
       25
       +
       

     

       26
       26
       +
       (** TODO:claude Main info command implementation *)

     

       27
       27
       +
       let info_cmd () base_dir slug_opt =

     

       28
       28
       +
         let entries = load base_dir in

     

       29
       29
       +
         match slug_opt with

     

       30
       30
       +
         | None ->

     

       31
       31
       +
           list_all_slugs entries

     

       32
       32
       +
         | Some slug ->

     

       33
       33
       +
           (* Handle contact handles starting with @ *)

     

       34
       34
       +
           if String.starts_with ~prefix:"@" slug then

     

       35
       35
       +
             let handle = String.sub slug 1 (String.length slug - 1) in

     

       36
       36
       +
             match Contact.find_by_handle (Entry.contacts entries) handle with

     

       37
       37
       +
             | None ->

     

       38
       38
       +
               Fmt.epr "Error: No contact found with handle '@%s'@." handle;

     

       39
       39
       +
               1

     

       40
       40
       +
             | Some contact ->

     

       41
       41
       +
               Contact.pp Fmt.stdout contact;

     

       42
       42
       +
               (* Add thumbnail information for contact *)

     

       43
       43
       +
               (match Entry.contact_thumbnail_slug contact with

     

       44
       44
       +
                | Some thumb_slug ->

     

       45
       45
       +
                  Fmt.pr "@.@.";

     

       46
       46
       +
                  Fmt.pr "@[<v>%a: %s@," (Fmt.styled `Bold Fmt.string) "Thumbnail Slug" thumb_slug;

     

       47
       47
       +
                  (* Look up the image in srcsetter *)

     

       48
       48
       +
                  (match Entry.lookup_image entries thumb_slug with

     

       49
       49
       +
                   | Some img ->

     

       50
       50
       +
                     let thumbnail_url = Entry.smallest_webp_variant img in

     

       51
       51
       +
                     Fmt.pr "%a: %s@," (Fmt.styled `Bold Fmt.string) "Thumbnail URL" thumbnail_url;

     

       52
       52
       +
                     Fmt.pr "%a: %s@," (Fmt.styled `Bold Fmt.string) "Origin" (Srcsetter.origin img);

     

       53
       53
       +
                     let (w, h) = Srcsetter.dims img in

     

       54
       54
       +
                     Fmt.pr "%a: %dx%d@," (Fmt.styled `Bold Fmt.string) "Dimensions" w h;

     

       55
       55
       +
                     let variants = Srcsetter.variants img in

     

       56
       56
       +
                     if not (Srcsetter.MS.is_empty variants) then begin

     

       57
       57
       +
                       Fmt.pr "%a:@," (Fmt.styled `Bold Fmt.string) "Variants";

     

       58
       58
       +
                       Srcsetter.MS.iter (fun name (vw, vh) ->

     

       59
       59
       +
                         Fmt.pr "  %s: %dx%d@," name vw vh

     

       60
       60
       +
                       ) variants

     

       61
       61
       +
                     end;

     

       62
       62
       +
                     Fmt.pr "@]"

     

       63
       63
       +
                   | None ->

     

       64
       64
       +
                     Fmt.epr "Warning: Contact thumbnail image not in srcsetter: %s@." thumb_slug;

     

       65
       65
       +
                     Fmt.pr "@]";

     

       66
       66
       +
                     ())

     

       67
       67
       +
                | None -> ());

     

       68
       68
       +
               (* Add Typesense JSON *)

     

       69
       69
       +
               let doc = Typesense.contact_to_document contact in

     

       70
       70
       +
               Fmt.pr "@.@.";

     

       71
       71
       +
               Fmt.pr "%a:@," (Fmt.styled `Bold Fmt.string) "Typesense Document";

     

       72
       72
       +
               Fmt.pr "%s@," (Ezjsonm.value_to_string ~minify:false doc);

     

       73
       73
       +
               (* Add backlinks information for contact *)

     

       74
       74
       +
               let backlinks = Bushel.Link_graph.get_backlinks_for_slug handle in

     

       75
       75
       +
               if backlinks <> [] then begin

     

       76
       76
       +
                 Fmt.pr "@.@.";

     

       77
       77
       +
                 Fmt.pr "%a (%d):@," (Fmt.styled `Bold Fmt.string) "Backlinks" (List.length backlinks);

     

       78
       78
       +
                 List.iter (fun source_slug ->

     

       79
       79
       +
                   match Entry.lookup entries source_slug with

     

       80
       80
       +
                   | Some source_entry ->

     

       81
       81
       +
                     let source_type = Entry.to_type_string source_entry in

     

       82
       82
       +
                     let source_title = Entry.title source_entry in

     

       83
       83
       +
                     Fmt.pr "  %a %a - %a@,"

     

       84
       84
       +
                       (Fmt.styled `Cyan Fmt.string) source_slug

     

       85
       85
       +
                       (Fmt.styled `Faint Fmt.string) (Printf.sprintf "(%s)" source_type)

     

       86
       86
       +
                       Fmt.string source_title

     

       87
       87
       +
                   | None ->

     

       88
       88
       +
                     Fmt.pr "  %a %a@,"

     

       89
       89
       +
                       (Fmt.styled `Cyan Fmt.string) source_slug

     

       90
       90
       +
                       (Fmt.styled `Red Fmt.string) "(not found)"

     

       91
       91
       +
                 ) backlinks

     

       92
       92
       +
               end;

     

       93
       93
       +
               Fmt.pr "@.";

     

       94
       94
       +
               0

     

       95
       95
       +
           else

     

       96
       96
       +
             (* Remove leading ':' if present, as slugs are stored without it *)

     

       97
       97
       +
             let normalized_slug =

     

       98
       98
       +
               if String.starts_with ~prefix:":" slug

     

       99
       99
       +
               then String.sub slug 1 (String.length slug - 1)

     

       100
       100
       +
               else slug

     

       101
       101
       +
             in

     

       102
       102
       +
             match Entry.lookup entries normalized_slug with

     

       103
       103
       +
             | None ->

     

       104
       104
       +
               Fmt.epr "Error: No entry found with slug '%s'@." slug;

     

       105
       105
       +
               1

     

       106
       106
       +
             | Some entry ->

     

       107
       107
       +
               (match entry with

     

       108
       108
       +
                | `Paper p -> Paper.pp Fmt.stdout p

     

       109
       109
       +
                | `Project p -> Project.pp Fmt.stdout p

     

       110
       110
       +
                | `Idea i -> Idea.pp Fmt.stdout i

     

       111
       111
       +
                | `Video v -> Video.pp Fmt.stdout v

     

       112
       112
       +
                | `Note n -> Note.pp Fmt.stdout n);

     

       113
       113
       +
               (* Add thumbnail information if available *)

     

       114
       114
       +
               (match Entry.thumbnail_slug entries entry with

     

       115
       115
       +
                | Some thumb_slug ->

     

       116
       116
       +
                  Fmt.pr "@.@.";

     

       117
       117
       +
                  Fmt.pr "@[<v>%a: %s@," (Fmt.styled `Bold Fmt.string) "Thumbnail Slug" thumb_slug;

     

       118
       118
       +
                  (* Look up the image in srcsetter *)

     

       119
       119
       +
                  (match Entry.lookup_image entries thumb_slug with

     

       120
       120
       +
                   | Some img ->

     

       121
       121
       +
                     let thumbnail_url = Entry.smallest_webp_variant img in

     

       122
       122
       +
                     Fmt.pr "%a: %s@," (Fmt.styled `Bold Fmt.string) "Thumbnail URL" thumbnail_url;

     

       123
       123
       +
                     Fmt.pr "%a: %s@," (Fmt.styled `Bold Fmt.string) "Origin" (Srcsetter.origin img);

     

       124
       124
       +
                     let (w, h) = Srcsetter.dims img in

     

       125
       125
       +
                     Fmt.pr "%a: %dx%d@," (Fmt.styled `Bold Fmt.string) "Dimensions" w h;

     

       126
       126
       +
                     let variants = Srcsetter.variants img in

     

       127
       127
       +
                     if not (Srcsetter.MS.is_empty variants) then begin

     

       128
       128
       +
                       Fmt.pr "%a:@," (Fmt.styled `Bold Fmt.string) "Variants";

     

       129
       129
       +
                       Srcsetter.MS.iter (fun name (vw, vh) ->

     

       130
       130
       +
                         Fmt.pr "  %s: %dx%d@," name vw vh

     

       131
       131
       +
                       ) variants

     

       132
       132
       +
                     end;

     

       133
       133
       +
                     Fmt.pr "@]"

     

       134
       134
       +
                   | None ->

     

       135
       135
       +
                     Fmt.epr "Warning: Thumbnail image not in srcsetter: %s@." thumb_slug;

     

       136
       136
       +
                     Fmt.pr "@]";

     

       137
       137
       +
                     ())

     

       138
       138
       +
                | None -> ());

     

       139
       139
       +
               (* Add Typesense JSON *)

     

       140
       140
       +
               let doc = match entry with

     

       141
       141
       +
                 | `Paper p -> Typesense.paper_to_document entries p

     

       142
       142
       +
                 | `Project p -> Typesense.project_to_document entries p

     

       143
       143
       +
                 | `Idea i -> Typesense.idea_to_document entries i

     

       144
       144
       +
                 | `Video v -> Typesense.video_to_document entries v

     

       145
       145
       +
                 | `Note n -> Typesense.note_to_document entries n

     

       146
       146
       +
               in

     

       147
       147
       +
               Fmt.pr "@.@.";

     

       148
       148
       +
               Fmt.pr "%a:@," (Fmt.styled `Bold Fmt.string) "Typesense Document";

     

       149
       149
       +
               Fmt.pr "%s@," (Ezjsonm.value_to_string ~minify:false doc);

     

       150
       150
       +
               (* Add backlinks information *)

     

       151
       151
       +
               let backlinks = Bushel.Link_graph.get_backlinks_for_slug normalized_slug in

     

       152
       152
       +
               if backlinks <> [] then begin

     

       153
       153
       +
                 Fmt.pr "@.@.";

     

       154
       154
       +
                 Fmt.pr "%a (%d):@," (Fmt.styled `Bold Fmt.string) "Backlinks" (List.length backlinks);

     

       155
       155
       +
                 List.iter (fun source_slug ->

     

       156
       156
       +
                   match Entry.lookup entries source_slug with

     

       157
       157
       +
                   | Some source_entry ->

     

       158
       158
       +
                     let source_type = Entry.to_type_string source_entry in

     

       159
       159
       +
                     let source_title = Entry.title source_entry in

     

       160
       160
       +
                     Fmt.pr "  %a %a - %a@,"

     

       161
       161
       +
                       (Fmt.styled `Cyan Fmt.string) source_slug

     

       162
       162
       +
                       (Fmt.styled `Faint Fmt.string) (Printf.sprintf "(%s)" source_type)

     

       163
       163
       +
                       Fmt.string source_title

     

       164
       164
       +
                   | None ->

     

       165
       165
       +
                     Fmt.pr "  %a %a@,"

     

       166
       166
       +
                       (Fmt.styled `Cyan Fmt.string) source_slug

     

       167
       167
       +
                       (Fmt.styled `Red Fmt.string) "(not found)"

     

       168
       168
       +
                 ) backlinks

     

       169
       169
       +
               end;

     

       170
       170
       +
               (* Add references information for notes *)

     

       171
       171
       +
               (match entry with

     

       172
       172
       +
                | `Note n ->

     

       173
       173
       +
                  let default_author = match Contact.find_by_handle (Entry.contacts entries) "avsm" with

     

       174
       174
       +
                    | Some c -> c

     

       175
       175
       +
                    | None -> failwith "Default author 'avsm' not found"

     

       176
       176
       +
                  in

     

       177
       177
       +
                  let references = Md.note_references entries default_author n in

     

       178
       178
       +
                  if references <> [] then begin

     

       179
       179
       +
                    Fmt.pr "@.@.";

     

       180
       180
       +
                    Fmt.pr "%a (%d):@," (Fmt.styled `Bold Fmt.string) "References" (List.length references);

     

       181
       181
       +
                    List.iter (fun (doi, citation, _is_paper) ->

     

       182
       182
       +
                      Fmt.pr "  %a: %s@,"

     

       183
       183
       +
                        (Fmt.styled `Cyan Fmt.string) doi

     

       184
       184
       +
                        citation

     

       185
       185
       +
                    ) references

     

       186
       186
       +
                  end

     

       187
       187
       +
                | _ -> ());

     

       188
       188
       +
               Fmt.pr "@.";

     

       189
       189
       +
               0

     

       190
       190
       +
       

     

       191
       191
       +
       (** TODO:claude Command line interface definition *)

     

       192
       192
       +
       let slug_arg =

     

       193
       193
       +
         let doc = "The slug of the entry to display (with or without leading ':'), or contact handle (with '@' prefix). If not provided, lists all available slugs." in

     

       194
       194
       +
         Arg.(value & pos 0 (some string) None & info [] ~docv:"SLUG" ~doc)

     

       195
       195
       +
       

     

       196
       196
       +
       let term =

     

       197
       197
       +
         Term.(const info_cmd $ Bushel_common.setup_term $ Bushel_common.base_dir $ slug_arg)

     

       198
       198
       +
       

     

       199
       199
       +
       let cmd =

     

       200
       200
       +
         let doc = "Display all information for a given slug" in

     

       201
       201
       +
         let info = Cmd.info "info" ~doc in

     

       202
       202
       +
         Cmd.v info term

+549

stack/bushel/bin/bushel_links.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Lwt.Infix

     

       3
       3
       +
       

     

       4
       4
       +
       (* Helper function for logging with proper flushing *)

     

       5
       5
       +
       let log fmt = Fmt.kstr (fun s -> prerr_string s; flush stderr) fmt

     

       6
       6
       +
       let log_verbose verbose fmt = 

     

       7
       7
       +
         if verbose then Fmt.kstr (fun s -> prerr_string s; flush stderr) fmt 

     

       8
       8
       +
         else Fmt.kstr (fun _ -> ()) fmt

     

       9
       9
       +
       

     

       10
       10
       +
       (* Initialize a new links.yml file or ensure it exists *)

     

       11
       11
       +
       let init_links_file links_file =

     

       12
       12
       +
         if Sys.file_exists links_file then

     

       13
       13
       +
           print_endline (Fmt.str "Links file %s already exists" links_file)

     

       14
       14
       +
         else begin

     

       15
       15
       +
           (* Create an empty links file *)

     

       16
       16
       +
           Bushel.Link.save_links_file links_file [];

     

       17
       17
       +
           print_endline (Fmt.str "Created empty links file: %s" links_file)

     

       18
       18
       +
         end;

     

       19
       19
       +
         0

     

       20
       20
       +
       

     

       21
       21
       +
       (* Update links.yml from Karakeep *)

     

       22
       22
       +
       let update_from_karakeep base_url api_key_opt tag links_file download_assets =

     

       23
       23
       +
         match api_key_opt with

     

       24
       24
       +
         | None ->

     

       25
       25
       +
             prerr_endline "Error: API key is required.";

     

       26
       26
       +
             prerr_endline "Please provide one with --api-key or create a ~/.karakeep-api file.";

     

       27
       27
       +
             1

     

       28
       28
       +
         | Some api_key ->

     

       29
       29
       +
             let assets_dir = "data/assets" in

     

       30
       30
       +
             

     

       31
       31
       +
             (* Run the Lwt program *)

     

       32
       32
       +
             Lwt_main.run (

     

       33
       33
       +
               print_endline (Fmt.str "Fetching links from %s with tag '%s'..." base_url tag);

     

       34
       34
       +
               

     

       35
       35
       +
               (* Prepare tag filter *)

     

       36
       36
       +
               let filter_tags = if tag = "" then [] else [tag] in

     

       37
       37
       +
               

     

       38
       38
       +
               (* Fetch bookmarks from Karakeep with error handling *)

     

       39
       39
       +
               Lwt.catch

     

       40
       40
       +
                 (fun () ->

     

       41
       41
       +
                   Karakeep.fetch_all_bookmarks ~api_key ~filter_tags base_url >>= fun bookmarks ->

     

       42
       42
       +
                   

     

       43
       43
       +
                   print_endline (Fmt.str "Retrieved %d bookmarks from Karakeep" (List.length bookmarks));

     

       44
       44
       +
                   

     

       45
       45
       +
                   (* Read existing links if file exists *)

     

       46
       46
       +
                   let existing_links = Bushel.Link.load_links_file links_file in

     

       47
       47
       +
                   

     

       48
       48
       +
                   (* Convert bookmarks to bushel links *)

     

       49
       49
       +
                   let new_links = List.map (fun bookmark ->

     

       50
       50
       +
                     Karakeep.to_bushel_link ~base_url bookmark

     

       51
       51
       +
                   ) bookmarks in

     

       52
       52
       +
                   

     

       53
       53
       +
                   (* Merge with existing links - keep existing dates (karakeep dates may be unreliable) *)

     

       54
       54
       +
                   let merged_links = Bushel.Link.merge_links existing_links new_links in

     

       55
       55
       +
                   

     

       56
       56
       +
                   (* Save the updated links file *)

     

       57
       57
       +
                   Bushel.Link.save_links_file links_file merged_links;

     

       58
       58
       +
                   

     

       59
       59
       +
                   print_endline (Fmt.str "Updated %s with %d links" links_file (List.length merged_links));

     

       60
       60
       +
                   

     

       61
       61
       +
                   (* Download assets if requested *)

     

       62
       62
       +
                   if download_assets then begin

     

       63
       63
       +
                     print_endline "Downloading assets for bookmarks...";

     

       64
       64
       +
                     

     

       65
       65
       +
                     (* Ensure the assets directory exists *)

     

       66
       66
       +
                     (try Unix.mkdir assets_dir 0o755 with Unix.Unix_error (Unix.EEXIST, _, _) -> ());

     

       67
       67
       +
                     

     

       68
       68
       +
                     (* Process each bookmark with assets *)

     

       69
       69
       +
                     Lwt_list.iter_s (fun bookmark ->

     

       70
       70
       +
                       (* Extract asset IDs from bookmark *)

     

       71
       71
       +
                       let assets = bookmark.Karakeep.assets in

     

       72
       72
       +
                       

     

       73
       73
       +
                       (* Skip if no assets *)

     

       74
       74
       +
                       if assets = [] then

     

       75
       75
       +
                         Lwt.return_unit

     

       76
       76
       +
                       else

     

       77
       77
       +
                         (* Process each asset *)

     

       78
       78
       +
                         Lwt_list.iter_s (fun (asset_id, asset_type) ->

     

       79
       79
       +
                           let asset_dir = Fmt.str "%s/%s" assets_dir asset_id in

     

       80
       80
       +
                           let asset_file = Fmt.str "%s/asset.bin" asset_dir in

     

       81
       81
       +
                           let meta_file = Fmt.str "%s/metadata.json" asset_dir in

     

       82
       82
       +
                           

     

       83
       83
       +
                           (* Skip if the asset already exists *)

     

       84
       84
       +
                           if Sys.file_exists asset_file then

     

       85
       85
       +
                             Lwt.return_unit

     

       86
       86
       +
                           else begin

     

       87
       87
       +
                             (* Create the asset directory *)

     

       88
       88
       +
                             (try Unix.mkdir asset_dir 0o755 with Unix.Unix_error (Unix.EEXIST, _, _) -> ());

     

       89
       89
       +
                             

     

       90
       90
       +
                             (* Download the asset *)

     

       91
       91
       +
                             print_endline (Fmt.str "Downloading %s asset %s..." asset_type asset_id);

     

       92
       92
       +
                             Karakeep.fetch_asset ~api_key base_url asset_id >>= fun data ->

     

       93
       93
       +
                             

     

       94
       94
       +
                             (* Guess content type based on first bytes *)

     

       95
       95
       +
                             let content_type = 

     

       96
       96
       +
                               if String.length data >= 4 && String.sub data 0 4 = "\x89PNG" then

     

       97
       97
       +
                                 "image/png"

     

       98
       98
       +
                               else if String.length data >= 3 && String.sub data 0 3 = "\xFF\xD8\xFF" then

     

       99
       99
       +
                                 "image/jpeg"

     

       100
       100
       +
                               else if String.length data >= 4 && String.sub data 0 4 = "%PDF" then

     

       101
       101
       +
                                 "application/pdf"

     

       102
       102
       +
                               else

     

       103
       103
       +
                                 "application/octet-stream"

     

       104
       104
       +
                             in

     

       105
       105
       +
                             

     

       106
       106
       +
                             (* Write the asset data *)

     

       107
       107
       +
                             Lwt_io.with_file ~mode:Lwt_io.Output asset_file (fun oc ->

     

       108
       108
       +
                               Lwt_io.write oc data

     

       109
       109
       +
                             ) >>= fun () ->

     

       110
       110
       +
                             

     

       111
       111
       +
                             (* Write metadata file *)

     

       112
       112
       +
                             let metadata = Fmt.str "{\n  \"contentType\": \"%s\",\n  \"assetType\": \"%s\"\n}" 

     

       113
       113
       +
                               content_type asset_type in

     

       114
       114
       +
                             Lwt_io.with_file ~mode:Lwt_io.Output meta_file (fun oc ->

     

       115
       115
       +
                               Lwt_io.write oc metadata

     

       116
       116
       +
                             )

     

       117
       117
       +
                           end

     

       118
       118
       +
                         ) assets

     

       119
       119
       +
                     ) bookmarks >>= fun () ->

     

       120
       120
       +
                     

     

       121
       121
       +
                     print_endline "Asset download completed.";

     

       122
       122
       +
                     Lwt.return 0

     

       123
       123
       +
                   end else

     

       124
       124
       +
                     Lwt.return 0

     

       125
       125
       +
                 )

     

       126
       126
       +
                 (fun exn ->

     

       127
       127
       +
                   prerr_endline (Fmt.str "Error fetching bookmarks: %s" (Printexc.to_string exn));

     

       128
       128
       +
                   Lwt.return 1

     

       129
       129
       +
                 )

     

       130
       130
       +
             )

     

       131
       131
       +
       

     

       132
       132
       +
       (* Extract outgoing links from Bushel entries *)

     

       133
       133
       +
       let update_from_bushel bushel_dir links_file include_domains exclude_domains =

     

       134
       134
       +
         (* Parse domain filters if provided *)

     

       135
       135
       +
         let include_domains_list = match include_domains with

     

       136
       136
       +
           | None -> []

     

       137
       137
       +
           | Some s -> String.split_on_char ',' s |> List.map String.trim

     

       138
       138
       +
         in

     

       139
       139
       +
         

     

       140
       140
       +
         let exclude_domains_list = match exclude_domains with

     

       141
       141
       +
           | None -> []

     

       142
       142
       +
           | Some s -> String.split_on_char ',' s |> List.map String.trim

     

       143
       143
       +
         in

     

       144
       144
       +
         

     

       145
       145
       +
         (* Show filter settings if any *)

     

       146
       146
       +
         if include_domains_list <> [] then

     

       147
       147
       +
           print_endline (Fmt.str "Including only domains: %s" (String.concat ", " include_domains_list));

     

       148
       148
       +
         

     

       149
       149
       +
         if exclude_domains_list <> [] then

     

       150
       150
       +
           print_endline (Fmt.str "Excluding domains: %s" (String.concat ", " exclude_domains_list));

     

       151
       151
       +
         

     

       152
       152
       +
         (* Load all entries from the bushel directory *)

     

       153
       153
       +
         let notes_dir = Filename.concat bushel_dir "data/notes" in

     

       154
       154
       +
         

     

       155
       155
       +
         (* Make sure the notes directory exists *)

     

       156
       156
       +
         if not (Sys.file_exists notes_dir) then begin

     

       157
       157
       +
           prerr_endline (Fmt.str "Error: Notes directory %s does not exist" notes_dir);

     

       158
       158
       +
           exit 1

     

       159
       159
       +
         end;

     

       160
       160
       +
         

     

       161
       161
       +
         (* Load all entries with fallback *)

     

       162
       162
       +
         print_endline (Fmt.str "Loading entries from %s..." bushel_dir);

     

       163
       163
       +
         

     

       164
       164
       +
         let entries_data = Bushel.load bushel_dir in

     

       165
       165
       +
         let all_entries = Bushel.Entry.all_entries entries_data in

     

       166
       166
       +
         print_endline (Fmt.str "Loaded %d entries" (List.length all_entries));

     

       167
       167
       +
         

     

       168
       168
       +
         (* Extract outgoing links from all entries *)

     

       169
       169
       +
         print_endline "Extracting outgoing links...";

     

       170
       170
       +
         let extracted_links = ref [] in

     

       171
       171
       +
         

     

       172
       172
       +
         (* Process each entry *)

     

       173
       173
       +
         List.iter (fun entry ->

     

       174
       174
       +
           let entry_body = Bushel.Entry.body entry in

     

       175
       175
       +
           let entry_slug = Bushel.Entry.slug entry in

     

       176
       176
       +
           

     

       177
       177
       +
           (* Skip empty bodies *)

     

       178
       178
       +
           if entry_body <> "" then begin

     

       179
       179
       +
             let links = Bushel.Entry.extract_external_links entry_body in

     

       180
       180
       +
             if links <> [] then begin

     

       181
       181
       +
               (* Add each link from this entry *)

     

       182
       182
       +
               List.iter (fun url ->

     

       183
       183
       +
                 (* Try to extract domain from URL *)

     

       184
       184
       +
                 let domain = 

     

       185
       185
       +
                   try

     

       186
       186
       +
                     let uri = Uri.of_string url in

     

       187
       187
       +
                     match Uri.host uri with

     

       188
       188
       +
                     | Some host -> host

     

       189
       189
       +
                     | None -> "unknown"

     

       190
       190
       +
                   with _ -> "unknown"

     

       191
       191
       +
                 in

     

       192
       192
       +
                 

     

       193
       193
       +
                 (* Filter by domain if filters are specified *)

     

       194
       194
       +
                 let include_by_domain =

     

       195
       195
       +
                   if include_domains_list <> [] then

     

       196
       196
       +
                     List.exists (fun filter -> 

     

       197
       197
       +
                       domain = filter || String.ends_with ~suffix:filter domain

     

       198
       198
       +
                     ) include_domains_list

     

       199
       199
       +
                   else true

     

       200
       200
       +
                 in

     

       201
       201
       +
                 

     

       202
       202
       +
                 let exclude_by_domain =

     

       203
       203
       +
                   List.exists (fun filter -> 

     

       204
       204
       +
                     domain = filter || String.ends_with ~suffix:filter domain

     

       205
       205
       +
                   ) exclude_domains_list

     

       206
       206
       +
                 in

     

       207
       207
       +
                 

     

       208
       208
       +
                 if include_by_domain && not exclude_by_domain then begin

     

       209
       209
       +
                   let date = Bushel.Entry.date entry in

     

       210
       210
       +
                   

     

       211
       211
       +
                   (* Extract tags from the entry *)

     

       212
       212
       +
                   let entry_tags = Bushel.Tags.tags_of_ent entries_data entry in

     

       213
       213
       +
                   let tag_strings = List.map Bushel.Tags.to_string entry_tags in

     

       214
       214
       +
                   

     

       215
       215
       +
                   let link = { 

     

       216
       216
       +
                     Bushel.Link.url; 

     

       217
       217
       +
                     date; 

     

       218
       218
       +
                     description = ""; 

     

       219
       219
       +
                     karakeep = None;

     

       220
       220
       +
                     bushel = Some { 

     

       221
       221
       +
                       Bushel.Link.slugs = [entry_slug]; 

     

       222
       222
       +
                       tags = tag_strings 

     

       223
       223
       +
                     };

     

       224
       224
       +
                   } in

     

       225
       225
       +
                   extracted_links := link :: !extracted_links

     

       226
       226
       +
                 end

     

       227
       227
       +
               ) links

     

       228
       228
       +
             end

     

       229
       229
       +
           end

     

       230
       230
       +
         ) all_entries;

     

       231
       231
       +
         

     

       232
       232
       +
         (* Load existing links *)

     

       233
       233
       +
         let existing_links = Bushel.Link.load_links_file links_file in

     

       234
       234
       +
         

     

       235
       235
       +
         (* Merge with existing links - prefer bushel entry dates *)

     

       236
       236
       +
         let merged_links = Bushel.Link.merge_links ~prefer_new_date:true existing_links !extracted_links in

     

       237
       237
       +
         

     

       238
       238
       +
         (* Save the updated links file *)

     

       239
       239
       +
         Bushel.Link.save_links_file links_file merged_links;

     

       240
       240
       +
         

     

       241
       241
       +
         print_endline (Fmt.str "Added %d extracted links from Bushel to %s" 

     

       242
       242
       +
           (List.length !extracted_links) links_file);

     

       243
       243
       +
         print_endline (Fmt.str "Total links in file: %d" (List.length merged_links));

     

       244
       244
       +
         0

     

       245
       245
       +
       

     

       246
       246
       +
       (* Helper function to filter links that don't have karakeep data for a specific remote *)

     

       247
       247
       +
       let filter_links_without_karakeep base_url links =

     

       248
       248
       +
         List.filter (fun link ->

     

       249
       249
       +
           match link.Bushel.Link.karakeep with

     

       250
       250
       +
           | Some { remote_url; _ } when remote_url = base_url -> false

     

       251
       251
       +
           | _ -> true

     

       252
       252
       +
         ) links

     

       253
       253
       +
       

     

       254
       254
       +
       (* Helper function to apply limit to links if specified *)

     

       255
       255
       +
       let apply_limit_to_links limit links =

     

       256
       256
       +
         match limit with

     

       257
       257
       +
         | Some n when n > 0 -> 

     

       258
       258
       +
             let rec take_n acc count = function

     

       259
       259
       +
               | [] -> List.rev acc

     

       260
       260
       +
               | _ when count = 0 -> List.rev acc

     

       261
       261
       +
               | x :: xs -> take_n (x :: acc) (count - 1) xs

     

       262
       262
       +
             in

     

       263
       263
       +
             let limited = take_n [] n links in

     

       264
       264
       +
             if List.length links > n then

     

       265
       265
       +
               log "Limited to first %d links (out of %d available)\n" n (List.length links);

     

       266
       266
       +
             limited

     

       267
       267
       +
         | _ -> links

     

       268
       268
       +
       

     

       269
       269
       +
       (* Helper function to prepare tags for a link *)

     

       270
       270
       +
       let prepare_tags_for_link tag link =

     

       271
       271
       +
         let slug_tags = 

     

       272
       272
       +
           match link.Bushel.Link.bushel with

     

       273
       273
       +
           | Some { slugs; _ } -> List.map (fun slug -> "bushel:" ^ slug) slugs

     

       274
       274
       +
           | None -> []

     

       275
       275
       +
         in

     

       276
       276
       +
         if tag = "" then slug_tags

     

       277
       277
       +
         else tag :: slug_tags

     

       278
       278
       +
       

     

       279
       279
       +
       (* Helper function to create batches for parallel processing *)

     

       280
       280
       +
       let create_batches max_concurrent links =

     

       281
       281
       +
         let rec create_batches_aux links acc =

     

       282
       282
       +
           match links with

     

       283
       283
       +
           | [] -> List.rev acc

     

       284
       284
       +
           | _ ->

     

       285
       285
       +
               let batch, rest = 

     

       286
       286
       +
                 if List.length links <= max_concurrent then

     

       287
       287
       +
                   links, []

     

       288
       288
       +
                 else

     

       289
       289
       +
                   let rec take n lst batch =

     

       290
       290
       +
                     if n = 0 || lst = [] then List.rev batch, lst

     

       291
       291
       +
                     else take (n-1) (List.tl lst) (List.hd lst :: batch)

     

       292
       292
       +
                   in

     

       293
       293
       +
                   take max_concurrent links []

     

       294
       294
       +
               in

     

       295
       295
       +
               create_batches_aux rest (batch :: acc)

     

       296
       296
       +
         in

     

       297
       297
       +
         create_batches_aux links []

     

       298
       298
       +
       

     

       299
       299
       +
       (* Helper function to upload a single link to Karakeep *)

     

       300
       300
       +
       let upload_single_link api_key base_url tag verbose updated_links link =

     

       301
       301
       +
         let url = link.Bushel.Link.url in

     

       302
       302
       +
         let title = 

     

       303
       303
       +
           if link.Bushel.Link.description <> "" then 

     

       304
       304
       +
             Some link.Bushel.Link.description 

     

       305
       305
       +
           else None 

     

       306
       306
       +
         in

     

       307
       307
       +
         let tags = prepare_tags_for_link tag link in

     

       308
       308
       +
         

     

       309
       309
       +
         if verbose then begin

     

       310
       310
       +
           log "  Uploading: %s\n" url;

     

       311
       311
       +
           if tags <> [] then 

     

       312
       312
       +
             log "    Tags: %s\n" (String.concat ", " tags);

     

       313
       313
       +
           if title <> None then 

     

       314
       314
       +
             log "    Title: %s\n" (Option.get title);

     

       315
       315
       +
         end else begin

     

       316
       316
       +
           log "Uploading: %s\n" url;

     

       317
       317
       +
         end;

     

       318
       318
       +
         

     

       319
       319
       +
         (* Create the bookmark with tags *)

     

       320
       320
       +
         Lwt.catch

     

       321
       321
       +
           (fun () ->

     

       322
       322
       +
             Karakeep.create_bookmark 

     

       323
       323
       +
               ~api_key 

     

       324
       324
       +
               ~url 

     

       325
       325
       +
               ?title 

     

       326
       326
       +
               ~tags 

     

       327
       327
       +
               base_url 

     

       328
       328
       +
             >>= fun bookmark ->

     

       329
       329
       +
             

     

       330
       330
       +
             (* Create updated link with karakeep data *)

     

       331
       331
       +
             let updated_link = {

     

       332
       332
       +
               link with 

     

       333
       333
       +
               Bushel.Link.karakeep = 

     

       334
       334
       +
                 Some { 

     

       335
       335
       +
                   Bushel.Link.remote_url = base_url; 

     

       336
       336
       +
                   id = bookmark.id;

     

       337
       337
       +
                   tags = bookmark.tags;

     

       338
       338
       +
                   metadata = [];  (* Will be populated on next sync *)

     

       339
       339
       +
                 }

     

       340
       340
       +
             } in

     

       341
       341
       +
             updated_links := updated_link :: !updated_links;

     

       342
       342
       +
             

     

       343
       343
       +
             if verbose then

     

       344
       344
       +
               log "    ✓ Added to Karakeep with ID: %s\n" bookmark.id

     

       345
       345
       +
             else

     

       346
       346
       +
               log "  - Added to Karakeep with ID: %s\n" bookmark.id;

     

       347
       347
       +
             Lwt.return 1 (* Success *)

     

       348
       348
       +
           )

     

       349
       349
       +
           (fun exn ->

     

       350
       350
       +
             if verbose then

     

       351
       351
       +
               log "    ✗ Error uploading %s: %s\n" url (Printexc.to_string exn)

     

       352
       352
       +
             else

     

       353
       353
       +
               log "  - Error uploading %s: %s\n" url (Printexc.to_string exn);

     

       354
       354
       +
             Lwt.return 0 (* Failure *)

     

       355
       355
       +
           )

     

       356
       356
       +
       

     

       357
       357
       +
       (* Helper function to process a batch of links *)

     

       358
       358
       +
       let process_batch api_key base_url tag verbose updated_links batch_num total_batches batch =

     

       359
       359
       +
         log_verbose verbose "\nProcessing batch %d/%d (%d links)...\n" 

     

       360
       360
       +
           (batch_num + 1) total_batches (List.length batch);

     

       361
       361
       +
         

     

       362
       362
       +
         (* Process links in this batch concurrently *)

     

       363
       363
       +
         Lwt_list.map_p (upload_single_link api_key base_url tag verbose updated_links) batch

     

       364
       364
       +
       

     

       365
       365
       +
       (* Helper function to update links file with new karakeep data *)

     

       366
       366
       +
       let update_links_file links_file original_links updated_links =

     

       367
       367
       +
         if !updated_links <> [] then begin

     

       368
       368
       +
           (* Replace the updated links in the original list *)

     

       369
       369
       +
           let final_links =

     

       370
       370
       +
             List.map (fun link ->

     

       371
       371
       +
               let url = link.Bushel.Link.url in

     

       372
       372
       +
               let updated = List.find_opt (fun ul -> ul.Bushel.Link.url = url) !updated_links in

     

       373
       373
       +
               match updated with

     

       374
       374
       +
               | Some ul -> ul

     

       375
       375
       +
               | None -> link

     

       376
       376
       +
             ) original_links

     

       377
       377
       +
           in

     

       378
       378
       +
           

     

       379
       379
       +
           (* Save the updated links file *)

     

       380
       380
       +
           Bushel.Link.save_links_file links_file final_links;

     

       381
       381
       +
           

     

       382
       382
       +
           log "Updated %s with %d new karakeep_ids\n" 

     

       383
       383
       +
             links_file (List.length !updated_links);

     

       384
       384
       +
         end

     

       385
       385
       +
       

     

       386
       386
       +
       (* Upload links to Karakeep that don't already have karakeep data *)

     

       387
       387
       +
       let upload_to_karakeep base_url api_key_opt links_file tag max_concurrent delay_seconds limit verbose =

     

       388
       388
       +
         match api_key_opt with

     

       389
       389
       +
         | None ->

     

       390
       390
       +
             log "Error: API key is required.\n";

     

       391
       391
       +
             log "Please provide one with --api-key or create a ~/.karakeep-api file.\n";

     

       392
       392
       +
             1

     

       393
       393
       +
         | Some api_key ->

     

       394
       394
       +
             (* Load links from file *)

     

       395
       395
       +
             log_verbose verbose "Loading links from %s...\n" links_file;

     

       396
       396
       +
             let links = Bushel.Link.load_links_file links_file in

     

       397
       397
       +
             log_verbose verbose "Loaded %d total links\n" (List.length links);

     

       398
       398
       +
             

     

       399
       399
       +
             (* Filter links that don't have karakeep data for this remote *)

     

       400
       400
       +
             log_verbose verbose "Filtering links that don't have karakeep data for %s...\n" base_url;

     

       401
       401
       +
             let filtered_links = filter_links_without_karakeep base_url links in

     

       402
       402
       +
             log_verbose verbose "Found %d links without karakeep data\n" (List.length filtered_links);

     

       403
       403
       +
             

     

       404
       404
       +
             (* Apply limit if specified *)

     

       405
       405
       +
             let links_to_upload = apply_limit_to_links limit filtered_links in

     

       406
       406
       +
         

     

       407
       407
       +
             if links_to_upload = [] then begin

     

       408
       408
       +
               log "No links to upload to %s (all links already have karakeep data)\n" base_url;

     

       409
       409
       +
               0

     

       410
       410
       +
             end else begin

     

       411
       411
       +
               log "Found %d links to upload to %s\n" (List.length links_to_upload) base_url;

     

       412
       412
       +
               

     

       413
       413
       +
               (* Split links into batches for parallel processing *)

     

       414
       414
       +
               let batches = create_batches max_concurrent links_to_upload in

     

       415
       415
       +
               log_verbose verbose "Processing in %d batches of up to %d links each...\n" 

     

       416
       416
       +
                 (List.length batches) max_concurrent;

     

       417
       417
       +
               log_verbose verbose "Delay between batches: %.1f seconds\n" delay_seconds;

     

       418
       418
       +
               

     

       419
       419
       +
               (* Process batches and accumulate updated links *)

     

       420
       420
       +
               let updated_links = ref [] in

     

       421
       421
       +
               

     

       422
       422
       +
               let result = Lwt_main.run (

     

       423
       423
       +
                 Lwt.catch

     

       424
       424
       +
                   (fun () ->

     

       425
       425
       +
                     Lwt_list.fold_left_s (fun (total_count, batch_num) batch ->

     

       426
       426
       +
                       process_batch api_key base_url tag verbose updated_links 

     

       427
       427
       +
                         batch_num (List.length batches) batch >>= fun results ->

     

       428
       428
       +
                       

     

       429
       429
       +
                       (* Count successes in this batch *)

     

       430
       430
       +
                       let batch_successes = List.fold_left (+) 0 results in

     

       431
       431
       +
                       let new_total = total_count + batch_successes in

     

       432
       432
       +
                       

     

       433
       433
       +
                       log_verbose verbose "  Batch %d complete: %d/%d successful (Total: %d/%d)\n" 

     

       434
       434
       +
                         (batch_num + 1) batch_successes (List.length batch) new_total (new_total + (List.length links_to_upload - new_total));

     

       435
       435
       +
                       

     

       436
       436
       +
                       (* Add a delay before processing the next batch *)

     

       437
       437
       +
                       if batch_num + 1 < List.length batches then begin

     

       438
       438
       +
                         log_verbose verbose "  Waiting %.1f seconds before next batch...\n" delay_seconds;

     

       439
       439
       +
                         Lwt_unix.sleep delay_seconds >>= fun () ->

     

       440
       440
       +
                         Lwt.return (new_total, batch_num + 1)

     

       441
       441
       +
                       end else

     

       442
       442
       +
                         Lwt.return (new_total, batch_num + 1)

     

       443
       443
       +
                     ) (0, 0) batches >>= fun (final_count, _) ->

     

       444
       444
       +
                     Lwt.return final_count

     

       445
       445
       +
                   )

     

       446
       446
       +
                   (fun exn ->

     

       447
       447
       +
                     log "Error during upload operation: %s\n" (Printexc.to_string exn);

     

       448
       448
       +
                     Lwt.return 0

     

       449
       449
       +
                   )

     

       450
       450
       +
               ) in

     

       451
       451
       +
               

     

       452
       452
       +
               (* Update the links file with the new karakeep_ids *)

     

       453
       453
       +
               update_links_file links_file links updated_links;

     

       454
       454
       +
               

     

       455
       455
       +
               log "Upload complete. %d/%d links uploaded successfully.\n" 

     

       456
       456
       +
                 result (List.length links_to_upload);

     

       457
       457
       +
               

     

       458
       458
       +
               0

     

       459
       459
       +
             end

     

       460
       460
       +
       

     

       461
       461
       +
       (* Common arguments *)

     

       462
       462
       +
       let links_file_arg =

     

       463
       463
       +
         let doc = "Links YAML file. Defaults to links.yml." in

     

       464
       464
       +
         Arg.(value & opt string "links.yml" & info ["file"; "f"] ~doc ~docv:"FILE")

     

       465
       465
       +
       

     

       466
       466
       +
       let base_url_arg =

     

       467
       467
       +
         let doc = "Base URL of the Karakeep instance" in

     

       468
       468
       +
         let default = "https://hoard.recoil.org" in

     

       469
       469
       +
         Arg.(value & opt string default & info ["url"] ~doc ~docv:"URL")

     

       470
       470
       +
       

     

       471
       471
       +
       let api_key_arg =

     

       472
       472
       +
         let doc = "API key for Karakeep authentication (ak1_<key_id>_<secret>)" in

     

       473
       473
       +
         let get_api_key () =

     

       474
       474
       +
           let home = try Sys.getenv "HOME" with Not_found -> "." in

     

       475
       475
       +
           let key_path = Filename.concat home ".karakeep-api" in

     

       476
       476
       +
           try

     

       477
       477
       +
             let ic = open_in key_path in

     

       478
       478
       +
             let key = input_line ic in

     

       479
       479
       +
             close_in ic;

     

       480
       480
       +
             Some (String.trim key)

     

       481
       481
       +
           with _ -> None

     

       482
       482
       +
         in

     

       483
       483
       +
         Arg.(value & opt (some string) (get_api_key ()) & info ["api-key"] ~doc ~docv:"API_KEY")

     

       484
       484
       +
       

     

       485
       485
       +
       let tag_arg =

     

       486
       486
       +
         let doc = "Tag to filter or apply to bookmarks" in

     

       487
       487
       +
         Arg.(value & opt string "" & info ["tag"; "t"] ~doc ~docv:"TAG")

     

       488
       488
       +
       

     

       489
       489
       +
       let download_assets_arg =

     

       490
       490
       +
         let doc = "Download assets (screenshots, etc.) from Karakeep" in

     

       491
       491
       +
         Arg.(value & flag & info ["download-assets"; "d"] ~doc)

     

       492
       492
       +
       

     

       493
       493
       +
       let base_dir_arg =

     

       494
       494
       +
         let doc = "Base directory of the Bushel project" in

     

       495
       495
       +
         Arg.(value & opt string "." & info ["dir"; "d"] ~doc ~docv:"DIR")

     

       496
       496
       +
       

     

       497
       497
       +
       let include_domains_arg =

     

       498
       498
       +
         let doc = "Only include links to these domains (comma-separated list)" in

     

       499
       499
       +
         Arg.(value & opt (some string) None & info ["include"] ~doc ~docv:"DOMAINS")

     

       500
       500
       +
       

     

       501
       501
       +
       let exclude_domains_arg =

     

       502
       502
       +
         let doc = "Exclude links to these domains (comma-separated list)" in

     

       503
       503
       +
         Arg.(value & opt (some string) None & info ["exclude"] ~doc ~docv:"DOMAINS")

     

       504
       504
       +
       

     

       505
       505
       +
       let concurrent_arg =

     

       506
       506
       +
         let doc = "Maximum number of concurrent uploads (default: 5)" in

     

       507
       507
       +
         Arg.(value & opt int 5 & info ["concurrent"; "c"] ~doc ~docv:"NUM")

     

       508
       508
       +
       

     

       509
       509
       +
       let delay_arg =

     

       510
       510
       +
         let doc = "Delay in seconds between batches (default: 1.0)" in

     

       511
       511
       +
         Arg.(value & opt float 1.0 & info ["delay"] ~doc ~docv:"SECONDS")

     

       512
       512
       +
       

     

       513
       513
       +
       let limit_arg =

     

       514
       514
       +
         let doc = "Limit number of links to upload (for testing)" in

     

       515
       515
       +
         Arg.(value & opt (some int) None & info ["limit"; "l"] ~doc ~docv:"NUM")

     

       516
       516
       +
       

     

       517
       517
       +
       let verbose_arg =

     

       518
       518
       +
         let doc = "Show detailed progress information during upload" in

     

       519
       519
       +
         Arg.(value & flag & info ["verbose"; "v"] ~doc)

     

       520
       520
       +
       

     

       521
       521
       +
       (* Command definitions *)

     

       522
       522
       +
       let init_cmd =

     

       523
       523
       +
         let doc = "Initialize a new links.yml file" in

     

       524
       524
       +
         let info = Cmd.info "init" ~doc in

     

       525
       525
       +
         Cmd.v info Term.(const init_links_file $ links_file_arg)

     

       526
       526
       +
       

     

       527
       527
       +
       let karakeep_cmd =

     

       528
       528
       +
         let doc = "Update links.yml with links from Karakeep" in

     

       529
       529
       +
         let info = Cmd.info "karakeep" ~doc in

     

       530
       530
       +
         Cmd.v info Term.(const update_from_karakeep $ base_url_arg $ api_key_arg $ tag_arg $ links_file_arg $ download_assets_arg)

     

       531
       531
       +
       

     

       532
       532
       +
       let bushel_cmd =

     

       533
       533
       +
         let doc = "Update links.yml with outgoing links from Bushel entries" in

     

       534
       534
       +
         let info = Cmd.info "bushel" ~doc in

     

       535
       535
       +
         Cmd.v info Term.(const update_from_bushel $ base_dir_arg $ links_file_arg $ include_domains_arg $ exclude_domains_arg)

     

       536
       536
       +
       

     

       537
       537
       +
       let upload_cmd =

     

       538
       538
       +
         let doc = "Upload links without karakeep data to Karakeep" in

     

       539
       539
       +
         let info = Cmd.info "upload" ~doc in

     

       540
       540
       +
         Cmd.v info Term.(const upload_to_karakeep $ base_url_arg $ api_key_arg $ links_file_arg $ tag_arg $ concurrent_arg $ delay_arg $ limit_arg $ verbose_arg)

     

       541
       541
       +
       

     

       542
       542
       +
       (* Export the term and cmd for use in main bushel.ml *)

     

       543
       543
       +
       let cmd =

     

       544
       544
       +
         let doc = "Manage links between Bushel and Karakeep" in

     

       545
       545
       +
         let info = Cmd.info "links" ~doc in

     

       546
       546
       +
         Cmd.group info [init_cmd; karakeep_cmd; bushel_cmd; upload_cmd]

     

       547
       547
       +
       

     

       548
       548
       +
       (* For standalone execution *)

     

       549
       549
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+115

stack/bushel/bin/bushel_main.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       

     

       3
       3
       +
       let version = "0.1.0"

     

       4
       4
       +
       

     

       5
       5
       +
       (* Import actual command implementations from submodules *)

     

       6
       6
       +
       

     

       7
       7
       +
       (* Faces command *)

     

       8
       8
       +
       let faces_cmd =

     

       9
       9
       +
         let doc = "Retrieve face thumbnails from Immich photo service" in

     

       10
       10
       +
         let info = Cmd.info "faces" ~version ~doc in

     

       11
       11
       +
         Cmd.v info Bushel_faces.term

     

       12
       12
       +
       

     

       13
       13
       +
       (* Links command - uses group structure *)

     

       14
       14
       +
       let links_cmd = Bushel_links.cmd

     

       15
       15
       +
       

     

       16
       16
       +
       (* Obsidian command *)

     

       17
       17
       +
       let obsidian_cmd =

     

       18
       18
       +
         let doc = "Convert Bushel entries to Obsidian format" in

     

       19
       19
       +
         let info = Cmd.info "obsidian" ~version ~doc in

     

       20
       20
       +
         Cmd.v info Bushel_obsidian.term

     

       21
       21
       +
       

     

       22
       22
       +
       (* Paper command *)

     

       23
       23
       +
       let paper_cmd =

     

       24
       24
       +
         let doc = "Fetch paper metadata from DOI" in

     

       25
       25
       +
         let info = Cmd.info "paper" ~version ~doc in

     

       26
       26
       +
         Cmd.v info Bushel_paper.term

     

       27
       27
       +
       

     

       28
       28
       +
       (* Paper classify command *)

     

       29
       29
       +
       let paper_classify_cmd = Bushel_paper_classify.cmd

     

       30
       30
       +
       

     

       31
       31
       +
       (* Paper tex command *)

     

       32
       32
       +
       let paper_tex_cmd = Bushel_paper_tex.cmd

     

       33
       33
       +
       

     

       34
       34
       +
       (* Thumbs command *)

     

       35
       35
       +
       let thumbs_cmd =

     

       36
       36
       +
         let doc = "Generate thumbnails from paper PDFs" in

     

       37
       37
       +
         let info = Cmd.info "thumbs" ~version ~doc in

     

       38
       38
       +
         Cmd.v info Bushel_thumbs.term

     

       39
       39
       +
       

     

       40
       40
       +
       (* Video command *)

     

       41
       41
       +
       let video_cmd =

     

       42
       42
       +
         let doc = "Fetch videos from PeerTube instances" in

     

       43
       43
       +
         let info = Cmd.info "video" ~version ~doc in

     

       44
       44
       +
         Cmd.v info Bushel_video.term

     

       45
       45
       +
       

     

       46
       46
       +
       (* Video thumbs command *)

     

       47
       47
       +
       let video_thumbs_cmd = Bushel_video_thumbs.cmd

     

       48
       48
       +
       

     

       49
       49
       +
       (* Query command *)

     

       50
       50
       +
       let query_cmd =

     

       51
       51
       +
         let doc = "Query Bushel collections using multisearch" in

     

       52
       52
       +
         let info = Cmd.info "query" ~version ~doc in

     

       53
       53
       +
         Cmd.v info Bushel_search.term

     

       54
       54
       +
       

     

       55
       55
       +
       (* Bibtex command *)

     

       56
       56
       +
       let bibtex_cmd =

     

       57
       57
       +
         let doc = "Export bibtex for all papers" in

     

       58
       58
       +
         let info = Cmd.info "bibtex" ~version ~doc in

     

       59
       59
       +
         Cmd.v info Bushel_bibtex.term

     

       60
       60
       +
       

     

       61
       61
       +
       (* Ideas command *)

     

       62
       62
       +
       let ideas_cmd = Bushel_ideas.cmd

     

       63
       63
       +
       

     

       64
       64
       +
       (* Info command *)

     

       65
       65
       +
       let info_cmd = Bushel_info.cmd

     

       66
       66
       +
       

     

       67
       67
       +
       (* Missing command *)

     

       68
       68
       +
       let missing_cmd = Bushel_missing.cmd

     

       69
       69
       +
       

     

       70
       70
       +
       (* Note DOI command *)

     

       71
       71
       +
       let note_doi_cmd = Bushel_note_doi.cmd

     

       72
       72
       +
       

     

       73
       73
       +
       (* DOI resolve command *)

     

       74
       74
       +
       let doi_cmd = Bushel_doi.cmd

     

       75
       75
       +
       

     

       76
       76
       +
       (* Main command *)

     

       77
       77
       +
       let bushel_cmd =

     

       78
       78
       +
         let doc = "Bushel content management toolkit" in

     

       79
       79
       +
         let sdocs = Manpage.s_common_options in

     

       80
       80
       +
         let man = [

     

       81
       81
       +
           `S Manpage.s_description;

     

       82
       82
       +
           `P "$(tname) is a unified command-line tool for managing various types of \

     

       83
       83
       +
               content in the Bushel system, including papers, videos, links, and more.";

     

       84
       84
       +
           `P "$(tname) provides unified access to all Bushel functionality through \

     

       85
       85
       +
               integrated subcommands.";

     

       86
       86
       +
           `S Manpage.s_commands;

     

       87
       87
       +
           `S Manpage.s_common_options;

     

       88
       88
       +
           `S "ENVIRONMENT";

     

       89
       89
       +
           `P "BUSHEL_CONFIG - Path to configuration file with default settings";

     

       90
       90
       +
           `S Manpage.s_authors;

     

       91
       91
       +
           `P "Anil Madhavapeddy";

     

       92
       92
       +
           `S Manpage.s_bugs;

     

       93
       93
       +
           `P "Report bugs at https://github.com/avsm/bushel/issues";

     

       94
       94
       +
         ] in

     

       95
       95
       +
         let info = Cmd.info "bushel" ~version ~doc ~sdocs ~man in

     

       96
       96
       +
         Cmd.group info [

     

       97
       97
       +
           bibtex_cmd;

     

       98
       98
       +
           doi_cmd;

     

       99
       99
       +
           faces_cmd;

     

       100
       100
       +
           ideas_cmd;

     

       101
       101
       +
           info_cmd;

     

       102
       102
       +
           links_cmd;

     

       103
       103
       +
           missing_cmd;

     

       104
       104
       +
           note_doi_cmd;

     

       105
       105
       +
           obsidian_cmd;

     

       106
       106
       +
           paper_cmd;

     

       107
       107
       +
           paper_classify_cmd;

     

       108
       108
       +
           paper_tex_cmd;

     

       109
       109
       +
           query_cmd;

     

       110
       110
       +
           thumbs_cmd;

     

       111
       111
       +
           video_cmd;

     

       112
       112
       +
           video_thumbs_cmd;

     

       113
       113
       +
         ]

     

       114
       114
       +
       

     

       115
       115
       +
       let () = exit (Cmd.eval' bushel_cmd)

+185

stack/bushel/bin/bushel_missing.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Bushel

     

       3
       3
       +
       

     

       4
       4
       +
       (** Check if an entry has a thumbnail *)

     

       5
       5
       +
       let has_thumbnail entries entry =

     

       6
       6
       +
         match Entry.thumbnail_slug entries entry with

     

       7
       7
       +
         | Some _ -> true

     

       8
       8
       +
         | None -> false

     

       9
       9
       +
       

     

       10
       10
       +
       (** Check if an entry has a synopsis or description *)

     

       11
       11
       +
       let has_synopsis = function

     

       12
       12
       +
         | `Paper p -> Paper.abstract p <> ""  (* Papers have abstracts *)

     

       13
       13
       +
         | `Note n -> Note.synopsis n <> None  (* Notes have optional synopsis *)

     

       14
       14
       +
         | `Idea _ -> true  (* Ideas don't have synopsis field *)

     

       15
       15
       +
         | `Project _ -> true  (* Projects don't have synopsis field *)

     

       16
       16
       +
         | `Video _ -> true  (* Videos don't have synopsis field *)

     

       17
       17
       +
       

     

       18
       18
       +
       (** Check if an entry has tags *)

     

       19
       19
       +
       let has_tags = function

     

       20
       20
       +
         | `Paper p -> Paper.tags p <> []

     

       21
       21
       +
         | `Note n -> Note.tags n <> []

     

       22
       22
       +
         | `Idea i -> i.Idea.tags <> []  (* Access record field directly *)

     

       23
       23
       +
         | `Project p -> Project.tags p <> []

     

       24
       24
       +
         | `Video v -> v.Video.tags <> []  (* Access record field directly *)

     

       25
       25
       +
       

     

       26
       26
       +
       (** Entry with broken references *)

     

       27
       27
       +
       type entry_with_broken_refs = {

     

       28
       28
       +
         entry : Entry.entry;

     

       29
       29
       +
         broken_slugs : string list;

     

       30
       30
       +
         broken_contacts : string list;

     

       31
       31
       +
       }

     

       32
       32
       +
       

     

       33
       33
       +
       (** Find entries missing thumbnails *)

     

       34
       34
       +
       let find_missing_thumbnails entries =

     

       35
       35
       +
         let all = Entry.all_entries entries in

     

       36
       36
       +
         List.filter (fun entry -> not (has_thumbnail entries entry)) all

     

       37
       37
       +
       

     

       38
       38
       +
       (** Find entries missing synopsis *)

     

       39
       39
       +
       let find_missing_synopsis entries =

     

       40
       40
       +
         let all = Entry.all_entries entries in

     

       41
       41
       +
         List.filter (fun entry -> not (has_synopsis entry)) all

     

       42
       42
       +
       

     

       43
       43
       +
       (** Find entries missing tags *)

     

       44
       44
       +
       let find_missing_tags entries =

     

       45
       45
       +
         let all = Entry.all_entries entries in

     

       46
       46
       +
         List.filter (fun entry -> not (has_tags entry)) all

     

       47
       47
       +
       

     

       48
       48
       +
       (** Find entries with broken slugs or contact handles *)

     

       49
       49
       +
       let find_broken_references entries =

     

       50
       50
       +
         let all = Entry.all_entries entries in

     

       51
       51
       +
         List.filter_map (fun entry ->

     

       52
       52
       +
           let body = Entry.body entry in

     

       53
       53
       +
           let broken_slugs, broken_contacts = Md.validate_references entries body in

     

       54
       54
       +
           if broken_slugs <> [] || broken_contacts <> [] then

     

       55
       55
       +
             Some { entry; broken_slugs; broken_contacts }

     

       56
       56
       +
           else

     

       57
       57
       +
             None

     

       58
       58
       +
         ) all

     

       59
       59
       +
       

     

       60
       60
       +
       (** Print a list of entries *)

     

       61
       61
       +
       let print_entries title entries_list =

     

       62
       62
       +
         if entries_list <> [] then begin

     

       63
       63
       +
           Fmt.pr "@.%a (%d):@," (Fmt.styled `Bold Fmt.string) title (List.length entries_list);

     

       64
       64
       +
           List.iter (fun entry ->

     

       65
       65
       +
             let slug = Entry.slug entry in

     

       66
       66
       +
             let type_str = Entry.to_type_string entry in

     

       67
       67
       +
             let title = Entry.title entry in

     

       68
       68
       +
             Fmt.pr "  %a %a - %a@,"

     

       69
       69
       +
               (Fmt.styled `Cyan Fmt.string) slug

     

       70
       70
       +
               (Fmt.styled `Faint Fmt.string) (Printf.sprintf "(%s)" type_str)

     

       71
       71
       +
               Fmt.string title

     

       72
       72
       +
           ) entries_list

     

       73
       73
       +
         end

     

       74
       74
       +
       

     

       75
       75
       +
       (** Print entries with broken references *)

     

       76
       76
       +
       let print_broken_references title entries_with_broken_refs =

     

       77
       77
       +
         if entries_with_broken_refs <> [] then begin

     

       78
       78
       +
           Fmt.pr "@.%a (%d):@," (Fmt.styled `Bold Fmt.string) title (List.length entries_with_broken_refs);

     

       79
       79
       +
           List.iter (fun { entry; broken_slugs; broken_contacts } ->

     

       80
       80
       +
             let slug = Entry.slug entry in

     

       81
       81
       +
             let type_str = Entry.to_type_string entry in

     

       82
       82
       +
             let entry_title = Entry.title entry in

     

       83
       83
       +
             Fmt.pr "  %a %a - %a@,"

     

       84
       84
       +
               (Fmt.styled `Cyan Fmt.string) slug

     

       85
       85
       +
               (Fmt.styled `Faint Fmt.string) (Printf.sprintf "(%s)" type_str)

     

       86
       86
       +
               Fmt.string entry_title;

     

       87
       87
       +
             if broken_slugs <> [] then

     

       88
       88
       +
               Fmt.pr "    %a %a@,"

     

       89
       89
       +
                 (Fmt.styled `Red Fmt.string) "Broken slugs:"

     

       90
       90
       +
                 (Fmt.list ~sep:Fmt.comma Fmt.string) broken_slugs;

     

       91
       91
       +
             if broken_contacts <> [] then

     

       92
       92
       +
               Fmt.pr "    %a %a@,"

     

       93
       93
       +
                 (Fmt.styled `Red Fmt.string) "Broken contacts:"

     

       94
       94
       +
                 (Fmt.list ~sep:Fmt.comma Fmt.string) broken_contacts;

     

       95
       95
       +
           ) entries_with_broken_refs

     

       96
       96
       +
         end

     

       97
       97
       +
       

     

       98
       98
       +
       (** Main missing command implementation *)

     

       99
       99
       +
       let missing_cmd () base_dir check_thumbnails check_synopsis check_tags check_refs =

     

       100
       100
       +
         let entries = load base_dir in

     

       101
       101
       +
       

     

       102
       102
       +
         let count = ref 0 in

     

       103
       103
       +
       

     

       104
       104
       +
         if check_thumbnails then begin

     

       105
       105
       +
           let missing = find_missing_thumbnails entries in

     

       106
       106
       +
           print_entries "Entries missing thumbnails" missing;

     

       107
       107
       +
           count := !count + List.length missing

     

       108
       108
       +
         end;

     

       109
       109
       +
       

     

       110
       110
       +
         if check_synopsis then begin

     

       111
       111
       +
           let missing = find_missing_synopsis entries in

     

       112
       112
       +
           print_entries "Entries missing synopsis" missing;

     

       113
       113
       +
           count := !count + List.length missing

     

       114
       114
       +
         end;

     

       115
       115
       +
       

     

       116
       116
       +
         if check_tags then begin

     

       117
       117
       +
           let missing = find_missing_tags entries in

     

       118
       118
       +
           print_entries "Entries missing tags" missing;

     

       119
       119
       +
           count := !count + List.length missing

     

       120
       120
       +
         end;

     

       121
       121
       +
       

     

       122
       122
       +
         if check_refs then begin

     

       123
       123
       +
           let broken = find_broken_references entries in

     

       124
       124
       +
           print_broken_references "Entries with broken references" broken;

     

       125
       125
       +
           (* Count total number of broken references, not just entries *)

     

       126
       126
       +
           let broken_count = List.fold_left (fun acc { broken_slugs; broken_contacts; _ } ->

     

       127
       127
       +
             acc + List.length broken_slugs + List.length broken_contacts

     

       128
       128
       +
           ) 0 broken in

     

       129
       129
       +
           count := !count + broken_count

     

       130
       130
       +
         end;

     

       131
       131
       +
       

     

       132
       132
       +
         if !count = 0 then

     

       133
       133
       +
           Fmt.pr "@.No missing metadata or broken references found.@."

     

       134
       134
       +
         else

     

       135
       135
       +
           Fmt.pr "@.Total issues found: %d@." !count;

     

       136
       136
       +
       

     

       137
       137
       +
         0

     

       138
       138
       +
       

     

       139
       139
       +
       (** Command line arguments *)

     

       140
       140
       +
       let thumbnails_flag =

     

       141
       141
       +
         let doc = "Check for entries missing thumbnails" in

     

       142
       142
       +
         Arg.(value & flag & info ["thumbnails"; "t"] ~doc)

     

       143
       143
       +
       

     

       144
       144
       +
       let synopsis_flag =

     

       145
       145
       +
         let doc = "Check for entries missing synopsis" in

     

       146
       146
       +
         Arg.(value & flag & info ["synopsis"; "s"] ~doc)

     

       147
       147
       +
       

     

       148
       148
       +
       let tags_flag =

     

       149
       149
       +
         let doc = "Check for entries missing tags" in

     

       150
       150
       +
         Arg.(value & flag & info ["tags"; "g"] ~doc)

     

       151
       151
       +
       

     

       152
       152
       +
       let refs_flag =

     

       153
       153
       +
         let doc = "Check for broken slugs and contact handles" in

     

       154
       154
       +
         Arg.(value & flag & info ["refs"; "r"] ~doc)

     

       155
       155
       +
       

     

       156
       156
       +
       let term =

     

       157
       157
       +
         Term.(const (fun setup base thumbnails synopsis tags refs ->

     

       158
       158
       +
           (* If no flags specified, check everything *)

     

       159
       159
       +
           let check_all = not (thumbnails || synopsis || tags || refs) in

     

       160
       160
       +
           missing_cmd setup base

     

       161
       161
       +
             (check_all || thumbnails)

     

       162
       162
       +
             (check_all || synopsis)

     

       163
       163
       +
             (check_all || tags)

     

       164
       164
       +
             (check_all || refs)

     

       165
       165
       +
         ) $ Bushel_common.setup_term $ Bushel_common.base_dir $ thumbnails_flag $ synopsis_flag $ tags_flag $ refs_flag)

     

       166
       166
       +
       

     

       167
       167
       +
       let cmd =

     

       168
       168
       +
         let doc = "List entries with missing metadata or broken references" in

     

       169
       169
       +
         let man = [

     

       170
       170
       +
           `S Manpage.s_description;

     

       171
       171
       +
           `P "This command scans all entries and reports any that are missing thumbnails, synopsis, tags, or have broken slugs/contact handles.";

     

       172
       172
       +
           `P "By default, all checks are performed. Use flags to select specific checks.";

     

       173
       173
       +
           `S Manpage.s_options;

     

       174
       174
       +
           `S Manpage.s_examples;

     

       175
       175
       +
           `P "Check for all issues:";

     

       176
       176
       +
           `Pre "  $(mname) $(tname)";

     

       177
       177
       +
           `P "Check only for missing thumbnails:";

     

       178
       178
       +
           `Pre "  $(mname) $(tname) --thumbnails";

     

       179
       179
       +
           `P "Check for missing synopsis and tags:";

     

       180
       180
       +
           `Pre "  $(mname) $(tname) --synopsis --tags";

     

       181
       181
       +
           `P "Check only for broken references:";

     

       182
       182
       +
           `Pre "  $(mname) $(tname) --refs";

     

       183
       183
       +
         ] in

     

       184
       184
       +
         let info = Cmd.info "missing" ~doc ~man in

     

       185
       185
       +
         Cmd.v info term

+131

stack/bushel/bin/bushel_note_doi.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Bushel

     

       3
       3
       +
       

     

       4
       4
       +
       (** Generate a roguedoi identifier using Crockford base32 encoding *)

     

       5
       5
       +
       let generate_roguedoi () =

     

       6
       6
       +
         Random.self_init ();

     

       7
       7
       +
         (* Generate a 10-character roguedoi with checksum and split every 5 chars *)

     

       8
       8
       +
         let id = Crockford.generate ~length:10 ~split_every:5 ~checksum:true () in

     

       9
       9
       +
         Printf.sprintf "10.59999/%s" id

     

       10
       10
       +
       

     

       11
       11
       +
       (** Add DOI to a specific note's frontmatter if it doesn't already have one *)

     

       12
       12
       +
       let add_doi_to_note note_path =

     

       13
       13
       +
         let content = In_channel.with_open_bin note_path In_channel.input_all in

     

       14
       14
       +
         (* Check if note already has a doi: field *)

     

       15
       15
       +
         let has_doi = try

     

       16
       16
       +
           let _ = String.index content 'd' in

     

       17
       17
       +
           let re = Str.regexp "^doi:" in

     

       18
       18
       +
           let lines = String.split_on_char '\n' content in

     

       19
       19
       +
           List.exists (fun line -> Str.string_match re (String.trim line) 0) lines

     

       20
       20
       +
         with Not_found -> false

     

       21
       21
       +
         in

     

       22
       22
       +
         if has_doi then begin

     

       23
       23
       +
           Fmt.pr "%a: Note already has a DOI, skipping@."

     

       24
       24
       +
             (Fmt.styled `Yellow Fmt.string) note_path;

     

       25
       25
       +
           false

     

       26
       26
       +
         end else begin

     

       27
       27
       +
           let roguedoi = generate_roguedoi () in

     

       28
       28
       +
           (* Parse the file to extract frontmatter *)

     

       29
       29
       +
           match String.split_on_char '\n' content with

     

       30
       30
       +
           | "---" :: rest ->

     

       31
       31
       +
               (* Find the end of frontmatter *)

     

       32
       32
       +
               let rec find_end_fm acc = function

     

       33
       33
       +
                 | [] -> None

     

       34
       34
       +
                 | "---" :: body_lines -> Some (List.rev acc, body_lines)

     

       35
       35
       +
                 | line :: lines -> find_end_fm (line :: acc) lines

     

       36
       36
       +
               in

     

       37
       37
       +
               (match find_end_fm [] rest with

     

       38
       38
       +
                | Some (fm_lines, body_lines) ->

     

       39
       39
       +
                    (* Add doi field to frontmatter *)

     

       40
       40
       +
                    let new_fm = fm_lines @ [Printf.sprintf "doi: %s" roguedoi] in

     

       41
       41
       +
                    let new_content =

     

       42
       42
       +
                      String.concat "\n" (["---"] @ new_fm @ ["---"] @ body_lines)

     

       43
       43
       +
                    in

     

       44
       44
       +
                    Out_channel.with_open_bin note_path (fun oc ->

     

       45
       45
       +
                      Out_channel.output_string oc new_content

     

       46
       46
       +
                    );

     

       47
       47
       +
                    Fmt.pr "%a: Added DOI %a@."

     

       48
       48
       +
                      (Fmt.styled `Green Fmt.string) note_path

     

       49
       49
       +
                      (Fmt.styled `Cyan Fmt.string) roguedoi;

     

       50
       50
       +
                    true

     

       51
       51
       +
                | None ->

     

       52
       52
       +
                    Fmt.epr "%a: Could not parse frontmatter@."

     

       53
       53
       +
                      (Fmt.styled `Red Fmt.string) note_path;

     

       54
       54
       +
                    false)

     

       55
       55
       +
           | _ ->

     

       56
       56
       +
               Fmt.epr "%a: No frontmatter found@."

     

       57
       57
       +
                 (Fmt.styled `Red Fmt.string) note_path;

     

       58
       58
       +
               false

     

       59
       59
       +
         end

     

       60
       60
       +
       

     

       61
       61
       +
       (** Main command implementation *)

     

       62
       62
       +
       let note_doi_cmd () base_dir dry_run =

     

       63
       63
       +
         let entries = load base_dir in

     

       64
       64
       +
         let notes = Entry.notes entries in

     

       65
       65
       +
       

     

       66
       66
       +
         (* Filter for perma notes without DOI *)

     

       67
       67
       +
         let perma_notes = List.filter (fun n ->

     

       68
       68
       +
           Note.perma n && Option.is_none (Note.doi n)

     

       69
       69
       +
         ) notes in

     

       70
       70
       +
       

     

       71
       71
       +
         if perma_notes = [] then begin

     

       72
       72
       +
           Fmt.pr "No permanent notes without DOI found.@.";

     

       73
       73
       +
           0

     

       74
       74
       +
         end else begin

     

       75
       75
       +
           Fmt.pr "@[<v>";

     

       76
       76
       +
           Fmt.pr "%a: Found %d permanent notes without DOI@.@."

     

       77
       77
       +
             (Fmt.styled `Bold Fmt.string) "Info"

     

       78
       78
       +
             (List.length perma_notes);

     

       79
       79
       +
       

     

       80
       80
       +
           let count = ref 0 in

     

       81
       81
       +
           List.iter (fun note ->

     

       82
       82
       +
             let slug = Note.slug note in

     

       83
       83
       +
             let note_path = Printf.sprintf "%s/data/notes/%s.md" base_dir slug in

     

       84
       84
       +
             Fmt.pr "Processing %a (%a)...@,"

     

       85
       85
       +
               (Fmt.styled `Cyan Fmt.string) slug

     

       86
       86
       +
               (Fmt.styled `Faint Fmt.string) (Note.title note);

     

       87
       87
       +
       

     

       88
       88
       +
             if not dry_run then begin

     

       89
       89
       +
               if add_doi_to_note note_path then

     

       90
       90
       +
                 incr count

     

       91
       91
       +
             end else begin

     

       92
       92
       +
               let roguedoi = generate_roguedoi () in

     

       93
       93
       +
               Fmt.pr "  Would add DOI: %a@,"

     

       94
       94
       +
                 (Fmt.styled `Cyan Fmt.string) roguedoi;

     

       95
       95
       +
               incr count

     

       96
       96
       +
             end

     

       97
       97
       +
           ) perma_notes;

     

       98
       98
       +
       

     

       99
       99
       +
           Fmt.pr "@.";

     

       100
       100
       +
           if dry_run then

     

       101
       101
       +
             Fmt.pr "%a: Would add DOI to %d notes (dry run)@."

     

       102
       102
       +
               (Fmt.styled `Bold Fmt.string) "Summary"

     

       103
       103
       +
               !count

     

       104
       104
       +
           else

     

       105
       105
       +
             Fmt.pr "%a: Added DOI to %d notes@."

     

       106
       106
       +
               (Fmt.styled `Bold Fmt.string) "Summary"

     

       107
       107
       +
               !count;

     

       108
       108
       +
           Fmt.pr "@]@.";

     

       109
       109
       +
           0

     

       110
       110
       +
         end

     

       111
       111
       +
       

     

       112
       112
       +
       (** Command line interface definition *)

     

       113
       113
       +
       let dry_run_flag =

     

       114
       114
       +
         let doc = "Show what would be done without making changes" in

     

       115
       115
       +
         Arg.(value & flag & info ["n"; "dry-run"] ~doc)

     

       116
       116
       +
       

     

       117
       117
       +
       let term =

     

       118
       118
       +
         Term.(const note_doi_cmd $ Bushel_common.setup_term $ Bushel_common.base_dir $ dry_run_flag)

     

       119
       119
       +
       

     

       120
       120
       +
       let cmd =

     

       121
       121
       +
         let doc = "Generate and add DOI identifiers to permanent notes" in

     

       122
       122
       +
         let man = [

     

       123
       123
       +
           `S Manpage.s_description;

     

       124
       124
       +
           `P "This command generates roguedoi identifiers using Crockford base32 encoding \

     

       125
       125
       +
               and adds them to the frontmatter of permanent notes (notes with perma: true) \

     

       126
       126
       +
               that don't already have a DOI.";

     

       127
       127
       +
           `P "Roguedoi format: 10.59999/xxxxx-xxxxx where x is a Crockford base32 character.";

     

       128
       128
       +
           `S Manpage.s_options;

     

       129
       129
       +
         ] in

     

       130
       130
       +
         let info = Cmd.info "note-doi" ~doc ~man in

     

       131
       131
       +
         Cmd.v info term

+88

stack/bushel/bin/bushel_obsidian.ml

···

       1
       1
       +
       open Bushel

     

       2
       2
       +
       

     

       3
       3
       +
       let obsidian_links =

     

       4
       4
       +
         let inline c = function

     

       5
       5
       +
           | Md.Obsidian_link l ->

     

       6
       6
       +
             Cmarkit_renderer.Context.string c l;

     

       7
       7
       +
             true

     

       8
       8
       +
           | _ -> false

     

       9
       9
       +
         in

     

       10
       10
       +
         Cmarkit_renderer.make ~inline ()

     

       11
       11
       +
       ;;

     

       12
       12
       +
       

     

       13
       13
       +
       let obsidian_of_doc doc =

     

       14
       14
       +
         let default = Cmarkit_commonmark.renderer () in

     

       15
       15
       +
         let r = Cmarkit_renderer.compose default obsidian_links in

     

       16
       16
       +
         Cmarkit_renderer.doc_to_string r doc

     

       17
       17
       +
       ;;

     

       18
       18
       +
       

     

       19
       19
       +
       let md_to_obsidian entries md =

     

       20
       20
       +
         let open Cmarkit in

     

       21
       21
       +
         Doc.of_string ~strict:false ~resolver:Md.with_bushel_links md

     

       22
       22
       +
         |> Mapper.map_doc (Mapper.make ~inline:(Md.bushel_inline_mapper_to_obsidian entries) ())

     

       23
       23
       +
         |> obsidian_of_doc

     

       24
       24
       +
       ;;

     

       25
       25
       +
       

     

       26
       26
       +
       let obsidian_output base output_dir =

     

       27
       27
       +
         let e = load base in

     

       28
       28
       +
         let all = Entry.all_entries e @ Entry.all_papers e in

     

       29
       29
       +
         List.iter

     

       30
       30
       +
           (fun ent ->

     

       31
       31
       +
             let slug =

     

       32
       32
       +
               match ent with

     

       33
       33
       +
               | `Paper { Paper.latest; slug; ver; _ } when not latest ->

     

       34
       34
       +
                 Printf.sprintf "%s-%s" slug ver

     

       35
       35
       +
               | _ -> Entry.slug ent

     

       36
       36
       +
             in

     

       37
       37
       +
             let fname = Filename.concat output_dir (slug ^ ".md") in

     

       38
       38
       +
             let tags =

     

       39
       39
       +
               Tags.tags_of_ent e ent

     

       40
       40
       +
               |> List.filter_map (fun tag ->

     

       41
       41
       +
                 match tag with

     

       42
       42
       +
                 | `Slug _ -> None

     

       43
       43
       +
                 | `Set s -> Some (Printf.sprintf "\"#%s\"" s)

     

       44
       44
       +
                 | `Text s -> Some (Printf.sprintf "%s" s)

     

       45
       45
       +
                 | `Contact _ -> None

     

       46
       46
       +
                 | `Year y -> Some (Printf.sprintf "\"#y%d\"" y))

     

       47
       47
       +
               |> List.map (fun s -> "- " ^ s)

     

       48
       48
       +
               |> String.concat "\n"

     

       49
       49
       +
             in

     

       50
       50
       +
             let links =

     

       51
       51
       +
               Tags.tags_of_ent e ent

     

       52
       52
       +
               |> List.filter_map (fun tag ->

     

       53
       53
       +
                 match tag with

     

       54
       54
       +
                 | `Slug s when s <> slug -> Some (Printf.sprintf "- \"[[%s]]\"" s)

     

       55
       55
       +
                 | `Contact c -> Some (Printf.sprintf "- \"[[@%s]]\"" c)

     

       56
       56
       +
                 | _ -> None)

     

       57
       57
       +
               |> String.concat "\n"

     

       58
       58
       +
               |> function

     

       59
       59
       +
               | "" -> ""

     

       60
       60
       +
               | s -> "linklist:\n" ^ s ^ "\n"

     

       61
       61
       +
             in

     

       62
       62
       +
             let body = Entry.body ent |> md_to_obsidian e in

     

       63
       63
       +
             let buf = Printf.sprintf "---\ntags:\n%s\n%s---\n\n%s" tags links body in

     

       64
       64
       +
             Out_channel.with_open_bin fname (fun oc -> output_string oc buf))

     

       65
       65
       +
           all;

     

       66
       66
       +
         List.iter

     

       67
       67
       +
           (fun contact ->

     

       68
       68
       +
             let slug = Contact.handle contact in

     

       69
       69
       +
             let fname = Filename.concat output_dir ("@" ^ slug ^ ".md") in

     

       70
       70
       +
             let buf = String.concat "\n" (Contact.names contact) in

     

       71
       71
       +
             Out_channel.with_open_bin fname (fun oc -> output_string oc buf))

     

       72
       72
       +
           (Entry.contacts e)

     

       73
       73
       +
       ;;

     

       74
       74
       +
       

     

       75
       75
       +
       (* Export the term for use in main bushel.ml *)

     

       76
       76
       +
       let term =

     

       77
       77
       +
         Cmdliner.Term.(

     

       78
       78
       +
           const (fun base_dir output_dir -> obsidian_output base_dir output_dir; 0) $ 

     

       79
       79
       +
           Bushel_common.base_dir $ 

     

       80
       80
       +
           Bushel_common.output_dir ~default:"obsidian"

     

       81
       81
       +
         )

     

       82
       82
       +
       

     

       83
       83
       +
       let cmd =

     

       84
       84
       +
         let doc = "Generate Obsidian-compatible markdown files" in

     

       85
       85
       +
         let info = Cmdliner.Cmd.info "obsidian" ~doc in

     

       86
       86
       +
         Cmdliner.Cmd.v info term

     

       87
       87
       +
       

     

       88
       88
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+74

stack/bushel/bin/bushel_paper.ml

···

       1
       1
       +
       module ZT = Zotero_translation

     

       2
       2
       +
       open Lwt.Infix

     

       3
       3
       +
       open Printf

     

       4
       4
       +
       module J = Ezjsonm

     

       5
       5
       +
       open Cmdliner

     

       6
       6
       +
       

     

       7
       7
       +
       

     

       8
       8
       +
       let _authors b j =

     

       9
       9
       +
         let keys = J.get_dict j in

     

       10
       10
       +
         let authors = J.get_list J.get_string (List.assoc "author" keys) in

     

       11
       11
       +
         let a = 

     

       12
       12
       +
           List.fold_left (fun acc a ->

     

       13
       13
       +
               match Bushel.Entry.lookup_by_name b a with

     

       14
       14
       +
               | Some c -> `String ("@" ^ (Bushel.Contact.handle c)) :: acc

     

       15
       15
       +
               | None -> failwith (sprintf "author %s not found" a)

     

       16
       16
       +
             ) [] authors

     

       17
       17
       +
           in

     

       18
       18
       +
         J.update j ["author"] (Some (`A a))

     

       19
       19
       +
       

     

       20
       20
       +
       let of_doi zt ~base_dir ~slug ~version doi =

     

       21
       21
       +
         ZT.json_of_doi zt ~slug doi  >>= fun j ->

     

       22
       22
       +
         let papers_dir = Printf.sprintf "%s/papers/%s" base_dir slug in

     

       23
       23
       +
         (* Ensure papers directory exists *)

     

       24
       24
       +
         (try Unix.mkdir papers_dir 0o755 with Unix.Unix_error (Unix.EEXIST, _, _) -> ());

     

       25
       25
       +
         

     

       26
       26
       +
         (* Extract abstract from JSON data *)

     

       27
       27
       +
         let abstract = try

     

       28
       28
       +
           let keys = Ezjsonm.get_dict (j :> Ezjsonm.value) in

     

       29
       29
       +
           match List.assoc_opt "abstract" keys with

     

       30
       30
       +
           | Some abstract_json -> Some (Ezjsonm.get_string abstract_json)

     

       31
       31
       +
           | None -> None

     

       32
       32
       +
         with _ -> None in

     

       33
       33
       +
         

     

       34
       34
       +
         (* Remove abstract from frontmatter - it goes in body *)

     

       35
       35
       +
         let keys = Ezjsonm.get_dict (j :> Ezjsonm.value) in

     

       36
       36
       +
         let filtered_keys = List.filter (fun (k, _) -> k <> "abstract") keys in

     

       37
       37
       +
         let json_without_abstract = `O filtered_keys in

     

       38
       38
       +
         

     

       39
       39
       +
         (* Use library function to generate YAML with abstract in body *)

     

       40
       40
       +
         let content = Bushel.Paper.to_yaml ?abstract ~ver:version json_without_abstract in

     

       41
       41
       +
         

     

       42
       42
       +
         let filename = Printf.sprintf "%s.md" version in

     

       43
       43
       +
         let filepath = Filename.concat papers_dir filename in

     

       44
       44
       +
         let oc = open_out filepath in

     

       45
       45
       +
         output_string oc content;

     

       46
       46
       +
         close_out oc;

     

       47
       47
       +
         Printf.printf "Created paper file: %s\n" filepath;

     

       48
       48
       +
         Lwt.return ()

     

       49
       49
       +
       

     

       50
       50
       +
       let slug_arg =

     

       51
       51
       +
         let doc = "Slug for the entry." in

     

       52
       52
       +
         Arg.(required & pos 0 (some string) None & info [] ~docv:"SLUG" ~doc)

     

       53
       53
       +
       

     

       54
       54
       +
       let version_arg =

     

       55
       55
       +
         let doc = "Version of the entry." in

     

       56
       56
       +
         Arg.(required & pos 1 (some string) None & info [] ~docv:"VERSION" ~doc)

     

       57
       57
       +
       

     

       58
       58
       +
       let doi_arg =

     

       59
       59
       +
         let doc = "DOI of the entry." in

     

       60
       60
       +
         Arg.(required & pos 2 (some string) None & info [] ~docv:"DOI" ~doc)

     

       61
       61
       +
       

     

       62
       62
       +
       (* Export the term for use in main bushel.ml *)

     

       63
       63
       +
       let term =

     

       64
       64
       +
         Term.(const (fun base slug version doi ->

     

       65
       65
       +
           let zt = ZT.v "http://svr-avsm2-eeg-ce:1969" in

     

       66
       66
       +
           Lwt_main.run @@ of_doi zt ~base_dir:base ~slug ~version doi; 0

     

       67
       67
       +
         ) $ Bushel_common.base_dir $ slug_arg $ version_arg $ doi_arg)

     

       68
       68
       +
       

     

       69
       69
       +
       let cmd =

     

       70
       70
       +
         let doc = "Generate paper entry from DOI" in

     

       71
       71
       +
         let info = Cmd.info "paper" ~doc in

     

       72
       72
       +
         Cmd.v info term

     

       73
       73
       +
       

     

       74
       74
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+57

stack/bushel/bin/bushel_paper_classify.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       

     

       3
       3
       +
       (** TODO:claude Classify papers based on heuristics and update metadata *)

     

       4
       4
       +
       let classify_papers base_dir overwrite =

     

       5
       5
       +
         let papers_dir = Printf.sprintf "%s/papers" base_dir in

     

       6
       6
       +
         if not (Sys.file_exists papers_dir) then (

     

       7
       7
       +
           Printf.eprintf "Papers directory not found: %s\n" papers_dir;

     

       8
       8
       +
           1

     

       9
       9
       +
         ) else (

     

       10
       10
       +
           let paper_dirs = Sys.readdir papers_dir |> Array.to_list in

     

       11
       11
       +
           List.iter (fun paper_slug ->

     

       12
       12
       +
             let paper_path = Filename.concat papers_dir paper_slug in

     

       13
       13
       +
             if Sys.is_directory paper_path then (

     

       14
       14
       +
               let versions = Sys.readdir paper_path |> Array.to_list 

     

       15
       15
       +
                            |> List.filter (String.ends_with ~suffix:".md") in

     

       16
       16
       +
               List.iter (fun version_file ->

     

       17
       17
       +
                 let filepath = Filename.concat paper_path version_file in

     

       18
       18
       +
                 let version = Filename.remove_extension version_file in

     

       19
       19
       +
                 try

     

       20
       20
       +
                   let paper = Bushel.Paper.of_md ~slug:paper_slug ~ver:version filepath in

     

       21
       21
       +
                   let predicted_class = Bushel.Paper.classification paper in

     

       22
       22
       +
                   let class_str = Bushel.Paper.string_of_classification predicted_class in

     

       23
       23
       +
                   Printf.printf "%s/%s: %s\n" paper_slug version class_str;

     

       24
       24
       +
                   

     

       25
       25
       +
                   (* Update the file if overwrite is enabled *)

     

       26
       26
       +
                   if overwrite then (

     

       27
       27
       +
                     let json_data = Bushel.Paper.raw_json paper in

     

       28
       28
       +
                     let keys = Ezjsonm.get_dict json_data in

     

       29
       29
       +
                     let updated_keys = ("classification", `String class_str) :: 

     

       30
       30
       +
                                      (List.filter (fun (k, _) -> k <> "classification") keys) in

     

       31
       31
       +
                     let updated_json = `O updated_keys in

     

       32
       32
       +
                     let abstract = Some (Bushel.Paper.abstract paper) in

     

       33
       33
       +
                     let content = Bushel.Paper.to_yaml ?abstract ~ver:version updated_json in

     

       34
       34
       +
                     let oc = open_out filepath in

     

       35
       35
       +
                     output_string oc content;

     

       36
       36
       +
                     close_out oc;

     

       37
       37
       +
                     Printf.printf "  Updated %s\n" filepath

     

       38
       38
       +
                   )

     

       39
       39
       +
                 with e ->

     

       40
       40
       +
                   Printf.eprintf "Error processing %s: %s\n" filepath (Printexc.to_string e)

     

       41
       41
       +
               ) versions

     

       42
       42
       +
             )

     

       43
       43
       +
           ) paper_dirs;

     

       44
       44
       +
           0

     

       45
       45
       +
         )

     

       46
       46
       +
       

     

       47
       47
       +
       let overwrite_flag =

     

       48
       48
       +
         let doc = "Update paper files with classification metadata" in

     

       49
       49
       +
         Arg.(value & flag & info ["overwrite"] ~doc)

     

       50
       50
       +
       

     

       51
       51
       +
       let term =

     

       52
       52
       +
         Term.(const classify_papers $ Bushel_common.base_dir $ overwrite_flag)

     

       53
       53
       +
       

     

       54
       54
       +
       let cmd =

     

       55
       55
       +
         let doc = "Classify papers as full/short/preprint" in

     

       56
       56
       +
         let info = Cmd.info "paper-classify" ~doc in

     

       57
       57
       +
         Cmd.v info term

+325

stack/bushel/bin/bushel_paper_tex.ml

···

       1
       1
       +
       open Printf

     

       2
       2
       +
       open Cmdliner

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Format author name for LaTeX with initials and full last name *)

     

       5
       5
       +
       let format_author_name author =

     

       6
       6
       +
         (* Split author name and convert to "F.M.~Lastname" format *)

     

       7
       7
       +
         let parts = String.split_on_char ' ' author |> List.filter (fun s -> s <> "") in

     

       8
       8
       +
         match List.rev parts with

     

       9
       9
       +
         | [] -> ""

     

       10
       10
       +
         | lastname :: rest_rev ->

     

       11
       11
       +
           let firstname_parts = List.rev rest_rev in

     

       12
       12
       +
           let initials = List.map (fun name -> 

     

       13
       13
       +
             if String.length name > 0 then String.sub name 0 1 ^ "." else ""

     

       14
       14
       +
           ) firstname_parts in

     

       15
       15
       +
           let initials_str = String.concat "" initials in

     

       16
       16
       +
           if initials_str = "" then lastname

     

       17
       17
       +
           else initials_str ^ "~" ^ lastname

     

       18
       18
       +
       

     

       19
       19
       +
       (** TODO:claude Format author name for LaTeX with underline for target author *)

     

       20
       20
       +
       let format_author target_name author =

     

       21
       21
       +
         let formatted = format_author_name author in

     

       22
       22
       +
         (* Check if author contains target name substring for underlining *)

     

       23
       23
       +
         if String.lowercase_ascii author |> fun s -> 

     

       24
       24
       +
            Re.execp (Re.Perl.compile_pat ~opts:[`Caseless] target_name) s

     

       25
       25
       +
         then sprintf "\\underline{%s}" formatted

     

       26
       26
       +
         else formatted

     

       27
       27
       +
       

     

       28
       28
       +
       (** TODO:claude Format authors list for LaTeX *)

     

       29
       29
       +
       let format_authors target_name authors =

     

       30
       30
       +
         match authors with

     

       31
       31
       +
         | [] -> ""

     

       32
       32
       +
         | [single] -> format_author target_name single

     

       33
       33
       +
         | _ -> 

     

       34
       34
       +
           let formatted = List.map (format_author target_name) authors in

     

       35
       35
       +
           String.concat ", " formatted

     

       36
       36
       +
       

     

       37
       37
       +
       (** TODO:claude Escape special LaTeX characters *)

     

       38
       38
       +
       let escape_latex str =

     

       39
       39
       +
         let replacements = [

     

       40
       40
       +
           ("&", "\\&");

     

       41
       41
       +
           ("%", "\\%");

     

       42
       42
       +
           ("$", "\\$");

     

       43
       43
       +
           ("#", "\\#");

     

       44
       44
       +
           ("_", "\\_");

     

       45
       45
       +
           ("{", "\\{");

     

       46
       46
       +
           ("}", "\\}");

     

       47
       47
       +
           ("~", "\\textasciitilde{}");

     

       48
       48
       +
           ("^", "\\textasciicircum{}");

     

       49
       49
       +
         ] in

     

       50
       50
       +
         List.fold_left (fun s (from, to_) ->

     

       51
       51
       +
           Re.replace_string (Re.compile (Re.str from)) ~by:to_ s

     

       52
       52
       +
         ) str replacements

     

       53
       53
       +
       

     

       54
       54
       +
       (** TODO:claude Clean venue name by removing common prefixes and handling arXiv *)

     

       55
       55
       +
       let clean_venue_name venue =

     

       56
       56
       +
         (* Special handling for arXiv to avoid redundancy like "arXiv (arXiv:ID)" *)

     

       57
       57
       +
         let venue_lower = String.lowercase_ascii venue in

     

       58
       58
       +
         if Re.execp (Re.Perl.compile_pat ~opts:[`Caseless] "arxiv") venue_lower then

     

       59
       59
       +
           if String.contains venue ':' then

     

       60
       60
       +
             (* If it contains arXiv:ID format, just return the ID part *)

     

       61
       61
       +
             let parts = String.split_on_char ':' venue in

     

       62
       62
       +
             match parts with

     

       63
       63
       +
             | _ :: id :: _ -> String.trim id

     

       64
       64
       +
             | _ -> venue

     

       65
       65
       +
           else venue

     

       66
       66
       +
         else

     

       67
       67
       +
           let prefixes = [

     

       68
       68
       +
             "in proceedings of the ";

     

       69
       69
       +
             "proceedings of the ";

     

       70
       70
       +
             "in proceedings of ";

     

       71
       71
       +
             "proceedings of ";

     

       72
       72
       +
             "in the ";

     

       73
       73
       +
             "the ";

     

       74
       74
       +
           ] in

     

       75
       75
       +
           let rec remove_prefixes v = function

     

       76
       76
       +
             | [] -> v

     

       77
       77
       +
             | prefix :: rest ->

     

       78
       78
       +
               if String.length v >= String.length prefix && 

     

       79
       79
       +
                  String.sub (String.lowercase_ascii v) 0 (String.length prefix) = prefix

     

       80
       80
       +
               then String.sub v (String.length prefix) (String.length v - String.length prefix)

     

       81
       81
       +
               else remove_prefixes v rest

     

       82
       82
       +
           in

     

       83
       83
       +
           let cleaned = remove_prefixes venue prefixes in

     

       84
       84
       +
           (* Capitalize first letter *)

     

       85
       85
       +
           if String.length cleaned > 0 then

     

       86
       86
       +
             String.mapi (fun i c -> if i = 0 then Char.uppercase_ascii c else c) cleaned

     

       87
       87
       +
           else cleaned

     

       88
       88
       +
       

     

       89
       89
       +
       (** TODO:claude Format venue for LaTeX with volume/number details for full papers *)

     

       90
       90
       +
       let format_venue paper =

     

       91
       91
       +
         let open Bushel.Paper in

     

       92
       92
       +
         let classification = classification paper in

     

       93
       93
       +
         match bibtype paper with

     

       94
       94
       +
         | "article" -> 

     

       95
       95
       +
           let journal_name = try journal paper |> clean_venue_name |> escape_latex with _ -> "Journal" in

     

       96
       96
       +
           if classification = Full then (

     

       97
       97
       +
             let vol_info = 

     

       98
       98
       +
               let vol = volume paper in

     

       99
       99
       +
               let num = issue paper in

     

       100
       100
       +
               match vol, num with

     

       101
       101
       +
               | Some v, Some n -> sprintf ", %s(%s)" v n

     

       102
       102
       +
               | Some v, None -> sprintf ", vol. %s" v

     

       103
       103
       +
               | None, Some n -> sprintf ", no. %s" n

     

       104
       104
       +
               | None, None -> ""

     

       105
       105
       +
             in

     

       106
       106
       +
             sprintf "\\textit{%s%s}" journal_name vol_info

     

       107
       107
       +
           ) else 

     

       108
       108
       +
             sprintf "\\textit{%s}" journal_name

     

       109
       109
       +
         | "inproceedings" ->

     

       110
       110
       +
           let conf_name = try booktitle paper |> clean_venue_name |> escape_latex with _ -> "Conference" in

     

       111
       111
       +
           sprintf "\\textit{%s}" conf_name

     

       112
       112
       +
         | "techreport" ->

     

       113
       113
       +
           let inst = try institution paper |> escape_latex with _ -> "Institution" in

     

       114
       114
       +
           sprintf "\\textit{Technical Report, %s}" inst

     

       115
       115
       +
         | "phdthesis" ->

     

       116
       116
       +
           let school = try institution paper |> escape_latex with _ -> "University" in

     

       117
       117
       +
           sprintf "\\textit{PhD thesis, %s}" school

     

       118
       118
       +
         | "mastersthesis" ->

     

       119
       119
       +
           let school = try institution paper |> escape_latex with _ -> "University" in

     

       120
       120
       +
           sprintf "\\textit{Master's thesis, %s}" school

     

       121
       121
       +
         | "book" ->

     

       122
       122
       +
           let publisher_str = try Bushel.Paper.publisher paper |> escape_latex with _ -> "" in

     

       123
       123
       +
           let edition_str = try 

     

       124
       124
       +
             let json = Bushel.Paper.raw_json paper in

     

       125
       125
       +
             let keys = Ezjsonm.get_dict json in

     

       126
       126
       +
             List.assoc "edition" keys |> Ezjsonm.get_string |> escape_latex 

     

       127
       127
       +
           with _ -> "" in

     

       128
       128
       +
           let isbn_str = try Bushel.Paper.isbn paper |> escape_latex with _ -> "" in

     

       129
       129
       +
           let venue_info = 

     

       130
       130
       +
             let base = match publisher_str, edition_str with

     

       131
       131
       +
               | pub, ed when pub <> "" && ed <> "" -> sprintf "%s, %s edition" pub ed

     

       132
       132
       +
               | pub, _ when pub <> "" -> pub

     

       133
       133
       +
               | _, ed when ed <> "" -> sprintf "%s edition" ed

     

       134
       134
       +
               | _, _ -> "Book"

     

       135
       135
       +
             in

     

       136
       136
       +
             if isbn_str <> "" then

     

       137
       137
       +
               sprintf "%s, ISBN %s" base isbn_str

     

       138
       138
       +
             else

     

       139
       139
       +
               base

     

       140
       140
       +
           in

     

       141
       141
       +
           sprintf "\\textit{%s}" venue_info

     

       142
       142
       +
         | "misc" ->

     

       143
       143
       +
           (* Try to get meaningful venue info for misc entries *)

     

       144
       144
       +
           let journal_str = try Bushel.Paper.journal paper |> clean_venue_name |> escape_latex with _ -> "" in

     

       145
       145
       +
           let booktitle_str = try Bushel.Paper.booktitle paper |> clean_venue_name |> escape_latex with _ -> "" in

     

       146
       146
       +
           let publisher_str = try Bushel.Paper.publisher paper |> escape_latex with _ -> "" in

     

       147
       147
       +
           if journal_str <> "" then

     

       148
       148
       +
             sprintf "\\textit{%s}" journal_str

     

       149
       149
       +
           else if booktitle_str <> "" then

     

       150
       150
       +
             sprintf "\\textit{%s}" booktitle_str

     

       151
       151
       +
           else if publisher_str <> "" then

     

       152
       152
       +
             sprintf "\\textit{%s}" publisher_str  

     

       153
       153
       +
           else

     

       154
       154
       +
             sprintf "\\textit{Preprint}"

     

       155
       155
       +
         | "abstract" ->

     

       156
       156
       +
           (* Handle conference abstracts *)

     

       157
       157
       +
           let conf_name = try Bushel.Paper.booktitle paper |> clean_venue_name |> escape_latex with _ -> "" in

     

       158
       158
       +
           let journal_str = try Bushel.Paper.journal paper |> clean_venue_name |> escape_latex with _ -> "" in

     

       159
       159
       +
           if conf_name <> "" then

     

       160
       160
       +
             sprintf "\\textit{%s (Abstract)}" conf_name

     

       161
       161
       +
           else if journal_str <> "" then

     

       162
       162
       +
             sprintf "\\textit{%s (Abstract)}" journal_str

     

       163
       163
       +
           else

     

       164
       164
       +
             sprintf "\\textit{Conference Abstract}"

     

       165
       165
       +
         | _ -> 

     

       166
       166
       +
           (* Fallback for other types with special arXiv handling *)

     

       167
       167
       +
           let journal_str = try Bushel.Paper.journal paper with _ -> "" in

     

       168
       168
       +
           let publisher_str = try Bushel.Paper.publisher paper |> escape_latex with _ -> "" in

     

       169
       169
       +
           

     

       170
       170
       +
           (* Special handling for arXiv papers - skip venue, let note handle it *)

     

       171
       171
       +
           if String.lowercase_ascii journal_str = "arxiv" then

     

       172
       172
       +
             ""

     

       173
       173
       +
           else if journal_str <> "" then

     

       174
       174
       +
             sprintf "\\textit{%s}" (journal_str |> clean_venue_name |> escape_latex)

     

       175
       175
       +
           else if publisher_str <> "" then

     

       176
       176
       +
             sprintf "\\textit{%s}" publisher_str  

     

       177
       177
       +
           else

     

       178
       178
       +
             sprintf "\\textit{Preprint}"

     

       179
       179
       +
       

     

       180
       180
       +
       (** TODO:claude Generate LaTeX PubItem for a paper *)

     

       181
       181
       +
       let generate_latex_entry target_name paper =

     

       182
       182
       +
         let open Bushel.Paper in

     

       183
       183
       +
         let slug_str = slug paper in

     

       184
       184
       +
         let title_str = title paper |> escape_latex in

     

       185
       185
       +
         let authors_str = format_authors target_name (authors paper) in

     

       186
       186
       +
         let venue_str = format_venue paper in

     

       187
       187
       +
         let year_str = year paper |> string_of_int in

     

       188
       188
       +
         let month_str = 

     

       189
       189
       +
           let (_, m, _) = date paper in

     

       190
       190
       +
           sprintf "%02d" m

     

       191
       191
       +
         in

     

       192
       192
       +
         

     

       193
       193
       +
         (* Check if paper is in the future *)

     

       194
       194
       +
         let is_in_press =

     

       195
       195
       +
           let paper_time = datetime paper in

     

       196
       196
       +
           let now = Ptime_clock.now () in

     

       197
       197
       +
           Ptime.compare paper_time now > 0

     

       198
       198
       +
         in

     

       199
       199
       +
         

     

       200
       200
       +
         (* Add DOI or PDF link if available, but not for in-press papers unless they have explicit URL *)

     

       201
       201
       +
         let title_with_link = 

     

       202
       202
       +
           if is_in_press then

     

       203
       203
       +
             (* For in-press papers, only add link if there's an explicit URL field *)

     

       204
       204
       +
             match Bushel.Paper.url paper with

     

       205
       205
       +
             | Some u -> sprintf "\\href{%s}{%s}" u title_str

     

       206
       206
       +
             | None -> title_str  (* No link for in-press papers without explicit URL *)

     

       207
       207
       +
           else

     

       208
       208
       +
             (* For published papers, use DOI or URL or default PDF link *)

     

       209
       209
       +
             match Bushel.Paper.doi paper with

     

       210
       210
       +
             | Some doi -> sprintf "\\href{https://doi.org/%s}{%s}" doi title_str

     

       211
       211
       +
             | None -> 

     

       212
       212
       +
               (* Check if there's a URL, otherwise default to PDF link *)

     

       213
       213
       +
               let url = match Bushel.Paper.url paper with

     

       214
       214
       +
                 | Some u -> u

     

       215
       215
       +
                 | None -> sprintf "https://anil.recoil.org/papers/%s.pdf" slug_str

     

       216
       216
       +
               in

     

       217
       217
       +
               sprintf "\\href{%s}{%s}" url title_str

     

       218
       218
       +
         in

     

       219
       219
       +
         

     

       220
       220
       +
         (* Add "(in press)" if paper is in the future *)

     

       221
       221
       +
         let in_press_str = if is_in_press then " \\textit{(in press)}" else "" in

     

       222
       222
       +
         

     

       223
       223
       +
         (* Add note if present *)

     

       224
       224
       +
         let note_str = match Bushel.Paper.note paper with

     

       225
       225
       +
           | Some n -> sprintf " \\textit{(%s)}" (escape_latex n)

     

       226
       226
       +
           | None -> ""

     

       227
       227
       +
         in

     

       228
       228
       +
         

     

       229
       229
       +
         sprintf "\\BigGap\n\\PubItemLabeled{%s}\n{``%s,''\n%s,\n%s%s%s,\n\\DatestampYM{%s}{%s}.}\n"

     

       230
       230
       +
           slug_str title_with_link authors_str venue_str in_press_str note_str year_str month_str

     

       231
       231
       +
       

     

       232
       232
       +
       (** TODO:claude Generate LaTeX output files for papers *)

     

       233
       233
       +
       let generate_tex base_dir output_dir target_name =

     

       234
       234
       +
         try

     

       235
       235
       +
           let papers = Bushel.load_papers base_dir in

     

       236
       236
       +
           let latest_papers = List.filter (fun p -> p.Bushel.Paper.latest) papers in

     

       237
       237
       +
           

     

       238
       238
       +
           (* Extract selected papers first *)

     

       239
       239
       +
           let selected_papers = List.filter Bushel.Paper.selected latest_papers in

     

       240
       240
       +
           

     

       241
       241
       +
           (* Group remaining papers by classification, excluding selected ones *)

     

       242
       242
       +
           let non_selected_papers = List.filter (fun p -> not (Bushel.Paper.selected p)) latest_papers in

     

       243
       243
       +
           let full_papers = List.filter (fun p -> 

     

       244
       244
       +
             Bushel.Paper.classification p = Bushel.Paper.Full) non_selected_papers in

     

       245
       245
       +
           let short_papers = List.filter (fun p -> 

     

       246
       246
       +
             Bushel.Paper.classification p = Bushel.Paper.Short) non_selected_papers in

     

       247
       247
       +
           let preprint_papers = List.filter (fun p -> 

     

       248
       248
       +
             Bushel.Paper.classification p = Bushel.Paper.Preprint) non_selected_papers in

     

       249
       249
       +
           

     

       250
       250
       +
           (* Sort each group by date, newest first *)

     

       251
       251
       +
           let sorted_full = List.sort Bushel.Paper.compare full_papers in

     

       252
       252
       +
           let sorted_short = List.sort Bushel.Paper.compare short_papers in

     

       253
       253
       +
           let sorted_preprint = List.sort Bushel.Paper.compare preprint_papers in

     

       254
       254
       +
           let sorted_selected = List.sort Bushel.Paper.compare selected_papers in

     

       255
       255
       +
           

     

       256
       256
       +
           (* Ensure output directory exists *)

     

       257
       257
       +
           (try Unix.mkdir output_dir 0o755 with Unix.Unix_error (Unix.EEXIST, _, _) -> ());

     

       258
       258
       +
           

     

       259
       259
       +
           (* Write papers_full.tex *)

     

       260
       260
       +
           let oc_full = open_out (Filename.concat output_dir "papers_full.tex") in

     

       261
       261
       +
           List.iter (fun paper ->

     

       262
       262
       +
             let latex = generate_latex_entry target_name paper in

     

       263
       263
       +
             output_string oc_full latex;

     

       264
       264
       +
             output_char oc_full '\n'

     

       265
       265
       +
           ) sorted_full;

     

       266
       266
       +
           close_out oc_full;

     

       267
       267
       +
           Printf.printf "Generated %s/papers_full.tex with %d entries\n" output_dir (List.length sorted_full);

     

       268
       268
       +
           

     

       269
       269
       +
           (* Write papers_short.tex *)

     

       270
       270
       +
           let oc_short = open_out (Filename.concat output_dir "papers_short.tex") in

     

       271
       271
       +
           List.iter (fun paper ->

     

       272
       272
       +
             let latex = generate_latex_entry target_name paper in

     

       273
       273
       +
             output_string oc_short latex;

     

       274
       274
       +
             output_char oc_short '\n'

     

       275
       275
       +
           ) sorted_short;

     

       276
       276
       +
           close_out oc_short;

     

       277
       277
       +
           Printf.printf "Generated %s/papers_short.tex with %d entries\n" output_dir (List.length sorted_short);

     

       278
       278
       +
           

     

       279
       279
       +
           (* Write papers_preprint.tex *)

     

       280
       280
       +
           let oc_preprint = open_out (Filename.concat output_dir "papers_preprint.tex") in

     

       281
       281
       +
           List.iter (fun paper ->

     

       282
       282
       +
             let latex = generate_latex_entry target_name paper in

     

       283
       283
       +
             output_string oc_preprint latex;

     

       284
       284
       +
             output_char oc_preprint '\n'

     

       285
       285
       +
           ) sorted_preprint;

     

       286
       286
       +
           close_out oc_preprint;

     

       287
       287
       +
           Printf.printf "Generated %s/papers_preprint.tex with %d entries\n" output_dir (List.length sorted_preprint);

     

       288
       288
       +
           

     

       289
       289
       +
           (* Write papers_selected.tex *)

     

       290
       290
       +
           let oc_selected = open_out (Filename.concat output_dir "papers_selected.tex") in

     

       291
       291
       +
           List.iter (fun paper ->

     

       292
       292
       +
             let latex = generate_latex_entry target_name paper in

     

       293
       293
       +
             output_string oc_selected latex;

     

       294
       294
       +
             output_char oc_selected '\n'

     

       295
       295
       +
           ) sorted_selected;

     

       296
       296
       +
           close_out oc_selected;

     

       297
       297
       +
           Printf.printf "Generated %s/papers_selected.tex with %d entries\n" output_dir (List.length sorted_selected);

     

       298
       298
       +
           

     

       299
       299
       +
           (* Write paper_count.tex *)

     

       300
       300
       +
           let total_count = List.length latest_papers in

     

       301
       301
       +
           let oc_count = open_out (Filename.concat output_dir "paper_count.tex") in

     

       302
       302
       +
           output_string oc_count (sprintf "\\setcounter{pubcounter}{%d}\n" total_count);

     

       303
       303
       +
           close_out oc_count;

     

       304
       304
       +
           Printf.printf "Generated %s/paper_count.tex with total count: %d\n" output_dir total_count;

     

       305
       305
       +
           

     

       306
       306
       +
           0

     

       307
       307
       +
         with e ->

     

       308
       308
       +
           Printf.eprintf "Error loading papers: %s\n" (Printexc.to_string e);

     

       309
       309
       +
           1

     

       310
       310
       +
       

     

       311
       311
       +
       let output_dir_arg =

     

       312
       312
       +
         let doc = "Output directory for generated LaTeX files" in

     

       313
       313
       +
         Arg.(value & opt string "." & info ["output"; "o"] ~docv:"DIR" ~doc)

     

       314
       314
       +
       

     

       315
       315
       +
       let target_name_arg =

     

       316
       316
       +
         let doc = "Name to underline in author list (e.g., 'Madhavapeddy')" in

     

       317
       317
       +
         Arg.(value & opt string "Madhavapeddy" & info ["target"; "t"] ~docv:"NAME" ~doc)

     

       318
       318
       +
       

     

       319
       319
       +
       let term =

     

       320
       320
       +
         Term.(const generate_tex $ Bushel_common.base_dir $ output_dir_arg $ target_name_arg)

     

       321
       321
       +
       

     

       322
       322
       +
       let cmd =

     

       323
       323
       +
         let doc = "Generate LaTeX publication entries" in

     

       324
       324
       +
         let info = Cmd.info "paper-tex" ~doc in

     

       325
       325
       +
         Cmd.v info term

+69

stack/bushel/bin/bushel_search.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Lwt.Syntax

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Bushel search command for integration with main CLI *)

     

       5
       5
       +
       

     

       6
       6
       +
       let endpoint =

     

       7
       7
       +
         let doc = "Typesense server endpoint URL" in

     

       8
       8
       +
         Arg.(value & opt string "" & info ["endpoint"; "e"] ~doc)

     

       9
       9
       +
       

     

       10
       10
       +
       let api_key =

     

       11
       11
       +
         let doc = "Typesense API key for authentication" in

     

       12
       12
       +
         Arg.(value & opt string "" & info ["api-key"; "k"] ~doc)

     

       13
       13
       +
       

     

       14
       14
       +
       

     

       15
       15
       +
       let limit =

     

       16
       16
       +
         let doc = "Maximum number of results to return" in

     

       17
       17
       +
         Arg.(value & opt int 50 & info ["limit"; "l"] ~doc)

     

       18
       18
       +
       

     

       19
       19
       +
       let offset =

     

       20
       20
       +
         let doc = "Number of results to skip (for pagination)" in

     

       21
       21
       +
         Arg.(value & opt int 0 & info ["offset"; "o"] ~doc)

     

       22
       22
       +
       

     

       23
       23
       +
       let query_text =

     

       24
       24
       +
         let doc = "Search query text" in

     

       25
       25
       +
         Arg.(required & pos 0 (some string) None & info [] ~docv:"QUERY" ~doc)

     

       26
       26
       +
       

     

       27
       27
       +
       (** TODO:claude Search function using multisearch *)

     

       28
       28
       +
       let search endpoint api_key query_text limit offset =

     

       29
       29
       +
         let base_config = Bushel.Typesense.load_config_from_files () in

     

       30
       30
       +
         let config = { 

     

       31
       31
       +
           Bushel.Typesense.endpoint = if endpoint = "" then base_config.endpoint else endpoint;

     

       32
       32
       +
           api_key = if api_key = "" then base_config.api_key else api_key;

     

       33
       33
       +
           openai_key = base_config.openai_key;

     

       34
       34
       +
         } in

     

       35
       35
       +
         

     

       36
       36
       +
         if config.api_key = "" then (

     

       37
       37
       +
           Printf.eprintf "Error: API key is required. Use --api-key, set TYPESENSE_API_KEY environment variable, or create .typesense-key file.\n";

     

       38
       38
       +
           exit 1

     

       39
       39
       +
         );

     

       40
       40
       +
         

     

       41
       41
       +
         Printf.printf "Searching Typesense at %s\n" config.endpoint;

     

       42
       42
       +
         Printf.printf "Query: \"%s\"\n" query_text;

     

       43
       43
       +
         Printf.printf "Limit: %d, Offset: %d\n" limit offset;

     

       44
       44
       +
         Printf.printf "\n";

     

       45
       45
       +
         

     

       46
       46
       +
         Lwt_main.run (

     

       47
       47
       +
           Lwt.catch (fun () ->

     

       48
       48
       +
             let* result = Bushel.Typesense.multisearch config query_text ~limit:50 () in

     

       49
       49
       +
             match result with

     

       50
       50
       +
             | Ok multisearch_resp ->

     

       51
       51
       +
               let combined_response = Bushel.Typesense.combine_multisearch_results multisearch_resp ~limit ~offset () in

     

       52
       52
       +
               Printf.printf "Found %d results (%.2fms)\n\n" combined_response.total combined_response.query_time;

     

       53
       53
       +
               

     

       54
       54
       +
               List.iteri (fun i (hit : Bushel.Typesense.search_result) ->

     

       55
       55
       +
                 Printf.printf "%d. %s (score: %.2f)\n" (i + 1) (Bushel.Typesense.pp_search_result_oneline hit) hit.Bushel.Typesense.score

     

       56
       56
       +
               ) combined_response.hits;

     

       57
       57
       +
               Lwt.return_unit

     

       58
       58
       +
             | Error err ->

     

       59
       59
       +
               Format.eprintf "Search error: %a\n" Bushel.Typesense.pp_error err;

     

       60
       60
       +
               exit 1

     

       61
       61
       +
           ) (fun exn ->

     

       62
       62
       +
             Printf.eprintf "Error: %s\n" (Printexc.to_string exn);

     

       63
       63
       +
             exit 1

     

       64
       64
       +
           )

     

       65
       65
       +
         );

     

       66
       66
       +
         0

     

       67
       67
       +
       

     

       68
       68
       +
       (** TODO:claude Command line term *)

     

       69
       69
       +
       let term = Term.(const search $ endpoint $ api_key $ query_text $ limit $ offset)

+70

stack/bushel/bin/bushel_thumbs.ml

···

       1
       1
       +
       open Printf

     

       2
       2
       +
       open Cmdliner

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude

     

       5
       5
       +
           Helper module for ImageMagick operations *)

     

       6
       6
       +
       module Imagemagick = struct

     

       7
       7
       +
         (* Generate thumbnail from PDF *)

     

       8
       8
       +
         let generate_thumbnail ~pdf_path ~size ~output_path =

     

       9
       9
       +
           let cmd = 

     

       10
       10
       +
             sprintf "magick -density 600 -quality 100 %s[0] -gravity North -crop 100%%x50%%+0+0 -resize %s %s" 

     

       11
       11
       +
               pdf_path size output_path

     

       12
       12
       +
           in

     

       13
       13
       +
           eprintf "Running: %s\n%!" cmd;

     

       14
       14
       +
           Sys.command cmd

     

       15
       15
       +
       end

     

       16
       16
       +
       

     

       17
       17
       +
       (** TODO:claude

     

       18
       18
       +
           Process a single paper to generate its thumbnail *)

     

       19
       19
       +
       let process_paper base_dir output_dir paper =

     

       20
       20
       +
         let slug = Bushel.Paper.slug paper in

     

       21
       21
       +
         let pdf_path = sprintf "%s/static/papers/%s.pdf" base_dir slug in

     

       22
       22
       +
         let thumbnail_path = sprintf "%s/%s.png" output_dir slug in

     

       23
       23
       +
         

     

       24
       24
       +
         (* Skip if thumbnail already exists *)

     

       25
       25
       +
         if Sys.file_exists thumbnail_path then (

     

       26
       26
       +
           printf "Thumbnail already exists for %s, skipping\n%!" slug

     

       27
       27
       +
         ) else if Sys.file_exists pdf_path then (

     

       28
       28
       +
           try

     

       29
       29
       +
             let size = sprintf "2048x" in

     

       30
       30
       +
             printf "Generating high-res thumbnail for %s (size: %s)\n%!" slug size;

     

       31
       31
       +
             match Imagemagick.generate_thumbnail ~pdf_path ~size ~output_path:thumbnail_path with

     

       32
       32
       +
             | 0 -> printf "Successfully generated thumbnail for %s\n%!" slug

     

       33
       33
       +
             | n -> eprintf "Error generating thumbnail for %s (exit code: %d)\n%!" slug n

     

       34
       34
       +
           with

     

       35
       35
       +
           | e -> eprintf "Error processing paper %s: %s\n%!" slug (Printexc.to_string e)

     

       36
       36
       +
         ) else (

     

       37
       37
       +
           eprintf "PDF file not found for paper: %s\n%!" slug

     

       38
       38
       +
         )

     

       39
       39
       +
       

     

       40
       40
       +
       (** TODO:claude

     

       41
       41
       +
           Main function to process all papers in a directory *)

     

       42
       42
       +
       let process_papers base_dir output_dir =

     

       43
       43
       +
         (* Create output directory if it doesn't exist *)

     

       44
       44
       +
         if not (Sys.file_exists output_dir) then (

     

       45
       45
       +
           printf "Creating output directory: %s\n%!" output_dir;

     

       46
       46
       +
           Unix.mkdir output_dir 0o755

     

       47
       47
       +
         );

     

       48
       48
       +
         

     

       49
       49
       +
         (* Load Bushel entries and get papers *)

     

       50
       50
       +
         printf "Loading papers from %s\n%!" base_dir;

     

       51
       51
       +
         let e = Bushel.load base_dir in

     

       52
       52
       +
         let papers = Bushel.Entry.papers e in

     

       53
       53
       +
         

     

       54
       54
       +
         (* Process each paper *)

     

       55
       55
       +
         printf "Found %d papers\n%!" (List.length papers);

     

       56
       56
       +
         List.iter (process_paper base_dir output_dir) papers

     

       57
       57
       +
       

     

       58
       58
       +
       (* Command line arguments are now imported from Bushel_common *)

     

       59
       59
       +
       

     

       60
       60
       +
       (* Export the term for use in main bushel.ml *)

     

       61
       61
       +
       let term =

     

       62
       62
       +
         Term.(const (fun base_dir output_dir -> process_papers base_dir output_dir; 0) $ 

     

       63
       63
       +
           Bushel_common.base_dir $ Bushel_common.output_dir ~default:".")

     

       64
       64
       +
       

     

       65
       65
       +
       let cmd =

     

       66
       66
       +
         let doc = "Generate thumbnails for paper PDFs" in

     

       67
       67
       +
         let info = Cmd.info "thumbs" ~doc in

     

       68
       68
       +
         Cmd.v info term

     

       69
       69
       +
       

     

       70
       70
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+248

stack/bushel/bin/bushel_typesense.ml

···

       1
       1
       +
       open Cmdliner

     

       2
       2
       +
       open Lwt.Syntax

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Bushel Typesense binary with upload and query functionality *)

     

       5
       5
       +
       

     

       6
       6
       +
       let endpoint =

     

       7
       7
       +
         let doc = "Typesense server endpoint URL" in

     

       8
       8
       +
         Arg.(value & opt string "http://localhost:8108" & info ["endpoint"; "e"] ~doc)

     

       9
       9
       +
       

     

       10
       10
       +
       let api_key =

     

       11
       11
       +
         let doc = "Typesense API key for authentication" in

     

       12
       12
       +
         Arg.(value & opt string "" & info ["api-key"; "k"] ~doc)

     

       13
       13
       +
       

     

       14
       14
       +
       let openai_key =

     

       15
       15
       +
         let doc = "OpenAI API key for embeddings" in

     

       16
       16
       +
         Arg.(value & opt string "" & info ["openai-key"; "oa"] ~doc)

     

       17
       17
       +
       

     

       18
       18
       +
       let data_dir =

     

       19
       19
       +
         let doc = "Directory containing bushel data files" in

     

       20
       20
       +
         Arg.(value & opt string "." & info ["data-dir"; "d"] ~doc)

     

       21
       21
       +
       

     

       22
       22
       +
       (** TODO:claude Main upload function *)

     

       23
       23
       +
       let upload endpoint api_key openai_key data_dir =

     

       24
       24
       +
         if api_key = "" then (

     

       25
       25
       +
           Printf.eprintf "Error: API key is required. Use --api-key or set TYPESENSE_API_KEY environment variable.\n";

     

       26
       26
       +
           exit 1

     

       27
       27
       +
         );

     

       28
       28
       +
       

     

       29
       29
       +
         if openai_key = "" then (

     

       30
       30
       +
           Printf.eprintf "Error: OpenAI API key is required for embeddings. Use --openai-key or set OPENAI_API_KEY environment variable.\n";

     

       31
       31
       +
           exit 1

     

       32
       32
       +
         );

     

       33
       33
       +
       

     

       34
       34
       +
         let config = Bushel.Typesense.{ endpoint; api_key; openai_key } in

     

       35
       35
       +
       

     

       36
       36
       +
         Printf.printf "Loading bushel data from %s\n" data_dir;

     

       37
       37
       +
         let entries = Bushel.load data_dir in

     

       38
       38
       +
       

     

       39
       39
       +
         Printf.printf "Uploading bushel data to Typesense at %s\n" endpoint;

     

       40
       40
       +
       

     

       41
       41
       +
         Lwt_main.run (

     

       42
       42
       +
           Lwt.catch (fun () ->

     

       43
       43
       +
             Bushel.Typesense.upload_all config entries

     

       44
       44
       +
           ) (fun exn ->

     

       45
       45
       +
             Printf.eprintf "Error: %s\n" (Printexc.to_string exn);

     

       46
       46
       +
             exit 1

     

       47
       47
       +
           )

     

       48
       48
       +
         )

     

       49
       49
       +
       

     

       50
       50
       +
       

     

       51
       51
       +
       (** TODO:claude Query function *)

     

       52
       52
       +
       let query endpoint api_key query_text collection limit offset =

     

       53
       53
       +
         let base_config = Bushel.Typesense.load_config_from_files () in

     

       54
       54
       +
         let config = { 

     

       55
       55
       +
           Bushel.Typesense.endpoint = if endpoint = "" then base_config.endpoint else endpoint;

     

       56
       56
       +
           api_key = if api_key = "" then base_config.api_key else api_key;

     

       57
       57
       +
           openai_key = base_config.openai_key;

     

       58
       58
       +
         } in

     

       59
       59
       +
         

     

       60
       60
       +
         if config.api_key = "" then (

     

       61
       61
       +
           Printf.eprintf "Error: API key is required. Use --api-key or set TYPESENSE_API_KEY environment variable.\n";

     

       62
       62
       +
           exit 1

     

       63
       63
       +
         );

     

       64
       64
       +
         

     

       65
       65
       +
         Printf.printf "Searching Typesense at %s\n" config.endpoint;

     

       66
       66
       +
         Printf.printf "Query: \"%s\"\n" query_text;

     

       67
       67
       +
         if collection <> "" then Printf.printf "Collection: %s\n" collection;

     

       68
       68
       +
         Printf.printf "Limit: %d, Offset: %d\n" limit offset;

     

       69
       69
       +
         Printf.printf "\n";

     

       70
       70
       +
         

     

       71
       71
       +
         Lwt_main.run (

     

       72
       72
       +
           Lwt.catch (fun () ->

     

       73
       73
       +
             let search_fn = if collection = "" then 

     

       74
       74
       +
               Bushel.Typesense.search_all config query_text ~limit ~offset

     

       75
       75
       +
             else

     

       76
       76
       +
               Bushel.Typesense.search_collection config collection query_text ~limit ~offset

     

       77
       77
       +
             in

     

       78
       78
       +
             let* result = search_fn () in

     

       79
       79
       +
             match result with

     

       80
       80
       +
             | Ok response ->

     

       81
       81
       +
               Printf.printf "Found %d results (%.2fms)\n\n" response.total response.query_time;

     

       82
       82
       +
               List.iteri (fun i (hit : Bushel.Typesense.search_result) ->

     

       83
       83
       +
                 Printf.printf "%d. [%s] %s (score: %.2f)\n" (i + 1) hit.collection hit.title hit.score;

     

       84
       84
       +
                 if hit.content <> "" then Printf.printf "   %s\n" hit.content;

     

       85
       85
       +
                 if hit.highlights <> [] then (

     

       86
       86
       +
                   Printf.printf "   Highlights:\n";

     

       87
       87
       +
                   List.iter (fun (field, snippets) ->

     

       88
       88
       +
                     List.iter (fun snippet ->

     

       89
       89
       +
                       Printf.printf "     %s: %s\n" field snippet

     

       90
       90
       +
                     ) snippets

     

       91
       91
       +
                   ) hit.highlights

     

       92
       92
       +
                 );

     

       93
       93
       +
                 Printf.printf "\n"

     

       94
       94
       +
               ) response.hits;

     

       95
       95
       +
               Lwt.return_unit

     

       96
       96
       +
             | Error err ->

     

       97
       97
       +
               Format.eprintf "Search error: %a\n" Bushel.Typesense.pp_error err;

     

       98
       98
       +
               exit 1

     

       99
       99
       +
           ) (fun exn ->

     

       100
       100
       +
             Printf.eprintf "Error: %s\n" (Printexc.to_string exn);

     

       101
       101
       +
             exit 1

     

       102
       102
       +
           )

     

       103
       103
       +
         )

     

       104
       104
       +
       

     

       105
       105
       +
       (** TODO:claude List collections function *)

     

       106
       106
       +
       let list endpoint api_key =

     

       107
       107
       +
         let base_config = Bushel.Typesense.load_config_from_files () in

     

       108
       108
       +
         let config = { 

     

       109
       109
       +
           Bushel.Typesense.endpoint = if endpoint = "" then base_config.endpoint else endpoint;

     

       110
       110
       +
           api_key = if api_key = "" then base_config.api_key else api_key;

     

       111
       111
       +
           openai_key = base_config.openai_key;

     

       112
       112
       +
         } in

     

       113
       113
       +
         

     

       114
       114
       +
         if config.api_key = "" then (

     

       115
       115
       +
           Printf.eprintf "Error: API key is required. Use --api-key or set TYPESENSE_API_KEY environment variable.\n";

     

       116
       116
       +
           exit 1

     

       117
       117
       +
         );

     

       118
       118
       +
         

     

       119
       119
       +
         Printf.printf "Listing collections at %s\n\n" config.endpoint;

     

       120
       120
       +
         

     

       121
       121
       +
         Lwt_main.run (

     

       122
       122
       +
           Lwt.catch (fun () ->

     

       123
       123
       +
             let* result = Bushel.Typesense.list_collections config in

     

       124
       124
       +
             match result with

     

       125
       125
       +
             | Ok collections ->

     

       126
       126
       +
               Printf.printf "Collections:\n";

     

       127
       127
       +
               List.iter (fun (name, count) ->

     

       128
       128
       +
                 Printf.printf "  %s (%d documents)\n" name count

     

       129
       129
       +
               ) collections;

     

       130
       130
       +
               Lwt.return_unit

     

       131
       131
       +
             | Error err ->

     

       132
       132
       +
               Format.eprintf "List error: %a\n" Bushel.Typesense.pp_error err;

     

       133
       133
       +
               exit 1

     

       134
       134
       +
           ) (fun exn ->

     

       135
       135
       +
             Printf.eprintf "Error: %s\n" (Printexc.to_string exn);

     

       136
       136
       +
             exit 1

     

       137
       137
       +
           )

     

       138
       138
       +
         )

     

       139
       139
       +
       

     

       140
       140
       +
       (** TODO:claude Command line arguments for query *)

     

       141
       141
       +
       let query_text =

     

       142
       142
       +
         let doc = "Search query text" in

     

       143
       143
       +
         Arg.(required & pos 0 (some string) None & info [] ~docv:"QUERY" ~doc)

     

       144
       144
       +
       

     

       145
       145
       +
       let collection =

     

       146
       146
       +
         let doc = "Specific collection to search (contacts, papers, projects, news, videos, notes, ideas)" in

     

       147
       147
       +
         Arg.(value & opt string "" & info ["collection"; "c"] ~doc)

     

       148
       148
       +
       

     

       149
       149
       +
       let limit =

     

       150
       150
       +
         let doc = "Maximum number of results to return" in

     

       151
       151
       +
         Arg.(value & opt int 10 & info ["limit"; "l"] ~doc)

     

       152
       152
       +
       

     

       153
       153
       +
       let offset =

     

       154
       154
       +
         let doc = "Number of results to skip (for pagination)" in

     

       155
       155
       +
         Arg.(value & opt int 0 & info ["offset"; "o"] ~doc)

     

       156
       156
       +
       

     

       157
       157
       +
       (** TODO:claude Query command *)

     

       158
       158
       +
       let query_cmd =

     

       159
       159
       +
         let doc = "Search bushel collections in Typesense" in

     

       160
       160
       +
         let man = [

     

       161
       161
       +
           `S Manpage.s_description;

     

       162
       162
       +
           `P "Search across all or specific bushel collections in Typesense.";

     

       163
       163
       +
           `P "The API key can be provided via --api-key flag or TYPESENSE_API_KEY environment variable.";

     

       164
       164
       +
           `P "If .typesense-url and .typesense-api files exist, they will be used for configuration.";

     

       165
       165
       +
           `S Manpage.s_examples;

     

       166
       166
       +
           `P "Search all collections:";

     

       167
       167
       +
           `Pre "  bushel-typesense query \"machine learning\"";

     

       168
       168
       +
           `P "Search specific collection:";

     

       169
       169
       +
           `Pre "  bushel-typesense query \"OCaml\" --collection papers";

     

       170
       170
       +
           `P "Search with pagination:";

     

       171
       171
       +
           `Pre "  bushel-typesense query \"AI\" --limit 5 --offset 10";

     

       172
       172
       +
         ] in

     

       173
       173
       +
         let info = Cmd.info "query" ~doc ~man in

     

       174
       174
       +
         Cmd.v info Term.(const query $ endpoint $ api_key $ query_text $ collection $ limit $ offset)

     

       175
       175
       +
       

     

       176
       176
       +
       (** TODO:claude List command *)

     

       177
       177
       +
       let list_cmd =

     

       178
       178
       +
         let doc = "List all collections in Typesense" in

     

       179
       179
       +
         let man = [

     

       180
       180
       +
           `S Manpage.s_description;

     

       181
       181
       +
           `P "List all available collections and their document counts.";

     

       182
       182
       +
         ] in

     

       183
       183
       +
         let info = Cmd.info "list" ~doc ~man in

     

       184
       184
       +
         Cmd.v info Term.(const list $ endpoint $ api_key)

     

       185
       185
       +
       

     

       186
       186
       +
       (** TODO:claude Updated upload command *)

     

       187
       187
       +
       let upload_cmd =

     

       188
       188
       +
         let doc = "Upload bushel collections to Typesense search engine" in

     

       189
       189
       +
         let man = [

     

       190
       190
       +
           `S Manpage.s_description;

     

       191
       191
       +
           `P "Upload all bushel object types (contacts, papers, projects, news, videos, notes, ideas) to a Typesense search engine instance.";

     

       192
       192
       +
           `P "The API key can be provided via --api-key flag or TYPESENSE_API_KEY environment variable.";

     

       193
       193
       +
           `S Manpage.s_examples;

     

       194
       194
       +
           `P "Upload to local Typesense instance:";

     

       195
       195
       +
           `Pre "  bushel-typesense upload --api-key xyz123 --openai-key sk-abc... --data-dir /path/to/data";

     

       196
       196
       +
           `P "Upload to remote Typesense instance:";

     

       197
       197
       +
           `Pre "  bushel-typesense upload --endpoint https://search.example.com --api-key xyz123 --openai-key sk-abc...";

     

       198
       198
       +
         ] in

     

       199
       199
       +
         let info = Cmd.info "upload" ~doc ~man in

     

       200
       200
       +
         Cmd.v info Term.(const upload $ endpoint $ api_key $ openai_key $ data_dir)

     

       201
       201
       +
       

     

       202
       202
       +
       (** TODO:claude Main command group *)

     

       203
       203
       +
       let main_cmd =

     

       204
       204
       +
         let doc = "Bushel Typesense client" in

     

       205
       205
       +
         let man = [

     

       206
       206
       +
           `S Manpage.s_description;

     

       207
       207
       +
           `P "Client for uploading to and querying Bushel collections in Typesense search engine.";

     

       208
       208
       +
           `S Manpage.s_commands;

     

       209
       209
       +
           `S Manpage.s_common_options;

     

       210
       210
       +
         ] in

     

       211
       211
       +
         let info = Cmd.info "bushel-typesense" ~doc ~man in

     

       212
       212
       +
         Cmd.group info [upload_cmd; query_cmd; list_cmd]

     

       213
       213
       +
       

     

       214
       214
       +
       let () =

     

       215
       215
       +
         (* Check for API keys in environment if not provided *)

     

       216
       216
       +
         let api_key_env = try Some (Sys.getenv "TYPESENSE_API_KEY") with Not_found -> None in

     

       217
       217
       +
         let openai_key_env = try Some (Sys.getenv "OPENAI_API_KEY") with Not_found -> None in

     

       218
       218
       +
         match api_key_env with

     

       219
       219
       +
         | Some key when key <> "" ->

     

       220
       220
       +
           (* Override the api_key argument with environment variable *)

     

       221
       221
       +
           let api_key = Arg.(value & opt string key & info ["api-key"; "k"] ~doc:"Typesense API key") in

     

       222
       222
       +
           let openai_key = match openai_key_env with

     

       223
       223
       +
             | Some oa_key when oa_key <> "" -> Arg.(value & opt string oa_key & info ["openai-key"; "oa"] ~doc:"OpenAI API key")

     

       224
       224
       +
             | _ -> openai_key

     

       225
       225
       +
           in

     

       226
       226
       +
           let upload_cmd =

     

       227
       227
       +
             let doc = "Upload bushel collections to Typesense search engine" in

     

       228
       228
       +
             let info = Cmd.info "upload" ~doc in

     

       229
       229
       +
             Cmd.v info Term.(const upload $ endpoint $ api_key $ openai_key $ data_dir)

     

       230
       230
       +
           in

     

       231
       231
       +
           let query_cmd =

     

       232
       232
       +
             let doc = "Search bushel collections in Typesense" in

     

       233
       233
       +
             let info = Cmd.info "query" ~doc in

     

       234
       234
       +
             Cmd.v info Term.(const query $ endpoint $ api_key $ query_text $ collection $ limit $ offset)

     

       235
       235
       +
           in

     

       236
       236
       +
           let list_cmd =

     

       237
       237
       +
             let doc = "List all collections in Typesense" in

     

       238
       238
       +
             let info = Cmd.info "list" ~doc in

     

       239
       239
       +
             Cmd.v info Term.(const list $ endpoint $ api_key)

     

       240
       240
       +
           in

     

       241
       241
       +
           let main_cmd =

     

       242
       242
       +
             let doc = "Bushel Typesense client" in

     

       243
       243
       +
             let info = Cmd.info "bushel-typesense" ~doc in

     

       244
       244
       +
             Cmd.group info [upload_cmd; query_cmd; list_cmd]

     

       245
       245
       +
           in

     

       246
       246
       +
           exit (Cmd.eval main_cmd)

     

       247
       247
       +
         | _ ->

     

       248
       248
       +
           exit (Cmd.eval main_cmd)

+138

stack/bushel/bin/bushel_video.ml

···

       1
       1
       +
       [@@@warning "-26-27-32"]

     

       2
       2
       +
       

     

       3
       3
       +
       open Lwt.Infix

     

       4
       4
       +
       open Cmdliner

     

       5
       5
       +
       

     

       6
       6
       +
       let setup_log style_renderer level =

     

       7
       7
       +
         Fmt_tty.setup_std_outputs ?style_renderer ();

     

       8
       8
       +
         Logs.set_level level;

     

       9
       9
       +
         Logs.set_reporter (Logs_fmt.reporter ());

     

       10
       10
       +
         ()

     

       11
       11
       +
       

     

       12
       12
       +
       let process_videos output_dir overwrite base_url channel fetch_thumbs thumbs_dir =

     

       13
       13
       +
         Peertube.fetch_all_channel_videos base_url channel >>= fun all_videos ->

     

       14
       14
       +
         Logs.info (fun f -> f "Total videos: %d" (List.length all_videos));

     

       15
       15
       +
       

     

       16
       16
       +
         (* Create thumbnails directory if needed *)

     

       17
       17
       +
         (if fetch_thumbs && not (Sys.file_exists thumbs_dir) then

     

       18
       18
       +
           Unix.mkdir thumbs_dir 0o755);

     

       19
       19
       +
       

     

       20
       20
       +
         (* Process each video, fetching full details for complete descriptions *)

     

       21
       21
       +
         Lwt_list.map_s (fun video ->

     

       22
       22
       +
           (* Fetch complete video details to get full description *)

     

       23
       23
       +
           Peertube.fetch_video_details base_url video.Peertube.uuid >>= fun full_video ->

     

       24
       24
       +
           let (description, published_date, title, url, uuid, slug) =

     

       25
       25
       +
             Peertube.to_bushel_video full_video

     

       26
       26
       +
           in

     

       27
       27
       +
           Logs.info (fun f -> f "Title: %s, URL: %s" title url);

     

       28
       28
       +
       

     

       29
       29
       +
           (* Download thumbnail if requested *)

     

       30
       30
       +
           (if fetch_thumbs then

     

       31
       31
       +
             let thumb_path = Filename.concat thumbs_dir (uuid ^ ".jpg") in

     

       32
       32
       +
             Peertube.download_thumbnail base_url full_video thumb_path >>= fun result ->

     

       33
       33
       +
             match result with

     

       34
       34
       +
             | Ok () ->

     

       35
       35
       +
                 Logs.info (fun f -> f "Downloaded thumbnail for %s to %s" title thumb_path);

     

       36
       36
       +
                 Lwt.return_unit

     

       37
       37
       +
             | Error (`Msg e) ->

     

       38
       38
       +
                 Logs.warn (fun f -> f "Failed to download thumbnail for %s: %s" title e);

     

       39
       39
       +
                 Lwt.return_unit

     

       40
       40
       +
           else

     

       41
       41
       +
             Lwt.return_unit) >>= fun () ->

     

       42
       42
       +
       

     

       43
       43
       +
           Lwt.return {Bushel.Video.description; published_date; title; url; uuid; slug;

     

       44
       44
       +
                       talk=false; paper=None; project=None; tags=full_video.tags}

     

       45
       45
       +
         ) all_videos >>= fun vids ->

     

       46
       46
       +
         

     

       47
       47
       +
         (* Write video files *)

     

       48
       48
       +
         Lwt_list.iter_s (fun video ->

     

       49
       49
       +
           let file_path = Filename.concat output_dir (video.Bushel.Video.uuid ^ ".md") in

     

       50
       50
       +
           let file_exists = Sys.file_exists file_path in

     

       51
       51
       +
           

     

       52
       52
       +
           if file_exists then

     

       53
       53
       +
             try

     

       54
       54
       +
               (* If file exists, load it to preserve specific fields *)

     

       55
       55
       +
               let existing_video = Bushel.Video.of_md file_path in

     

       56
       56
       +
               (* Create merged video with preserved fields *)

     

       57
       57
       +
               let merged_video = {

     

       58
       58
       +
                 video with

     

       59
       59
       +
                 tags = existing_video.tags;  (* Preserve existing tags *)

     

       60
       60
       +
                 paper = existing_video.paper;  (* Preserve paper field *)

     

       61
       61
       +
                 project = existing_video.project;  (* Preserve project field *)

     

       62
       62
       +
                 talk = existing_video.talk;  (* Preserve talk field *)

     

       63
       63
       +
               } in

     

       64
       64
       +
               

     

       65
       65
       +
               (* Write the merged video data *)

     

       66
       66
       +
               if overwrite then

     

       67
       67
       +
                 match Bushel.Video.to_file output_dir merged_video with

     

       68
       68
       +
                 | Ok () -> 

     

       69
       69
       +
                     Logs.info (fun f -> f "Updated video %s with preserved fields in %s" 

     

       70
       70
       +
                       merged_video.Bushel.Video.title file_path);

     

       71
       71
       +
                     Lwt.return_unit

     

       72
       72
       +
                 | Error (`Msg e) -> 

     

       73
       73
       +
                     Logs.err (fun f -> f "Failed to update video %s: %s" 

     

       74
       74
       +
                       merged_video.Bushel.Video.title e);

     

       75
       75
       +
                     Lwt.return_unit

     

       76
       76
       +
               else begin

     

       77
       77
       +
                 Logs.info (fun f -> f "Skipping existing video %s (use --overwrite to replace)" 

     

       78
       78
       +
                   video.Bushel.Video.title);

     

       79
       79
       +
                 Lwt.return_unit

     

       80
       80
       +
               end

     

       81
       81
       +
             with _ ->

     

       82
       82
       +
               (* If reading existing file fails, proceed with new data *)

     

       83
       83
       +
               if overwrite then

     

       84
       84
       +
                 match Bushel.Video.to_file output_dir video with

     

       85
       85
       +
                 | Ok () -> 

     

       86
       86
       +
                     Logs.info (fun f -> f "Wrote video %s to %s (existing file could not be read)" 

     

       87
       87
       +
                       video.Bushel.Video.title file_path);

     

       88
       88
       +
                     Lwt.return_unit

     

       89
       89
       +
                 | Error (`Msg e) -> 

     

       90
       90
       +
                     Logs.err (fun f -> f "Failed to write video %s: %s" 

     

       91
       91
       +
                       video.Bushel.Video.title e);

     

       92
       92
       +
                     Lwt.return_unit

     

       93
       93
       +
               else begin

     

       94
       94
       +
                 Logs.info (fun f -> f "Skipping existing video %s (use --overwrite to replace)" 

     

       95
       95
       +
                   video.Bushel.Video.title);

     

       96
       96
       +
                 Lwt.return_unit

     

       97
       97
       +
               end

     

       98
       98
       +
           else

     

       99
       99
       +
             (* If file doesn't exist, just write new data *)

     

       100
       100
       +
             match Bushel.Video.to_file output_dir video with

     

       101
       101
       +
             | Ok () -> 

     

       102
       102
       +
                 Logs.info (fun f -> f "Wrote new video %s to %s" 

     

       103
       103
       +
                   video.Bushel.Video.title file_path);

     

       104
       104
       +
                 Lwt.return_unit

     

       105
       105
       +
             | Error (`Msg e) -> 

     

       106
       106
       +
                 Logs.err (fun f -> f "Failed to write video %s: %s" 

     

       107
       107
       +
                   video.Bushel.Video.title e);

     

       108
       108
       +
                 Lwt.return_unit

     

       109
       109
       +
         ) vids

     

       110
       110
       +
       

     

       111
       111
       +
       (* Command line arguments are now imported from Bushel_common *)

     

       112
       112
       +
       

     

       113
       113
       +
       (* Export the term for use in main bushel.ml *)

     

       114
       114
       +
       let term =

     

       115
       115
       +
         let fetch_thumbs =

     

       116
       116
       +
           let doc = "Download video thumbnails" in

     

       117
       117
       +
           Arg.(value & flag & info ["fetch-thumbs"] ~doc)

     

       118
       118
       +
         in

     

       119
       119
       +
         let thumbs_dir =

     

       120
       120
       +
           let doc = "Directory to save thumbnails (default: images/videos)" in

     

       121
       121
       +
           Arg.(value & opt string "images/videos" & info ["thumbs-dir"] ~docv:"DIR" ~doc)

     

       122
       122
       +
         in

     

       123
       123
       +
         Term.(const (fun output_dir overwrite base_url channel fetch_thumbs thumbs_dir () ->

     

       124
       124
       +
           Lwt_main.run (process_videos output_dir overwrite base_url channel fetch_thumbs thumbs_dir); 0)

     

       125
       125
       +
           $ Bushel_common.output_dir ~default:"." $

     

       126
       126
       +
           Bushel_common.overwrite $

     

       127
       127
       +
           Bushel_common.url_term ~default:"https://crank.recoil.org" ~doc:"PeerTube base URL" $

     

       128
       128
       +
           Bushel_common.channel ~default:"anil" $

     

       129
       129
       +
           fetch_thumbs $

     

       130
       130
       +
           thumbs_dir $

     

       131
       131
       +
           Bushel_common.setup_term)

     

       132
       132
       +
       

     

       133
       133
       +
       let cmd =

     

       134
       134
       +
         let doc = "Fetch and process videos from PeerTube" in

     

       135
       135
       +
         let info = Cmd.info "video" ~doc in

     

       136
       136
       +
         Cmd.v info term

     

       137
       137
       +
       

     

       138
       138
       +
       (* Main entry point removed - accessed through bushel_main.ml *)

+81

stack/bushel/bin/bushel_video_thumbs.ml

···

       1
       1
       +
       [@@@warning "-26-27-32"]

     

       2
       2
       +
       

     

       3
       3
       +
       open Lwt.Infix

     

       4
       4
       +
       open Cmdliner

     

       5
       5
       +
       

     

       6
       6
       +
       let setup_log style_renderer level =

     

       7
       7
       +
         Fmt_tty.setup_std_outputs ?style_renderer ();

     

       8
       8
       +
         Logs.set_level level;

     

       9
       9
       +
         Logs.set_reporter (Logs_fmt.reporter ());

     

       10
       10
       +
         ()

     

       11
       11
       +
       

     

       12
       12
       +
       let process_video_thumbs videos_dir thumbs_dir base_url =

     

       13
       13
       +
         (* Ensure thumbnail directory exists *)

     

       14
       14
       +
         (if not (Sys.file_exists thumbs_dir) then

     

       15
       15
       +
           Unix.mkdir thumbs_dir 0o755);

     

       16
       16
       +
       

     

       17
       17
       +
         (* Read all video markdown files *)

     

       18
       18
       +
         let video_files = Sys.readdir videos_dir

     

       19
       19
       +
           |> Array.to_list

     

       20
       20
       +
           |> List.filter (fun f -> Filename.check_suffix f ".md")

     

       21
       21
       +
           |> List.map (fun f -> Filename.concat videos_dir f)

     

       22
       22
       +
         in

     

       23
       23
       +
       

     

       24
       24
       +
         Logs.info (fun f -> f "Found %d video files to process" (List.length video_files));

     

       25
       25
       +
       

     

       26
       26
       +
         (* Process each video file *)

     

       27
       27
       +
         Lwt_list.iter_s (fun video_file ->

     

       28
       28
       +
           try

     

       29
       29
       +
             (* Load existing video *)

     

       30
       30
       +
             let video = Bushel.Video.of_md video_file in

     

       31
       31
       +
             let uuid = video.Bushel.Video.uuid in

     

       32
       32
       +
       

     

       33
       33
       +
             Logs.info (fun f -> f "Processing video: %s (UUID: %s)" video.title uuid);

     

       34
       34
       +
       

     

       35
       35
       +
             (* Fetch video details from PeerTube to get thumbnail info *)

     

       36
       36
       +
             Peertube.fetch_video_details base_url uuid >>= fun peertube_video ->

     

       37
       37
       +
       

     

       38
       38
       +
             (* Download thumbnail *)

     

       39
       39
       +
             let thumb_path = Filename.concat thumbs_dir (uuid ^ ".jpg") in

     

       40
       40
       +
             Peertube.download_thumbnail base_url peertube_video thumb_path >>= fun result ->

     

       41
       41
       +
       

     

       42
       42
       +
             match result with

     

       43
       43
       +
             | Ok () ->

     

       44
       44
       +
                 Logs.info (fun f -> f "Downloaded thumbnail for %s to %s" video.title thumb_path);

     

       45
       45
       +
       

     

       46
       46
       +
                 (* Update video file with thumbnail_url field *)

     

       47
       47
       +
                 (match Peertube.thumbnail_url base_url peertube_video with

     

       48
       48
       +
                  | Some url ->

     

       49
       49
       +
                      Logs.info (fun f -> f "Thumbnail URL: %s" url);

     

       50
       50
       +
                      Lwt.return_unit

     

       51
       51
       +
                  | None ->

     

       52
       52
       +
                      Logs.warn (fun f -> f "No thumbnail URL for video %s" video.title);

     

       53
       53
       +
                      Lwt.return_unit)

     

       54
       54
       +
             | Error (`Msg e) ->

     

       55
       55
       +
                 Logs.err (fun f -> f "Failed to download thumbnail for %s: %s" video.title e);

     

       56
       56
       +
                 Lwt.return_unit

     

       57
       57
       +
           with exn ->

     

       58
       58
       +
             Logs.err (fun f -> f "Error processing %s: %s" video_file (Printexc.to_string exn));

     

       59
       59
       +
             Lwt.return_unit

     

       60
       60
       +
         ) video_files

     

       61
       61
       +
       

     

       62
       62
       +
       let term =

     

       63
       63
       +
         let videos_dir =

     

       64
       64
       +
           let doc = "Directory containing video markdown files" in

     

       65
       65
       +
           Arg.(value & opt string "data/videos" & info ["videos-dir"; "d"] ~docv:"DIR" ~doc)

     

       66
       66
       +
         in

     

       67
       67
       +
         let thumbs_dir =

     

       68
       68
       +
           let doc = "Directory to save thumbnails" in

     

       69
       69
       +
           Arg.(value & opt string "images/videos" & info ["thumbs-dir"; "t"] ~docv:"DIR" ~doc)

     

       70
       70
       +
         in

     

       71
       71
       +
         Term.(const (fun videos_dir thumbs_dir base_url () ->

     

       72
       72
       +
           Lwt_main.run (process_video_thumbs videos_dir thumbs_dir base_url); 0)

     

       73
       73
       +
           $ videos_dir $

     

       74
       74
       +
           thumbs_dir $

     

       75
       75
       +
           Bushel_common.url_term ~default:"https://crank.recoil.org" ~doc:"PeerTube base URL" $

     

       76
       76
       +
           Bushel_common.setup_term)

     

       77
       77
       +
       

     

       78
       78
       +
       let cmd =

     

       79
       79
       +
         let doc = "Download thumbnails for existing videos and update metadata" in

     

       80
       80
       +
         let info = Cmd.info "video-thumbs" ~doc in

     

       81
       81
       +
         Cmd.v info term

+20

stack/bushel/bin/dune

···

       1
       1
       +
       (library

     

       2
       2
       +
        (name bushel_common)

     

       3
       3
       +
        (modules bushel_common)

     

       4
       4
       +
        (libraries cmdliner fmt fmt.cli fmt.tty logs logs.cli logs.fmt))

     

       5
       5
       +
       

     

       6
       6
       +
       (executable

     

       7
       7
       +
        (name bushel_main)

     

       8
       8
       +
        (public_name bushel)

     

       9
       9
       +
        (package bushel)

     

       10
       10
       +
        (modules bushel_main bushel_bibtex bushel_doi bushel_ideas bushel_info bushel_missing bushel_note_doi bushel_obsidian bushel_paper bushel_paper_classify bushel_paper_tex bushel_video bushel_video_thumbs bushel_thumbs bushel_faces bushel_links bushel_search)

     

       11
       11
       +
        (flags (:standard -w -69))

     

       12
       12
       +
        (libraries bushel bushel_common cmdliner cohttp-lwt-unix lwt.unix yaml ezjsonm zotero-translation peertube fmt fmt.cli fmt.tty logs logs.cli logs.fmt cmarkit karakeep uri unix ptime.clock.os crockford))

     

       13
       13
       +
       

     

       14
       14
       +
       (executable

     

       15
       15
       +
        (name bushel_typesense)

     

       16
       16
       +
        (public_name bushel-typesense)

     

       17
       17
       +
        (package bushel)

     

       18
       18
       +
        (modules bushel_typesense)

     

       19
       19
       +
        (flags (:standard -w -69))

     

       20
       20
       +
        (libraries bushel bushel_common cmdliner lwt.unix))

+47

stack/bushel/bushel.opam

···

       1
       1
       +
       # This file is generated by dune, edit dune-project instead

     

       2
       2
       +
       opam-version: "2.0"

     

       3
       3
       +
       synopsis: "A webring but not as oldskool"

     

       4
       4
       +
       description: "This is all still a work in progress"

     

       5
       5
       +
       maintainer: ["anil@recoil.org"]

     

       6
       6
       +
       authors: ["Anil Madhavapeddy"]

     

       7
       7
       +
       license: "ISC"

     

       8
       8
       +
       homepage: "https://github.com/avsm/bushel"

     

       9
       9
       +
       bug-reports: "https://github.com/avsm/bushel/issues"

     

       10
       10
       +
       depends: [

     

       11
       11
       +
         "dune" {>= "3.17"}

     

       12
       12
       +
         "ocaml" {>= "5.2.0"}

     

       13
       13
       +
         "uri"

     

       14
       14
       +
         "cmarkit"

     

       15
       15
       +
         "ezjsonm"

     

       16
       16
       +
         "ptime"

     

       17
       17
       +
         "jsont"

     

       18
       18
       +
         "bytesrw"

     

       19
       19
       +
         "jekyll-format"

     

       20
       20
       +
         "yaml"

     

       21
       21
       +
         "lwt"

     

       22
       22
       +
         "cohttp-lwt-unix"

     

       23
       23
       +
         "fmt"

     

       24
       24
       +
         "peertube"

     

       25
       25
       +
         "karakeep"

     

       26
       26
       +
         "typesense-client"

     

       27
       27
       +
         "cmdliner"

     

       28
       28
       +
         "odoc" {with-doc}

     

       29
       29
       +
       ]

     

       30
       30
       +
       build: [

     

       31
       31
       +
         ["dune" "subst"] {dev}

     

       32
       32
       +
         [

     

       33
       33
       +
           "dune"

     

       34
       34
       +
           "build"

     

       35
       35
       +
           "-p"

     

       36
       36
       +
           name

     

       37
       37
       +
           "-j"

     

       38
       38
       +
           jobs

     

       39
       39
       +
           "@install"

     

       40
       40
       +
           "@runtest" {with-test}

     

       41
       41
       +
           "@doc" {with-doc}

     

       42
       42
       +
         ]

     

       43
       43
       +
       ]

     

       44
       44
       +
       dev-repo: "git+https://github.com/avsm/bushel.git"

     

       45
       45
       +
       pin-depends: [

     

       46
       46
       +
        [ "zotero-translation.dev" "git+https://github.com/avsm/zotero-translation.git" ]

     

       47
       47
       +
       ]

stack/bushel/bushel.opam.template

···

       1
       1
       +
       pin-depends: [

     

       2
       2
       +
        [ "zotero-translation.dev" "git+https://github.com/avsm/zotero-translation.git" ]

     

       3
       3
       +
       ]

+68

stack/bushel/dune-project

···

       1
       1
       +
       (lang dune 3.17)

     

       2
       2
       +
       (name bushel)

     

       3
       3
       +
       

     

       4
       4
       +
       (source (github avsm/bushel))

     

       5
       5
       +
       (license ISC)

     

       6
       6
       +
       (authors "Anil Madhavapeddy")

     

       7
       7
       +
       (maintainers "anil@recoil.org")

     

       8
       8
       +
       

     

       9
       9
       +
       (generate_opam_files true)

     

       10
       10
       +
       

     

       11
       11
       +
       (package

     

       12
       12
       +
        (name bushel)

     

       13
       13
       +
        (synopsis "A webring but not as oldskool")

     

       14
       14
       +
        (description "This is all still a work in progress")

     

       15
       15
       +
        (depends

     

       16
       16
       +
         (ocaml (>= "5.2.0"))

     

       17
       17
       +
         uri

     

       18
       18
       +
         cmarkit

     

       19
       19
       +
         ezjsonm

     

       20
       20
       +
         ptime

     

       21
       21
       +
         jsont

     

       22
       22
       +
         bytesrw

     

       23
       23
       +
         jekyll-format

     

       24
       24
       +
         yaml

     

       25
       25
       +
         lwt

     

       26
       26
       +
         cohttp-lwt-unix

     

       27
       27
       +
         fmt

     

       28
       28
       +
         peertube

     

       29
       29
       +
         karakeep

     

       30
       30
       +
         typesense-client

     

       31
       31
       +
         cmdliner))

     

       32
       32
       +
       

     

       33
       33
       +
       (package

     

       34
       34
       +
        (name peertube)

     

       35
       35
       +
        (synopsis "PeerTube API client")

     

       36
       36
       +
        (description "Client for interacting with PeerTube instances")

     

       37
       37
       +
        (depends

     

       38
       38
       +
         (ocaml (>= "5.2.0"))

     

       39
       39
       +
         ezjsonm

     

       40
       40
       +
         lwt

     

       41
       41
       +
         cohttp-lwt-unix

     

       42
       42
       +
         ptime

     

       43
       43
       +
         fmt))

     

       44
       44
       +
       

     

       45
       45
       +
       (package

     

       46
       46
       +
        (name karakeep)

     

       47
       47
       +
        (synopsis "Karakeep API client for Bushel")

     

       48
       48
       +
        (description "Karakeep API client to retrieve bookmarks from Karakeep instances")

     

       49
       49
       +
        (depends

     

       50
       50
       +
         (ocaml (>= "5.2.0"))

     

       51
       51
       +
         ezjsonm

     

       52
       52
       +
         lwt

     

       53
       53
       +
         cohttp-lwt-unix

     

       54
       54
       +
         ptime

     

       55
       55
       +
         fmt))

     

       56
       56
       +
       

     

       57
       57
       +
       (package

     

       58
       58
       +
        (name typesense-client)

     

       59
       59
       +
        (synopsis "Standalone Typesense client for OCaml")

     

       60
       60
       +
        (description "A standalone Typesense client that can be compiled to JavaScript")

     

       61
       61
       +
        (depends

     

       62
       62
       +
         (ocaml (>= "5.2.0"))

     

       63
       63
       +
         ezjsonm

     

       64
       64
       +
         lwt

     

       65
       65
       +
         cohttp-lwt-unix

     

       66
       66
       +
         ptime

     

       67
       67
       +
         fmt

     

       68
       68
       +
         uri))

+35

stack/bushel/karakeep.opam

···

       1
       1
       +
       # This file is generated by dune, edit dune-project instead

     

       2
       2
       +
       opam-version: "2.0"

     

       3
       3
       +
       synopsis: "Karakeep API client for Bushel"

     

       4
       4
       +
       description:

     

       5
       5
       +
         "Karakeep API client to retrieve bookmarks from Karakeep instances"

     

       6
       6
       +
       maintainer: ["anil@recoil.org"]

     

       7
       7
       +
       authors: ["Anil Madhavapeddy"]

     

       8
       8
       +
       license: "ISC"

     

       9
       9
       +
       homepage: "https://github.com/avsm/bushel"

     

       10
       10
       +
       bug-reports: "https://github.com/avsm/bushel/issues"

     

       11
       11
       +
       depends: [

     

       12
       12
       +
         "dune" {>= "3.17"}

     

       13
       13
       +
         "ocaml" {>= "5.2.0"}

     

       14
       14
       +
         "ezjsonm"

     

       15
       15
       +
         "lwt"

     

       16
       16
       +
         "cohttp-lwt-unix"

     

       17
       17
       +
         "ptime"

     

       18
       18
       +
         "fmt"

     

       19
       19
       +
         "odoc" {with-doc}

     

       20
       20
       +
       ]

     

       21
       21
       +
       build: [

     

       22
       22
       +
         ["dune" "subst"] {dev}

     

       23
       23
       +
         [

     

       24
       24
       +
           "dune"

     

       25
       25
       +
           "build"

     

       26
       26
       +
           "-p"

     

       27
       27
       +
           name

     

       28
       28
       +
           "-j"

     

       29
       29
       +
           jobs

     

       30
       30
       +
           "@install"

     

       31
       31
       +
           "@runtest" {with-test}

     

       32
       32
       +
           "@doc" {with-doc}

     

       33
       33
       +
         ]

     

       34
       34
       +
       ]

     

       35
       35
       +
       dev-repo: "git+https://github.com/avsm/bushel.git"

stack/bushel/karakeep/dune

···

       1
       1
       +
       (library

     

       2
       2
       +
        (name karakeep) 

     

       3
       3
       +
        (public_name karakeep)

     

       4
       4
       +
        (libraries bushel lwt cohttp cohttp-lwt-unix ezjsonm fmt ptime))

+568

stack/bushel/karakeep/karakeep.ml

···

       1
       1
       +
       (** Karakeep API client implementation *)

     

       2
       2
       +
       

     

       3
       3
       +
       open Lwt.Infix

     

       4
       4
       +
       

     

       5
       5
       +
       module J = Ezjsonm

     

       6
       6
       +
       

     

       7
       7
       +
       (** Type representing a Karakeep bookmark *)

     

       8
       8
       +
       type bookmark = {

     

       9
       9
       +
         id: string;

     

       10
       10
       +
         title: string option;

     

       11
       11
       +
         url: string;

     

       12
       12
       +
         note: string option;

     

       13
       13
       +
         created_at: Ptime.t;

     

       14
       14
       +
         updated_at: Ptime.t option;

     

       15
       15
       +
         favourited: bool;

     

       16
       16
       +
         archived: bool;

     

       17
       17
       +
         tags: string list;

     

       18
       18
       +
         tagging_status: string option;

     

       19
       19
       +
         summary: string option;

     

       20
       20
       +
         content: (string * string) list;

     

       21
       21
       +
         assets: (string * string) list;

     

       22
       22
       +
       }

     

       23
       23
       +
       

     

       24
       24
       +
       (** Type for Karakeep API response containing bookmarks *)

     

       25
       25
       +
       type bookmark_response = {

     

       26
       26
       +
         total: int;

     

       27
       27
       +
         data: bookmark list;

     

       28
       28
       +
         next_cursor: string option;

     

       29
       29
       +
       }

     

       30
       30
       +
       

     

       31
       31
       +
       (** Parse a date string to Ptime.t, defaulting to epoch if invalid *)

     

       32
       32
       +
       let parse_date str =

     

       33
       33
       +
         match Ptime.of_rfc3339 str with

     

       34
       34
       +
         | Ok (date, _, _) -> date

     

       35
       35
       +
         | Error _ -> 

     

       36
       36
       +
             Fmt.epr "Warning: could not parse date '%s'\n" str;

     

       37
       37
       +
             (* Default to epoch time *)

     

       38
       38
       +
             let span_opt = Ptime.Span.of_d_ps (0, 0L) in

     

       39
       39
       +
             match span_opt with

     

       40
       40
       +
             | None -> failwith "Internal error: couldn't create epoch time span"

     

       41
       41
       +
             | Some span ->

     

       42
       42
       +
               match Ptime.of_span span with

     

       43
       43
       +
               | Some t -> t

     

       44
       44
       +
               | None -> failwith "Internal error: couldn't create epoch time"

     

       45
       45
       +
       

     

       46
       46
       +
       (** Extract a string field from JSON, returns None if not present or not a string *)

     

       47
       47
       +
       let get_string_opt json path =

     

       48
       48
       +
         try Some (J.find json path |> J.get_string)

     

       49
       49
       +
         with _ -> None

     

       50
       50
       +
       

     

       51
       51
       +
       (** Extract a string list field from JSON, returns empty list if not present *)

     

       52
       52
       +
       let get_string_list json path =

     

       53
       53
       +
         try 

     

       54
       54
       +
           let items_json = J.find json path in

     

       55
       55
       +
           J.get_list (fun tag -> J.find tag ["name"] |> J.get_string) items_json

     

       56
       56
       +
         with _ -> []

     

       57
       57
       +
       

     

       58
       58
       +
       (** Extract a boolean field from JSON, with default value *)

     

       59
       59
       +
       let get_bool_def json path default =

     

       60
       60
       +
         try J.find json path |> J.get_bool

     

       61
       61
       +
         with _ -> default

     

       62
       62
       +
       

     

       63
       63
       +
       (** Parse a single bookmark from Karakeep JSON *)

     

       64
       64
       +
       let parse_bookmark json =

     

       65
       65
       +
         (* Remove debug prints for production *)

     

       66
       66
       +
         (* Printf.eprintf "%s\n%!" (J.value_to_string json); *)

     

       67
       67
       +
         

     

       68
       68
       +
         let id = 

     

       69
       69
       +
           try J.find json ["id"] |> J.get_string 

     

       70
       70
       +
           with e -> 

     

       71
       71
       +
             prerr_endline (Fmt.str "Error parsing bookmark ID: %s" (Printexc.to_string e));

     

       72
       72
       +
             prerr_endline (Fmt.str "JSON: %s" (J.value_to_string json));

     

       73
       73
       +
             failwith "Unable to parse bookmark ID"

     

       74
       74
       +
         in

     

       75
       75
       +
         

     

       76
       76
       +
         (* Title can be null *)

     

       77
       77
       +
         let title = 

     

       78
       78
       +
           try Some (J.find json ["title"] |> J.get_string)

     

       79
       79
       +
           with _ -> None

     

       80
       80
       +
         in

     

       81
       81
       +
         (* Remove debug prints for production *)

     

       82
       82
       +
         (* Printf.eprintf "%s -> %s\n%!" id (match title with None -> "???"  | Some v -> v); *)

     

       83
       83
       +
         (* Get URL - try all possible locations *)

     

       84
       84
       +
         let url = 

     

       85
       85
       +
           try J.find json ["url"] |> J.get_string  (* Direct url field *)

     

       86
       86
       +
           with _ -> try

     

       87
       87
       +
             J.find json ["content"; "url"] |> J.get_string  (* Inside content.url *)

     

       88
       88
       +
           with _ -> try

     

       89
       89
       +
             J.find json ["content"; "sourceUrl"] |> J.get_string  (* Inside content.sourceUrl *)

     

       90
       90
       +
           with _ -> 

     

       91
       91
       +
             (* For assets/PDF type links *)

     

       92
       92
       +
             match J.find_opt json ["content"; "type"] with

     

       93
       93
       +
             | Some (`String "asset") ->

     

       94
       94
       +
                 (* Extract URL from sourceUrl in content *)

     

       95
       95
       +
                 (try J.find json ["content"; "sourceUrl"] |> J.get_string

     

       96
       96
       +
                  with _ -> 

     

       97
       97
       +
                    (match J.find_opt json ["id"] with

     

       98
       98
       +
                     | Some (`String id) -> "karakeep-asset://" ^ id

     

       99
       99
       +
                     | _ -> failwith "No URL or asset ID found in bookmark"))

     

       100
       100
       +
             | _ ->

     

       101
       101
       +
                 (* Debug output to understand what we're getting *)

     

       102
       102
       +
                 prerr_endline (Fmt.str "Bookmark JSON structure: %s" (J.value_to_string json));

     

       103
       103
       +
                 failwith "No URL found in bookmark"

     

       104
       104
       +
         in

     

       105
       105
       +
         

     

       106
       106
       +
         let note = get_string_opt json ["note"] in

     

       107
       107
       +
         

     

       108
       108
       +
         (* Parse dates *)

     

       109
       109
       +
         let created_at = 

     

       110
       110
       +
           try J.find json ["createdAt"] |> J.get_string |> parse_date 

     

       111
       111
       +
           with _ -> 

     

       112
       112
       +
             try J.find json ["created_at"] |> J.get_string |> parse_date

     

       113
       113
       +
             with _ -> failwith "No creation date found"

     

       114
       114
       +
         in

     

       115
       115
       +
         

     

       116
       116
       +
         let updated_at = 

     

       117
       117
       +
           try Some (J.find json ["updatedAt"] |> J.get_string |> parse_date)

     

       118
       118
       +
           with _ -> 

     

       119
       119
       +
             try Some (J.find json ["modifiedAt"] |> J.get_string |> parse_date)

     

       120
       120
       +
             with _ -> None

     

       121
       121
       +
         in

     

       122
       122
       +
         

     

       123
       123
       +
         let favourited = get_bool_def json ["favourited"] false in

     

       124
       124
       +
         let archived = get_bool_def json ["archived"] false in

     

       125
       125
       +
         let tags = get_string_list json ["tags"] in

     

       126
       126
       +
         

     

       127
       127
       +
         (* Extract additional metadata *)

     

       128
       128
       +
         let tagging_status = get_string_opt json ["taggingStatus"] in

     

       129
       129
       +
         let summary = get_string_opt json ["summary"] in

     

       130
       130
       +
         

     

       131
       131
       +
         (* Extract content details *)

     

       132
       132
       +
         let content =

     

       133
       133
       +
           try

     

       134
       134
       +
             let content_json = J.find json ["content"] in

     

       135
       135
       +
             let rec extract_fields acc = function

     

       136
       136
       +
               | [] -> acc

     

       137
       137
       +
               | (k, v) :: rest ->

     

       138
       138
       +
                   let value = match v with

     

       139
       139
       +
                     | `String s -> s

     

       140
       140
       +
                     | `Bool b -> string_of_bool b

     

       141
       141
       +
                     | `Float f -> string_of_float f

     

       142
       142
       +
                     | `Null -> "null"

     

       143
       143
       +
                     | _ -> "complex_value" (* For objects and arrays *)

     

       144
       144
       +
                   in

     

       145
       145
       +
                   extract_fields ((k, value) :: acc) rest

     

       146
       146
       +
             in

     

       147
       147
       +
             match content_json with

     

       148
       148
       +
             | `O fields -> extract_fields [] fields

     

       149
       149
       +
             | _ -> []

     

       150
       150
       +
           with _ -> []

     

       151
       151
       +
         in

     

       152
       152
       +
         

     

       153
       153
       +
         (* Extract assets *)

     

       154
       154
       +
         let assets =

     

       155
       155
       +
           try

     

       156
       156
       +
             let assets_json = J.find json ["assets"] in

     

       157
       157
       +
             J.get_list (fun asset_json ->

     

       158
       158
       +
               let id = J.find asset_json ["id"] |> J.get_string in

     

       159
       159
       +
               let asset_type = 

     

       160
       160
       +
                 try J.find asset_json ["assetType"] |> J.get_string

     

       161
       161
       +
                 with _ -> "unknown"

     

       162
       162
       +
               in

     

       163
       163
       +
               (id, asset_type)

     

       164
       164
       +
             ) assets_json

     

       165
       165
       +
           with _ -> []

     

       166
       166
       +
         in

     

       167
       167
       +
         

     

       168
       168
       +
         { id; title; url; note; created_at; updated_at; favourited; archived; tags; 

     

       169
       169
       +
           tagging_status; summary; content; assets }

     

       170
       170
       +
       

     

       171
       171
       +
       (** Parse a Karakeep bookmark response *)

     

       172
       172
       +
       let parse_bookmark_response json =

     

       173
       173
       +
         (* The response format is different based on endpoint, need to handle both structures *)

     

       174
       174
       +
         (* Print the whole JSON structure for debugging *)

     

       175
       175
       +
         prerr_endline (Fmt.str "Full response JSON: %s" (J.value_to_string json));

     

       176
       176
       +
         

     

       177
       177
       +
         try 

     

       178
       178
       +
           (* Standard list format with total count *)

     

       179
       179
       +
           let total = J.find json ["total"] |> J.get_int in

     

       180
       180
       +
           let bookmarks_json = J.find json ["data"] in

     

       181
       181
       +
           prerr_endline "Found bookmarks in data array";

     

       182
       182
       +
           let data = J.get_list parse_bookmark bookmarks_json in

     

       183
       183
       +
           

     

       184
       184
       +
           (* Try to extract nextCursor if available *)

     

       185
       185
       +
           let next_cursor = 

     

       186
       186
       +
             try Some (J.find json ["nextCursor"] |> J.get_string)

     

       187
       187
       +
             with _ -> None

     

       188
       188
       +
           in

     

       189
       189
       +
           

     

       190
       190
       +
           { total; data; next_cursor }

     

       191
       191
       +
         with e1 -> 

     

       192
       192
       +
           prerr_endline (Fmt.str "First format parse error: %s" (Printexc.to_string e1));

     

       193
       193
       +
           try

     

       194
       194
       +
             (* Format with bookmarks array *)

     

       195
       195
       +
             let bookmarks_json = J.find json ["bookmarks"] in

     

       196
       196
       +
             prerr_endline "Found bookmarks in bookmarks array";

     

       197
       197
       +
             let data = 

     

       198
       198
       +
               try J.get_list parse_bookmark bookmarks_json 

     

       199
       199
       +
               with e ->

     

       200
       200
       +
                 prerr_endline (Fmt.str "Error parsing bookmarks array: %s" (Printexc.to_string e));

     

       201
       201
       +
                 prerr_endline (Fmt.str "First bookmark sample: %s" 

     

       202
       202
       +
                   (try J.value_to_string (List.hd (J.get_list (fun x -> x) bookmarks_json))

     

       203
       203
       +
                    with _ -> "Could not extract sample"));

     

       204
       204
       +
                 []

     

       205
       205
       +
             in

     

       206
       206
       +
             

     

       207
       207
       +
             (* Try to extract nextCursor if available *)

     

       208
       208
       +
             let next_cursor = 

     

       209
       209
       +
               try Some (J.find json ["nextCursor"] |> J.get_string)

     

       210
       210
       +
               with _ -> None

     

       211
       211
       +
             in

     

       212
       212
       +
             

     

       213
       213
       +
             { total = List.length data; data; next_cursor }

     

       214
       214
       +
           with e2 ->

     

       215
       215
       +
             prerr_endline (Fmt.str "Second format parse error: %s" (Printexc.to_string e2));

     

       216
       216
       +
             try

     

       217
       217
       +
               (* Check if it's an error response *)

     

       218
       218
       +
               let error = J.find json ["error"] |> J.get_string in

     

       219
       219
       +
               let message = 

     

       220
       220
       +
                 try J.find json ["message"] |> J.get_string

     

       221
       221
       +
                 with _ -> "Unknown error"

     

       222
       222
       +
               in

     

       223
       223
       +
               prerr_endline (Fmt.str "API Error: %s - %s" error message);

     

       224
       224
       +
               { total = 0; data = []; next_cursor = None }

     

       225
       225
       +
             with _ ->

     

       226
       226
       +
               try

     

       227
       227
       +
                 (* Alternate format without total (for endpoints like /tags/<id>/bookmarks) *)

     

       228
       228
       +
                 prerr_endline "Trying alternate array format";

     

       229
       229
       +
                 

     

       230
       230
       +
                 (* Debug the structure to identify the format *)

     

       231
       231
       +
                 prerr_endline (Fmt.str "JSON structure keys: %s" 

     

       232
       232
       +
                   (match json with

     

       233
       233
       +
                    | `O fields -> 

     

       234
       234
       +
                        String.concat ", " (List.map (fun (k, _) -> k) fields)

     

       235
       235
       +
                    | _ -> "not an object"));

     

       236
       236
       +
                 

     

       237
       237
       +
                 (* Check if it has a nextCursor but bookmarks are nested differently *)

     

       238
       238
       +
                 if J.find_opt json ["nextCursor"] <> None then begin

     

       239
       239
       +
                   prerr_endline "Found nextCursor, checking alternate structures";

     

       240
       240
       +
                   

     

       241
       241
       +
                   (* Try different bookmark container paths *)

     

       242
       242
       +
                   let bookmarks_json =

     

       243
       243
       +
                     try Some (J.find json ["data"])

     

       244
       244
       +
                     with _ -> None

     

       245
       245
       +
                   in

     

       246
       246
       +
                   

     

       247
       247
       +
                   match bookmarks_json with

     

       248
       248
       +
                   | Some json_array ->

     

       249
       249
       +
                       prerr_endline "Found bookmarks in data field";

     

       250
       250
       +
                       begin try

     

       251
       251
       +
                         let data = J.get_list parse_bookmark json_array in

     

       252
       252
       +
                         let next_cursor = 

     

       253
       253
       +
                           try Some (J.find json ["nextCursor"] |> J.get_string)

     

       254
       254
       +
                           with _ -> None

     

       255
       255
       +
                         in

     

       256
       256
       +
                         { total = List.length data; data; next_cursor }

     

       257
       257
       +
                       with e ->

     

       258
       258
       +
                         prerr_endline (Fmt.str "Error parsing bookmarks from data: %s" (Printexc.to_string e));

     

       259
       259
       +
                         { total = 0; data = []; next_cursor = None }

     

       260
       260
       +
                       end

     

       261
       261
       +
                   | None ->

     

       262
       262
       +
                       prerr_endline "No bookmarks found in alternate structure";

     

       263
       263
       +
                       { total = 0; data = []; next_cursor = None }

     

       264
       264
       +
                 end

     

       265
       265
       +
                 else begin

     

       266
       266
       +
                   (* Check if it's an array at root level *)

     

       267
       267
       +
                   match json with

     

       268
       268
       +
                   | `A _ ->

     

       269
       269
       +
                       let data = 

     

       270
       270
       +
                         try J.get_list parse_bookmark json

     

       271
       271
       +
                         with e ->

     

       272
       272
       +
                           prerr_endline (Fmt.str "Error parsing root array: %s" (Printexc.to_string e));

     

       273
       273
       +
                           []

     

       274
       274
       +
                       in

     

       275
       275
       +
                       { total = List.length data; data; next_cursor = None }

     

       276
       276
       +
                   | _ -> 

     

       277
       277
       +
                       prerr_endline "Not an array at root level";

     

       278
       278
       +
                       { total = 0; data = []; next_cursor = None }

     

       279
       279
       +
                 end

     

       280
       280
       +
               with e3 ->

     

       281
       281
       +
                 prerr_endline (Fmt.str "Third format parse error: %s" (Printexc.to_string e3));

     

       282
       282
       +
                 { total = 0; data = []; next_cursor = None }

     

       283
       283
       +
       

     

       284
       284
       +
       (** Helper function to consume and return response body data *)

     

       285
       285
       +
       let consume_body body =

     

       286
       286
       +
         Cohttp_lwt.Body.to_string body >>= fun _ ->

     

       287
       287
       +
         Lwt.return_unit

     

       288
       288
       +
       

     

       289
       289
       +
       (** Fetch bookmarks from a Karakeep instance with pagination support *)

     

       290
       290
       +
       let fetch_bookmarks ~api_key ?(limit=50) ?(offset=0) ?cursor ?(include_content=false) ?filter_tags base_url =

     

       291
       291
       +
         let open Cohttp_lwt_unix in

     

       292
       292
       +
         

     

       293
       293
       +
         (* Base URL for bookmarks API *)

     

       294
       294
       +
         let url_base = Fmt.str "%s/api/v1/bookmarks?limit=%d&includeContent=%b" 

     

       295
       295
       +
                         base_url limit include_content in

     

       296
       296
       +
         

     

       297
       297
       +
         (* Add pagination parameter - either cursor or offset *)

     

       298
       298
       +
         let url = 

     

       299
       299
       +
           match cursor with

     

       300
       300
       +
           | Some cursor_value -> 

     

       301
       301
       +
               url_base ^ "&cursor=" ^ cursor_value

     

       302
       302
       +
           | None -> 

     

       303
       303
       +
               url_base ^ "&offset=" ^ string_of_int offset

     

       304
       304
       +
         in

     

       305
       305
       +
         

     

       306
       306
       +
         (* Add tags filter if provided *)

     

       307
       307
       +
         let url = match filter_tags with

     

       308
       308
       +
           | Some tags when tags <> [] -> 

     

       309
       309
       +
               (* URL encode each tag and join with commas *)

     

       310
       310
       +
               let encoded_tags = 

     

       311
       311
       +
                 List.map (fun tag -> 

     

       312
       312
       +
                   Uri.pct_encode ~component:`Query_key tag

     

       313
       313
       +
                 ) tags 

     

       314
       314
       +
               in

     

       315
       315
       +
               let tags_param = String.concat "," encoded_tags in

     

       316
       316
       +
               prerr_endline (Fmt.str "Adding tags filter: %s" tags_param);

     

       317
       317
       +
               url ^ "&tags=" ^ tags_param

     

       318
       318
       +
           | _ -> url

     

       319
       319
       +
         in

     

       320
       320
       +
         

     

       321
       321
       +
         (* Set up headers with API key *)

     

       322
       322
       +
         let headers = Cohttp.Header.init ()

     

       323
       323
       +
           |> fun h -> Cohttp.Header.add h "Authorization" ("Bearer " ^ api_key) in

     

       324
       324
       +
         

     

       325
       325
       +
         prerr_endline (Fmt.str "Fetching bookmarks from: %s" url);

     

       326
       326
       +
       

     

       327
       327
       +
         (* Make the request *)

     

       328
       328
       +
         Lwt.catch

     

       329
       329
       +
           (fun () ->

     

       330
       330
       +
             Client.get ~headers (Uri.of_string url) >>= fun (resp, body) ->

     

       331
       331
       +
             if resp.status = `OK then

     

       332
       332
       +
               Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       333
       333
       +
               prerr_endline (Fmt.str "Received %d bytes of response data" (String.length body_str));

     

       334
       334
       +
               

     

       335
       335
       +
               Lwt.catch

     

       336
       336
       +
                 (fun () ->

     

       337
       337
       +
                   let json = J.from_string body_str in

     

       338
       338
       +
                   Lwt.return (parse_bookmark_response json)

     

       339
       339
       +
                 )

     

       340
       340
       +
                 (fun e ->

     

       341
       341
       +
                   prerr_endline (Fmt.str "JSON parsing error: %s" (Printexc.to_string e));

     

       342
       342
       +
                   prerr_endline (Fmt.str "Response body (first 200 chars): %s" 

     

       343
       343
       +
                     (if String.length body_str > 200 then String.sub body_str 0 200 ^ "..." else body_str));

     

       344
       344
       +
                   Lwt.fail e

     

       345
       345
       +
                 )

     

       346
       346
       +
             else

     

       347
       347
       +
               let status_code = Cohttp.Code.code_of_status resp.status in

     

       348
       348
       +
               consume_body body >>= fun _ ->

     

       349
       349
       +
               prerr_endline (Fmt.str "HTTP error %d" status_code);

     

       350
       350
       +
               Lwt.fail_with (Fmt.str "HTTP error: %d" status_code)

     

       351
       351
       +
           )

     

       352
       352
       +
           (fun e ->

     

       353
       353
       +
             prerr_endline (Fmt.str "Network error: %s" (Printexc.to_string e));

     

       354
       354
       +
             Lwt.fail e

     

       355
       355
       +
           )

     

       356
       356
       +
       

     

       357
       357
       +
       (** Fetch all bookmarks from a Karakeep instance using pagination *)

     

       358
       358
       +
       let fetch_all_bookmarks ~api_key ?(page_size=50) ?max_pages ?filter_tags ?(include_content=false) base_url =

     

       359
       359
       +
         let rec fetch_pages page_num cursor acc _total_count =

     

       360
       360
       +
           (* Use cursor if available, otherwise use offset-based pagination *)

     

       361
       361
       +
           (match cursor with

     

       362
       362
       +
            | Some cursor_str -> fetch_bookmarks ~api_key ~limit:page_size ~cursor:cursor_str ~include_content ?filter_tags base_url 

     

       363
       363
       +
            | None -> fetch_bookmarks ~api_key ~limit:page_size ~offset:(page_num * page_size) ~include_content ?filter_tags base_url)

     

       364
       364
       +
           >>= fun response ->

     

       365
       365
       +
           

     

       366
       366
       +
           let all_bookmarks = acc @ response.data in

     

       367
       367
       +
           

     

       368
       368
       +
           (* Determine if we need to fetch more pages *)

     

       369
       369
       +
           let more_available = 

     

       370
       370
       +
             match response.next_cursor with

     

       371
       371
       +
             | Some _ -> true  (* We have a cursor, so there are more results *)

     

       372
       372
       +
             | None -> 

     

       373
       373
       +
                 (* Fall back to offset-based check *)

     

       374
       374
       +
                 let fetched_count = (page_num * page_size) + List.length response.data in

     

       375
       375
       +
                 fetched_count < response.total

     

       376
       376
       +
           in

     

       377
       377
       +
           

     

       378
       378
       +
           let under_max_pages = match max_pages with

     

       379
       379
       +
             | None -> true

     

       380
       380
       +
             | Some max -> page_num + 1 < max

     

       381
       381
       +
           in

     

       382
       382
       +
           

     

       383
       383
       +
           if more_available && under_max_pages then

     

       384
       384
       +
             fetch_pages (page_num + 1) response.next_cursor all_bookmarks response.total

     

       385
       385
       +
           else

     

       386
       386
       +
             Lwt.return all_bookmarks

     

       387
       387
       +
         in

     

       388
       388
       +
         fetch_pages 0 None [] 0

     

       389
       389
       +
       

     

       390
       390
       +
       (** Fetch detailed information for a single bookmark by ID *)

     

       391
       391
       +
       let fetch_bookmark_details ~api_key base_url bookmark_id =

     

       392
       392
       +
         let open Cohttp_lwt_unix in

     

       393
       393
       +
         let url = Fmt.str "%s/api/v1/bookmarks/%s" base_url bookmark_id in

     

       394
       394
       +
         

     

       395
       395
       +
         (* Set up headers with API key *)

     

       396
       396
       +
         let headers = Cohttp.Header.init ()

     

       397
       397
       +
           |> fun h -> Cohttp.Header.add h "Authorization" ("Bearer " ^ api_key) in

     

       398
       398
       +
           

     

       399
       399
       +
         Client.get ~headers (Uri.of_string url) >>= fun (resp, body) ->

     

       400
       400
       +
         if resp.status = `OK then

     

       401
       401
       +
           Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       402
       402
       +
           let json = J.from_string body_str in

     

       403
       403
       +
           Lwt.return (parse_bookmark json)

     

       404
       404
       +
         else

     

       405
       405
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       406
       406
       +
           consume_body body >>= fun () ->

     

       407
       407
       +
           Lwt.fail_with (Fmt.str "HTTP error: %d" status_code)

     

       408
       408
       +
       

     

       409
       409
       +
       (** Get the asset URL for a given asset ID *)

     

       410
       410
       +
       let get_asset_url base_url asset_id =

     

       411
       411
       +
         Fmt.str "%s/api/assets/%s" base_url asset_id

     

       412
       412
       +
       

     

       413
       413
       +
       (** Fetch an asset from the Karakeep server as a binary string *)

     

       414
       414
       +
       let fetch_asset ~api_key base_url asset_id =

     

       415
       415
       +
         let open Cohttp_lwt_unix in

     

       416
       416
       +
         

     

       417
       417
       +
         let url = get_asset_url base_url asset_id in

     

       418
       418
       +
         

     

       419
       419
       +
         (* Set up headers with API key *)

     

       420
       420
       +
         let headers = Cohttp.Header.init ()

     

       421
       421
       +
           |> fun h -> Cohttp.Header.add h "Authorization" ("Bearer " ^ api_key) in

     

       422
       422
       +
         

     

       423
       423
       +
         Client.get ~headers (Uri.of_string url) >>= fun (resp, body) ->

     

       424
       424
       +
         if resp.status = `OK then

     

       425
       425
       +
           Cohttp_lwt.Body.to_string body

     

       426
       426
       +
         else

     

       427
       427
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       428
       428
       +
           consume_body body >>= fun () ->

     

       429
       429
       +
           Lwt.fail_with (Fmt.str "Asset fetch error: %d" status_code)

     

       430
       430
       +
       

     

       431
       431
       +
       (** Create a new bookmark in Karakeep with optional tags *)

     

       432
       432
       +
       let create_bookmark ~api_key ~url ?title ?note ?tags ?(favourited=false) ?(archived=false) base_url =

     

       433
       433
       +
         let open Cohttp_lwt_unix in

     

       434
       434
       +
         

     

       435
       435
       +
         (* Prepare the bookmark request body *)

     

       436
       436
       +
         let body_obj = [

     

       437
       437
       +
           ("type", `String "link");

     

       438
       438
       +
           ("url", `String url);

     

       439
       439
       +
           ("favourited", `Bool favourited);

     

       440
       440
       +
           ("archived", `Bool archived);

     

       441
       441
       +
         ] in

     

       442
       442
       +
         

     

       443
       443
       +
         (* Add optional fields *)

     

       444
       444
       +
         let body_obj = match title with

     

       445
       445
       +
           | Some title_str -> ("title", `String title_str) :: body_obj

     

       446
       446
       +
           | None -> body_obj

     

       447
       447
       +
         in

     

       448
       448
       +
         

     

       449
       449
       +
         let body_obj = match note with

     

       450
       450
       +
           | Some note_str -> ("note", `String note_str) :: body_obj

     

       451
       451
       +
           | None -> body_obj

     

       452
       452
       +
         in

     

       453
       453
       +
         

     

       454
       454
       +
         (* Convert to JSON *)

     

       455
       455
       +
         let body_json = `O body_obj in

     

       456
       456
       +
         let body_str = J.to_string body_json in

     

       457
       457
       +
         

     

       458
       458
       +
         (* Set up headers with API key *)

     

       459
       459
       +
         let headers = Cohttp.Header.init ()

     

       460
       460
       +
           |> fun h -> Cohttp.Header.add h "Authorization" ("Bearer " ^ api_key)

     

       461
       461
       +
           |> fun h -> Cohttp.Header.add h "Content-Type" "application/json"

     

       462
       462
       +
         in

     

       463
       463
       +
         

     

       464
       464
       +
         (* Helper function to ensure we consume all response body data *)

     

       465
       465
       +
         let consume_body body =

     

       466
       466
       +
           Cohttp_lwt.Body.to_string body >>= fun _ ->

     

       467
       467
       +
           Lwt.return_unit

     

       468
       468
       +
         in

     

       469
       469
       +
         

     

       470
       470
       +
         (* Create the bookmark *)

     

       471
       471
       +
         let url_endpoint = Fmt.str "%s/api/v1/bookmarks" base_url in

     

       472
       472
       +
         Client.post ~headers ~body:(Cohttp_lwt.Body.of_string body_str) (Uri.of_string url_endpoint) >>= fun (resp, body) ->

     

       473
       473
       +
         

     

       474
       474
       +
         if resp.status = `Created || resp.status = `OK then

     

       475
       475
       +
           Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       476
       476
       +
           let json = J.from_string body_str in

     

       477
       477
       +
           let bookmark = parse_bookmark json in

     

       478
       478
       +
           

     

       479
       479
       +
           (* If tags are provided, add them to the bookmark *)

     

       480
       480
       +
           (match tags with

     

       481
       481
       +
            | Some tag_list when tag_list <> [] ->

     

       482
       482
       +
                (* Prepare the tags request body *)

     

       483
       483
       +
                let tag_objects = List.map (fun tag_name -> 

     

       484
       484
       +
                  `O [("tagName", `String tag_name)]

     

       485
       485
       +
                ) tag_list in

     

       486
       486
       +
                

     

       487
       487
       +
                let tags_body = `O [("tags", `A tag_objects)] in

     

       488
       488
       +
                let tags_body_str = J.to_string tags_body in

     

       489
       489
       +
                

     

       490
       490
       +
                (* Add tags to the bookmark *)

     

       491
       491
       +
                let tags_url = Fmt.str "%s/api/v1/bookmarks/%s/tags" base_url bookmark.id in

     

       492
       492
       +
                Client.post ~headers ~body:(Cohttp_lwt.Body.of_string tags_body_str) (Uri.of_string tags_url) >>= fun (resp, body) ->

     

       493
       493
       +
                

     

       494
       494
       +
                (* Always consume the response body *)

     

       495
       495
       +
                consume_body body >>= fun () ->

     

       496
       496
       +
                

     

       497
       497
       +
                if resp.status = `OK then

     

       498
       498
       +
                  (* Fetch the bookmark again to get updated tags *)

     

       499
       499
       +
                  fetch_bookmark_details ~api_key base_url bookmark.id

     

       500
       500
       +
                else

     

       501
       501
       +
                  (* Return the bookmark without tags if tag addition failed *)

     

       502
       502
       +
                  Lwt.return bookmark

     

       503
       503
       +
            | _ -> Lwt.return bookmark)

     

       504
       504
       +
         else

     

       505
       505
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       506
       506
       +
           Cohttp_lwt.Body.to_string body >>= fun error_body ->

     

       507
       507
       +
           Lwt.fail_with (Fmt.str "Failed to create bookmark. HTTP error: %d. Details: %s" status_code error_body)

     

       508
       508
       +
       

     

       509
       509
       +
       (** Convert a Karakeep bookmark to Bushel.Link.t compatible structure *)

     

       510
       510
       +
       let to_bushel_link ?base_url bookmark =

     

       511
       511
       +
         (* Try to find the best title from multiple possible sources *)

     

       512
       512
       +
         let description = 

     

       513
       513
       +
           match bookmark.title with

     

       514
       514
       +
           | Some title when title <> "" -> title

     

       515
       515
       +
           | _ -> 

     

       516
       516
       +
               (* Check if there's a title in the content *)

     

       517
       517
       +
               let content_title = List.assoc_opt "title" bookmark.content in

     

       518
       518
       +
               match content_title with

     

       519
       519
       +
               | Some title when title <> "" && title <> "null" -> title

     

       520
       520
       +
               | _ -> bookmark.url

     

       521
       521
       +
         in

     

       522
       522
       +
         let date = Ptime.to_date bookmark.created_at in

     

       523
       523
       +
         

     

       524
       524
       +
         (* Build selective metadata - only include useful fields *)

     

       525
       525
       +
         let metadata = 

     

       526
       526
       +
           (match bookmark.summary with Some s -> [("summary", s)] | None -> []) @

     

       527
       527
       +
           (* Extract key asset IDs *)

     

       528
       528
       +
           (List.filter_map (fun (id, asset_type) -> 

     

       529
       529
       +
             match asset_type with

     

       530
       530
       +
             | "screenshot" | "bannerImage" -> Some (asset_type, id)

     

       531
       531
       +
             | _ -> None

     

       532
       532
       +
           ) bookmark.assets) @

     

       533
       533
       +
           (* Extract only the favicon from content *)

     

       534
       534
       +
           (List.filter_map (fun (k, v) -> 

     

       535
       535
       +
             if k = "favicon" && v <> "" && v <> "null" then Some ("favicon", v) else None

     

       536
       536
       +
           ) bookmark.content)

     

       537
       537
       +
         in

     

       538
       538
       +
         

     

       539
       539
       +
         (* Create karakeep data if base_url is provided *)

     

       540
       540
       +
         let karakeep = 

     

       541
       541
       +
           match base_url with

     

       542
       542
       +
           | Some url -> 

     

       543
       543
       +
               Some { 

     

       544
       544
       +
                 Bushel.Link.remote_url = url; 

     

       545
       545
       +
                 id = bookmark.id;

     

       546
       546
       +
                 tags = bookmark.tags;

     

       547
       547
       +
                 metadata = metadata;

     

       548
       548
       +
               }

     

       549
       549
       +
           | None -> None

     

       550
       550
       +
         in

     

       551
       551
       +
         

     

       552
       552
       +
         (* Extract bushel slugs from tags *)

     

       553
       553
       +
         let bushel_slugs = 

     

       554
       554
       +
           List.filter_map (fun tag ->

     

       555
       555
       +
             if String.starts_with ~prefix:"bushel:" tag then

     

       556
       556
       +
               Some (String.sub tag 7 (String.length tag - 7))

     

       557
       557
       +
             else

     

       558
       558
       +
               None

     

       559
       559
       +
           ) bookmark.tags

     

       560
       560
       +
         in

     

       561
       561
       +
         

     

       562
       562
       +
         (* Create bushel data if we have bushel-related information *)

     

       563
       563
       +
         let bushel = 

     

       564
       564
       +
           if bushel_slugs = [] then None

     

       565
       565
       +
           else Some { Bushel.Link.slugs = bushel_slugs; tags = [] }

     

       566
       566
       +
         in

     

       567
       567
       +
         

     

       568
       568
       +
         { Bushel.Link.url = bookmark.url; date; description; karakeep; bushel }

+123

stack/bushel/karakeep/karakeep.mli

···

       1
       1
       +
       (** Karakeep API client interface *)

     

       2
       2
       +
       

     

       3
       3
       +
       (** Type representing a Karakeep bookmark *)

     

       4
       4
       +
       type bookmark = {

     

       5
       5
       +
         id: string;

     

       6
       6
       +
         title: string option;

     

       7
       7
       +
         url: string;

     

       8
       8
       +
         note: string option;

     

       9
       9
       +
         created_at: Ptime.t;

     

       10
       10
       +
         updated_at: Ptime.t option;

     

       11
       11
       +
         favourited: bool;

     

       12
       12
       +
         archived: bool;

     

       13
       13
       +
         tags: string list;

     

       14
       14
       +
         tagging_status: string option;

     

       15
       15
       +
         summary: string option;

     

       16
       16
       +
         content: (string * string) list;

     

       17
       17
       +
         assets: (string * string) list;

     

       18
       18
       +
       }

     

       19
       19
       +
       

     

       20
       20
       +
       (** Type for Karakeep API response containing bookmarks *)

     

       21
       21
       +
       type bookmark_response = {

     

       22
       22
       +
         total: int;

     

       23
       23
       +
         data: bookmark list;

     

       24
       24
       +
         next_cursor: string option;

     

       25
       25
       +
       }

     

       26
       26
       +
       

     

       27
       27
       +
       (** Parse a single bookmark from Karakeep JSON *)

     

       28
       28
       +
       val parse_bookmark : Ezjsonm.value -> bookmark

     

       29
       29
       +
       

     

       30
       30
       +
       (** Parse a Karakeep bookmark response *)

     

       31
       31
       +
       val parse_bookmark_response : Ezjsonm.value -> bookmark_response

     

       32
       32
       +
       

     

       33
       33
       +
       (** Fetch bookmarks from a Karakeep instance with pagination support

     

       34
       34
       +
           @param api_key API key for authentication

     

       35
       35
       +
           @param limit Number of bookmarks to fetch per page (default: 50)

     

       36
       36
       +
           @param offset Starting index for pagination (0-based) (default: 0)

     

       37
       37
       +
           @param cursor Optional pagination cursor for cursor-based pagination (overrides offset when provided)

     

       38
       38
       +
           @param include_content Whether to include full content (default: false)

     

       39
       39
       +
           @param filter_tags Optional list of tags to filter by

     

       40
       40
       +
           @param base_url Base URL of the Karakeep instance

     

       41
       41
       +
           @return A Lwt promise with the bookmark response *)

     

       42
       42
       +
       val fetch_bookmarks : 

     

       43
       43
       +
         api_key:string -> 

     

       44
       44
       +
         ?limit:int -> 

     

       45
       45
       +
         ?offset:int -> 

     

       46
       46
       +
         ?cursor:string ->

     

       47
       47
       +
         ?include_content:bool ->

     

       48
       48
       +
         ?filter_tags:string list ->

     

       49
       49
       +
         string -> 

     

       50
       50
       +
         bookmark_response Lwt.t

     

       51
       51
       +
       

     

       52
       52
       +
       (** Fetch all bookmarks from a Karakeep instance using pagination

     

       53
       53
       +
           @param api_key API key for authentication

     

       54
       54
       +
           @param page_size Number of bookmarks to fetch per page (default: 50)

     

       55
       55
       +
           @param max_pages Maximum number of pages to fetch (None for all pages)

     

       56
       56
       +
           @param filter_tags Optional list of tags to filter by

     

       57
       57
       +
           @param include_content Whether to include full content (default: false)

     

       58
       58
       +
           @param base_url Base URL of the Karakeep instance

     

       59
       59
       +
           @return A Lwt promise with all bookmarks combined *)

     

       60
       60
       +
       val fetch_all_bookmarks : 

     

       61
       61
       +
         api_key:string -> 

     

       62
       62
       +
         ?page_size:int -> 

     

       63
       63
       +
         ?max_pages:int -> 

     

       64
       64
       +
         ?filter_tags:string list ->

     

       65
       65
       +
         ?include_content:bool ->

     

       66
       66
       +
         string -> 

     

       67
       67
       +
         bookmark list Lwt.t

     

       68
       68
       +
       

     

       69
       69
       +
       (** Fetch detailed information for a single bookmark by ID

     

       70
       70
       +
           @param api_key API key for authentication

     

       71
       71
       +
           @param base_url Base URL of the Karakeep instance

     

       72
       72
       +
           @param bookmark_id ID of the bookmark to fetch

     

       73
       73
       +
           @return A Lwt promise with the complete bookmark details *)

     

       74
       74
       +
       val fetch_bookmark_details : 

     

       75
       75
       +
         api_key:string ->

     

       76
       76
       +
         string -> 

     

       77
       77
       +
         string -> 

     

       78
       78
       +
         bookmark Lwt.t

     

       79
       79
       +
       

     

       80
       80
       +
       (** Convert a Karakeep bookmark to Bushel.Link.t compatible structure

     

       81
       81
       +
           @param base_url Optional base URL of the Karakeep instance (for karakeep_id) *)

     

       82
       82
       +
       val to_bushel_link : ?base_url:string -> bookmark -> Bushel.Link.t

     

       83
       83
       +
       

     

       84
       84
       +
       (** Fetch an asset from the Karakeep server as a binary string

     

       85
       85
       +
           @param api_key API key for authentication

     

       86
       86
       +
           @param base_url Base URL of the Karakeep instance

     

       87
       87
       +
           @param asset_id ID of the asset to fetch

     

       88
       88
       +
           @return A Lwt promise with the binary asset data *)

     

       89
       89
       +
       val fetch_asset :

     

       90
       90
       +
         api_key:string ->

     

       91
       91
       +
         string ->

     

       92
       92
       +
         string ->

     

       93
       93
       +
         string Lwt.t

     

       94
       94
       +
       

     

       95
       95
       +
       (** Get the asset URL for a given asset ID

     

       96
       96
       +
           @param base_url Base URL of the Karakeep instance

     

       97
       97
       +
           @param asset_id ID of the asset

     

       98
       98
       +
           @return The full URL to the asset *)

     

       99
       99
       +
       val get_asset_url :

     

       100
       100
       +
         string ->

     

       101
       101
       +
         string ->

     

       102
       102
       +
         string

     

       103
       103
       +
       

     

       104
       104
       +
       (** Create a new bookmark in Karakeep with optional tags

     

       105
       105
       +
           @param api_key API key for authentication

     

       106
       106
       +
           @param url The URL to bookmark

     

       107
       107
       +
           @param title Optional title for the bookmark

     

       108
       108
       +
           @param note Optional note to add to the bookmark

     

       109
       109
       +
           @param tags Optional list of tag names to add to the bookmark

     

       110
       110
       +
           @param favourited Whether the bookmark should be marked as favourite (default: false)

     

       111
       111
       +
           @param archived Whether the bookmark should be archived (default: false)

     

       112
       112
       +
           @param base_url Base URL of the Karakeep instance

     

       113
       113
       +
           @return A Lwt promise with the created bookmark *)

     

       114
       114
       +
       val create_bookmark :

     

       115
       115
       +
         api_key:string ->

     

       116
       116
       +
         url:string ->

     

       117
       117
       +
         ?title:string ->

     

       118
       118
       +
         ?note:string ->

     

       119
       119
       +
         ?tags:string list ->

     

       120
       120
       +
         ?favourited:bool ->

     

       121
       121
       +
         ?archived:bool ->

     

       122
       122
       +
         string ->

     

       123
       123
       +
         bookmark Lwt.t

+79

stack/bushel/lib/bushel.ml

···

       1
       1
       +
       module Contact = Contact

     

       2
       2
       +
       module Idea = Idea

     

       3
       3
       +
       module Note = Note

     

       4
       4
       +
       module Paper = Paper

     

       5
       5
       +
       module Project = Project

     

       6
       6
       +
       module Video = Video

     

       7
       7
       +
       module Tags = Tags

     

       8
       8
       +
       module Link = Link

     

       9
       9
       +
       module Entry = Entry

     

       10
       10
       +
       module Util = Util

     

       11
       11
       +
       module Srcsetter = Srcsetter

     

       12
       12
       +
       module Md = Md

     

       13
       13
       +
       module Typesense = Typesense

     

       14
       14
       +
       module Link_graph = Link_graph

     

       15
       15
       +
       module Description = Description

     

       16
       16
       +
       module Doi_entry = Doi_entry

     

       17
       17
       +
       

     

       18
       18
       +
       let map_md base subdir fn =

     

       19
       19
       +
         let dir = base ^ "/data/" ^ subdir in

     

       20
       20
       +
         Sys.readdir dir

     

       21
       21
       +
         |> Array.to_list

     

       22
       22
       +
         |> List.filter (fun f -> Filename.check_suffix f ".md")

     

       23
       23
       +
         |> List.map (fun e -> fn dir e)

     

       24
       24
       +
       ;;

     

       25
       25
       +
       

     

       26
       26
       +
       let map_category base c fn = map_md base c (fun dir e -> fn @@ Filename.concat dir e)

     

       27
       27
       +
       let dbg l = Printf.eprintf "loading %s\n%!" l

     

       28
       28
       +
       

     

       29
       29
       +
       let load_contacts base = dbg "contacts"; map_category base "contacts" Contact.of_md

     

       30
       30
       +
       let load_projects base = dbg "projects"; map_category base "projects" Project.of_md

     

       31
       31
       +
       let load_notes base =

     

       32
       32
       +
         dbg "notes";

     

       33
       33
       +
         let notes_from_notes = map_category base "notes" Note.of_md in

     

       34
       34
       +
         let notes_from_news = map_category base "news" Note.of_md in

     

       35
       35
       +
         notes_from_notes @ notes_from_news

     

       36
       36
       +
       let load_ideas base = dbg "ideas"; map_category base "ideas" Idea.of_md

     

       37
       37
       +
       let load_videos base = dbg "videos"; map_category base "videos" Video.of_md

     

       38
       38
       +
       

     

       39
       39
       +
       let load_images base =

     

       40
       40
       +
         Printf.eprintf "load images %s/images\n%!" base;

     

       41
       41
       +
         try

     

       42
       42
       +
           Srcsetter.list_of_json (Util.read_file (base ^ "/images/index.json")) |> Result.get_ok

     

       43
       43
       +
         with

     

       44
       44
       +
         | _ -> [] (* FIXME log *)

     

       45
       45
       +
       ;;

     

       46
       46
       +
       

     

       47
       47
       +
       let load_papers base =

     

       48
       48
       +
         Printf.eprintf "load papers %s/data/papers\n%!" base;

     

       49
       49
       +
         Sys.readdir (base ^ "/data/papers")

     

       50
       50
       +
         |> Array.to_list

     

       51
       51
       +
         |> List.filter (fun slug -> Sys.is_directory (base ^ "/data/papers/" ^ slug))

     

       52
       52
       +
         |> List.map (fun slug ->

     

       53
       53
       +
           Sys.readdir (base ^ "/data/papers/" ^ slug)

     

       54
       54
       +
           |> Array.to_list

     

       55
       55
       +
           |> List.filter (fun ver -> Filename.check_suffix ver ".md")

     

       56
       56
       +
           |> List.map (fun ver ->

     

       57
       57
       +
             let ver = Filename.chop_extension ver in

     

       58
       58
       +
             Paper.of_md ~slug ~ver (base ^ "/data/papers/" ^ slug ^ "/" ^ ver ^ ".md")))

     

       59
       59
       +
         |> List.flatten

     

       60
       60
       +
         |> Paper.tv

     

       61
       61
       +
       ;;

     

       62
       62
       +
       

     

       63
       63
       +
       let load base =

     

       64
       64
       +
         let images = load_images base in

     

       65
       65
       +
         let papers = load_papers base in

     

       66
       66
       +
         let contacts = load_contacts base in

     

       67
       67
       +
         let projects = load_projects base in

     

       68
       68
       +
         let notes = load_notes base in

     

       69
       69
       +
         let ideas = load_ideas base in

     

       70
       70
       +
         let videos = load_videos base in

     

       71
       71
       +
         let entries = Entry.v ~images ~papers ~notes ~projects ~ideas ~videos ~contacts ~data_dir:(base ^ "/data") in

     

       72
       72
       +
         (* Build link graph *)

     

       73
       73
       +
         Printf.eprintf "Building link_graph...\n%!";

     

       74
       74
       +
         let graph = Link_graph.build_link_graph entries in

     

       75
       75
       +
         Fmt.epr "%a@." Link_graph.pp_graph graph;

     

       76
       76
       +
         Link_graph.set_graph graph;

     

       77
       77
       +
         entries

     

       78
       78
       +
       ;;

     

       79
       79
       +

+27

stack/bushel/lib/bushel.mli

···

       1
       1
       +
       (** Bushel *)

     

       2
       2
       +
       

     

       3
       3
       +
       module Contact = Contact

     

       4
       4
       +
       module Idea = Idea

     

       5
       5
       +
       module Note = Note

     

       6
       6
       +
       module Paper = Paper

     

       7
       7
       +
       module Project = Project

     

       8
       8
       +
       module Video = Video

     

       9
       9
       +
       module Tags = Tags

     

       10
       10
       +
       module Link = Link

     

       11
       11
       +
       module Entry = Entry

     

       12
       12
       +
       module Util = Util

     

       13
       13
       +
       module Md = Md

     

       14
       14
       +
       module Srcsetter = Srcsetter

     

       15
       15
       +
       module Typesense = Typesense

     

       16
       16
       +
       module Link_graph = Link_graph

     

       17
       17
       +
       module Description = Description

     

       18
       18
       +
       module Doi_entry = Doi_entry

     

       19
       19
       +
       

     

       20
       20
       +
       val load_contacts : string -> Contact.ts

     

       21
       21
       +
       val load_projects : string -> Project.ts

     

       22
       22
       +
       val load_notes : string -> Note.ts

     

       23
       23
       +
       val load_ideas : string -> Idea.ts

     

       24
       24
       +
       val load_videos : string -> Video.ts

     

       25
       25
       +
       val load_images : string -> Srcsetter.ts

     

       26
       26
       +
       val load_papers : string -> Paper.ts

     

       27
       27
       +
       val load : string -> Entry.t

+172

stack/bushel/lib/contact.ml

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { names : string list

     

       3
       3
       +
         ; handle : string

     

       4
       4
       +
         ; email : string option

     

       5
       5
       +
         ; icon : string option

     

       6
       6
       +
         ; github : string option

     

       7
       7
       +
         ; twitter : string option

     

       8
       8
       +
         ; bluesky : string option

     

       9
       9
       +
         ; mastodon : string option

     

       10
       10
       +
         ; orcid : string option

     

       11
       11
       +
         ; url : string option

     

       12
       12
       +
         ; atom : string list option

     

       13
       13
       +
         }

     

       14
       14
       +
       

     

       15
       15
       +
       type ts = t list

     

       16
       16
       +
       

     

       17
       17
       +
       let v ?email ?github ?twitter ?bluesky ?mastodon ?orcid ?icon ?url ?atom handle names =

     

       18
       18
       +
         { names; handle; email; github; twitter; bluesky; mastodon; orcid; url; icon; atom }

     

       19
       19
       +
       ;;

     

       20
       20
       +
       

     

       21
       21
       +
       let make names email icon github twitter bluesky mastodon orcid url atom =

     

       22
       22
       +
         v ?email ?github ?twitter ?bluesky ?mastodon ?orcid ?icon ?url ?atom "" names

     

       23
       23
       +
       ;;

     

       24
       24
       +
       

     

       25
       25
       +
       let names { names; _ } = names

     

       26
       26
       +
       let name { names; _ } = List.hd names

     

       27
       27
       +
       let handle { handle; _ } = handle

     

       28
       28
       +
       let email { email; _ } = email

     

       29
       29
       +
       let icon { icon; _ } = icon

     

       30
       30
       +
       let github { github; _ } = github

     

       31
       31
       +
       let twitter { twitter; _ } = twitter

     

       32
       32
       +
       let bluesky { bluesky; _ } = bluesky

     

       33
       33
       +
       let mastodon { mastodon; _ } = mastodon

     

       34
       34
       +
       let orcid { orcid; _ } = orcid

     

       35
       35
       +
       let url { url; _ } = url

     

       36
       36
       +
       let atom { atom; _ } = atom

     

       37
       37
       +
       

     

       38
       38
       +
       let json_t =

     

       39
       39
       +
         let open Jsont in

     

       40
       40
       +
         let open Jsont.Object in

     

       41
       41
       +
         let mem_opt f v ~enc = mem f v ~dec_absent:None ~enc_omit:Option.is_none ~enc in

     

       42
       42
       +
         map ~kind:"Contact" make

     

       43
       43
       +
         |> mem "names" (list string) ~dec_absent:[] ~enc:names

     

       44
       44
       +
         |> mem_opt "email" (some string) ~enc:email

     

       45
       45
       +
         |> mem_opt "icon" (some string) ~enc:icon

     

       46
       46
       +
         |> mem_opt "github" (some string) ~enc:github

     

       47
       47
       +
         |> mem_opt "twitter" (some string) ~enc:twitter

     

       48
       48
       +
         |> mem_opt "bluesky" (some string) ~enc:bluesky

     

       49
       49
       +
         |> mem_opt "mastodon" (some string) ~enc:mastodon

     

       50
       50
       +
         |> mem_opt "orcid" (some string) ~enc:orcid

     

       51
       51
       +
         |> mem_opt "url" (some string) ~enc:url

     

       52
       52
       +
         |> mem_opt "atom" (some (list string)) ~enc:atom

     

       53
       53
       +
         |> finish

     

       54
       54
       +
       ;;

     

       55
       55
       +
       

     

       56
       56
       +
       let v = Jsont_bytesrw.decode_string (Jsont.list json_t)

     

       57
       57
       +
       let compare a b = String.compare a.handle b.handle

     

       58
       58
       +
       let find_by_handle ts h = List.find_opt (fun { handle; _ } -> handle = h) ts

     

       59
       59
       +
       

     

       60
       60
       +
       let best_url c =

     

       61
       61
       +
         match c.url with

     

       62
       62
       +
         | Some v -> Some v

     

       63
       63
       +
         | None ->

     

       64
       64
       +
           (match c.github with

     

       65
       65
       +
            | Some v -> Some ("https://github.com/" ^ v)

     

       66
       66
       +
            | None ->

     

       67
       67
       +
              (match c.email with

     

       68
       68
       +
               | Some v -> Some ("mailto:" ^ v)

     

       69
       69
       +
               | None -> None))

     

       70
       70
       +
       ;;

     

       71
       71
       +
       

     

       72
       72
       +
       let of_md fname =

     

       73
       73
       +
         (* TODO fix Jekyll_post to not error on no date *)

     

       74
       74
       +
         let fname' = "2000-01-01-" ^ Filename.basename fname in

     

       75
       75
       +
         let handle = Filename.basename fname |> Filename.chop_extension in

     

       76
       76
       +
         match Jekyll_post.of_string ~fname:fname' (Util.read_file fname) with

     

       77
       77
       +
         | Error (`Msg m) -> failwith ("contact_of_md: " ^ m)

     

       78
       78
       +
         | Ok jp ->

     

       79
       79
       +
           let fields = jp.Jekyll_post.fields |> Jekyll_format.fields_to_yaml in

     

       80
       80
       +
           let c = Jsont_bytesrw.decode_string json_t (Ezjsonm.value_to_string fields) in

     

       81
       81
       +
           (match c with

     

       82
       82
       +
            | Error e -> failwith e

     

       83
       83
       +
            | Ok c -> { c with handle })

     

       84
       84
       +
       ;;

     

       85
       85
       +
       

     

       86
       86
       +
       (* Given a name, turn it lowercase and return the concatenation of the

     

       87
       87
       +
       initials of all the words in the name and the full last name. *)

     

       88
       88
       +
       let handle_of_name name =

     

       89
       89
       +
         let name = String.lowercase_ascii name in

     

       90
       90
       +
         let words = String.split_on_char ' ' name in

     

       91
       91
       +
         let initials = String.concat "" (List.map (fun w -> String.sub w 0 1) words) in

     

       92
       92
       +
         initials ^ List.hd (List.rev words)

     

       93
       93
       +
       ;;

     

       94
       94
       +
       

     

       95
       95
       +
       (* fuzzy lookup for an author. Strip out any non alpha numeric characters while

     

       96
       96
       +
          searching for the name *)

     

       97
       97
       +
       let lookup_by_name ts a =

     

       98
       98
       +
         let a = String.lowercase_ascii a in

     

       99
       99
       +
         let rec aux acc = function

     

       100
       100
       +
           | [] -> acc

     

       101
       101
       +
           | t :: ts ->

     

       102
       102
       +
             if List.exists (fun n -> String.lowercase_ascii n = a) t.names

     

       103
       103
       +
             then aux (t :: acc) ts

     

       104
       104
       +
             else aux acc ts

     

       105
       105
       +
         in

     

       106
       106
       +
         match aux [] ts with

     

       107
       107
       +
         | [ a ] -> a

     

       108
       108
       +
         | [] -> raise (Failure ("contact.ml: author not found: " ^ a))

     

       109
       109
       +
         | _ -> raise (Failure ("ambiguous author: " ^ a))

     

       110
       110
       +
       ;;

     

       111
       111
       +
       

     

       112
       112
       +
       (* TODO:claude *)

     

       113
       113
       +
       let typesense_schema =

     

       114
       114
       +
         let open Ezjsonm in

     

       115
       115
       +
         dict [

     

       116
       116
       +
           ("name", string "contacts");

     

       117
       117
       +
           ("fields", list (fun d -> dict d) [

     

       118
       118
       +
             [("name", string "id"); ("type", string "string")];

     

       119
       119
       +
             [("name", string "handle"); ("type", string "string")];

     

       120
       120
       +
             [("name", string "name"); ("type", string "string")];

     

       121
       121
       +
             [("name", string "names"); ("type", string "string[]"); ("optional", bool true)];

     

       122
       122
       +
             [("name", string "email"); ("type", string "string[]"); ("optional", bool true)];

     

       123
       123
       +
             [("name", string "icon"); ("type", string "string[]"); ("optional", bool true)];

     

       124
       124
       +
             [("name", string "github"); ("type", string "string[]"); ("optional", bool true)];

     

       125
       125
       +
             [("name", string "twitter"); ("type", string "string[]"); ("optional", bool true)];

     

       126
       126
       +
             [("name", string "bluesky"); ("type", string "string[]"); ("optional", bool true)];

     

       127
       127
       +
             [("name", string "mastodon"); ("type", string "string[]"); ("optional", bool true)];

     

       128
       128
       +
             [("name", string "orcid"); ("type", string "string[]"); ("optional", bool true)];

     

       129
       129
       +
             [("name", string "url"); ("type", string "string[]"); ("optional", bool true)];

     

       130
       130
       +
             [("name", string "atom"); ("type", string "string[]"); ("optional", bool true)];

     

       131
       131
       +
           ]);

     

       132
       132
       +
         ]

     

       133
       133
       +
       

     

       134
       134
       +
       (** TODO:claude Pretty-print a contact with ANSI formatting *)

     

       135
       135
       +
       let pp ppf c =

     

       136
       136
       +
         let open Fmt in

     

       137
       137
       +
         pf ppf "@[<v>";

     

       138
       138
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Contact";

     

       139
       139
       +
         pf ppf "%a: @%a@," (styled `Bold string) "Handle" string (handle c);

     

       140
       140
       +
         pf ppf "%a: %a@," (styled `Bold string) "Name" string (name c);

     

       141
       141
       +
         let ns = names c in

     

       142
       142
       +
         if List.length ns > 1 then

     

       143
       143
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Aliases" (list ~sep:comma string) (List.tl ns);

     

       144
       144
       +
         (match email c with

     

       145
       145
       +
          | Some e -> pf ppf "%a: %a@," (styled `Bold string) "Email" string e

     

       146
       146
       +
          | None -> ());

     

       147
       147
       +
         (match github c with

     

       148
       148
       +
          | Some g -> pf ppf "%a: https://github.com/%a@," (styled `Bold string) "GitHub" string g

     

       149
       149
       +
          | None -> ());

     

       150
       150
       +
         (match twitter c with

     

       151
       151
       +
          | Some t -> pf ppf "%a: https://twitter.com/%a@," (styled `Bold string) "Twitter" string t

     

       152
       152
       +
          | None -> ());

     

       153
       153
       +
         (match bluesky c with

     

       154
       154
       +
          | Some b -> pf ppf "%a: %a@," (styled `Bold string) "Bluesky" string b

     

       155
       155
       +
          | None -> ());

     

       156
       156
       +
         (match mastodon c with

     

       157
       157
       +
          | Some m -> pf ppf "%a: %a@," (styled `Bold string) "Mastodon" string m

     

       158
       158
       +
          | None -> ());

     

       159
       159
       +
         (match orcid c with

     

       160
       160
       +
          | Some o -> pf ppf "%a: https://orcid.org/%a@," (styled `Bold string) "ORCID" string o

     

       161
       161
       +
          | None -> ());

     

       162
       162
       +
         (match url c with

     

       163
       163
       +
          | Some u -> pf ppf "%a: %a@," (styled `Bold string) "URL" string u

     

       164
       164
       +
          | None -> ());

     

       165
       165
       +
         (match icon c with

     

       166
       166
       +
          | Some i -> pf ppf "%a: %a@," (styled `Bold string) "Icon" string i

     

       167
       167
       +
          | None -> ());

     

       168
       168
       +
         (match atom c with

     

       169
       169
       +
          | Some atoms when atoms <> [] ->

     

       170
       170
       +
            pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Atom Feeds" (list ~sep:comma string) atoms

     

       171
       171
       +
          | _ -> ());

     

       172
       172
       +
         pf ppf "@]"

+25

stack/bushel/lib/contact.mli

···

       1
       1
       +
       type t

     

       2
       2
       +
       type ts = t list

     

       3
       3
       +
       

     

       4
       4
       +
       val v : string -> (ts, string) result

     

       5
       5
       +
       val names : t -> string list

     

       6
       6
       +
       val name : t -> string

     

       7
       7
       +
       val handle : t -> string

     

       8
       8
       +
       val email : t -> string option

     

       9
       9
       +
       val icon : t -> string option

     

       10
       10
       +
       val github : t -> string option

     

       11
       11
       +
       val twitter : t -> string option

     

       12
       12
       +
       val bluesky : t -> string option

     

       13
       13
       +
       val mastodon : t -> string option

     

       14
       14
       +
       val orcid : t -> string option

     

       15
       15
       +
       val url : t -> string option

     

       16
       16
       +
       val atom : t -> string list option

     

       17
       17
       +
       val best_url : t -> string option

     

       18
       18
       +
       val find_by_handle : t list -> string -> t option

     

       19
       19
       +
       val handle_of_name : string -> string

     

       20
       20
       +
       val lookup_by_name : ts -> string -> t

     

       21
       21
       +
       val json_t : t Jsont.t

     

       22
       22
       +
       val compare : t -> t -> int

     

       23
       23
       +
       val of_md : string -> t

     

       24
       24
       +
       val typesense_schema : Ezjsonm.value

     

       25
       25
       +
       val pp : Format.formatter -> t -> unit

+72

stack/bushel/lib/description.ml

···

       1
       1
       +
       (** Generate descriptive text for bushel entries *)

     

       2
       2
       +
       

     

       3
       3
       +
       (* Helper to format a date as "Month Year" *)

     

       4
       4
       +
       let format_date date =

     

       5
       5
       +
         let (year, month, _day) = date in

     

       6
       6
       +
         let month_name = match month with

     

       7
       7
       +
           | 1 -> "January" | 2 -> "February" | 3 -> "March" | 4 -> "April"

     

       8
       8
       +
           | 5 -> "May" | 6 -> "June" | 7 -> "July" | 8 -> "August"

     

       9
       9
       +
           | 9 -> "September" | 10 -> "October" | 11 -> "November" | 12 -> "December"

     

       10
       10
       +
           | _ -> ""

     

       11
       11
       +
         in

     

       12
       12
       +
         Printf.sprintf "%s %d" month_name year

     

       13
       13
       +
       

     

       14
       14
       +
       (* Generate a descriptive sentence for a paper *)

     

       15
       15
       +
       let paper_description (p : Paper.t) ~date_str =

     

       16
       16
       +
         let venue = match String.lowercase_ascii (Paper.bibtype p) with

     

       17
       17
       +
           | "inproceedings" -> Paper.booktitle p

     

       18
       18
       +
           | "article" -> Paper.journal p

     

       19
       19
       +
           | "book" ->

     

       20
       20
       +
             let pub = Paper.publisher p in

     

       21
       21
       +
             if pub = "" then "Book" else "Book by " ^ pub

     

       22
       22
       +
           | "techreport" ->

     

       23
       23
       +
             (try "Technical report at " ^ Paper.institution p

     

       24
       24
       +
              with _ -> "Technical report")

     

       25
       25
       +
           | "misc" ->

     

       26
       26
       +
             let pub = Paper.publisher p in

     

       27
       27
       +
             if pub = "" then "Working paper" else "Working paper at " ^ pub

     

       28
       28
       +
           | _ -> "Publication"

     

       29
       29
       +
         in

     

       30
       30
       +
         Printf.sprintf "Paper in %s (%s)" venue date_str

     

       31
       31
       +
       

     

       32
       32
       +
       (* Generate a descriptive sentence for a note *)

     

       33
       33
       +
       let note_description (n : Note.t) ~date_str ~lookup_fn =

     

       34
       34
       +
         match Note.slug_ent n with

     

       35
       35
       +
         | Some slug_ent ->

     

       36
       36
       +
           (match lookup_fn slug_ent with

     

       37
       37
       +
            | Some related_title ->

     

       38
       38
       +
              Printf.sprintf "Note about %s (%s)" related_title date_str

     

       39
       39
       +
            | None -> Printf.sprintf "Research note (%s)" date_str)

     

       40
       40
       +
         | None -> Printf.sprintf "Research note (%s)" date_str

     

       41
       41
       +
       

     

       42
       42
       +
       (* Generate a descriptive sentence for an idea *)

     

       43
       43
       +
       let idea_description (i : Idea.t) ~date_str =

     

       44
       44
       +
         let status_str = String.lowercase_ascii (Idea.status_to_string (Idea.status i)) in

     

       45
       45
       +
         let level_str = Idea.level_to_string (Idea.level i) in

     

       46
       46
       +
         Printf.sprintf "Research idea (%s, %s level, %s)" status_str level_str date_str

     

       47
       47
       +
       

     

       48
       48
       +
       (* Generate a descriptive sentence for a video *)

     

       49
       49
       +
       let video_description (v : Video.t) ~date_str ~lookup_fn =

     

       50
       50
       +
         let video_type = if Video.talk v then "Talk video" else "Video" in

     

       51
       51
       +
         let context = match Video.paper v with

     

       52
       52
       +
           | Some paper_slug ->

     

       53
       53
       +
             (match lookup_fn paper_slug with

     

       54
       54
       +
              | Some title -> Printf.sprintf " about %s" title

     

       55
       55
       +
              | None -> "")

     

       56
       56
       +
           | None ->

     

       57
       57
       +
             (match Video.project v with

     

       58
       58
       +
              | Some project_slug ->

     

       59
       59
       +
                (match lookup_fn project_slug with

     

       60
       60
       +
                 | Some title -> Printf.sprintf " about %s" title

     

       61
       61
       +
                 | None -> "")

     

       62
       62
       +
              | None -> "")

     

       63
       63
       +
         in

     

       64
       64
       +
         Printf.sprintf "%s%s (%s)" video_type context date_str

     

       65
       65
       +
       

     

       66
       66
       +
       (* Generate a descriptive sentence for a project *)

     

       67
       67
       +
       let project_description (pr : Project.t) =

     

       68
       68
       +
         let end_str = match pr.Project.finish with

     

       69
       69
       +
           | Some year -> string_of_int year

     

       70
       70
       +
           | None -> "present"

     

       71
       71
       +
         in

     

       72
       72
       +
         Printf.sprintf "Project (%d–%s)" pr.Project.start end_str

+19

stack/bushel/lib/description.mli

···

       1
       1
       +
       (** Generate descriptive text for bushel entries *)

     

       2
       2
       +
       

     

       3
       3
       +
       (** Format a date as "Month Year" *)

     

       4
       4
       +
       val format_date : int * int * int -> string

     

       5
       5
       +
       

     

       6
       6
       +
       (** Generate a descriptive sentence for a paper with date string *)

     

       7
       7
       +
       val paper_description : Paper.t -> date_str:string -> string

     

       8
       8
       +
       

     

       9
       9
       +
       (** Generate a descriptive sentence for a note with date string and lookup function *)

     

       10
       10
       +
       val note_description : Note.t -> date_str:string -> lookup_fn:(string -> string option) -> string

     

       11
       11
       +
       

     

       12
       12
       +
       (** Generate a descriptive sentence for an idea with date string *)

     

       13
       13
       +
       val idea_description : Idea.t -> date_str:string -> string

     

       14
       14
       +
       

     

       15
       15
       +
       (** Generate a descriptive sentence for a video with date string and lookup function *)

     

       16
       16
       +
       val video_description : Video.t -> date_str:string -> lookup_fn:(string -> string option) -> string

     

       17
       17
       +
       

     

       18
       18
       +
       (** Generate a descriptive sentence for a project *)

     

       19
       19
       +
       val project_description : Project.t -> string

+147

stack/bushel/lib/doi_entry.ml

···

       1
       1
       +
       module J = Ezjsonm

     

       2
       2
       +
       

     

       3
       3
       +
       type status =

     

       4
       4
       +
         | Resolved

     

       5
       5
       +
         | Failed of string

     

       6
       6
       +
       

     

       7
       7
       +
       type t = {

     

       8
       8
       +
         doi: string;

     

       9
       9
       +
         title: string;

     

       10
       10
       +
         authors: string list;

     

       11
       11
       +
         year: int;

     

       12
       12
       +
         bibtype: string;

     

       13
       13
       +
         publisher: string;

     

       14
       14
       +
         resolved_at: string;

     

       15
       15
       +
         source_urls: string list;

     

       16
       16
       +
         status: status;

     

       17
       17
       +
         ignore: bool;

     

       18
       18
       +
       }

     

       19
       19
       +
       

     

       20
       20
       +
       type ts = t list

     

       21
       21
       +
       

     

       22
       22
       +
       let create_resolved ~doi ~title ~authors ~year ~bibtype ~publisher ?(source_urls=[]) () =

     

       23
       23
       +
         let resolved_at =

     

       24
       24
       +
           let now = Ptime_clock.now () in

     

       25
       25
       +
           let rfc3339 = Ptime.to_rfc3339 ~space:false ~frac_s:0 now in

     

       26
       26
       +
           String.sub rfc3339 0 10  (* Extract YYYY-MM-DD *)

     

       27
       27
       +
         in

     

       28
       28
       +
         { doi; title; authors; year; bibtype; publisher; resolved_at; source_urls; status = Resolved; ignore = false }

     

       29
       29
       +
       

     

       30
       30
       +
       let create_failed ~doi ~error ?(source_urls=[]) () =

     

       31
       31
       +
         let resolved_at =

     

       32
       32
       +
           let now = Ptime_clock.now () in

     

       33
       33
       +
           let rfc3339 = Ptime.to_rfc3339 ~space:false ~frac_s:0 now in

     

       34
       34
       +
           String.sub rfc3339 0 10  (* Extract YYYY-MM-DD *)

     

       35
       35
       +
         in

     

       36
       36
       +
         { doi; title = ""; authors = []; year = 0; bibtype = ""; publisher = "";

     

       37
       37
       +
           resolved_at; source_urls; status = Failed error; ignore = false }

     

       38
       38
       +
       

     

       39
       39
       +
       let merge_entries old_entry new_entry =

     

       40
       40
       +
         (* Combine source_urls, removing duplicates *)

     

       41
       41
       +
         let combined_urls =

     

       42
       42
       +
           List.sort_uniq String.compare (old_entry.source_urls @ new_entry.source_urls)

     

       43
       43
       +
         in

     

       44
       44
       +
         (* Use new_entry's data but with combined URLs and preserve ignore flag from old entry *)

     

       45
       45
       +
         { new_entry with source_urls = combined_urls; ignore = old_entry.ignore }

     

       46
       46
       +
       

     

       47
       47
       +
       let to_yaml_value entry =

     

       48
       48
       +
         let status_field = match entry.status with

     

       49
       49
       +
           | Resolved -> []

     

       50
       50
       +
           | Failed err -> [("error", `String err)]

     

       51
       51
       +
         in

     

       52
       52
       +
         let source_urls_field = match entry.source_urls with

     

       53
       53
       +
           | [] -> []

     

       54
       54
       +
           | urls -> [("source_urls", `A (List.map (fun url -> `String url) urls))]

     

       55
       55
       +
         in

     

       56
       56
       +
         let ignore_field = if entry.ignore then [("ignore", `Bool true)] else [] in

     

       57
       57
       +
         let fields = [

     

       58
       58
       +
           ("doi", `String entry.doi);

     

       59
       59
       +
           ("resolved_at", `String entry.resolved_at);

     

       60
       60
       +
         ] @ status_field @ source_urls_field @ ignore_field in

     

       61
       61
       +
         let fields = match entry.status with

     

       62
       62
       +
           | Resolved ->

     

       63
       63
       +
             fields @ [

     

       64
       64
       +
               ("title", `String entry.title);

     

       65
       65
       +
               ("authors", `A (List.map (fun a -> `String a) entry.authors));

     

       66
       66
       +
               ("year", `Float (float_of_int entry.year));

     

       67
       67
       +
               ("bibtype", `String entry.bibtype);

     

       68
       68
       +
               ("publisher", `String entry.publisher);

     

       69
       69
       +
             ]

     

       70
       70
       +
           | Failed _ -> fields

     

       71
       71
       +
         in

     

       72
       72
       +
         `O fields

     

       73
       73
       +
       

     

       74
       74
       +
       let of_yaml_value v =

     

       75
       75
       +
         try

     

       76
       76
       +
           let doi = J.find v ["doi"] |> J.get_string in

     

       77
       77
       +
           let resolved_at = J.find v ["resolved_at"] |> J.get_string in

     

       78
       78
       +
           (* Support both old source_url (single) and new source_urls (list) for backwards compatibility *)

     

       79
       79
       +
           let source_urls =

     

       80
       80
       +
             try

     

       81
       81
       +
               J.find v ["source_urls"] |> J.get_list J.get_string

     

       82
       82
       +
             with _ ->

     

       83
       83
       +
               try

     

       84
       84
       +
                 let single_url = J.find v ["source_url"] |> J.get_string in

     

       85
       85
       +
                 [single_url]

     

       86
       86
       +
               with _ -> []

     

       87
       87
       +
           in

     

       88
       88
       +
           let ignore = try J.find v ["ignore"] |> J.get_bool with _ -> false in

     

       89
       89
       +
           let error = try Some (J.find v ["error"] |> J.get_string) with _ -> None in

     

       90
       90
       +
           match error with

     

       91
       91
       +
           | Some err ->

     

       92
       92
       +
             { doi; title = ""; authors = []; year = 0; bibtype = ""; publisher = "";

     

       93
       93
       +
               resolved_at; source_urls; status = Failed err; ignore }

     

       94
       94
       +
           | None ->

     

       95
       95
       +
             let title = J.find v ["title"] |> J.get_string in

     

       96
       96
       +
             let authors = J.find v ["authors"] |> J.get_list J.get_string in

     

       97
       97
       +
             let year = J.find v ["year"] |> J.get_float |> int_of_float in

     

       98
       98
       +
             let bibtype = J.find v ["bibtype"] |> J.get_string in

     

       99
       99
       +
             let publisher = J.find v ["publisher"] |> J.get_string in

     

       100
       100
       +
             { doi; title; authors; year; bibtype; publisher; resolved_at; source_urls; status = Resolved; ignore }

     

       101
       101
       +
         with e ->

     

       102
       102
       +
           Printf.eprintf "Failed to parse DOI entry: %s\n%!" (Printexc.to_string e);

     

       103
       103
       +
           failwith "Invalid DOI entry in YAML"

     

       104
       104
       +
       

     

       105
       105
       +
       let load path =

     

       106
       106
       +
         if not (Sys.file_exists path) then

     

       107
       107
       +
           []

     

       108
       108
       +
         else

     

       109
       109
       +
           try

     

       110
       110
       +
             let yaml_str = In_channel.with_open_text path In_channel.input_all in

     

       111
       111
       +
             match Yaml.of_string yaml_str with

     

       112
       112
       +
             | Ok (`A entries) -> List.map of_yaml_value entries

     

       113
       113
       +
             | Ok _ -> []

     

       114
       114
       +
             | Error (`Msg e) ->

     

       115
       115
       +
               Printf.eprintf "Failed to parse %s: %s\n%!" path e;

     

       116
       116
       +
               []

     

       117
       117
       +
           with e ->

     

       118
       118
       +
             Printf.eprintf "Failed to load %s: %s\n%!" path (Printexc.to_string e);

     

       119
       119
       +
             []

     

       120
       120
       +
       

     

       121
       121
       +
       let save path entries =

     

       122
       122
       +
         let yaml_list = `A (List.map to_yaml_value entries) in

     

       123
       123
       +
         let yaml_str = Yaml.to_string_exn yaml_list in

     

       124
       124
       +
         Out_channel.with_open_text path (fun oc ->

     

       125
       125
       +
           Out_channel.output_string oc yaml_str

     

       126
       126
       +
         )

     

       127
       127
       +
       

     

       128
       128
       +
       let to_map entries =

     

       129
       129
       +
         let map = Hashtbl.create (List.length entries) in

     

       130
       130
       +
         List.iter (fun entry -> Hashtbl.add map entry.doi entry) entries;

     

       131
       131
       +
         map

     

       132
       132
       +
       

     

       133
       133
       +
       let find_by_doi entries doi =

     

       134
       134
       +
         List.find_opt (fun entry -> not entry.ignore && entry.doi = doi) entries

     

       135
       135
       +
       

     

       136
       136
       +
       let find_by_url entries url =

     

       137
       137
       +
         List.find_opt (fun entry ->

     

       138
       138
       +
           not entry.ignore && List.mem url entry.source_urls

     

       139
       139
       +
         ) entries

     

       140
       140
       +
       

     

       141
       141
       +
       let find_by_doi_including_ignored entries doi =

     

       142
       142
       +
         List.find_opt (fun entry -> entry.doi = doi) entries

     

       143
       143
       +
       

     

       144
       144
       +
       let find_by_url_including_ignored entries url =

     

       145
       145
       +
         List.find_opt (fun entry ->

     

       146
       146
       +
           List.mem url entry.source_urls

     

       147
       147
       +
         ) entries

+51

stack/bushel/lib/doi_entry.mli

···

       1
       1
       +
       (** DOI entries resolved from external sources via Zotero Translation Server *)

     

       2
       2
       +
       

     

       3
       3
       +
       type status =

     

       4
       4
       +
         | Resolved  (** Successfully resolved from Zotero *)

     

       5
       5
       +
         | Failed of string  (** Failed to resolve, with error message *)

     

       6
       6
       +
       

     

       7
       7
       +
       type t = {

     

       8
       8
       +
         doi: string;

     

       9
       9
       +
         title: string;

     

       10
       10
       +
         authors: string list;

     

       11
       11
       +
         year: int;

     

       12
       12
       +
         bibtype: string;  (** article, inproceedings, book, etc *)

     

       13
       13
       +
         publisher: string;  (** journal/conference/publisher name *)

     

       14
       14
       +
         resolved_at: string;  (** ISO date when resolved *)

     

       15
       15
       +
         source_urls: string list;  (** All URLs that resolve to this DOI (publisher links, doi.org URLs, etc) *)

     

       16
       16
       +
         status: status;

     

       17
       17
       +
         ignore: bool;  (** If true, skip this entry when looking up references *)

     

       18
       18
       +
       }

     

       19
       19
       +
       

     

       20
       20
       +
       type ts = t list

     

       21
       21
       +
       

     

       22
       22
       +
       (** Load DOI entries from YAML file *)

     

       23
       23
       +
       val load : string -> ts

     

       24
       24
       +
       

     

       25
       25
       +
       (** Save DOI entries to YAML file *)

     

       26
       26
       +
       val save : string -> ts -> unit

     

       27
       27
       +
       

     

       28
       28
       +
       (** Convert list to hashtable for fast lookup by DOI *)

     

       29
       29
       +
       val to_map : ts -> (string, t) Hashtbl.t

     

       30
       30
       +
       

     

       31
       31
       +
       (** Find entry by DOI (excludes ignored entries) *)

     

       32
       32
       +
       val find_by_doi : ts -> string -> t option

     

       33
       33
       +
       

     

       34
       34
       +
       (** Find entry by source URL (searches through all source_urls, excludes ignored entries) *)

     

       35
       35
       +
       val find_by_url : ts -> string -> t option

     

       36
       36
       +
       

     

       37
       37
       +
       (** Find entry by DOI including ignored entries (for resolution checks) *)

     

       38
       38
       +
       val find_by_doi_including_ignored : ts -> string -> t option

     

       39
       39
       +
       

     

       40
       40
       +
       (** Find entry by source URL including ignored entries (for resolution checks) *)

     

       41
       41
       +
       val find_by_url_including_ignored : ts -> string -> t option

     

       42
       42
       +
       

     

       43
       43
       +
       (** Create a new resolved entry *)

     

       44
       44
       +
       val create_resolved : doi:string -> title:string -> authors:string list ->

     

       45
       45
       +
         year:int -> bibtype:string -> publisher:string -> ?source_urls:string list -> unit -> t

     

       46
       46
       +
       

     

       47
       47
       +
       (** Create a new failed entry *)

     

       48
       48
       +
       val create_failed : doi:string -> error:string -> ?source_urls:string list -> unit -> t

     

       49
       49
       +
       

     

       50
       50
       +
       (** Merge two entries with the same DOI, combining their source_urls *)

     

       51
       51
       +
       val merge_entries : t -> t -> t

+19

stack/bushel/lib/dune

···

       1
       1
       +
       (library

     

       2
       2
       +
        (name bushel)

     

       3
       3
       +
        (public_name bushel)

     

       4
       4
       +
        (libraries

     

       5
       5
       +
         cmarkit

     

       6
       6
       +
         uri

     

       7
       7
       +
         jsont

     

       8
       8
       +
         jsont.bytesrw

     

       9
       9
       +
         ezjsonm

     

       10
       10
       +
         ptime

     

       11
       11
       +
         yaml.unix

     

       12
       12
       +
         jekyll-format

     

       13
       13
       +
         lwt

     

       14
       14
       +
         cohttp-lwt-unix

     

       15
       15
       +
         fmt

     

       16
       16
       +
         re

     

       17
       17
       +
         ptime.clock

     

       18
       18
       +
         ptime.clock.os

     

       19
       19
       +
         typesense-client))

+446

stack/bushel/lib/entry.ml

···

       1
       1
       +
       type entry =

     

       2
       2
       +
         [ `Paper of Paper.t

     

       3
       3
       +
         | `Project of Project.t

     

       4
       4
       +
         | `Idea of Idea.t

     

       5
       5
       +
         | `Video of Video.t

     

       6
       6
       +
         | `Note of Note.t

     

       7
       7
       +
         ]

     

       8
       8
       +
       

     

       9
       9
       +
       type slugs = (string, entry) Hashtbl.t

     

       10
       10
       +
       

     

       11
       11
       +
       type t =

     

       12
       12
       +
         { slugs : slugs

     

       13
       13
       +
         ; papers : Paper.ts

     

       14
       14
       +
         ; old_papers : Paper.ts

     

       15
       15
       +
         ; notes : Note.ts

     

       16
       16
       +
         ; projects : Project.ts

     

       17
       17
       +
         ; ideas : Idea.ts

     

       18
       18
       +
         ; videos : Video.ts

     

       19
       19
       +
         ; contacts : Contact.ts

     

       20
       20
       +
         ; images : Srcsetter.ts

     

       21
       21
       +
         ; doi_entries : Doi_entry.ts

     

       22
       22
       +
         ; data_dir : string

     

       23
       23
       +
         }

     

       24
       24
       +
       

     

       25
       25
       +
       let contacts { contacts; _ } = contacts

     

       26
       26
       +
       let videos { videos; _ } = videos

     

       27
       27
       +
       let ideas { ideas; _ } = ideas

     

       28
       28
       +
       let papers { papers; _ } = papers

     

       29
       29
       +
       let notes { notes; _ } = notes

     

       30
       30
       +
       let projects { projects; _ } = projects

     

       31
       31
       +
       let images { images; _ } = images

     

       32
       32
       +
       let doi_entries { doi_entries; _ } = doi_entries

     

       33
       33
       +
       let data_dir { data_dir; _ } = data_dir

     

       34
       34
       +
       

     

       35
       35
       +
       let v ~papers ~notes ~projects ~ideas ~videos ~contacts ~images ~data_dir =

     

       36
       36
       +
         let slugs : slugs = Hashtbl.create 42 in

     

       37
       37
       +
         let papers, old_papers = List.partition (fun p -> p.Paper.latest) papers in

     

       38
       38
       +
         List.iter (fun n -> Hashtbl.add slugs n.Note.slug (`Note n)) notes;

     

       39
       39
       +
         List.iter (fun p -> Hashtbl.add slugs p.Project.slug (`Project p)) projects;

     

       40
       40
       +
         List.iter (fun i -> Hashtbl.add slugs i.Idea.slug (`Idea i)) ideas;

     

       41
       41
       +
         List.iter (fun v -> Hashtbl.add slugs v.Video.slug (`Video v)) videos;

     

       42
       42
       +
         List.iter (fun p -> Hashtbl.add slugs p.Paper.slug (`Paper p)) papers;

     

       43
       43
       +
         (* Load DOI entries from doi.yml *)

     

       44
       44
       +
         let doi_yml_path = Filename.concat data_dir "doi.yml" in

     

       45
       45
       +
         let doi_entries = Doi_entry.load doi_yml_path in

     

       46
       46
       +
         { slugs; papers; old_papers; notes; projects; ideas; videos; images; contacts; doi_entries; data_dir }

     

       47
       47
       +
       ;;

     

       48
       48
       +
       

     

       49
       49
       +
       let lookup { slugs; _ } slug = Hashtbl.find_opt slugs slug

     

       50
       50
       +
       let lookup_exn { slugs; _ } slug = Hashtbl.find slugs slug

     

       51
       51
       +
       

     

       52
       52
       +
       let old_papers { old_papers; _ } = old_papers

     

       53
       53
       +
       

     

       54
       54
       +
       let sidebar = function

     

       55
       55
       +
         | `Note { Note.sidebar = Some s; _ } -> Some s

     

       56
       56
       +
         | _ -> None

     

       57
       57
       +
       ;;

     

       58
       58
       +
       

     

       59
       59
       +
       let to_type_string = function

     

       60
       60
       +
         | `Paper _ -> "paper"

     

       61
       61
       +
         | `Note _ -> "note"

     

       62
       62
       +
         | `Project _ -> "project"

     

       63
       63
       +
         | `Idea _ -> "idea"

     

       64
       64
       +
         | `Video _ -> "video"

     

       65
       65
       +
       ;;

     

       66
       66
       +
       

     

       67
       67
       +
       let synopsis = function

     

       68
       68
       +
         | `Note n -> Note.synopsis n

     

       69
       69
       +
         | _ -> None

     

       70
       70
       +
       ;;

     

       71
       71
       +
       

     

       72
       72
       +
       let slug = function

     

       73
       73
       +
         | `Paper p -> p.Paper.slug

     

       74
       74
       +
         | `Note n -> n.Note.slug

     

       75
       75
       +
         | `Project p -> p.Project.slug

     

       76
       76
       +
         | `Idea i -> i.Idea.slug

     

       77
       77
       +
         | `Video v -> v.Video.slug

     

       78
       78
       +
       ;;

     

       79
       79
       +
       

     

       80
       80
       +
       let title = function

     

       81
       81
       +
         | `Paper p -> Paper.title p

     

       82
       82
       +
         | `Note n -> Note.title n

     

       83
       83
       +
         | `Project p -> Project.title p

     

       84
       84
       +
         | `Idea i -> Idea.title i

     

       85
       85
       +
         | `Video v -> Video.title v

     

       86
       86
       +
       ;;

     

       87
       87
       +
       

     

       88
       88
       +
       let body = function

     

       89
       89
       +
         | `Paper _ -> ""

     

       90
       90
       +
         | `Note n -> Note.body n

     

       91
       91
       +
         | `Project p -> Project.body p

     

       92
       92
       +
         | `Idea i -> Idea.body i

     

       93
       93
       +
         | `Video _ -> ""

     

       94
       94
       +
       ;;

     

       95
       95
       +
       

     

       96
       96
       +
       let site_url = function

     

       97
       97
       +
         | `Paper p -> "/papers/" ^ p.Paper.slug

     

       98
       98
       +
         | `Note n -> "/notes/" ^ n.Note.slug

     

       99
       99
       +
         | `Project p -> "/projects/" ^ p.Project.slug

     

       100
       100
       +
         | `Idea i -> "/ideas/" ^ i.Idea.slug

     

       101
       101
       +
         | `Video v -> "/videos/" ^ v.Video.slug

     

       102
       102
       +
       ;;

     

       103
       103
       +
       

     

       104
       104
       +
       (** Extract external URLs from markdown content *)

     

       105
       105
       +
       let extract_external_links md =

     

       106
       106
       +
         let open Cmarkit in

     

       107
       107
       +
         let urls = ref [] in

     

       108
       108
       +
         

     

       109
       109
       +
         let is_external_url url =

     

       110
       110
       +
           (* XXX FIXME *)

     

       111
       111
       +
           let is_bushel_slug = String.starts_with ~prefix:":" in

     

       112
       112
       +
           let is_tag_slug = String.starts_with ~prefix:"##" in

     

       113
       113
       +
           if is_bushel_slug url || is_tag_slug url then false

     

       114
       114
       +
           else 

     

       115
       115
       +
             try

     

       116
       116
       +
               let uri = Uri.of_string url in

     

       117
       117
       +
               match Uri.scheme uri with

     

       118
       118
       +
               | Some s when s = "http" || s = "https" -> true

     

       119
       119
       +
               | Some _ -> true  (* Any other scheme is considered external *)

     

       120
       120
       +
               | None -> false   (* Local references or relative paths *)

     

       121
       121
       +
             with _ -> false

     

       122
       122
       +
         in

     

       123
       123
       +
         

     

       124
       124
       +
         let inline_mapper _ = function

     

       125
       125
       +
           | Inline.Link (lb, _) | Inline.Image (lb, _) ->

     

       126
       126
       +
               let ref = Inline.Link.reference lb in

     

       127
       127
       +
               (match ref with

     

       128
       128
       +
               | `Inline (ld, _) ->

     

       129
       129
       +
                   (match Link_definition.dest ld with

     

       130
       130
       +
                   | Some (url, _) when is_external_url url -> 

     

       131
       131
       +
                       urls := url :: !urls;

     

       132
       132
       +
                       Mapper.default

     

       133
       133
       +
                   | _ -> Mapper.default)

     

       134
       134
       +
               | `Ref (_, _, l) ->

     

       135
       135
       +
                   (* Get the referenced label definition and extract URL if it exists *)

     

       136
       136
       +
                   let defs = Doc.defs (Doc.of_string ~strict:false md) in

     

       137
       137
       +
                   (match Label.Map.find_opt (Label.key l) defs with

     

       138
       138
       +
                   | Some (Link_definition.Def (ld, _)) -> 

     

       139
       139
       +
                       (match Link_definition.dest ld with

     

       140
       140
       +
                       | Some (url, _) when is_external_url url ->

     

       141
       141
       +
                           urls := url :: !urls

     

       142
       142
       +
                       | _ -> ())

     

       143
       143
       +
                   | _ -> ());

     

       144
       144
       +
                   Mapper.default)

     

       145
       145
       +
           | Inline.Autolink (autolink, _) ->

     

       146
       146
       +
               let url = Inline.Autolink.link autolink |> fst in

     

       147
       147
       +
               if not (Inline.Autolink.is_email autolink) && is_external_url url then

     

       148
       148
       +
                 urls := url :: !urls;

     

       149
       149
       +
               Mapper.default

     

       150
       150
       +
           | _ -> Mapper.default

     

       151
       151
       +
         in

     

       152
       152
       +
         

     

       153
       153
       +
         let mapper = Mapper.make ~inline:inline_mapper () in

     

       154
       154
       +
         let doc = Doc.of_string ~strict:false md in

     

       155
       155
       +
         let _ = Mapper.map_doc mapper doc in

     

       156
       156
       +
         List.sort_uniq String.compare !urls

     

       157
       157
       +
       

     

       158
       158
       +
       let outgoing_links e = extract_external_links (body e)

     

       159
       159
       +
       

     

       160
       160
       +
       let lookup_site_url t slug =

     

       161
       161
       +
         match lookup t slug with

     

       162
       162
       +
         | Some ent -> site_url ent

     

       163
       163
       +
         | None -> ""

     

       164
       164
       +
       

     

       165
       165
       +
       let lookup_title t slug =

     

       166
       166
       +
         match lookup t slug with

     

       167
       167
       +
         | Some ent -> title ent

     

       168
       168
       +
         | None -> ""

     

       169
       169
       +
       

     

       170
       170
       +
       

     

       171
       171
       +
       let date (x : entry) =

     

       172
       172
       +
         match x with

     

       173
       173
       +
         | `Paper p -> Paper.date p

     

       174
       174
       +
         | `Note n -> Note.date n

     

       175
       175
       +
         | `Project p -> p.Project.start, 1, 1

     

       176
       176
       +
         | `Idea i -> i.Idea.year, i.Idea.month, 1

     

       177
       177
       +
         | `Video v -> Video.date v

     

       178
       178
       +
       ;;

     

       179
       179
       +
       

     

       180
       180
       +
       let datetime v = date v |> Ptime.of_date |> Option.get

     

       181
       181
       +
       

     

       182
       182
       +
       let year x =

     

       183
       183
       +
         match date x with

     

       184
       184
       +
         | y, _, _ -> y

     

       185
       185
       +
       ;;

     

       186
       186
       +
       

     

       187
       187
       +
       let is_index_entry = function

     

       188
       188
       +
         | `Note { Note.index_page; _ } -> index_page

     

       189
       189
       +
         | _ -> false

     

       190
       190
       +
       ;;

     

       191
       191
       +
       

     

       192
       192
       +
       let notes_for_slug { notes; _ } slug =

     

       193
       193
       +
         List.filter (fun n -> match Note.slug_ent n with Some s -> s = slug | None -> false) notes

     

       194
       194
       +
       let all_entries { slugs; _ } = Hashtbl.fold (fun _ v acc -> v :: acc) slugs []

     

       195
       195
       +
       

     

       196
       196
       +
       let all_papers { papers; old_papers; _ } =

     

       197
       197
       +
         List.map (fun x -> `Paper x) (papers @ old_papers)

     

       198
       198
       +
       ;;

     

       199
       199
       +
       

     

       200
       200
       +
       let compare a b =

     

       201
       201
       +
         let datetime v = Option.get (Ptime.of_date v) in

     

       202
       202
       +
         let da = datetime (date a) in

     

       203
       203
       +
         let db = datetime (date b) in

     

       204
       204
       +
         if da = db then compare (title a) (title b) else Ptime.compare da db

     

       205
       205
       +
       ;;

     

       206
       206
       +
       

     

       207
       207
       +
       let lookup_by_name {contacts;_} n =

     

       208
       208
       +
         match Contact.lookup_by_name contacts n with

     

       209
       209
       +
         | v -> Some v

     

       210
       210
       +
         | exception _ -> None

     

       211
       211
       +
       

     

       212
       212
       +
       (** Extract the first image URL from markdown text *)

     

       213
       213
       +
       let extract_first_image md =

     

       214
       214
       +
         let open Cmarkit in

     

       215
       215
       +
         (* Don't use bushel link resolver to avoid circular dependency *)

     

       216
       216
       +
         let doc = Doc.of_string md in

     

       217
       217
       +
         let found_image = ref None in

     

       218
       218
       +
       

     

       219
       219
       +
         let find_image_in_inline _mapper = function

     

       220
       220
       +
           | Inline.Image (img, _) ->

     

       221
       221
       +
             (match Inline.Link.reference img with

     

       222
       222
       +
              | `Inline (ld, _) ->

     

       223
       223
       +
                (match Link_definition.dest ld with

     

       224
       224
       +
                 | Some (url, _) when !found_image = None ->

     

       225
       225
       +
                   found_image := Some url;

     

       226
       226
       +
                   Mapper.default

     

       227
       227
       +
                 | _ -> Mapper.default)

     

       228
       228
       +
              | _ -> Mapper.default)

     

       229
       229
       +
           | _ -> Mapper.default

     

       230
       230
       +
         in

     

       231
       231
       +
       

     

       232
       232
       +
         let mapper = Mapper.make ~inline:find_image_in_inline () in

     

       233
       233
       +
         let _ = Mapper.map_doc mapper doc in

     

       234
       234
       +
         !found_image

     

       235
       235
       +
       ;;

     

       236
       236
       +
       

     

       237
       237
       +
       (** Extract the first video slug from markdown text by looking for bushel video links *)

     

       238
       238
       +
       let extract_first_video entries md =

     

       239
       239
       +
         let open Cmarkit in

     

       240
       240
       +
         let doc = Doc.of_string md in

     

       241
       241
       +
         let found_video = ref None in

     

       242
       242
       +
       

     

       243
       243
       +
         let find_video_in_inline _mapper = function

     

       244
       244
       +
           | Inline.Link (link, _) ->

     

       245
       245
       +
             (match Inline.Link.reference link with

     

       246
       246
       +
              | `Inline (ld, _) ->

     

       247
       247
       +
                (match Link_definition.dest ld with

     

       248
       248
       +
                 | Some (url, _) when !found_video = None && String.starts_with ~prefix:":" url ->

     

       249
       249
       +
                   (* Check if this is a video slug *)

     

       250
       250
       +
                   let slug = String.sub url 1 (String.length url - 1) in

     

       251
       251
       +
                   (match lookup entries slug with

     

       252
       252
       +
                    | Some (`Video v) ->

     

       253
       253
       +
                      found_video := Some (Video.uuid v);

     

       254
       254
       +
                      Mapper.default

     

       255
       255
       +
                    | _ -> Mapper.default)

     

       256
       256
       +
                 | _ -> Mapper.default)

     

       257
       257
       +
              | _ -> Mapper.default)

     

       258
       258
       +
           | _ -> Mapper.default

     

       259
       259
       +
         in

     

       260
       260
       +
       

     

       261
       261
       +
         let mapper = Mapper.make ~inline:find_video_in_inline () in

     

       262
       262
       +
         let _ = Mapper.map_doc mapper doc in

     

       263
       263
       +
         !found_video

     

       264
       264
       +
       ;;

     

       265
       265
       +
       

     

       266
       266
       +
       (** Look up an image in the srcsetter list by slug *)

     

       267
       267
       +
       let lookup_image { images; _ } slug =

     

       268
       268
       +
         List.find_opt (fun img -> Srcsetter.slug img = slug) images

     

       269
       269
       +
       

     

       270
       270
       +
       (** Get the smallest webp variant from a srcsetter image *)

     

       271
       271
       +
       let smallest_webp_variant img =

     

       272
       272
       +
         let variants = Srcsetter.variants img in

     

       273
       273
       +
         let webp_variants =

     

       274
       274
       +
           Srcsetter.MS.bindings variants

     

       275
       275
       +
           |> List.filter (fun (name, _) -> String.ends_with ~suffix:".webp" name)

     

       276
       276
       +
         in

     

       277
       277
       +
         match webp_variants with

     

       278
       278
       +
         | [] ->

     

       279
       279
       +
           (* No webp variants - use the name field which is always webp *)

     

       280
       280
       +
           "/images/" ^ Srcsetter.name img

     

       281
       281
       +
         | variants ->

     

       282
       282
       +
           (* Find the variant with the smallest width *)

     

       283
       283
       +
           let smallest = List.fold_left (fun acc (name, (w, h)) ->

     

       284
       284
       +
             match acc with

     

       285
       285
       +
             | None -> Some (name, w, h)

     

       286
       286
       +
             | Some (_, min_w, _) when w < min_w -> Some (name, w, h)

     

       287
       287
       +
             | _ -> acc

     

       288
       288
       +
           ) None variants in

     

       289
       289
       +
           match smallest with

     

       290
       290
       +
           | Some (name, _, _) -> "/images/" ^ name

     

       291
       291
       +
           | None -> "/images/" ^ Srcsetter.name img

     

       292
       292
       +
       

     

       293
       293
       +
       (** Get thumbnail slug for a contact *)

     

       294
       294
       +
       let contact_thumbnail_slug contact =

     

       295
       295
       +
         (* Contact images use just the handle as slug *)

     

       296
       296
       +
         Some (Contact.handle contact)

     

       297
       297
       +
       

     

       298
       298
       +
       (** Get thumbnail URL for a contact - resolved through srcsetter *)

     

       299
       299
       +
       let contact_thumbnail entries contact =

     

       300
       300
       +
         match contact_thumbnail_slug contact with

     

       301
       301
       +
         | None -> None

     

       302
       302
       +
         | Some thumb_slug ->

     

       303
       303
       +
           match lookup_image entries thumb_slug with

     

       304
       304
       +
           | Some img -> Some (smallest_webp_variant img)

     

       305
       305
       +
           | None -> None (* Image not in srcsetter - thumbnails are optional *)

     

       306
       306
       +
       

     

       307
       307
       +
       (** Get thumbnail slug for an entry with fallbacks *)

     

       308
       308
       +
       let rec thumbnail_slug entries = function

     

       309
       309
       +
         | `Paper p ->

     

       310
       310
       +
           (* Slug is just the paper slug, directory is in the origin path *)

     

       311
       311
       +
           Some (Paper.slug p)

     

       312
       312
       +
       

     

       313
       313
       +
         | `Video v ->

     

       314
       314
       +
           (* Videos use their UUID as the slug *)

     

       315
       315
       +
           Some (Video.uuid v)

     

       316
       316
       +
       

     

       317
       317
       +
         | `Project p ->

     

       318
       318
       +
           (* Project images use "project-{slug}" format *)

     

       319
       319
       +
           Some (Printf.sprintf "project-%s" p.Project.slug)

     

       320
       320
       +
       

     

       321
       321
       +
         | `Idea i ->

     

       322
       322
       +
           let is_active = match Idea.status i with

     

       323
       323
       +
             | Idea.Available | Idea.Discussion | Idea.Ongoing -> true

     

       324
       324
       +
             | Idea.Completed | Idea.Expired -> false

     

       325
       325
       +
           in

     

       326
       326
       +
           if is_active then

     

       327
       327
       +
             (* Use first supervisor's face image *)

     

       328
       328
       +
             let supervisors = Idea.supervisors i in

     

       329
       329
       +
             match supervisors with

     

       330
       330
       +
             | sup :: _ ->

     

       331
       331
       +
               let handle = if String.length sup > 0 && sup.[0] = '@'

     

       332
       332
       +
                 then String.sub sup 1 (String.length sup - 1)

     

       333
       333
       +
                 else sup

     

       334
       334
       +
               in

     

       335
       335
       +
               (match Contact.find_by_handle (contacts entries) handle with

     

       336
       336
       +
                | Some c ->

     

       337
       337
       +
                  (* Contact images use just the handle as slug *)

     

       338
       338
       +
                  Some (Contact.handle c)

     

       339
       339
       +
                | None ->

     

       340
       340
       +
                  (* Fallback to project thumbnail *)

     

       341
       341
       +
                  let project_slug = Idea.project i in

     

       342
       342
       +
                  (match lookup entries project_slug with

     

       343
       343
       +
                   | Some p -> thumbnail_slug entries p

     

       344
       344
       +
                   | None -> None))

     

       345
       345
       +
             | [] ->

     

       346
       346
       +
               (* No supervisors, use project thumbnail *)

     

       347
       347
       +
               let project_slug = Idea.project i in

     

       348
       348
       +
               (match lookup entries project_slug with

     

       349
       349
       +
                | Some p -> thumbnail_slug entries p

     

       350
       350
       +
                | None -> None)

     

       351
       351
       +
           else

     

       352
       352
       +
             (* Use project thumbnail for completed/expired ideas *)

     

       353
       353
       +
             let project_slug = Idea.project i in

     

       354
       354
       +
             (match lookup entries project_slug with

     

       355
       355
       +
              | Some p -> thumbnail_slug entries p

     

       356
       356
       +
              | None -> None)

     

       357
       357
       +
       

     

       358
       358
       +
         | `Note n ->

     

       359
       359
       +
           (* Use titleimage if set, otherwise extract first image from body, then try video, otherwise use slug_ent's thumbnail *)

     

       360
       360
       +
           (match Note.titleimage n with

     

       361
       361
       +
            | Some slug ->

     

       362
       362
       +
              (* Always treat titleimage as a bushel slug (without ':' prefix) *)

     

       363
       363
       +
              Some slug

     

       364
       364
       +
            | None ->

     

       365
       365
       +
              (* Extract first image from markdown body *)

     

       366
       366
       +
              match extract_first_image (Note.body n) with

     

       367
       367
       +
              | Some url when String.starts_with ~prefix:":" url ->

     

       368
       368
       +
                Some (String.sub url 1 (String.length url - 1))

     

       369
       369
       +
              | Some _ -> None

     

       370
       370
       +
              | None ->

     

       371
       371
       +
                (* Try extracting first video from markdown body *)

     

       372
       372
       +
                match extract_first_video entries (Note.body n) with

     

       373
       373
       +
                | Some video_uuid -> Some video_uuid

     

       374
       374
       +
                | None ->

     

       375
       375
       +
                  (* Fallback to slug_ent's thumbnail if present *)

     

       376
       376
       +
                  match Note.slug_ent n with

     

       377
       377
       +
                  | Some slug_ent ->

     

       378
       378
       +
                    (match lookup entries slug_ent with

     

       379
       379
       +
                     | Some entry -> thumbnail_slug entries entry

     

       380
       380
       +
                     | None -> None)

     

       381
       381
       +
                  | None -> None)

     

       382
       382
       +
       

     

       383
       383
       +
       (** Get thumbnail URL for an entry with fallbacks - resolved through srcsetter *)

     

       384
       384
       +
       let thumbnail entries entry =

     

       385
       385
       +
         match thumbnail_slug entries entry with

     

       386
       386
       +
         | None -> None

     

       387
       387
       +
         | Some thumb_slug ->

     

       388
       388
       +
           match lookup_image entries thumb_slug with

     

       389
       389
       +
           | Some img -> Some (smallest_webp_variant img)

     

       390
       390
       +
           | None ->

     

       391
       391
       +
             (* For projects, fallback to supervisor faces if project image doesn't exist *)

     

       392
       392
       +
             (match entry with

     

       393
       393
       +
              | `Project p ->

     

       394
       394
       +
                (* Find ideas for this project *)

     

       395
       395
       +
                let project_ideas = List.filter (fun idea ->

     

       396
       396
       +
                  Idea.project idea = ":" ^ p.Project.slug

     

       397
       397
       +
                ) (ideas entries) in

     

       398
       398
       +
                (* Collect all unique supervisors from these ideas *)

     

       399
       399
       +
                let all_supervisors =

     

       400
       400
       +
                  List.fold_left (fun acc idea ->

     

       401
       401
       +
                    List.fold_left (fun acc2 sup ->

     

       402
       402
       +
                      if List.mem sup acc2 then acc2 else sup :: acc2

     

       403
       403
       +
                    ) acc (Idea.supervisors idea)

     

       404
       404
       +
                  ) [] project_ideas

     

       405
       405
       +
                in

     

       406
       406
       +
                (* Split into avsm and others, preferring others first *)

     

       407
       407
       +
                let (others, avsm) = List.partition (fun sup ->

     

       408
       408
       +
                  let handle = if String.length sup > 0 && sup.[0] = '@'

     

       409
       409
       +
                    then String.sub sup 1 (String.length sup - 1)

     

       410
       410
       +
                    else sup

     

       411
       411
       +
                  in

     

       412
       412
       +
                  handle <> "avsm"

     

       413
       413
       +
                ) all_supervisors in

     

       414
       414
       +
                (* Try supervisors in order: others first, then avsm *)

     

       415
       415
       +
                let ordered_supervisors = others @ avsm in

     

       416
       416
       +
                (* Try each supervisor's face image *)

     

       417
       417
       +
                let rec try_supervisors = function

     

       418
       418
       +
                  | [] -> None

     

       419
       419
       +
                  | sup :: rest ->

     

       420
       420
       +
                    let handle = if String.length sup > 0 && sup.[0] = '@'

     

       421
       421
       +
                      then String.sub sup 1 (String.length sup - 1)

     

       422
       422
       +
                      else sup

     

       423
       423
       +
                    in

     

       424
       424
       +
                    (match Contact.find_by_handle (contacts entries) handle with

     

       425
       425
       +
                     | Some c ->

     

       426
       426
       +
                       (match lookup_image entries (Contact.handle c) with

     

       427
       427
       +
                        | Some img -> Some (smallest_webp_variant img)

     

       428
       428
       +
                        | None -> try_supervisors rest)

     

       429
       429
       +
                     | None -> try_supervisors rest)

     

       430
       430
       +
                in

     

       431
       431
       +
                try_supervisors ordered_supervisors

     

       432
       432
       +
              | _ -> None)

     

       433
       433
       +
       

     

       434
       434
       +
       (** Get thumbnail URL for a note with slug_ent *)

     

       435
       435
       +
       let thumbnail_note_with_ent entries note_item =

     

       436
       436
       +
         (* Use linked entry's thumbnail if slug_ent is set *)

     

       437
       437
       +
         match Note.slug_ent note_item with

     

       438
       438
       +
         | Some slug_ent ->

     

       439
       439
       +
           (match lookup entries (":" ^ slug_ent) with

     

       440
       440
       +
            | Some entry -> thumbnail entries entry

     

       441
       441
       +
            | None ->

     

       442
       442
       +
              (* Fallback to extracting first image from note body *)

     

       443
       443
       +
              extract_first_image (Note.body note_item))

     

       444
       444
       +
         | None ->

     

       445
       445
       +
           (* No slug_ent, extract from note body *)

     

       446
       446
       +
           extract_first_image (Note.body note_item)

+79

stack/bushel/lib/entry.mli

···

       1
       1
       +
       type entry =

     

       2
       2
       +
         [ `Idea of Idea.t

     

       3
       3
       +
         | `Note of Note.t

     

       4
       4
       +
         | `Paper of Paper.t

     

       5
       5
       +
         | `Project of Project.t

     

       6
       6
       +
         | `Video of Video.t

     

       7
       7
       +
         ]

     

       8
       8
       +
       

     

       9
       9
       +
       type slugs = (string, entry) Hashtbl.t

     

       10
       10
       +
       type t

     

       11
       11
       +
       

     

       12
       12
       +
       val contacts : t -> Contact.ts

     

       13
       13
       +
       val videos : t -> Video.ts

     

       14
       14
       +
       val ideas : t -> Idea.ts

     

       15
       15
       +
       val papers : t -> Paper.ts

     

       16
       16
       +
       val notes : t -> Note.ts

     

       17
       17
       +
       val projects : t -> Project.ts

     

       18
       18
       +
       val images : t -> Srcsetter.ts

     

       19
       19
       +
       val doi_entries : t -> Doi_entry.ts

     

       20
       20
       +
       val data_dir : t -> string

     

       21
       21
       +
       

     

       22
       22
       +
       val v

     

       23
       23
       +
         :  papers:Paper.t list

     

       24
       24
       +
         -> notes:Note.ts

     

       25
       25
       +
         -> projects:Project.ts

     

       26
       26
       +
         -> ideas:Idea.ts

     

       27
       27
       +
         -> videos:Video.ts

     

       28
       28
       +
         -> contacts:Contact.ts

     

       29
       29
       +
         -> images:Srcsetter.ts

     

       30
       30
       +
         -> data_dir:string

     

       31
       31
       +
         -> t

     

       32
       32
       +
       

     

       33
       33
       +
       val lookup : t -> string -> entry option

     

       34
       34
       +
       val lookup_exn : t -> string -> entry

     

       35
       35
       +
       val lookup_site_url : t -> string -> string

     

       36
       36
       +
       val lookup_title : t -> string -> string

     

       37
       37
       +
       val lookup_by_name : t -> string -> Contact.t option

     

       38
       38
       +
       val old_papers : t -> Paper.ts

     

       39
       39
       +
       val sidebar : [> `Note of Note.t ] -> string option

     

       40
       40
       +
       val to_type_string : entry -> string

     

       41
       41
       +
       val slug : entry -> string

     

       42
       42
       +
       val title : entry -> string

     

       43
       43
       +
       val body : entry -> string

     

       44
       44
       +
       val extract_external_links : string -> string list

     

       45
       45
       +
       val outgoing_links : entry -> string list

     

       46
       46
       +
       

     

       47
       47
       +
       (* FIXME move to view *)

     

       48
       48
       +
       val site_url : entry -> string

     

       49
       49
       +
       val date : entry -> Ptime.date

     

       50
       50
       +
       val datetime : entry -> Ptime.t

     

       51
       51
       +
       val year : entry -> int

     

       52
       52
       +
       val synopsis : entry -> string option

     

       53
       53
       +
       

     

       54
       54
       +
       val is_index_entry : entry -> bool

     

       55
       55
       +
       val notes_for_slug : t -> string -> Note.t list

     

       56
       56
       +
       val all_entries : t -> entry list

     

       57
       57
       +
       val all_papers : t -> entry list

     

       58
       58
       +
       val compare : entry -> entry -> int

     

       59
       59
       +
       

     

       60
       60
       +
       (** Look up an image in the srcsetter list by slug *)

     

       61
       61
       +
       val lookup_image : t -> string -> Srcsetter.t option

     

       62
       62
       +
       

     

       63
       63
       +
       (** Get the smallest webp variant from a srcsetter image *)

     

       64
       64
       +
       val smallest_webp_variant : Srcsetter.t -> string

     

       65
       65
       +
       

     

       66
       66
       +
       (** Get thumbnail slug for a contact *)

     

       67
       67
       +
       val contact_thumbnail_slug : Contact.t -> string option

     

       68
       68
       +
       

     

       69
       69
       +
       (** Get thumbnail URL for a contact - resolved through srcsetter *)

     

       70
       70
       +
       val contact_thumbnail : t -> Contact.t -> string option

     

       71
       71
       +
       

     

       72
       72
       +
       (** Get thumbnail slug for an entry with fallbacks *)

     

       73
       73
       +
       val thumbnail_slug : t -> entry -> string option

     

       74
       74
       +
       

     

       75
       75
       +
       (** Get thumbnail URL for an entry with fallbacks - resolved through srcsetter *)

     

       76
       76
       +
       val thumbnail : t -> entry -> string option

     

       77
       77
       +
       

     

       78
       78
       +
       (** Get thumbnail URL for a note with slug_ent *)

     

       79
       79
       +
       val thumbnail_note_with_ent : t -> Note.t -> string option

+223

stack/bushel/lib/idea.ml

···

       1
       1
       +
       type level =

     

       2
       2
       +
         | Any

     

       3
       3
       +
         | PartII

     

       4
       4
       +
         | MPhil

     

       5
       5
       +
         | PhD

     

       6
       6
       +
         | Postdoc

     

       7
       7
       +
       

     

       8
       8
       +
       let level_of_yaml = function

     

       9
       9
       +
         | `String ("Any" | "any") -> Ok Any

     

       10
       10
       +
         | `String ("PartII" | "partii") -> Ok PartII

     

       11
       11
       +
         | `String ("MPhil" | "mphil") -> Ok MPhil

     

       12
       12
       +
         | `String ("PhD" | "phd") -> Ok PhD

     

       13
       13
       +
         | `String ("postdoc" | "Postdoc") -> Ok Postdoc

     

       14
       14
       +
         | _ -> Error (`Msg "level_of_yaml")

     

       15
       15
       +
       ;;

     

       16
       16
       +
       

     

       17
       17
       +
       let level_to_string = function

     

       18
       18
       +
         | Any -> "Any"

     

       19
       19
       +
         | PartII -> "PartII"

     

       20
       20
       +
         | MPhil -> "MPhil"

     

       21
       21
       +
         | PhD -> "PhD"

     

       22
       22
       +
         | Postdoc -> "postdoctoral"

     

       23
       23
       +
       ;;

     

       24
       24
       +
       

     

       25
       25
       +
       let level_to_tag = function

     

       26
       26
       +
         | Any -> "idea-beginner"

     

       27
       27
       +
         | PartII -> "idea-medium"

     

       28
       28
       +
         | MPhil -> "idea-hard"

     

       29
       29
       +
         | PhD -> "idea-phd"

     

       30
       30
       +
         | Postdoc -> "idea-postdoc"

     

       31
       31
       +
       ;;

     

       32
       32
       +
       

     

       33
       33
       +
       let level_to_yaml s = `String (level_to_string s)

     

       34
       34
       +
       

     

       35
       35
       +
       type status =

     

       36
       36
       +
         | Available

     

       37
       37
       +
         | Discussion

     

       38
       38
       +
         | Ongoing

     

       39
       39
       +
         | Completed

     

       40
       40
       +
         | Expired

     

       41
       41
       +
       

     

       42
       42
       +
       let status_of_yaml = function

     

       43
       43
       +
         | `String ("Available" | "available") -> Ok Available

     

       44
       44
       +
         | `String ("Discussion" | "discussion") -> Ok Discussion

     

       45
       45
       +
         | `String ("Ongoing" | "ongoing") -> Ok Ongoing

     

       46
       46
       +
         | `String ("Completed" | "completed") -> Ok Completed

     

       47
       47
       +
         | `String ("Expired" | "expired") -> Ok Expired

     

       48
       48
       +
         | _ -> Error (`Msg "status_of_yaml")

     

       49
       49
       +
       ;;

     

       50
       50
       +
       

     

       51
       51
       +
       let status_to_string = function

     

       52
       52
       +
         | Available -> "Available"

     

       53
       53
       +
         | Discussion -> "Discussion"

     

       54
       54
       +
         | Ongoing -> "Ongoing"

     

       55
       55
       +
         | Completed -> "Completed"

     

       56
       56
       +
         | Expired -> "Expired"

     

       57
       57
       +
       ;;

     

       58
       58
       +
       

     

       59
       59
       +
       let status_to_tag = function

     

       60
       60
       +
         | Available -> "idea-available"

     

       61
       61
       +
         | Discussion -> "idea-discuss"

     

       62
       62
       +
         | Ongoing -> "idea-ongoing"

     

       63
       63
       +
         | Completed -> "idea-done"

     

       64
       64
       +
         | Expired -> "idea-expired"

     

       65
       65
       +
       ;;

     

       66
       66
       +
       

     

       67
       67
       +
       let status_to_yaml s = `String (status_to_string s)

     

       68
       68
       +
       

     

       69
       69
       +
       type t =

     

       70
       70
       +
         { slug : string

     

       71
       71
       +
         ; title : string

     

       72
       72
       +
         ; level : level

     

       73
       73
       +
         ; project : string

     

       74
       74
       +
         ; status : status

     

       75
       75
       +
         ; month: int

     

       76
       76
       +
         ; year : int

     

       77
       77
       +
         ; supervisors : string list

     

       78
       78
       +
         ; students : string list

     

       79
       79
       +
         ; reading : string

     

       80
       80
       +
         ; body : string

     

       81
       81
       +
         ; url : string option

     

       82
       82
       +
         ; tags : string list

     

       83
       83
       +
         }

     

       84
       84
       +
       

     

       85
       85
       +
       type ts = t list

     

       86
       86
       +
       

     

       87
       87
       +
       let title i = i.title

     

       88
       88
       +
       let supervisors i = i.supervisors

     

       89
       89
       +
       let students i = i.students

     

       90
       90
       +
       let reading i = i.reading

     

       91
       91
       +
       let status i = i.status

     

       92
       92
       +
       let level i = i.level

     

       93
       93
       +
       let year i = i.year

     

       94
       94
       +
       let body i = i.body

     

       95
       95
       +
       let project i = i.project

     

       96
       96
       +
       

     

       97
       97
       +
       let compare a b =

     

       98
       98
       +
         match compare a.status b.status with

     

       99
       99
       +
         | 0 ->

     

       100
       100
       +
           (match a.status with

     

       101
       101
       +
            | Completed -> compare b.year a.year

     

       102
       102
       +
            | _ ->

     

       103
       103
       +
              (match compare a.level b.level with

     

       104
       104
       +
               | 0 -> begin 

     

       105
       105
       +
                 match compare b.year a.year with

     

       106
       106
       +
                 | 0 -> compare b.month a.month

     

       107
       107
       +
                 | n -> n

     

       108
       108
       +
               end

     

       109
       109
       +
               | n -> n))

     

       110
       110
       +
         | n -> n

     

       111
       111
       +
       ;;

     

       112
       112
       +
       

     

       113
       113
       +
       let of_md fname =

     

       114
       114
       +
         match Jekyll_post.of_string ~fname:(Filename.basename fname) (Util.read_file fname) with

     

       115
       115
       +
         | Error _ -> failwith "TODO"

     

       116
       116
       +
         | Ok jp ->

     

       117
       117
       +
           let fields = jp.Jekyll_post.fields in

     

       118
       118
       +
           let y = Jekyll_format.fields_to_yaml fields in

     

       119
       119
       +
           let year, month, _ = jp.Jekyll_post.date |> Ptime.to_date in

     

       120
       120
       +
           let body = jp.Jekyll_post.body in

     

       121
       121
       +
           let string f = Yaml.Util.(find_exn f y |> Option.get |> to_string |> Result.get_ok) in

     

       122
       122
       +
           let string' f d =

     

       123
       123
       +
             try Yaml.Util.(find_exn f y |> Option.get |> to_string |> Result.get_ok) with

     

       124
       124
       +
             | _ -> d

     

       125
       125
       +
           in

     

       126
       126
       +
           let to_list = function

     

       127
       127
       +
             | `A l -> Ok l

     

       128
       128
       +
             | _ -> Error (`Msg "to_list")

     

       129
       129
       +
           in

     

       130
       130
       +
           let strings f =

     

       131
       131
       +
             try

     

       132
       132
       +
               Yaml.Util.(

     

       133
       133
       +
                 find_exn f y

     

       134
       134
       +
                 |> Option.get

     

       135
       135
       +
                 |> to_list

     

       136
       136
       +
                 |> Result.get_ok

     

       137
       137
       +
                 |> List.map (fun x -> to_string x |> Result.get_ok))

     

       138
       138
       +
             with

     

       139
       139
       +
             | _exn -> []

     

       140
       140
       +
           in

     

       141
       141
       +
           let level =

     

       142
       142
       +
             Yaml.Util.(find_exn "level" y |> Option.get |> level_of_yaml |> Result.get_ok)

     

       143
       143
       +
           in

     

       144
       144
       +
           let status =

     

       145
       145
       +
             Yaml.Util.(find_exn "status" y |> Option.get |> status_of_yaml |> Result.get_ok)

     

       146
       146
       +
           in

     

       147
       147
       +
           let slug = jp.Jekyll_post.slug in

     

       148
       148
       +
           { slug

     

       149
       149
       +
           ; title = string "title"

     

       150
       150
       +
           ; level

     

       151
       151
       +
           ; project = string "project"

     

       152
       152
       +
           ; status

     

       153
       153
       +
           ; supervisors = strings "supervisors"

     

       154
       154
       +
           ; students = strings "students"

     

       155
       155
       +
           ; tags = strings "tags"

     

       156
       156
       +
           ; reading = string' "reading" ""

     

       157
       157
       +
           ; month

     

       158
       158
       +
           ; year

     

       159
       159
       +
           ; body

     

       160
       160
       +
           ; url = None (* TODO *)

     

       161
       161
       +
           }

     

       162
       162
       +
       ;;

     

       163
       163
       +
       

     

       164
       164
       +
       let lookup ideas slug = List.find_opt (fun i -> i.slug = slug) ideas

     

       165
       165
       +
       

     

       166
       166
       +
       (* TODO:claude *)

     

       167
       167
       +
       let typesense_schema =

     

       168
       168
       +
         let open Ezjsonm in

     

       169
       169
       +
         dict [

     

       170
       170
       +
           ("name", string "ideas");

     

       171
       171
       +
           ("fields", list (fun d -> dict d) [

     

       172
       172
       +
             [("name", string "id"); ("type", string "string")];

     

       173
       173
       +
             [("name", string "title"); ("type", string "string")];

     

       174
       174
       +
             [("name", string "description"); ("type", string "string")];

     

       175
       175
       +
             [("name", string "year"); ("type", string "int32")];

     

       176
       176
       +
             [("name", string "date"); ("type", string "string")];

     

       177
       177
       +
             [("name", string "date_timestamp"); ("type", string "int64")];

     

       178
       178
       +
             [("name", string "tags"); ("type", string "string[]"); ("facet", bool true)];

     

       179
       179
       +
             [("name", string "level"); ("type", string "string"); ("facet", bool true)];

     

       180
       180
       +
             [("name", string "status"); ("type", string "string"); ("facet", bool true)];

     

       181
       181
       +
             [("name", string "project"); ("type", string "string"); ("facet", bool true)];

     

       182
       182
       +
             [("name", string "supervisors"); ("type", string "string[]"); ("optional", bool true)];

     

       183
       183
       +
             [("name", string "body"); ("type", string "string"); ("optional", bool true)];

     

       184
       184
       +
             [("name", string "students"); ("type", string "string[]"); ("optional", bool true)];

     

       185
       185
       +
             [("name", string "reading"); ("type", string "string"); ("optional", bool true)];

     

       186
       186
       +
             [("name", string "url"); ("type", string "string"); ("optional", bool true)];

     

       187
       187
       +
           ]);

     

       188
       188
       +
           ("default_sorting_field", string "date_timestamp");

     

       189
       189
       +
         ]

     

       190
       190
       +
       

     

       191
       191
       +
       (** TODO:claude Pretty-print an idea with ANSI formatting *)

     

       192
       192
       +
       let pp ppf i =

     

       193
       193
       +
         let open Fmt in

     

       194
       194
       +
         pf ppf "@[<v>";

     

       195
       195
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Idea";

     

       196
       196
       +
         pf ppf "%a: %a@," (styled `Bold string) "Slug" string i.slug;

     

       197
       197
       +
         pf ppf "%a: %a@," (styled `Bold string) "Title" string (title i);

     

       198
       198
       +
         pf ppf "%a: %a@," (styled `Bold string) "Level" string (level_to_string (level i));

     

       199
       199
       +
         pf ppf "%a: %a@," (styled `Bold string) "Status" string (status_to_string (status i));

     

       200
       200
       +
         pf ppf "%a: %a@," (styled `Bold string) "Project" string (project i);

     

       201
       201
       +
         pf ppf "%a: %04d-%02d@," (styled `Bold string) "Date" (year i) i.month;

     

       202
       202
       +
         let sups = supervisors i in

     

       203
       203
       +
         if sups <> [] then

     

       204
       204
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Supervisors" (list ~sep:comma string) sups;

     

       205
       205
       +
         let studs = students i in

     

       206
       206
       +
         if studs <> [] then

     

       207
       207
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Students" (list ~sep:comma string) studs;

     

       208
       208
       +
         (match i.url with

     

       209
       209
       +
          | Some url -> pf ppf "%a: %a@," (styled `Bold string) "URL" string url

     

       210
       210
       +
          | None -> ());

     

       211
       211
       +
         let t = i.tags in

     

       212
       212
       +
         if t <> [] then

     

       213
       213
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Tags" (list ~sep:comma string) t;

     

       214
       214
       +
         let r = reading i in

     

       215
       215
       +
         if r <> "" then begin

     

       216
       216
       +
           pf ppf "@,";

     

       217
       217
       +
           pf ppf "%a:@," (styled `Bold string) "Reading";

     

       218
       218
       +
           pf ppf "%a@," string r;

     

       219
       219
       +
         end;

     

       220
       220
       +
         pf ppf "@,";

     

       221
       221
       +
         pf ppf "%a:@," (styled `Bold string) "Body";

     

       222
       222
       +
         pf ppf "%a@," string (body i);

     

       223
       223
       +
         pf ppf "@]"

+55

stack/bushel/lib/idea.mli

···

       1
       1
       +
       type level =

     

       2
       2
       +
         | Any

     

       3
       3
       +
         | PartII

     

       4
       4
       +
         | MPhil

     

       5
       5
       +
         | PhD

     

       6
       6
       +
         | Postdoc

     

       7
       7
       +
       

     

       8
       8
       +
       type status =

     

       9
       9
       +
         | Available

     

       10
       10
       +
         | Discussion

     

       11
       11
       +
         | Ongoing

     

       12
       12
       +
         | Completed

     

       13
       13
       +
         | Expired

     

       14
       14
       +
       

     

       15
       15
       +
       val level_of_yaml : Ezjsonm.value -> (level, [> `Msg of string ]) result

     

       16
       16
       +
       val level_to_string : level -> string

     

       17
       17
       +
       val level_to_tag : level -> string

     

       18
       18
       +
       val level_to_yaml : level -> Ezjsonm.value

     

       19
       19
       +
       val status_of_yaml : Ezjsonm.value -> (status, [> `Msg of string ]) result

     

       20
       20
       +
       val status_to_string : status -> string

     

       21
       21
       +
       val status_to_tag : status -> string

     

       22
       22
       +
       val status_to_yaml : status -> Ezjsonm.value

     

       23
       23
       +
       

     

       24
       24
       +
       type t =

     

       25
       25
       +
         { slug : string

     

       26
       26
       +
         ; title : string

     

       27
       27
       +
         ; level : level

     

       28
       28
       +
         ; project : string

     

       29
       29
       +
         ; status : status

     

       30
       30
       +
         ; month : int

     

       31
       31
       +
         ; year : int

     

       32
       32
       +
         ; supervisors : string list

     

       33
       33
       +
         ; students : string list

     

       34
       34
       +
         ; reading : string

     

       35
       35
       +
         ; body : string

     

       36
       36
       +
         ; url : string option

     

       37
       37
       +
         ; tags : string list

     

       38
       38
       +
         }

     

       39
       39
       +
       

     

       40
       40
       +
       type ts = t list

     

       41
       41
       +
       

     

       42
       42
       +
       val title : t -> string

     

       43
       43
       +
       val supervisors : t -> string list

     

       44
       44
       +
       val students : t -> string list

     

       45
       45
       +
       val reading : t -> string

     

       46
       46
       +
       val status : t -> status

     

       47
       47
       +
       val level : t -> level

     

       48
       48
       +
       val year : t -> int

     

       49
       49
       +
       val body : t -> string

     

       50
       50
       +
       val project : t -> string

     

       51
       51
       +
       val compare : t -> t -> int

     

       52
       52
       +
       val lookup : t list -> string -> t option

     

       53
       53
       +
       val of_md : string -> t

     

       54
       54
       +
       val typesense_schema : Ezjsonm.value

     

       55
       55
       +
       val pp : Format.formatter -> t -> unit

+296

stack/bushel/lib/link.ml

···

       1
       1
       +
       type karakeep_data = {

     

       2
       2
       +
         remote_url : string;

     

       3
       3
       +
         id : string;

     

       4
       4
       +
         tags : string list;

     

       5
       5
       +
         metadata : (string * string) list;

     

       6
       6
       +
       }

     

       7
       7
       +
       

     

       8
       8
       +
       type bushel_data = {

     

       9
       9
       +
         slugs : string list;

     

       10
       10
       +
         tags : string list;

     

       11
       11
       +
       }

     

       12
       12
       +
       

     

       13
       13
       +
       type t = {

     

       14
       14
       +
         url : string;

     

       15
       15
       +
         date : Ptime.date;

     

       16
       16
       +
         description : string;

     

       17
       17
       +
         karakeep : karakeep_data option;

     

       18
       18
       +
         bushel : bushel_data option;

     

       19
       19
       +
       }

     

       20
       20
       +
       

     

       21
       21
       +
       type ts = t list

     

       22
       22
       +
       

     

       23
       23
       +
       let url { url; _ } = url

     

       24
       24
       +
       let date { date; _ } = date

     

       25
       25
       +
       let description { description; _ } = description

     

       26
       26
       +
       let datetime v = Option.get @@ Ptime.of_date @@ date v

     

       27
       27
       +
       let compare a b = Ptime.compare (datetime b) (datetime a)

     

       28
       28
       +
       

     

       29
       29
       +
       (* Convert YAML to Link.t *)

     

       30
       30
       +
       let t_of_yaml = function

     

       31
       31
       +
         | `O fields ->

     

       32
       32
       +
           let url =

     

       33
       33
       +
             match List.assoc_opt "url" fields with

     

       34
       34
       +
             | Some (`String v) -> v

     

       35
       35
       +
             | _ -> failwith "link: missing or invalid url"

     

       36
       36
       +
           in

     

       37
       37
       +
           let date = 

     

       38
       38
       +
             match List.assoc_opt "date" fields with

     

       39
       39
       +
             | Some (`String v) ->  begin

     

       40
       40
       +
                 try

     

       41
       41
       +
                   match Scanf.sscanf v "%04d-%02d-%02d" (fun y m d -> (y, m, d)) with

     

       42
       42
       +
                   | (y, m, d) -> (y, m, d)

     

       43
       43
       +
                 with _ ->

     

       44
       44
       +
                   (* Fall back to RFC3339 parsing for backward compatibility *)

     

       45
       45
       +
                   v |> Ptime.of_rfc3339 |> Result.get_ok |> fun (a, _, _) -> Ptime.to_date a

     

       46
       46
       +
             end

     

       47
       47
       +
             | _ -> failwith "link: missing or invalid date"

     

       48
       48
       +
           in

     

       49
       49
       +
           let description =

     

       50
       50
       +
             match List.assoc_opt "description" fields with

     

       51
       51
       +
             | Some (`String v) -> v

     

       52
       52
       +
             | _ -> ""

     

       53
       53
       +
           in

     

       54
       54
       +
           let karakeep =

     

       55
       55
       +
             match List.assoc_opt "karakeep" fields with

     

       56
       56
       +
             | Some (`O k_fields) ->

     

       57
       57
       +
                 let remote_url = 

     

       58
       58
       +
                   match List.assoc_opt "remote_url" k_fields with

     

       59
       59
       +
                   | Some (`String v) -> v

     

       60
       60
       +
                   | _ -> failwith "link: invalid karakeep.remote_url"

     

       61
       61
       +
                 in

     

       62
       62
       +
                 let id = 

     

       63
       63
       +
                   match List.assoc_opt "id" k_fields with

     

       64
       64
       +
                   | Some (`String v) -> v

     

       65
       65
       +
                   | _ -> failwith "link: invalid karakeep.id"

     

       66
       66
       +
                 in

     

       67
       67
       +
                 let tags =

     

       68
       68
       +
                   match List.assoc_opt "tags" k_fields with

     

       69
       69
       +
                   | Some (`A tag_list) ->

     

       70
       70
       +
                       List.fold_left (fun acc tag ->

     

       71
       71
       +
                         match tag with

     

       72
       72
       +
                         | `String t -> t :: acc

     

       73
       73
       +
                         | _ -> acc

     

       74
       74
       +
                       ) [] tag_list

     

       75
       75
       +
                       |> List.rev

     

       76
       76
       +
                   | _ -> []

     

       77
       77
       +
                 in

     

       78
       78
       +
                 let metadata =

     

       79
       79
       +
                   match List.assoc_opt "metadata" k_fields with

     

       80
       80
       +
                   | Some (`O meta_fields) -> 

     

       81
       81
       +
                       List.fold_left (fun acc (k, v) ->

     

       82
       82
       +
                         match v with

     

       83
       83
       +
                         | `String value -> (k, value) :: acc

     

       84
       84
       +
                         | _ -> acc

     

       85
       85
       +
                       ) [] meta_fields

     

       86
       86
       +
                   | _ -> []

     

       87
       87
       +
                 in

     

       88
       88
       +
                 Some { remote_url; id; tags; metadata }

     

       89
       89
       +
             | _ -> None

     

       90
       90
       +
           in

     

       91
       91
       +
           let bushel =

     

       92
       92
       +
             match List.assoc_opt "bushel" fields with

     

       93
       93
       +
             | Some (`O b_fields) ->

     

       94
       94
       +
                 let slugs =

     

       95
       95
       +
                   match List.assoc_opt "slugs" b_fields with

     

       96
       96
       +
                   | Some (`A slug_list) ->

     

       97
       97
       +
                       List.fold_left (fun acc slug ->

     

       98
       98
       +
                         match slug with

     

       99
       99
       +
                         | `String s -> s :: acc

     

       100
       100
       +
                         | _ -> acc

     

       101
       101
       +
                       ) [] slug_list

     

       102
       102
       +
                       |> List.rev

     

       103
       103
       +
                   | _ -> []

     

       104
       104
       +
                 in

     

       105
       105
       +
                 let tags =

     

       106
       106
       +
                   match List.assoc_opt "tags" b_fields with

     

       107
       107
       +
                   | Some (`A tag_list) ->

     

       108
       108
       +
                       List.fold_left (fun acc tag ->

     

       109
       109
       +
                         match tag with

     

       110
       110
       +
                         | `String t -> t :: acc

     

       111
       111
       +
                         | _ -> acc

     

       112
       112
       +
                       ) [] tag_list

     

       113
       113
       +
                       |> List.rev

     

       114
       114
       +
                   | _ -> []

     

       115
       115
       +
                 in

     

       116
       116
       +
                 Some { slugs; tags }

     

       117
       117
       +
             | _ -> None

     

       118
       118
       +
           in

     

       119
       119
       +
           { url; date; description; karakeep; bushel }

     

       120
       120
       +
         | _ -> failwith "invalid yaml"

     

       121
       121
       +
       

     

       122
       122
       +
       (* Read file contents *)

     

       123
       123
       +
       let read_file file = In_channel.(with_open_bin file input_all)

     

       124
       124
       +
       

     

       125
       125
       +
       (* Load links from a YAML file *)

     

       126
       126
       +
       let of_md fname =

     

       127
       127
       +
         match Yaml.of_string_exn (read_file fname) with

     

       128
       128
       +
         | `A links -> 

     

       129
       129
       +
             List.map t_of_yaml links

     

       130
       130
       +
         | `O _ as single_link -> 

     

       131
       131
       +
             [t_of_yaml single_link]

     

       132
       132
       +
         | _ -> failwith "link_of_md: expected array or object"

     

       133
       133
       +
       

     

       134
       134
       +
       (* Convert Link.t to YAML *)

     

       135
       135
       +
       let to_yaml t =

     

       136
       136
       +
         let (year, month, day) = t.date in

     

       137
       137
       +
         let date_str = Printf.sprintf "%04d-%02d-%02d" year month day in

     

       138
       138
       +
         

     

       139
       139
       +
         (* Create base fields *)

     

       140
       140
       +
         let base_fields = [

     

       141
       141
       +
           ("url", `String t.url);

     

       142
       142
       +
           ("date", `String date_str);

     

       143
       143
       +
         ] @ 

     

       144
       144
       +
         (if t.description = "" then [] else [("description", `String t.description)])

     

       145
       145
       +
         in

     

       146
       146
       +
         

     

       147
       147
       +
         (* Add karakeep data if present *)

     

       148
       148
       +
         let karakeep_fields =

     

       149
       149
       +
           match t.karakeep with

     

       150
       150
       +
           | Some { remote_url; id; tags; metadata } -> 

     

       151
       151
       +
               let karakeep_obj = [

     

       152
       152
       +
                 ("remote_url", `String remote_url);

     

       153
       153
       +
                 ("id", `String id);

     

       154
       154
       +
               ] in

     

       155
       155
       +
               let karakeep_obj = 

     

       156
       156
       +
                 if tags = [] then karakeep_obj

     

       157
       157
       +
                 else ("tags", `A (List.map (fun t -> `String t) tags)) :: karakeep_obj

     

       158
       158
       +
               in

     

       159
       159
       +
               let karakeep_obj =

     

       160
       160
       +
                 if metadata = [] then karakeep_obj

     

       161
       161
       +
                 else ("metadata", `O (List.map (fun (k, v) -> (k, `String v)) metadata)) :: karakeep_obj

     

       162
       162
       +
               in

     

       163
       163
       +
               [("karakeep", `O karakeep_obj)]

     

       164
       164
       +
           | None -> []

     

       165
       165
       +
         in

     

       166
       166
       +
         

     

       167
       167
       +
         (* Add bushel data if present *)

     

       168
       168
       +
         let bushel_fields =

     

       169
       169
       +
           match t.bushel with

     

       170
       170
       +
           | Some { slugs; tags } -> 

     

       171
       171
       +
               let bushel_obj = [] in

     

       172
       172
       +
               let bushel_obj = 

     

       173
       173
       +
                 if slugs = [] then bushel_obj

     

       174
       174
       +
                 else ("slugs", `A (List.map (fun s -> `String s) slugs)) :: bushel_obj

     

       175
       175
       +
               in

     

       176
       176
       +
               let bushel_obj =

     

       177
       177
       +
                 if tags = [] then bushel_obj

     

       178
       178
       +
                 else ("tags", `A (List.map (fun t -> `String t) tags)) :: bushel_obj

     

       179
       179
       +
               in

     

       180
       180
       +
               if bushel_obj = [] then [] else [("bushel", `O bushel_obj)]

     

       181
       181
       +
           | None -> []

     

       182
       182
       +
         in

     

       183
       183
       +
         

     

       184
       184
       +
         `O (base_fields @ karakeep_fields @ bushel_fields)

     

       185
       185
       +
       

     

       186
       186
       +
       (* Write a link to a file in the output directory *)

     

       187
       187
       +
       let to_file output_dir t =

     

       188
       188
       +
         let filename = 

     

       189
       189
       +
           let (y, m, d) = t.date in

     

       190
       190
       +
           let hash = Digest.string t.url |> Digest.to_hex in

     

       191
       191
       +
           let short_hash = String.sub hash 0 8 in

     

       192
       192
       +
           Printf.sprintf "%04d-%02d-%02d-%s.md" y m d short_hash

     

       193
       193
       +
         in

     

       194
       194
       +
         let file_path = Fpath.v (Filename.concat output_dir filename) in

     

       195
       195
       +
         let yaml = to_yaml t in

     

       196
       196
       +
         let yaml_str = Yaml.to_string_exn yaml in

     

       197
       197
       +
         let content = "---\n" ^ yaml_str ^ "---\n" in

     

       198
       198
       +
         Bos.OS.File.write file_path content

     

       199
       199
       +
       

     

       200
       200
       +
       (* Load links from a YAML file *)

     

       201
       201
       +
       let load_links_file path =

     

       202
       202
       +
         try

     

       203
       203
       +
           let yaml_str = In_channel.(with_open_bin path input_all) in

     

       204
       204
       +
           match Yaml.of_string_exn yaml_str with

     

       205
       205
       +
           | `A links -> List.map t_of_yaml links

     

       206
       206
       +
           | _ -> []

     

       207
       207
       +
         with _ -> []

     

       208
       208
       +
       

     

       209
       209
       +
       (* Save links to a YAML file *)

     

       210
       210
       +
       let save_links_file path links =

     

       211
       211
       +
         try

     

       212
       212
       +
           let yaml = `A (List.map to_yaml links) in

     

       213
       213
       +
           let yaml_str = Yaml.to_string_exn ~len:4200000 yaml in

     

       214
       214
       +
           let oc = open_out path in

     

       215
       215
       +
           output_string oc yaml_str;

     

       216
       216
       +
           close_out oc

     

       217
       217
       +
         with e ->

     

       218
       218
       +
           Printf.eprintf "Error saving links file: %s\n%!" (Printexc.to_string e);

     

       219
       219
       +
           Printf.eprintf "Attempting to save with smaller length limit...\n%!";

     

       220
       220
       +
           let yaml = `A (List.map to_yaml links) in

     

       221
       221
       +
           let yaml_str = Yaml.to_string_exn ~len:800000 yaml in

     

       222
       222
       +
           let oc = open_out path in

     

       223
       223
       +
           output_string oc yaml_str;

     

       224
       224
       +
           close_out oc

     

       225
       225
       +
       

     

       226
       226
       +
       (* Merge two lists of links, combining metadata from duplicates *)

     

       227
       227
       +
       let merge_links ?(prefer_new_date=false) existing new_links =

     

       228
       228
       +
         let links_by_url = Hashtbl.create (List.length existing) in

     

       229
       229
       +
         

     

       230
       230
       +
         (* Add existing links to hashtable *)

     

       231
       231
       +
         List.iter (fun link -> 

     

       232
       232
       +
           Hashtbl.replace links_by_url link.url link

     

       233
       233
       +
         ) existing;

     

       234
       234
       +
         

     

       235
       235
       +
         (* Merge new links with existing ones *)

     

       236
       236
       +
         List.iter (fun new_link ->

     

       237
       237
       +
           match Hashtbl.find_opt links_by_url new_link.url with

     

       238
       238
       +
           | None -> 

     

       239
       239
       +
               (* New link not in existing links *)

     

       240
       240
       +
               Hashtbl.add links_by_url new_link.url new_link

     

       241
       241
       +
           | Some old_link ->

     

       242
       242
       +
               (* Merge link data, prefer newer data for fields *)

     

       243
       243
       +
               let title = 

     

       244
       244
       +
                 if new_link.description <> "" then new_link.description 

     

       245
       245
       +
                 else old_link.description 

     

       246
       246
       +
               in

     

       247
       247
       +
               

     

       248
       248
       +
               (* Combine karakeep data (prefer new over old) *)

     

       249
       249
       +
               let karakeep = 

     

       250
       250
       +
                 match new_link.karakeep, old_link.karakeep with

     

       251
       251
       +
                 | Some new_k, Some old_k when new_k.remote_url = old_k.remote_url ->

     

       252
       252
       +
                     (* Same remote, merge the data *)

     

       253
       253
       +
                     let merged_metadata =

     

       254
       254
       +
                       let meta_tbl = Hashtbl.create (List.length old_k.metadata) in

     

       255
       255
       +
                       List.iter (fun (k, v) -> Hashtbl.replace meta_tbl k v) old_k.metadata;

     

       256
       256
       +
                       List.iter (fun (k, v) -> Hashtbl.replace meta_tbl k v) new_k.metadata;

     

       257
       257
       +
                       Hashtbl.fold (fun k v acc -> (k, v) :: acc) meta_tbl []

     

       258
       258
       +
                     in

     

       259
       259
       +
                     let merged_tags = List.sort_uniq String.compare (old_k.tags @ new_k.tags) in

     

       260
       260
       +
                     Some { new_k with metadata = merged_metadata; tags = merged_tags }

     

       261
       261
       +
                 | Some new_k, _ -> Some new_k

     

       262
       262
       +
                 | None, old_k -> old_k

     

       263
       263
       +
               in

     

       264
       264
       +
               

     

       265
       265
       +
               (* Combine bushel data *)

     

       266
       266
       +
               let bushel = 

     

       267
       267
       +
                 match new_link.bushel, old_link.bushel with

     

       268
       268
       +
                 | Some new_b, Some old_b ->

     

       269
       269
       +
                     (* Merge slugs and tags *)

     

       270
       270
       +
                     let merged_slugs = List.sort_uniq String.compare (old_b.slugs @ new_b.slugs) in

     

       271
       271
       +
                     let merged_tags = List.sort_uniq String.compare (old_b.tags @ new_b.tags) in

     

       272
       272
       +
                     Some { slugs = merged_slugs; tags = merged_tags }

     

       273
       273
       +
                 | Some new_b, _ -> Some new_b

     

       274
       274
       +
                 | None, old_b -> old_b

     

       275
       275
       +
               in

     

       276
       276
       +
               

     

       277
       277
       +
               (* Combined link - prefer new date when requested (for bushel entries) *)

     

       278
       278
       +
               let date = 

     

       279
       279
       +
                 if prefer_new_date then new_link.date

     

       280
       280
       +
                 else if compare new_link old_link > 0 then new_link.date 

     

       281
       281
       +
                 else old_link.date

     

       282
       282
       +
               in

     

       283
       283
       +
               let merged_link = {

     

       284
       284
       +
                 url = new_link.url;

     

       285
       285
       +
                 date;

     

       286
       286
       +
                 description = title;

     

       287
       287
       +
                 karakeep;

     

       288
       288
       +
                 bushel

     

       289
       289
       +
               } in

     

       290
       290
       +
               Hashtbl.replace links_by_url new_link.url merged_link

     

       291
       291
       +
         ) new_links;

     

       292
       292
       +
         

     

       293
       293
       +
         (* Convert hashtable back to list and sort by date *)

     

       294
       294
       +
         Hashtbl.to_seq_values links_by_url 

     

       295
       295
       +
         |> List.of_seq

     

       296
       296
       +
         |> List.sort compare

+34

stack/bushel/lib/link.mli

···

       1
       1
       +
       type karakeep_data = {

     

       2
       2
       +
         remote_url : string;

     

       3
       3
       +
         id : string;

     

       4
       4
       +
         tags : string list;

     

       5
       5
       +
         metadata : (string * string) list;

     

       6
       6
       +
       }

     

       7
       7
       +
       

     

       8
       8
       +
       type bushel_data = {

     

       9
       9
       +
         slugs : string list;

     

       10
       10
       +
         tags : string list;

     

       11
       11
       +
       }

     

       12
       12
       +
       

     

       13
       13
       +
       type t = {

     

       14
       14
       +
         url : string;

     

       15
       15
       +
         date : Ptime.date;

     

       16
       16
       +
         description : string;

     

       17
       17
       +
         karakeep : karakeep_data option;

     

       18
       18
       +
         bushel : bushel_data option;

     

       19
       19
       +
       }

     

       20
       20
       +
       

     

       21
       21
       +
       type ts = t list

     

       22
       22
       +
       

     

       23
       23
       +
       val compare : t -> t -> int

     

       24
       24
       +
       val url : t -> string

     

       25
       25
       +
       val date : t -> Ptime.date

     

       26
       26
       +
       val datetime : t -> Ptime.t

     

       27
       27
       +
       val description : t -> string

     

       28
       28
       +
       val of_md : string -> ts

     

       29
       29
       +
       val to_yaml : t -> Yaml.value

     

       30
       30
       +
       val t_of_yaml : Yaml.value -> t

     

       31
       31
       +
       val to_file : string -> t -> (unit, [> `Msg of string]) result

     

       32
       32
       +
       val load_links_file : string -> ts

     

       33
       33
       +
       val save_links_file : string -> ts -> unit

     

       34
       34
       +
       val merge_links : ?prefer_new_date:bool -> ts -> ts -> ts

+317

stack/bushel/lib/link_graph.ml

···

       1
       1
       +
       module StringSet = Set.Make(String)

     

       2
       2
       +
       

     

       3
       3
       +
       type entry_type = [ `Paper | `Project | `Note | `Idea | `Video | `Contact ]

     

       4
       4
       +
       

     

       5
       5
       +
       type internal_link = {

     

       6
       6
       +
         source: string;

     

       7
       7
       +
         target: string;

     

       8
       8
       +
         target_type: entry_type;

     

       9
       9
       +
       }

     

       10
       10
       +
       

     

       11
       11
       +
       type external_link = {

     

       12
       12
       +
         source: string;

     

       13
       13
       +
         domain: string;

     

       14
       14
       +
         url: string;

     

       15
       15
       +
       }

     

       16
       16
       +
       

     

       17
       17
       +
       type link_graph = {

     

       18
       18
       +
         (* All links *)

     

       19
       19
       +
         mutable internal_links: internal_link list;

     

       20
       20
       +
         mutable external_links: external_link list;

     

       21
       21
       +
       

     

       22
       22
       +
         (* Indices for efficient queries *)

     

       23
       23
       +
         outbound: (string, StringSet.t) Hashtbl.t;

     

       24
       24
       +
         backlinks: (string, StringSet.t) Hashtbl.t;

     

       25
       25
       +
         external_by_entry: (string, StringSet.t) Hashtbl.t;

     

       26
       26
       +
         external_by_domain: (string, StringSet.t) Hashtbl.t; (* domain -> source slugs *)

     

       27
       27
       +
       }

     

       28
       28
       +
       

     

       29
       29
       +
       let empty_graph () = {

     

       30
       30
       +
         internal_links = [];

     

       31
       31
       +
         external_links = [];

     

       32
       32
       +
         outbound = Hashtbl.create 256;

     

       33
       33
       +
         backlinks = Hashtbl.create 256;

     

       34
       34
       +
         external_by_entry = Hashtbl.create 256;

     

       35
       35
       +
         external_by_domain = Hashtbl.create 64;

     

       36
       36
       +
       }

     

       37
       37
       +
       

     

       38
       38
       +
       (* Global storage for the link graph *)

     

       39
       39
       +
       let current_graph : link_graph option ref = ref None

     

       40
       40
       +
       

     

       41
       41
       +
       let set_graph graph = current_graph := Some graph

     

       42
       42
       +
       let get_graph () = !current_graph

     

       43
       43
       +
       

     

       44
       44
       +
       let entry_type_to_string = function

     

       45
       45
       +
         | `Paper -> "paper"

     

       46
       46
       +
         | `Project -> "project"

     

       47
       47
       +
         | `Note -> "note"

     

       48
       48
       +
         | `Idea -> "idea"

     

       49
       49
       +
         | `Video -> "video"

     

       50
       50
       +
         | `Contact -> "contact"

     

       51
       51
       +
       

     

       52
       52
       +
       (* Query functions *)

     

       53
       53
       +
       

     

       54
       54
       +
       let get_outbound graph slug =

     

       55
       55
       +
         try StringSet.elements (Hashtbl.find graph.outbound slug)

     

       56
       56
       +
         with Not_found -> []

     

       57
       57
       +
       

     

       58
       58
       +
       let get_backlinks graph slug =

     

       59
       59
       +
         try StringSet.elements (Hashtbl.find graph.backlinks slug)

     

       60
       60
       +
         with Not_found -> []

     

       61
       61
       +
       

     

       62
       62
       +
       let get_external_links graph slug =

     

       63
       63
       +
         try StringSet.elements (Hashtbl.find graph.external_by_entry slug)

     

       64
       64
       +
         with Not_found -> []

     

       65
       65
       +
       

     

       66
       66
       +
       let get_entries_linking_to_domain graph domain =

     

       67
       67
       +
         try StringSet.elements (Hashtbl.find graph.external_by_domain domain)

     

       68
       68
       +
         with Not_found -> []

     

       69
       69
       +
       

     

       70
       70
       +
       (* Query functions that use the global graph *)

     

       71
       71
       +
       

     

       72
       72
       +
       let get_backlinks_for_slug slug =

     

       73
       73
       +
         match !current_graph with

     

       74
       74
       +
         | None -> []

     

       75
       75
       +
         | Some graph -> get_backlinks graph slug

     

       76
       76
       +
       

     

       77
       77
       +
       let get_outbound_for_slug slug =

     

       78
       78
       +
         match !current_graph with

     

       79
       79
       +
         | None -> []

     

       80
       80
       +
         | Some graph -> get_outbound graph slug

     

       81
       81
       +
       

     

       82
       82
       +
       let get_external_links_for_slug slug =

     

       83
       83
       +
         match !current_graph with

     

       84
       84
       +
         | None -> []

     

       85
       85
       +
         | Some graph -> get_external_links graph slug

     

       86
       86
       +
       

     

       87
       87
       +
       (* Pretty printing *)

     

       88
       88
       +
       

     

       89
       89
       +
       let pp_internal_link ppf (link : internal_link) =

     

       90
       90
       +
         Fmt.pf ppf "%s -> %s (%s)"

     

       91
       91
       +
           link.source

     

       92
       92
       +
           link.target

     

       93
       93
       +
           (entry_type_to_string link.target_type)

     

       94
       94
       +
       

     

       95
       95
       +
       let pp_external_link ppf (link : external_link) =

     

       96
       96
       +
         Fmt.pf ppf "%s -> %s (%s)"

     

       97
       97
       +
           link.source

     

       98
       98
       +
           link.domain

     

       99
       99
       +
           link.url

     

       100
       100
       +
       

     

       101
       101
       +
       let pp_graph ppf graph =

     

       102
       102
       +
         Fmt.pf ppf "@[<v>Internal links: %d@,External links: %d@,Entries with outbound: %d@,Entries with backlinks: %d@]"

     

       103
       103
       +
           (List.length graph.internal_links)

     

       104
       104
       +
           (List.length graph.external_links)

     

       105
       105
       +
           (Hashtbl.length graph.outbound)

     

       106
       106
       +
           (Hashtbl.length graph.backlinks)

     

       107
       107
       +
       

     

       108
       108
       +
       let entry_type_of_entry = function

     

       109
       109
       +
         | `Paper _ -> `Paper

     

       110
       110
       +
         | `Project _ -> `Project

     

       111
       111
       +
         | `Note _ -> `Note

     

       112
       112
       +
         | `Idea _ -> `Idea

     

       113
       113
       +
         | `Video _ -> `Video

     

       114
       114
       +
         | `Contact _ -> `Contact

     

       115
       115
       +
       

     

       116
       116
       +
       let extract_domain url =

     

       117
       117
       +
         try

     

       118
       118
       +
           let uri = Uri.of_string url in

     

       119
       119
       +
           match Uri.host uri with

     

       120
       120
       +
           | Some host -> host

     

       121
       121
       +
           | None -> "unknown"

     

       122
       122
       +
         with _ -> "unknown"

     

       123
       123
       +
       

     

       124
       124
       +
       let add_to_set_hashtbl tbl key value =

     

       125
       125
       +
         let current =

     

       126
       126
       +
           try Hashtbl.find tbl key

     

       127
       127
       +
           with Not_found -> StringSet.empty

     

       128
       128
       +
         in

     

       129
       129
       +
         Hashtbl.replace tbl key (StringSet.add value current)

     

       130
       130
       +
       

     

       131
       131
       +
       let build_link_graph entries =

     

       132
       132
       +
         let graph = empty_graph () in

     

       133
       133
       +
       

     

       134
       134
       +
         (* Helper to add internal link *)

     

       135
       135
       +
         let add_internal_link source target target_type =

     

       136
       136
       +
           let link = { source; target; target_type } in

     

       137
       137
       +
           graph.internal_links <- link :: graph.internal_links;

     

       138
       138
       +
           add_to_set_hashtbl graph.outbound source target;

     

       139
       139
       +
           add_to_set_hashtbl graph.backlinks target source

     

       140
       140
       +
         in

     

       141
       141
       +
       

     

       142
       142
       +
         (* Helper to add external link *)

     

       143
       143
       +
         let add_external_link source url =

     

       144
       144
       +
           let domain = extract_domain url in

     

       145
       145
       +
           let link = { source; domain; url } in

     

       146
       146
       +
           graph.external_links <- link :: graph.external_links;

     

       147
       147
       +
           add_to_set_hashtbl graph.external_by_entry source url;

     

       148
       148
       +
           add_to_set_hashtbl graph.external_by_domain domain source

     

       149
       149
       +
         in

     

       150
       150
       +
       

     

       151
       151
       +
         (* Process each entry *)

     

       152
       152
       +
         let process_entry entry =

     

       153
       153
       +
           let source_slug = Entry.slug entry in

     

       154
       154
       +
       

     

       155
       155
       +
           (* Get all links from this entry's markdown content *)

     

       156
       156
       +
           let md_content = Entry.body entry in

     

       157
       157
       +
           let all_links = Md.extract_all_links md_content in

     

       158
       158
       +
       

     

       159
       159
       +
           List.iter (fun link ->

     

       160
       160
       +
             if Md.is_bushel_slug link then

     

       161
       161
       +
               (* Internal bushel link *)

     

       162
       162
       +
               let target_slug = Md.strip_handle link in

     

       163
       163
       +
               match Entry.lookup entries target_slug with

     

       164
       164
       +
               | Some target_entry ->

     

       165
       165
       +
                 let target_type = entry_type_of_entry target_entry in

     

       166
       166
       +
                 add_internal_link source_slug target_slug target_type

     

       167
       167
       +
               | None -> ()

     

       168
       168
       +
             else if Md.is_contact_slug link then

     

       169
       169
       +
               (* Contact link *)

     

       170
       170
       +
               let handle = Md.strip_handle link in

     

       171
       171
       +
               match Contact.find_by_handle (Entry.contacts entries) handle with

     

       172
       172
       +
               | Some c ->

     

       173
       173
       +
                 let target_slug = Contact.handle c in

     

       174
       174
       +
                 add_internal_link source_slug target_slug `Contact

     

       175
       175
       +
               | None -> ()

     

       176
       176
       +
             else if Md.is_tag_slug link then

     

       177
       177
       +
               (* Skip tag links *)

     

       178
       178
       +
               ()

     

       179
       179
       +
             else if Md.is_type_filter_slug link then

     

       180
       180
       +
               (* Skip type filter links *)

     

       181
       181
       +
               ()

     

       182
       182
       +
             else if String.starts_with ~prefix:"http://" link ||

     

       183
       183
       +
                     String.starts_with ~prefix:"https://" link then

     

       184
       184
       +
               (* External link *)

     

       185
       185
       +
               add_external_link source_slug link

     

       186
       186
       +
             else

     

       187
       187
       +
               (* Skip other links (relative paths, etc) *)

     

       188
       188
       +
               ()

     

       189
       189
       +
           ) all_links

     

       190
       190
       +
         in

     

       191
       191
       +
       

     

       192
       192
       +
         (* Process all entries *)

     

       193
       193
       +
         List.iter process_entry (Entry.all_entries entries);

     

       194
       194
       +
       

     

       195
       195
       +
         (* Process slug_ent references from notes *)

     

       196
       196
       +
         let process_note_slug_ent note =

     

       197
       197
       +
           match Note.slug_ent note with

     

       198
       198
       +
           | Some target_slug ->

     

       199
       199
       +
             let source_slug = Note.slug note in

     

       200
       200
       +
             (* Look up the target entry by slug *)

     

       201
       201
       +
             (match Entry.lookup entries target_slug with

     

       202
       202
       +
              | Some target_entry ->

     

       203
       203
       +
                let target_type = entry_type_of_entry target_entry in

     

       204
       204
       +
                add_internal_link source_slug target_slug target_type

     

       205
       205
       +
              | None -> ())

     

       206
       206
       +
           | None -> ()

     

       207
       207
       +
         in

     

       208
       208
       +
         List.iter process_note_slug_ent (Entry.notes entries);

     

       209
       209
       +
       

     

       210
       210
       +
         (* Process projects: field from papers *)

     

       211
       211
       +
         let process_paper_projects paper =

     

       212
       212
       +
           let source_slug = Paper.slug paper in

     

       213
       213
       +
           let project_slugs = Paper.project_slugs paper in

     

       214
       214
       +
           List.iter (fun project_slug ->

     

       215
       215
       +
             (* Verify the project exists *)

     

       216
       216
       +
             match Entry.lookup entries project_slug with

     

       217
       217
       +
             | Some (`Project _) ->

     

       218
       218
       +
               add_internal_link source_slug project_slug `Project

     

       219
       219
       +
             | _ -> ()

     

       220
       220
       +
           ) project_slugs

     

       221
       221
       +
         in

     

       222
       222
       +
         List.iter process_paper_projects (Entry.papers entries);

     

       223
       223
       +
       

     

       224
       224
       +
         (* Deduplicate links *)

     

       225
       225
       +
         let module LinkSet = Set.Make(struct

     

       226
       226
       +
           type t = internal_link

     

       227
       227
       +
           let compare (a : internal_link) (b : internal_link) =

     

       228
       228
       +
             match String.compare a.source b.source with

     

       229
       229
       +
             | 0 -> String.compare a.target b.target

     

       230
       230
       +
             | n -> n

     

       231
       231
       +
         end) in

     

       232
       232
       +
       

     

       233
       233
       +
         let module ExtLinkSet = Set.Make(struct

     

       234
       234
       +
           type t = external_link

     

       235
       235
       +
           let compare (a : external_link) (b : external_link) =

     

       236
       236
       +
             match String.compare a.source b.source with

     

       237
       237
       +
             | 0 -> String.compare a.url b.url

     

       238
       238
       +
             | n -> n

     

       239
       239
       +
         end) in

     

       240
       240
       +
       

     

       241
       241
       +
         graph.internal_links <- LinkSet.elements (LinkSet.of_list graph.internal_links);

     

       242
       242
       +
         graph.external_links <- ExtLinkSet.elements (ExtLinkSet.of_list graph.external_links);

     

       243
       243
       +
       

     

       244
       244
       +
         graph

     

       245
       245
       +
       

     

       246
       246
       +
       (* Export for visualization *)

     

       247
       247
       +
       

     

       248
       248
       +
       let to_json graph entries =

     

       249
       249
       +
         (* Build nodes from entries *)

     

       250
       250
       +
         let entry_nodes = List.map (fun entry ->

     

       251
       251
       +
           let slug = Entry.slug entry in

     

       252
       252
       +
           let title = Entry.title entry in

     

       253
       253
       +
           let entry_type = entry_type_of_entry entry in

     

       254
       254
       +
           `O [

     

       255
       255
       +
             ("id", `String slug);

     

       256
       256
       +
             ("title", `String title);

     

       257
       257
       +
             ("type", `String (entry_type_to_string entry_type));

     

       258
       258
       +
             ("group", `String "entry");

     

       259
       259
       +
           ]

     

       260
       260
       +
         ) (Entry.all_entries entries) in

     

       261
       261
       +
       

     

       262
       262
       +
         (* Build nodes from contacts *)

     

       263
       263
       +
         let contact_nodes = List.map (fun contact ->

     

       264
       264
       +
           let handle = Contact.handle contact in

     

       265
       265
       +
           let name = Contact.name contact in

     

       266
       266
       +
           `O [

     

       267
       267
       +
             ("id", `String handle);

     

       268
       268
       +
             ("title", `String name);

     

       269
       269
       +
             ("type", `String "contact");

     

       270
       270
       +
             ("group", `String "entry");

     

       271
       271
       +
           ]

     

       272
       272
       +
         ) (Entry.contacts entries) in

     

       273
       273
       +
       

     

       274
       274
       +
         (* Build domain nodes from external links *)

     

       275
       275
       +
         let domain_map = Hashtbl.create 64 in

     

       276
       276
       +
         List.iter (fun link ->

     

       277
       277
       +
           if not (Hashtbl.mem domain_map link.domain) then

     

       278
       278
       +
             Hashtbl.add domain_map link.domain ()

     

       279
       279
       +
         ) graph.external_links;

     

       280
       280
       +
       

     

       281
       281
       +
         let domain_nodes = Hashtbl.fold (fun domain () acc ->

     

       282
       282
       +
           (`O [

     

       283
       283
       +
             ("id", `String ("domain:" ^ domain));

     

       284
       284
       +
             ("title", `String domain);

     

       285
       285
       +
             ("type", `String "domain");

     

       286
       286
       +
             ("group", `String "domain");

     

       287
       287
       +
           ]) :: acc

     

       288
       288
       +
         ) domain_map [] in

     

       289
       289
       +
       

     

       290
       290
       +
         let all_nodes = entry_nodes @ contact_nodes @ domain_nodes in

     

       291
       291
       +
       

     

       292
       292
       +
         (* Build internal links *)

     

       293
       293
       +
         let internal_links_json = List.map (fun (link : internal_link) ->

     

       294
       294
       +
           `O [

     

       295
       295
       +
             ("source", `String link.source);

     

       296
       296
       +
             ("target", `String link.target);

     

       297
       297
       +
             ("type", `String "internal");

     

       298
       298
       +
           ]

     

       299
       299
       +
         ) graph.internal_links in

     

       300
       300
       +
       

     

       301
       301
       +
         (* Build external links (entry -> domain) *)

     

       302
       302
       +
         let external_links_json = List.map (fun (link : external_link) ->

     

       303
       303
       +
           `O [

     

       304
       304
       +
             ("source", `String link.source);

     

       305
       305
       +
             ("target", `String ("domain:" ^ link.domain));

     

       306
       306
       +
             ("type", `String "external");

     

       307
       307
       +
           ]

     

       308
       308
       +
         ) graph.external_links in

     

       309
       309
       +
       

     

       310
       310
       +
         let all_links = internal_links_json @ external_links_json in

     

       311
       311
       +
       

     

       312
       312
       +
         let json = `O [

     

       313
       313
       +
           ("nodes", `A all_nodes);

     

       314
       314
       +
           ("links", `A all_links);

     

       315
       315
       +
         ] in

     

       316
       316
       +
       

     

       317
       317
       +
         Ezjsonm.to_string json

+781

stack/bushel/lib/md.ml

···

       1
       1
       +
       (** Bushel mappers for our Markdown extensions and utilities

     

       2
       2
       +
       

     

       3
       3
       +
           This module provides mappers to convert Bushel markdown extensions to different

     

       4
       4
       +
           output formats. There are two main mappers:

     

       5
       5
       +
       

     

       6
       6
       +
           1. {!make_bushel_inline_mapper} - Full sidenote mode for the main website

     

       7
       7
       +
              - Converts Bushel links to interactive sidenotes

     

       8
       8
       +
              - Includes entry previews, contact info, footnotes

     

       9
       9
       +
              - Used for the main site HTML rendering

     

       10
       10
       +
       

     

       11
       11
       +
           2. {!make_bushel_link_only_mapper} - Plain HTML mode for feeds and simple output

     

       12
       12
       +
              - Converts Bushel links to regular HTML <a> tags

     

       13
       13
       +
              - Automatically cleans up link text that contains Bushel slugs

     

       14
       14
       +
              - Used for Atom feeds, RSS, search indexing

     

       15
       15
       +
              - Images need .webp extension added (handled by calling code)

     

       16
       16
       +
       

     

       17
       17
       +
           For plain text output (search, LLM), use {!markdown_to_plaintext}.

     

       18
       18
       +
       *)

     

       19
       19
       +
       

     

       20
       20
       +
       (* Sidenote data types - reuse existing Bushel types *)

     

       21
       21
       +
       type sidenote_data =

     

       22
       22
       +
         | Contact_note of Contact.t * string (* contact data + trigger text *)

     

       23
       23
       +
         | Paper_note of Paper.t * string

     

       24
       24
       +
         | Idea_note of Idea.t * string

     

       25
       25
       +
         | Note_note of Note.t * string

     

       26
       26
       +
         | Project_note of Project.t * string

     

       27
       27
       +
         | Video_note of Video.t * string

     

       28
       28
       +
         | Footnote_note of string * Cmarkit.Block.t * string

     

       29
       29
       +
           (* slug, block content, trigger text *)

     

       30
       30
       +
       

     

       31
       31
       +
       type Cmarkit.Inline.t += Side_note of sidenote_data

     

       32
       32
       +
       

     

       33
       33
       +
       let authorlink = Cmarkit.Meta.key ()

     

       34
       34
       +
       

     

       35
       35
       +
       let make_authorlink label =

     

       36
       36
       +
         let meta = Cmarkit.Meta.tag authorlink (Cmarkit.Label.meta label) in

     

       37
       37
       +
         Cmarkit.Label.with_meta meta label

     

       38
       38
       +
       ;;

     

       39
       39
       +
       

     

       40
       40
       +
       let sluglink = Cmarkit.Meta.key ()

     

       41
       41
       +
       

     

       42
       42
       +
       let make_sluglink label =

     

       43
       43
       +
         let meta = Cmarkit.Meta.tag sluglink (Cmarkit.Label.meta label) in

     

       44
       44
       +
         Cmarkit.Label.with_meta meta label

     

       45
       45
       +
       ;;

     

       46
       46
       +
       

     

       47
       47
       +
       let with_bushel_links = function

     

       48
       48
       +
         | `Def _ as ctx -> Cmarkit.Label.default_resolver ctx

     

       49
       49
       +
         | `Ref (_, _, (Some _ as def)) -> def

     

       50
       50
       +
         | `Ref (_, ref, None) ->

     

       51
       51
       +
           let txt = Cmarkit.Label.key ref in

     

       52
       52
       +
           (match txt.[0] with

     

       53
       53
       +
            | '@' -> Some (make_authorlink ref)

     

       54
       54
       +
            | ':' -> Some (make_sluglink ref)

     

       55
       55
       +
            | '#' -> if txt.[1] = '#' then Some (make_sluglink ref) else None

     

       56
       56
       +
            | _ -> None)

     

       57
       57
       +
       ;;

     

       58
       58
       +
       

     

       59
       59
       +
       let strip_handle s =

     

       60
       60
       +
         if s.[0] = '@' || s.[0] = ':'

     

       61
       61
       +
         then String.sub s 1 (String.length s - 1)

     

       62
       62
       +
         else if s.[0] = '#' && s.[1] = '#'

     

       63
       63
       +
         then String.sub s 2 (String.length s - 2)

     

       64
       64
       +
         else s

     

       65
       65
       +
       ;;

     

       66
       66
       +
       

     

       67
       67
       +
       (* FIXME use Tags *)

     

       68
       68
       +
       let is_bushel_slug = String.starts_with ~prefix:":"

     

       69
       69
       +
       let is_tag_slug link =

     

       70
       70
       +
         String.starts_with ~prefix:"##" link &&

     

       71
       71
       +
         not (String.starts_with ~prefix:"###" link)

     

       72
       72
       +
       

     

       73
       73
       +
       let is_type_filter_slug = String.starts_with ~prefix:"###"

     

       74
       74
       +
       let is_contact_slug = String.starts_with ~prefix:"@"

     

       75
       75
       +
       

     

       76
       76
       +
       let text_of_inline lb =

     

       77
       77
       +
         let open Cmarkit in

     

       78
       78
       +
         Inline.to_plain_text ~break_on_soft:false lb

     

       79
       79
       +
         |> fun r -> String.concat "\n" (List.map (String.concat "") r)

     

       80
       80
       +
       ;;

     

       81
       81
       +
       

     

       82
       82
       +
       let link_target_is_bushel ?slugs lb =

     

       83
       83
       +
         let open Cmarkit in

     

       84
       84
       +
         let ref = Inline.Link.reference lb in

     

       85
       85
       +
         match ref with

     

       86
       86
       +
         | `Inline (ld, _) ->

     

       87
       87
       +
           let dest = Link_definition.dest ld in

     

       88
       88
       +
           (match dest with

     

       89
       89
       +
            | Some (url, _) when is_bushel_slug url ->

     

       90
       90
       +
              (match slugs with

     

       91
       91
       +
               | Some s -> Hashtbl.replace s url ()

     

       92
       92
       +
               | _ -> ());

     

       93
       93
       +
              Some (url, Inline.Link.text lb |> text_of_inline)

     

       94
       94
       +
            | Some (url, _) when is_tag_slug url ->

     

       95
       95
       +
              (* Return the tag URL unchanged - will be handled by renderer *)

     

       96
       96
       +
              Some (url, Inline.Link.text lb |> text_of_inline)

     

       97
       97
       +
            | Some (url, _) when is_contact_slug url ->

     

       98
       98
       +
              Some (url, Inline.Link.text lb |> text_of_inline)

     

       99
       99
       +
            | _ -> None)

     

       100
       100
       +
         | _ -> None

     

       101
       101
       +
       ;;

     

       102
       102
       +
       

     

       103
       103
       +
       let image_target_is_bushel lb =

     

       104
       104
       +
         let open Cmarkit in

     

       105
       105
       +
         let ref = Inline.Link.reference lb in

     

       106
       106
       +
         match ref with

     

       107
       107
       +
         | `Inline (ld, _) ->

     

       108
       108
       +
           let dest = Link_definition.dest ld in

     

       109
       109
       +
           (match dest with

     

       110
       110
       +
            | Some (url, _) when is_bushel_slug url ->

     

       111
       111
       +
              let alt = Link_definition.title ld in

     

       112
       112
       +
              let dir =

     

       113
       113
       +
                Inline.Link.text lb

     

       114
       114
       +
                |> Inline.to_plain_text ~break_on_soft:false

     

       115
       115
       +
                |> fun r -> String.concat "\n" (List.map (String.concat "") r)

     

       116
       116
       +
              in

     

       117
       117
       +
              Some (url, alt, dir)

     

       118
       118
       +
            | _ -> None)

     

       119
       119
       +
         | _ -> None

     

       120
       120
       +
       ;;

     

       121
       121
       +
       

     

       122
       122
       +
       let rewrite_bushel_link_reference entries slug title meta =

     

       123
       123
       +
         let open Cmarkit in

     

       124
       124
       +
         let s = strip_handle slug in

     

       125
       125
       +
         (* Check if it's a tag, contact, or entry *)

     

       126
       126
       +
         if is_tag_slug slug then

     

       127
       127
       +
           (* Tag link - keep the ## prefix in dest for renderer to detect *)

     

       128
       128
       +
           let txt = Inline.Text (title, meta) in

     

       129
       129
       +
           let ld = Link_definition.make ~dest:(slug, meta) () in

     

       130
       130
       +
           let ll = `Inline (ld, meta) in

     

       131
       131
       +
           let ld = Inline.Link.make txt ll in

     

       132
       132
       +
           Mapper.ret (Inline.Link (ld, meta))

     

       133
       133
       +
         else if is_contact_slug slug then

     

       134
       134
       +
           (* Contact sidenote *)

     

       135
       135
       +
           match Contact.find_by_handle (Entry.contacts entries) s with

     

       136
       136
       +
           | Some c ->

     

       137
       137
       +
               let sidenote = Side_note (Contact_note (c, title)) in

     

       138
       138
       +
               Mapper.ret sidenote

     

       139
       139
       +
           | None ->

     

       140
       140
       +
               (* Contact not found, fallback to regular link *)

     

       141
       141
       +
               let txt = Inline.Text (title, meta) in

     

       142
       142
       +
               let ld = Link_definition.make ~dest:("", meta) () in

     

       143
       143
       +
               let ll = `Inline (ld, meta) in

     

       144
       144
       +
               let ld = Inline.Link.make txt ll in

     

       145
       145
       +
               Mapper.ret (Inline.Link (ld, meta))

     

       146
       146
       +
         else

     

       147
       147
       +
           (* Check entry type and generate appropriate sidenote *)

     

       148
       148
       +
           match Entry.lookup entries s with

     

       149
       149
       +
           | Some (`Paper p) ->

     

       150
       150
       +
               let sidenote = Side_note (Paper_note (p, title)) in

     

       151
       151
       +
               Mapper.ret sidenote

     

       152
       152
       +
           | Some (`Idea i) ->

     

       153
       153
       +
               let sidenote = Side_note (Idea_note (i, title)) in

     

       154
       154
       +
               Mapper.ret sidenote

     

       155
       155
       +
           | Some (`Note n) ->

     

       156
       156
       +
               let sidenote = Side_note (Note_note (n, title)) in

     

       157
       157
       +
               Mapper.ret sidenote

     

       158
       158
       +
           | Some (`Project p) ->

     

       159
       159
       +
               let sidenote = Side_note (Project_note (p, title)) in

     

       160
       160
       +
               Mapper.ret sidenote

     

       161
       161
       +
           | Some (`Video v) ->

     

       162
       162
       +
               let sidenote = Side_note (Video_note (v, title)) in

     

       163
       163
       +
               Mapper.ret sidenote

     

       164
       164
       +
           | None ->

     

       165
       165
       +
               (* Entry not found, use regular link *)

     

       166
       166
       +
               let dest = Entry.lookup_site_url entries s in

     

       167
       167
       +
               let txt = Inline.Text (title, meta) in

     

       168
       168
       +
               let ld = Link_definition.make ~dest:(dest, meta) () in

     

       169
       169
       +
               let ll = `Inline (ld, meta) in

     

       170
       170
       +
               let ld = Inline.Link.make txt ll in

     

       171
       171
       +
               Mapper.ret (Inline.Link (ld, meta))

     

       172
       172
       +
       ;;

     

       173
       173
       +
       

     

       174
       174
       +
       let rewrite_bushel_image_reference entries url title dir meta =

     

       175
       175
       +
         let open Cmarkit in

     

       176
       176
       +
         let dest =

     

       177
       177
       +
           match Entry.lookup entries (strip_handle url) with

     

       178
       178
       +
           | Some ent -> Entry.site_url ent (* This is a video *)

     

       179
       179
       +
           | None -> Printf.sprintf "/images/%s" (strip_handle url)

     

       180
       180
       +
         in

     

       181
       181
       +
         let txt = Inline.Text (dir, meta) in

     

       182
       182
       +
         let ld = Link_definition.make ?title ~dest:(dest, meta) () in

     

       183
       183
       +
         let ll = `Inline (ld, meta) in

     

       184
       184
       +
         let ld = Inline.Link.make txt ll in

     

       185
       185
       +
         let ent_il = Inline.Image (ld, meta) in

     

       186
       186
       +
         Mapper.ret ent_il

     

       187
       187
       +
       ;;

     

       188
       188
       +
       

     

       189
       189
       +
       type Cmarkit.Inline.t += Obsidian_link of string

     

       190
       190
       +
       

     

       191
       191
       +
       let rewrite_label_reference_to_obsidian lb meta =

     

       192
       192
       +
         let open Cmarkit in

     

       193
       193
       +
         match Inline.Link.referenced_label lb with

     

       194
       194
       +
         | None -> Mapper.default

     

       195
       195
       +
         | Some l ->

     

       196
       196
       +
           let m = Label.meta l in

     

       197
       197
       +
           (match Meta.find authorlink m with

     

       198
       198
       +
            | Some () ->

     

       199
       199
       +
              let slug = Label.key l in

     

       200
       200
       +
              let target = Printf.sprintf "[[%s]]" slug in

     

       201
       201
       +
              let txt = Obsidian_link target in

     

       202
       202
       +
              Mapper.ret txt

     

       203
       203
       +
            | None ->

     

       204
       204
       +
              (match Meta.find sluglink m with

     

       205
       205
       +
               | None -> Mapper.default

     

       206
       206
       +
               | Some () ->

     

       207
       207
       +
                 let slug = Label.key l in

     

       208
       208
       +
                 if is_bushel_slug slug

     

       209
       209
       +
                 then (

     

       210
       210
       +
                   let target = Printf.sprintf "[[%s]]" (strip_handle slug) in

     

       211
       211
       +
                   let txt = Obsidian_link target in

     

       212
       212
       +
                   Mapper.ret txt)

     

       213
       213
       +
                 else if is_tag_slug slug

     

       214
       214
       +
                 then (

     

       215
       215
       +
                   let target = Printf.sprintf "#%s" (strip_handle slug) in

     

       216
       216
       +
                   let txt = Inline.Text (target, meta) in

     

       217
       217
       +
                   Mapper.ret txt)

     

       218
       218
       +
                 else Mapper.default))

     

       219
       219
       +
       ;;

     

       220
       220
       +
       

     

       221
       221
       +
       let make_bushel_link_only_mapper _defs entries =

     

       222
       222
       +
         let open Cmarkit in

     

       223
       223
       +
         fun _m ->

     

       224
       224
       +
           function

     

       225
       225
       +
           | Inline.Link (lb, meta) ->

     

       226
       226
       +
             (* Convert Bushel link references to regular links (not sidenotes) *)

     

       227
       227
       +
             (match link_target_is_bushel lb with

     

       228
       228
       +
              | Some (url, title) ->

     

       229
       229
       +
                let s = strip_handle url in

     

       230
       230
       +
                let dest = Entry.lookup_site_url entries s in

     

       231
       231
       +
                (* If title is itself a Bushel slug, use the entry title instead *)

     

       232
       232
       +
                let link_text =

     

       233
       233
       +
                  if is_bushel_slug title then

     

       234
       234
       +
                    match Entry.lookup entries (strip_handle title) with

     

       235
       235
       +
                    | Some ent -> Entry.title ent

     

       236
       236
       +
                    | None -> title

     

       237
       237
       +
                  else title

     

       238
       238
       +
                in

     

       239
       239
       +
                let txt = Inline.Text (link_text, meta) in

     

       240
       240
       +
                let ld = Link_definition.make ~dest:(dest, meta) () in

     

       241
       241
       +
                let ll = `Inline (ld, meta) in

     

       242
       242
       +
                let ld = Inline.Link.make txt ll in

     

       243
       243
       +
                Mapper.ret (Inline.Link (ld, meta))

     

       244
       244
       +
              | None ->

     

       245
       245
       +
                (match Inline.Link.referenced_label lb with

     

       246
       246
       +
                 | Some l ->

     

       247
       247
       +
                   let m = Label.meta l in

     

       248
       248
       +
                   (* Check for authorlink (contact) first *)

     

       249
       249
       +
                   (match Meta.find authorlink m with

     

       250
       250
       +
                    | Some () ->

     

       251
       251
       +
                      let slug = Label.key l in

     

       252
       252
       +
                      let s = strip_handle slug in

     

       253
       253
       +
                      (match Contact.find_by_handle (Entry.contacts entries) s with

     

       254
       254
       +
                       | Some c ->

     

       255
       255
       +
                         let name = Contact.name c in

     

       256
       256
       +
                         (match Contact.best_url c with

     

       257
       257
       +
                          | Some dest ->

     

       258
       258
       +
                            let txt = Inline.Text (name, meta) in

     

       259
       259
       +
                            let ld = Link_definition.make ~dest:(dest, meta) () in

     

       260
       260
       +
                            let ll = `Inline (ld, meta) in

     

       261
       261
       +
                            let ld = Inline.Link.make txt ll in

     

       262
       262
       +
                            Mapper.ret (Inline.Link (ld, meta))

     

       263
       263
       +
                          | None ->

     

       264
       264
       +
                            (* No URL for contact, just use name as text *)

     

       265
       265
       +
                            let txt = Inline.Text (name, meta) in

     

       266
       266
       +
                            Mapper.ret txt)

     

       267
       267
       +
                       | None ->

     

       268
       268
       +
                         (* Contact not found, use title as fallback text *)

     

       269
       269
       +
                         let title = Inline.Link.text lb |> text_of_inline in

     

       270
       270
       +
                         let txt = Inline.Text (title, meta) in

     

       271
       271
       +
                         Mapper.ret txt)

     

       272
       272
       +
                    | None ->

     

       273
       273
       +
                      (* Check for sluglink *)

     

       274
       274
       +
                      (match Meta.find sluglink m with

     

       275
       275
       +
                       | Some () ->

     

       276
       276
       +
                         let slug = Label.key l in

     

       277
       277
       +
                         if is_bushel_slug slug || is_tag_slug slug || is_contact_slug slug

     

       278
       278
       +
                         then (

     

       279
       279
       +
                           let s = strip_handle slug in

     

       280
       280
       +
                           let dest = Entry.lookup_site_url entries s in

     

       281
       281
       +
                           let title = Inline.Link.text lb |> text_of_inline in

     

       282
       282
       +
                           (* If link text is itself a Bushel slug, use the entry title instead *)

     

       283
       283
       +
                           let link_text =

     

       284
       284
       +
                             let trimmed = String.trim title in

     

       285
       285
       +
                             if is_bushel_slug trimmed then

     

       286
       286
       +
                               match Entry.lookup entries (strip_handle trimmed) with

     

       287
       287
       +
                               | Some ent -> Entry.title ent

     

       288
       288
       +
                               | None -> title

     

       289
       289
       +
                             else title

     

       290
       290
       +
                           in

     

       291
       291
       +
                           let txt = Inline.Text (link_text, meta) in

     

       292
       292
       +
                           let ld = Link_definition.make ~dest:(dest, meta) () in

     

       293
       293
       +
                           let ll = `Inline (ld, meta) in

     

       294
       294
       +
                           let ld = Inline.Link.make txt ll in

     

       295
       295
       +
                           Mapper.ret (Inline.Link (ld, meta)))

     

       296
       296
       +
                         else Mapper.default

     

       297
       297
       +
                       | None -> Mapper.default))

     

       298
       298
       +
                 | None -> Mapper.default))

     

       299
       299
       +
           | _ -> Mapper.default

     

       300
       300
       +
       ;;

     

       301
       301
       +
       

     

       302
       302
       +
       let rewrite_footnote_reference ?footnote_map entries defs lb _meta =

     

       303
       303
       +
         let open Cmarkit in

     

       304
       304
       +
         match Inline.Link.referenced_label lb with

     

       305
       305
       +
         | None -> Mapper.default

     

       306
       306
       +
         | Some l ->

     

       307
       307
       +
           (match Inline.Link.reference_definition defs lb with

     

       308
       308
       +
            | Some (Block.Footnote.Def (fn, _)) ->

     

       309
       309
       +
              let label_key = Label.key l in

     

       310
       310
       +
              let slug, trigger_text =

     

       311
       311
       +
                match footnote_map with

     

       312
       312
       +
                | Some fm ->

     

       313
       313
       +
                  (match Hashtbl.find_opt fm label_key with

     

       314
       314
       +
                   | Some (slug, text) -> (slug, text)

     

       315
       315
       +
                   | None ->

     

       316
       316
       +
                     let num = Hashtbl.length fm + 1 in

     

       317
       317
       +
                     let slug = Printf.sprintf "fn-%d" num in

     

       318
       318
       +
                     let text = Printf.sprintf "[%d]" num in

     

       319
       319
       +
                     Hashtbl.add fm label_key (slug, text);

     

       320
       320
       +
                     (slug, text))

     

       321
       321
       +
                | None ->

     

       322
       322
       +
                  (* No map provided, use label key as slug *)

     

       323
       323
       +
                  let slug = Printf.sprintf "fn-%s" (String.sub label_key 1 (String.length label_key - 1)) in

     

       324
       324
       +
                  let text = "[?]" in

     

       325
       325
       +
                  (slug, text)

     

       326
       326
       +
              in

     

       327
       327
       +
              (* Process the block to convert Bushel link references to regular links (not sidenotes) *)

     

       328
       328
       +
              let block = Block.Footnote.block fn in

     

       329
       329
       +
              let link_mapper = Mapper.make ~inline:(make_bushel_link_only_mapper defs entries) () in

     

       330
       330
       +
              let processed_block =

     

       331
       331
       +
                match Mapper.map_block link_mapper block with

     

       332
       332
       +
                | Some b -> b

     

       333
       333
       +
                | None -> block

     

       334
       334
       +
              in

     

       335
       335
       +
              let sidenote = Side_note (Footnote_note (slug, processed_block, trigger_text)) in

     

       336
       336
       +
              Mapper.ret sidenote

     

       337
       337
       +
            | _ -> Mapper.default)

     

       338
       338
       +
       

     

       339
       339
       +
       let rewrite_label_reference ?slugs entries lb meta =

     

       340
       340
       +
         let open Cmarkit in

     

       341
       341
       +
         match Inline.Link.referenced_label lb with

     

       342
       342
       +
         | None -> Mapper.default

     

       343
       343
       +
         | Some l ->

     

       344
       344
       +
           let m = Label.meta l in

     

       345
       345
       +
           (match Meta.find authorlink m with

     

       346
       346
       +
            | Some () ->

     

       347
       347
       +
              let slug = Label.key l in

     

       348
       348
       +
              (match Contact.find_by_handle (Entry.contacts entries) (strip_handle slug) with

     

       349
       349
       +
               | Some c ->

     

       350
       350
       +
                   let trigger_text = Contact.name c in

     

       351
       351
       +
                   let sidenote = Side_note (Contact_note (c, trigger_text)) in

     

       352
       352
       +
                   Mapper.ret sidenote

     

       353
       353
       +
               | None ->

     

       354
       354
       +
                   (* Contact not found, fallback to text *)

     

       355
       355
       +
                   let txt = Inline.Text ("Unknown Person", meta) in

     

       356
       356
       +
                   Mapper.ret txt)

     

       357
       357
       +
            | None ->

     

       358
       358
       +
              (match Meta.find sluglink m with

     

       359
       359
       +
               | None -> Mapper.default

     

       360
       360
       +
               | Some () ->

     

       361
       361
       +
                 let slug = Label.key l in

     

       362
       362
       +
                 if is_bushel_slug slug

     

       363
       363
       +
                 then (

     

       364
       364
       +
                   (match slugs with

     

       365
       365
       +
                    | Some s -> Hashtbl.replace s slug ()

     

       366
       366
       +
                    | _ -> ());

     

       367
       367
       +
                   let s = strip_handle slug in

     

       368
       368
       +
                   (* Check entry type and generate appropriate sidenote *)

     

       369
       369
       +
                   match Entry.lookup entries s with

     

       370
       370
       +
                   | Some (`Paper p) ->

     

       371
       371
       +
                       let trigger_text = Entry.lookup_title entries s in

     

       372
       372
       +
                       let sidenote = Side_note (Paper_note (p, trigger_text)) in

     

       373
       373
       +
                       Mapper.ret sidenote

     

       374
       374
       +
                   | Some (`Idea i) ->

     

       375
       375
       +
                       let trigger_text = Entry.lookup_title entries s in

     

       376
       376
       +
                       let sidenote = Side_note (Idea_note (i, trigger_text)) in

     

       377
       377
       +
                       Mapper.ret sidenote

     

       378
       378
       +
                   | Some (`Note n) ->

     

       379
       379
       +
                       let trigger_text = Entry.lookup_title entries s in

     

       380
       380
       +
                       let sidenote = Side_note (Note_note (n, trigger_text)) in

     

       381
       381
       +
                       Mapper.ret sidenote

     

       382
       382
       +
                   | Some (`Project p) ->

     

       383
       383
       +
                       let trigger_text = Entry.lookup_title entries s in

     

       384
       384
       +
                       let sidenote = Side_note (Project_note (p, trigger_text)) in

     

       385
       385
       +
                       Mapper.ret sidenote

     

       386
       386
       +
                   | Some (`Video v) ->

     

       387
       387
       +
                       let trigger_text = Entry.lookup_title entries s in

     

       388
       388
       +
                       let sidenote = Side_note (Video_note (v, trigger_text)) in

     

       389
       389
       +
                       Mapper.ret sidenote

     

       390
       390
       +
                   | None ->

     

       391
       391
       +
                       (* Entry not found, use regular link *)

     

       392
       392
       +
                       let target = Entry.lookup_title entries s in

     

       393
       393
       +
                       let dest = Entry.lookup_site_url entries s in

     

       394
       394
       +
                       let txt = Inline.Text (target, meta) in

     

       395
       395
       +
                       let ld = Link_definition.make ~dest:(dest, meta) () in

     

       396
       396
       +
                       let ll = `Inline (ld, meta) in

     

       397
       397
       +
                       let ld = Inline.Link.make txt ll in

     

       398
       398
       +
                       Mapper.ret (Inline.Link (ld, meta)))

     

       399
       399
       +
                 else if is_tag_slug slug

     

       400
       400
       +
                 then (

     

       401
       401
       +
                   let sh = strip_handle slug in

     

       402
       402
       +
                   (* Use # as dest to prevent navigation, JavaScript will intercept *)

     

       403
       403
       +
                   let target, dest = sh, "#" in

     

       404
       404
       +
                   let txt = Inline.Text (target, meta) in

     

       405
       405
       +
                   let ld = Link_definition.make ~dest:(dest, meta) () in

     

       406
       406
       +
                   let ll = `Inline (ld, meta) in

     

       407
       407
       +
                   let ld = Inline.Link.make txt ll in

     

       408
       408
       +
                   let ent_il = Inline.Link (ld, meta) in

     

       409
       409
       +
                   Mapper.ret ent_il)

     

       410
       410
       +
                 else Mapper.default))

     

       411
       411
       +
       ;;

     

       412
       412
       +
       

     

       413
       413
       +
       let bushel_inline_mapper_to_obsidian entries _m =

     

       414
       414
       +
         let open Cmarkit in

     

       415
       415
       +
         function

     

       416
       416
       +
         | Inline.Link (lb, meta) ->

     

       417
       417
       +
           (match link_target_is_bushel lb with

     

       418
       418
       +
            | None -> rewrite_label_reference_to_obsidian lb meta

     

       419
       419
       +
            | Some (url, title) -> rewrite_bushel_link_reference entries url title meta)

     

       420
       420
       +
         | Inline.Image (lb, meta) ->

     

       421
       421
       +
           (match image_target_is_bushel lb with

     

       422
       422
       +
            | None -> rewrite_label_reference_to_obsidian lb meta

     

       423
       423
       +
            | Some (url, alt, dir) -> rewrite_bushel_image_reference entries url alt dir meta)

     

       424
       424
       +
         | _ -> Mapper.default

     

       425
       425
       +
       ;;

     

       426
       426
       +
       

     

       427
       427
       +
       let make_bushel_inline_mapper ?slugs ?footnote_map defs entries =

     

       428
       428
       +
         let open Cmarkit in

     

       429
       429
       +
         fun _m ->

     

       430
       430
       +
           function

     

       431
       431
       +
           | Inline.Link (lb, meta) ->

     

       432
       432
       +
             (* First check if this is a footnote reference *)

     

       433
       433
       +
             (match Inline.Link.referenced_label lb with

     

       434
       434
       +
              | Some l when String.starts_with ~prefix:"^" (Label.key l) ->

     

       435
       435
       +
                (* This is a footnote reference *)

     

       436
       436
       +
                rewrite_footnote_reference ?footnote_map entries defs lb meta

     

       437
       437
       +
              | _ ->

     

       438
       438
       +
                (* Not a footnote, handle as bushel link *)

     

       439
       439
       +
                (match link_target_is_bushel ?slugs lb with

     

       440
       440
       +
                 | None -> rewrite_label_reference ?slugs entries lb meta

     

       441
       441
       +
                 | Some (url, title) -> rewrite_bushel_link_reference entries url title meta))

     

       442
       442
       +
           | Inline.Image (lb, meta) ->

     

       443
       443
       +
             (match image_target_is_bushel lb with

     

       444
       444
       +
              | None -> rewrite_label_reference entries lb meta

     

       445
       445
       +
              | Some (url, alt, dir) -> rewrite_bushel_image_reference entries url alt dir meta)

     

       446
       446
       +
           | _ -> Mapper.default

     

       447
       447
       +
       ;;

     

       448
       448
       +
       

     

       449
       449
       +
       let scan_for_slugs entries md =

     

       450
       450
       +
         let open Cmarkit in

     

       451
       451
       +
         let slugs = Hashtbl.create 7 in

     

       452
       452
       +
         let doc = Doc.of_string ~strict:false ~resolver:with_bushel_links md in

     

       453
       453
       +
         let defs = Doc.defs doc in

     

       454
       454
       +
         let _ =

     

       455
       455
       +
           Mapper.map_doc (Mapper.make ~inline:(make_bushel_inline_mapper ~slugs defs entries) ()) doc

     

       456
       456
       +
         in

     

       457
       457
       +
         Hashtbl.fold (fun k () a -> k :: a) slugs []

     

       458
       458
       +
       ;;

     

       459
       459
       +
       

     

       460
       460
       +
       (** Validation mapper that collects broken references *)

     

       461
       461
       +
       let make_validation_mapper entries broken_slugs broken_contacts =

     

       462
       462
       +
         let open Cmarkit in

     

       463
       463
       +
         fun _m ->

     

       464
       464
       +
           function

     

       465
       465
       +
           | Inline.Link (lb, _meta) ->

     

       466
       466
       +
             (* Check inline bushel links *)

     

       467
       467
       +
             (match link_target_is_bushel lb with

     

       468
       468
       +
              | Some (url, _title) ->

     

       469
       469
       +
                let s = strip_handle url in

     

       470
       470
       +
                if is_contact_slug url then

     

       471
       471
       +
                  (* Validate contact handle *)

     

       472
       472
       +
                  (match Contact.find_by_handle (Entry.contacts entries) s with

     

       473
       473
       +
                   | None -> Hashtbl.replace broken_contacts url ()

     

       474
       474
       +
                   | Some _ -> ())

     

       475
       475
       +
                else if is_bushel_slug url then

     

       476
       476
       +
                  (* Validate entry slug *)

     

       477
       477
       +
                  (match Entry.lookup entries s with

     

       478
       478
       +
                   | None -> Hashtbl.replace broken_slugs url ()

     

       479
       479
       +
                   | Some _ -> ())

     

       480
       480
       +
                else ();

     

       481
       481
       +
                Mapper.default

     

       482
       482
       +
              | None ->

     

       483
       483
       +
                (* Check referenced label links *)

     

       484
       484
       +
                (match Inline.Link.referenced_label lb with

     

       485
       485
       +
                 | Some l ->

     

       486
       486
       +
                   let m = Label.meta l in

     

       487
       487
       +
                   (* Check for contact reference *)

     

       488
       488
       +
                   (match Meta.find authorlink m with

     

       489
       489
       +
                    | Some () ->

     

       490
       490
       +
                      let slug = Label.key l in

     

       491
       491
       +
                      let handle = strip_handle slug in

     

       492
       492
       +
                      (match Contact.find_by_handle (Entry.contacts entries) handle with

     

       493
       493
       +
                       | None -> Hashtbl.replace broken_contacts slug ()

     

       494
       494
       +
                       | Some _ -> ());

     

       495
       495
       +
                      Mapper.default

     

       496
       496
       +
                    | None ->

     

       497
       497
       +
                      (* Check for entry slug reference *)

     

       498
       498
       +
                      (match Meta.find sluglink m with

     

       499
       499
       +
                       | None -> Mapper.default

     

       500
       500
       +
                       | Some () ->

     

       501
       501
       +
                         let slug = Label.key l in

     

       502
       502
       +
                         if is_bushel_slug slug then (

     

       503
       503
       +
                           let s = strip_handle slug in

     

       504
       504
       +
                           match Entry.lookup entries s with

     

       505
       505
       +
                            | None -> Hashtbl.replace broken_slugs slug ()

     

       506
       506
       +
                            | Some _ -> ()

     

       507
       507
       +
                         );

     

       508
       508
       +
                         Mapper.default))

     

       509
       509
       +
                 | None -> Mapper.default))

     

       510
       510
       +
           | _ -> Mapper.default

     

       511
       511
       +
       ;;

     

       512
       512
       +
       

     

       513
       513
       +
       (** Validate all bushel references in markdown and return broken ones *)

     

       514
       514
       +
       let validate_references entries md =

     

       515
       515
       +
         let open Cmarkit in

     

       516
       516
       +
         let broken_slugs = Hashtbl.create 7 in

     

       517
       517
       +
         let broken_contacts = Hashtbl.create 7 in

     

       518
       518
       +
         let doc = Doc.of_string ~strict:false ~resolver:with_bushel_links md in

     

       519
       519
       +
         let mapper = Mapper.make ~inline:(make_validation_mapper entries broken_slugs broken_contacts) () in

     

       520
       520
       +
         let _ = Mapper.map_doc mapper doc in

     

       521
       521
       +
         let slugs = Hashtbl.fold (fun k () a -> k :: a) broken_slugs [] in

     

       522
       522
       +
         let contacts = Hashtbl.fold (fun k () a -> k :: a) broken_contacts [] in

     

       523
       523
       +
         (slugs, contacts)

     

       524
       524
       +
       ;;

     

       525
       525
       +
       

     

       526
       526
       +
       (** Extract the first image URL from markdown text *)

     

       527
       527
       +
       let extract_first_image md =

     

       528
       528
       +
         let open Cmarkit in

     

       529
       529
       +
         (* Don't use bushel link resolver to avoid circular dependency with Entry *)

     

       530
       530
       +
         let doc = Doc.of_string md in

     

       531
       531
       +
         let found_image = ref None in

     

       532
       532
       +
       

     

       533
       533
       +
         let find_image_in_inline _mapper = function

     

       534
       534
       +
           | Inline.Image (img, _) ->

     

       535
       535
       +
             (match Inline.Link.reference img with

     

       536
       536
       +
              | `Inline (ld, _) ->

     

       537
       537
       +
                (match Link_definition.dest ld with

     

       538
       538
       +
                 | Some (url, _) when !found_image = None ->

     

       539
       539
       +
                   found_image := Some url;

     

       540
       540
       +
                   Mapper.default

     

       541
       541
       +
                 | _ -> Mapper.default)

     

       542
       542
       +
              | _ -> Mapper.default)

     

       543
       543
       +
           | _ -> Mapper.default

     

       544
       544
       +
         in

     

       545
       545
       +
       

     

       546
       546
       +
         let mapper = Mapper.make ~inline:find_image_in_inline () in

     

       547
       547
       +
         let _ = Mapper.map_doc mapper doc in

     

       548
       548
       +
         !found_image

     

       549
       549
       +
       ;;

     

       550
       550
       +
       

     

       551
       551
       +
       (** Convert markdown text to plain text, resolving bushel links to just their text *)

     

       552
       552
       +
       let markdown_to_plaintext _entries text =

     

       553
       553
       +
         let open Cmarkit in

     

       554
       554
       +
         (* Parse markdown with bushel link resolver *)

     

       555
       555
       +
         let doc = Doc.of_string ~resolver:with_bushel_links text in

     

       556
       556
       +
       

     

       557
       557
       +
         (* Convert document blocks to plain text *)

     

       558
       558
       +
         let rec block_to_text = function

     

       559
       559
       +
           | Block.Blank_line _ -> ""

     

       560
       560
       +
           | Block.Thematic_break _ -> "\n---\n"

     

       561
       561
       +
           | Block.Paragraph (p, _) ->

     

       562
       562
       +
             let inline = Block.Paragraph.inline p in

     

       563
       563
       +
             Inline.to_plain_text ~break_on_soft:false inline

     

       564
       564
       +
             |> List.map (String.concat "") |> String.concat "\n"

     

       565
       565
       +
           | Block.Heading (h, _) ->

     

       566
       566
       +
             let inline = Block.Heading.inline h in

     

       567
       567
       +
             Inline.to_plain_text ~break_on_soft:false inline

     

       568
       568
       +
             |> List.map (String.concat "") |> String.concat "\n"

     

       569
       569
       +
           | Block.Block_quote (bq, _) ->

     

       570
       570
       +
             let blocks = Block.Block_quote.block bq in

     

       571
       571
       +
             block_to_text blocks

     

       572
       572
       +
           | Block.List (l, _) ->

     

       573
       573
       +
             let items = Block.List'.items l in

     

       574
       574
       +
             List.map (fun (item, _) ->

     

       575
       575
       +
               let blocks = Block.List_item.block item in

     

       576
       576
       +
               block_to_text blocks

     

       577
       577
       +
             ) items |> String.concat "\n"

     

       578
       578
       +
           | Block.Code_block (cb, _) ->

     

       579
       579
       +
             let code = Block.Code_block.code cb in

     

       580
       580
       +
             String.concat "\n" (List.map Block_line.to_string code)

     

       581
       581
       +
           | Block.Html_block _ -> ""  (* Skip HTML blocks for search *)

     

       582
       582
       +
           | Block.Link_reference_definition _ -> ""

     

       583
       583
       +
           | Block.Ext_footnote_definition _ -> ""

     

       584
       584
       +
           | Block.Blocks (blocks, _) ->

     

       585
       585
       +
             List.map block_to_text blocks |> String.concat "\n"

     

       586
       586
       +
           | _ -> ""

     

       587
       587
       +
         in

     

       588
       588
       +
         let blocks = Doc.block doc in

     

       589
       589
       +
         block_to_text blocks

     

       590
       590
       +
       ;;

     

       591
       591
       +
       

     

       592
       592
       +
       (** Extract all links from markdown text, including from images *)

     

       593
       593
       +
       let extract_all_links text =

     

       594
       594
       +
         let open Cmarkit in

     

       595
       595
       +
         let doc = Doc.of_string ~resolver:with_bushel_links text in

     

       596
       596
       +
         let links = ref [] in

     

       597
       597
       +
       

     

       598
       598
       +
         let find_links_in_inline _mapper = function

     

       599
       599
       +
           | Inline.Link (lb, _) | Inline.Image (lb, _) ->

     

       600
       600
       +
             (* Check for inline link/image destination *)

     

       601
       601
       +
             (match Inline.Link.reference lb with

     

       602
       602
       +
              | `Inline (ld, _) ->

     

       603
       603
       +
                (match Link_definition.dest ld with

     

       604
       604
       +
                 | Some (url, _) ->

     

       605
       605
       +
                   links := url :: !links;

     

       606
       606
       +
                   Mapper.default

     

       607
       607
       +
                 | None -> Mapper.default)

     

       608
       608
       +
              | `Ref _ ->

     

       609
       609
       +
                (* For reference-style links/images, check if it has a referenced label *)

     

       610
       610
       +
                (match Inline.Link.referenced_label lb with

     

       611
       611
       +
                 | Some l ->

     

       612
       612
       +
                   let key = Label.key l in

     

       613
       613
       +
                   (* Check if it's a bushel-style link *)

     

       614
       614
       +
                   if String.length key > 0 && (key.[0] = ':' || key.[0] = '@' ||

     

       615
       615
       +
                      (String.length key > 1 && key.[0] = '#' && key.[1] = '#')) then

     

       616
       616
       +
                     links := key :: !links;

     

       617
       617
       +
                   Mapper.default

     

       618
       618
       +
                 | None -> Mapper.default))

     

       619
       619
       +
           | _ -> Mapper.default

     

       620
       620
       +
         in

     

       621
       621
       +
       

     

       622
       622
       +
         let mapper = Mapper.make ~inline:find_links_in_inline () in

     

       623
       623
       +
         let _ = Mapper.map_doc mapper doc in

     

       624
       624
       +
       

     

       625
       625
       +
         (* Deduplicate *)

     

       626
       626
       +
         let module StringSet = Set.Make(String) in

     

       627
       627
       +
         StringSet.elements (StringSet.of_list !links)

     

       628
       628
       +
       ;;

     

       629
       629
       +
       

     

       630
       630
       +
       (* Reference source type for CiTO annotations *)

     

       631
       631
       +
       type reference_source =

     

       632
       632
       +
         | Paper  (* CitesAsSourceDocument *)

     

       633
       633
       +
         | Note   (* CitesAsRelated *)

     

       634
       634
       +
         | External  (* Cites *)

     

       635
       635
       +
       

     

       636
       636
       +
       (* Extract references (papers/notes with DOIs) from a note *)

     

       637
       637
       +
       let note_references entries default_author note =

     

       638
       638
       +
         let refs = ref [] in

     

       639
       639
       +
       

     

       640
       640
       +
         (* Helper to format author name: extract last name from full name *)

     

       641
       641
       +
         let format_author_last name =

     

       642
       642
       +
           let parts = String.split_on_char ' ' name in

     

       643
       643
       +
           List.nth parts (List.length parts - 1)

     

       644
       644
       +
         in

     

       645
       645
       +
       

     

       646
       646
       +
         (* Helper to format a citation *)

     

       647
       647
       +
         let format_citation ~authors ~year ~title ~publisher =

     

       648
       648
       +
           let author_str = match authors with

     

       649
       649
       +
             | [] -> ""

     

       650
       650
       +
             | [author] -> format_author_last author ^ " "

     

       651
       651
       +
             | author :: _ -> (format_author_last author) ^ " et al "

     

       652
       652
       +
           in

     

       653
       653
       +
           let pub_str = match publisher with

     

       654
       654
       +
             | None | Some "" -> ""

     

       655
       655
       +
             | Some p -> p ^ ". "

     

       656
       656
       +
           in

     

       657
       657
       +
           Printf.sprintf "%s(%d). %s. %s" author_str year title pub_str

     

       658
       658
       +
         in

     

       659
       659
       +
       

     

       660
       660
       +
         (* Check slug_ent if it exists *)

     

       661
       661
       +
         (match Note.slug_ent note with

     

       662
       662
       +
          | Some slug ->

     

       663
       663
       +
            (match Entry.lookup entries slug with

     

       664
       664
       +
             | Some (`Paper p) ->

     

       665
       665
       +
               (match Paper.doi p with

     

       666
       666
       +
                | Some doi ->

     

       667
       667
       +
                  let authors = Paper.authors p in

     

       668
       668
       +
                  let year = Paper.year p in

     

       669
       669
       +
                  let title = Paper.title p in

     

       670
       670
       +
                  let publisher = Some (Paper.publisher p) in

     

       671
       671
       +
                  let citation = format_citation ~authors ~year ~title ~publisher in

     

       672
       672
       +
                  refs := (doi, citation, Paper) :: !refs

     

       673
       673
       +
                | None -> ())

     

       674
       674
       +
             | Some (`Note n) ->

     

       675
       675
       +
               (match Note.doi n with

     

       676
       676
       +
                | Some doi ->

     

       677
       677
       +
                  let authors = match Note.author n with

     

       678
       678
       +
                    | Some a -> [a]

     

       679
       679
       +
                    | None -> [Contact.name default_author]

     

       680
       680
       +
                  in

     

       681
       681
       +
                  let (year, _, _) = Note.date n in

     

       682
       682
       +
                  let title = Note.title n in

     

       683
       683
       +
                  let publisher = None in

     

       684
       684
       +
                  let citation = format_citation ~authors ~year ~title ~publisher in

     

       685
       685
       +
                  refs := (doi, citation, Note) :: !refs

     

       686
       686
       +
                | None -> ())

     

       687
       687
       +
             | _ -> ())

     

       688
       688
       +
          | None -> ());

     

       689
       689
       +
       

     

       690
       690
       +
         (* Scan body for bushel references *)

     

       691
       691
       +
         let slugs = scan_for_slugs entries (Note.body note) in

     

       692
       692
       +
         List.iter (fun slug ->

     

       693
       693
       +
           (* Strip leading : or @ from slug before lookup *)

     

       694
       694
       +
           let normalized_slug = strip_handle slug in

     

       695
       695
       +
           match Entry.lookup entries normalized_slug with

     

       696
       696
       +
           | Some (`Paper p) ->

     

       697
       697
       +
             (match Paper.doi p with

     

       698
       698
       +
              | Some doi ->

     

       699
       699
       +
                let authors = Paper.authors p in

     

       700
       700
       +
                let year = Paper.year p in

     

       701
       701
       +
                let title = Paper.title p in

     

       702
       702
       +
                let publisher = Some (Paper.publisher p) in

     

       703
       703
       +
                let citation = format_citation ~authors ~year ~title ~publisher in

     

       704
       704
       +
                (* Check if doi already exists in refs *)

     

       705
       705
       +
                if not (List.exists (fun (d, _, _) -> d = doi) !refs) then

     

       706
       706
       +
                  refs := (doi, citation, Paper) :: !refs

     

       707
       707
       +
              | None -> ())

     

       708
       708
       +
           | Some (`Note n) ->

     

       709
       709
       +
             (match Note.doi n with

     

       710
       710
       +
              | Some doi ->

     

       711
       711
       +
                let authors = match Note.author n with

     

       712
       712
       +
                  | Some a -> [a]

     

       713
       713
       +
                  | None -> [Contact.name default_author]

     

       714
       714
       +
                in

     

       715
       715
       +
                let (year, _, _) = Note.date n in

     

       716
       716
       +
                let title = Note.title n in

     

       717
       717
       +
                let publisher = None in

     

       718
       718
       +
                let citation = format_citation ~authors ~year ~title ~publisher in

     

       719
       719
       +
                (* Check if doi already exists in refs *)

     

       720
       720
       +
                if not (List.exists (fun (d, _, _) -> d = doi) !refs) then

     

       721
       721
       +
                  refs := (doi, citation, Note) :: !refs

     

       722
       722
       +
              | None -> ())

     

       723
       723
       +
           | _ -> ()

     

       724
       724
       +
         ) slugs;

     

       725
       725
       +
       

     

       726
       726
       +
         (* Scan body for external DOI URLs and resolve from cache *)

     

       727
       727
       +
         let body = Note.body note in

     

       728
       728
       +
         let doi_url_pattern = Re.Perl.compile_pat "https?://(?:dx\\.)?doi\\.org/([^)\\s\"'>]+)" in

     

       729
       729
       +
         let matches = Re.all doi_url_pattern body in

     

       730
       730
       +
         let doi_entries = Entry.doi_entries entries in

     

       731
       731
       +
         List.iter (fun group ->

     

       732
       732
       +
           try

     

       733
       733
       +
             let encoded_doi = Re.Group.get group 1 in

     

       734
       734
       +
             (* URL decode the DOI *)

     

       735
       735
       +
             let doi = Uri.pct_decode encoded_doi in

     

       736
       736
       +
             (* Check if doi already exists in refs *)

     

       737
       737
       +
             if not (List.exists (fun (d, _, _) -> d = doi) !refs) then

     

       738
       738
       +
               (* Look up in DOI cache *)

     

       739
       739
       +
               match Doi_entry.find_by_doi doi_entries doi with

     

       740
       740
       +
               | Some doi_entry when doi_entry.status = Resolved ->

     

       741
       741
       +
                 let citation = format_citation

     

       742
       742
       +
                   ~authors:doi_entry.authors

     

       743
       743
       +
                   ~year:doi_entry.year

     

       744
       744
       +
                   ~title:doi_entry.title

     

       745
       745
       +
                   ~publisher:(Some doi_entry.publisher)

     

       746
       746
       +
                 in

     

       747
       747
       +
                 refs := (doi, citation, External) :: !refs

     

       748
       748
       +
               | _ ->

     

       749
       749
       +
                 (* Not found in cache, add minimal citation with just the DOI *)

     

       750
       750
       +
                 refs := (doi, doi, External) :: !refs

     

       751
       751
       +
           with _ -> ()

     

       752
       752
       +
         ) matches;

     

       753
       753
       +
       

     

       754
       754
       +
         (* Scan body for publisher URLs (Elsevier, Nature, ACM, Sage, UPenn, Springer, Taylor & Francis) and resolve from cache *)

     

       755
       755
       +
         let publisher_pattern = Re.Perl.compile_pat "https?://(?:(?:www\\.)?(?:linkinghub\\.elsevier\\.com|nature\\.com|journals\\.sagepub\\.com|garfield\\.library\\.upenn\\.edu|link\\.springer\\.com)/[^)\\s\"'>]+|(?:dl\\.acm\\.org|(?:www\\.)?tandfonline\\.com)/doi(?:/pdf)?/10\\.[^)\\s\"'>]+)" in

     

       756
       756
       +
         let publisher_matches = Re.all publisher_pattern body in

     

       757
       757
       +
         List.iter (fun group ->

     

       758
       758
       +
           try

     

       759
       759
       +
             let url = Re.Group.get group 0 in

     

       760
       760
       +
             (* Look up in DOI cache by source URL *)

     

       761
       761
       +
             match Doi_entry.find_by_url doi_entries url with

     

       762
       762
       +
             | Some doi_entry when doi_entry.status = Resolved ->

     

       763
       763
       +
               let doi = doi_entry.doi in

     

       764
       764
       +
               (* Check if this DOI already exists in refs *)

     

       765
       765
       +
               if not (List.exists (fun (d, _, _) -> d = doi) !refs) then

     

       766
       766
       +
                 let citation = format_citation

     

       767
       767
       +
                   ~authors:doi_entry.authors

     

       768
       768
       +
                   ~year:doi_entry.year

     

       769
       769
       +
                   ~title:doi_entry.title

     

       770
       770
       +
                   ~publisher:(Some doi_entry.publisher)

     

       771
       771
       +
                 in

     

       772
       772
       +
                 refs := (doi, citation, External) :: !refs

     

       773
       773
       +
             | _ ->

     

       774
       774
       +
               (* Not found in cache, skip it *)

     

       775
       775
       +
               ()

     

       776
       776
       +
           with _ -> ()

     

       777
       777
       +
         ) publisher_matches;

     

       778
       778
       +
       

     

       779
       779
       +
         List.rev !refs

     

       780
       780
       +
       ;;

     

       781
       781
       +

+73

stack/bushel/lib/md.mli

···

       1
       1
       +
       val make_bushel_inline_mapper

     

       2
       2
       +
         :  ?slugs:(string, unit) Hashtbl.t

     

       3
       3
       +
         -> ?footnote_map:(string, string * string) Hashtbl.t

     

       4
       4
       +
         -> Cmarkit.Label.defs

     

       5
       5
       +
         -> Entry.t

     

       6
       6
       +
         -> 'a

     

       7
       7
       +
         -> Cmarkit.Inline.t

     

       8
       8
       +
         -> Cmarkit.Inline.t Cmarkit.Mapper.result

     

       9
       9
       +
       

     

       10
       10
       +
       val make_bushel_link_only_mapper

     

       11
       11
       +
         :  Cmarkit.Label.defs

     

       12
       12
       +
         -> Entry.t

     

       13
       13
       +
         -> 'a

     

       14
       14
       +
         -> Cmarkit.Inline.t

     

       15
       15
       +
         -> Cmarkit.Inline.t Cmarkit.Mapper.result

     

       16
       16
       +
       

     

       17
       17
       +
       type Cmarkit.Inline.t += Obsidian_link of string

     

       18
       18
       +
       

     

       19
       19
       +
       type sidenote_data =

     

       20
       20
       +
         | Contact_note of Contact.t * string

     

       21
       21
       +
         | Paper_note of Paper.t * string

     

       22
       22
       +
         | Idea_note of Idea.t * string

     

       23
       23
       +
         | Note_note of Note.t * string

     

       24
       24
       +
         | Project_note of Project.t * string

     

       25
       25
       +
         | Video_note of Video.t * string

     

       26
       26
       +
         | Footnote_note of string * Cmarkit.Block.t * string

     

       27
       27
       +
       

     

       28
       28
       +
       type Cmarkit.Inline.t += Side_note of sidenote_data

     

       29
       29
       +
       

     

       30
       30
       +
       val bushel_inline_mapper_to_obsidian

     

       31
       31
       +
         :  Entry.t

     

       32
       32
       +
         -> 'a

     

       33
       33
       +
         -> Cmarkit.Inline.t

     

       34
       34
       +
         -> Cmarkit.Inline.t Cmarkit.Mapper.result

     

       35
       35
       +
       

     

       36
       36
       +
       val with_bushel_links

     

       37
       37
       +
         :  [< `Def of Cmarkit.Label.t option * Cmarkit.Label.t

     

       38
       38
       +
            | `Ref of 'a * Cmarkit.Label.t * Cmarkit.Label.t option

     

       39
       39
       +
            ]

     

       40
       40
       +
         -> Cmarkit.Label.t option

     

       41
       41
       +
       

     

       42
       42
       +
       val scan_for_slugs : Entry.t -> string -> string list

     

       43
       43
       +
       

     

       44
       44
       +
       (** Validate all bushel references in markdown and return broken ones.

     

       45
       45
       +
           Returns (broken_slugs, broken_contacts) where each list contains

     

       46
       46
       +
           the full reference string (e.g., ":missing-slug", "@unknown-handle") *)

     

       47
       47
       +
       val validate_references : Entry.t -> string -> string list * string list

     

       48
       48
       +
       

     

       49
       49
       +
       (** Extract the first image URL from markdown text *)

     

       50
       50
       +
       val extract_first_image : string -> string option

     

       51
       51
       +
       

     

       52
       52
       +
       (** Convert markdown text to plain text, resolving bushel links to just their text *)

     

       53
       53
       +
       val markdown_to_plaintext : 'a -> string -> string

     

       54
       54
       +
       

     

       55
       55
       +
       val is_bushel_slug : string -> bool

     

       56
       56
       +
       val is_tag_slug : string -> bool

     

       57
       57
       +
       val is_type_filter_slug : string -> bool

     

       58
       58
       +
       val is_contact_slug : string -> bool

     

       59
       59
       +
       val strip_handle : string -> string

     

       60
       60
       +
       

     

       61
       61
       +
       (** Extract all links from markdown text, including from images (internal and external) *)

     

       62
       62
       +
       val extract_all_links : string -> string list

     

       63
       63
       +
       

     

       64
       64
       +
       (** Type indicating the source of a reference for CiTO annotation *)

     

       65
       65
       +
       type reference_source =

     

       66
       66
       +
         | Paper  (** CitesAsSourceDocument *)

     

       67
       67
       +
         | Note   (** CitesAsRelated *)

     

       68
       68
       +
         | External  (** Cites *)

     

       69
       69
       +
       

     

       70
       70
       +
       (** Extract references (papers/notes with DOIs) from a note.

     

       71
       71
       +
           Returns a list of (DOI, citation_string, reference_source) tuples.

     

       72
       72
       +
           Citation format: "Last, First (Year). Title. Publisher. https://doi.org/the/doi" *)

     

       73
       73
       +
       val note_references : Entry.t -> Contact.t -> Note.t -> (string * string * reference_source) list

+230

stack/bushel/lib/note.ml

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { title : string

     

       3
       3
       +
         ; date : Ptime.date

     

       4
       4
       +
         ; slug : string

     

       5
       5
       +
         ; body : string

     

       6
       6
       +
         ; tags : string list

     

       7
       7
       +
         ; draft : bool

     

       8
       8
       +
         ; updated : Ptime.date option

     

       9
       9
       +
         ; sidebar : string option

     

       10
       10
       +
         ; index_page : bool

     

       11
       11
       +
         ; perma : bool               (* Permanent article that will receive a DOI *)

     

       12
       12
       +
         ; doi : string option        (* DOI identifier for permanent articles *)

     

       13
       13
       +
         ; synopsis: string option

     

       14
       14
       +
         ; titleimage: string option

     

       15
       15
       +
         ; via : (string * string) option

     

       16
       16
       +
         ; slug_ent : string option  (* Optional reference to another entry *)

     

       17
       17
       +
         ; source : string option     (* Optional source for news-style notes *)

     

       18
       18
       +
         ; url : string option        (* Optional external URL for news-style notes *)

     

       19
       19
       +
         ; author : string option     (* Optional author for news-style notes *)

     

       20
       20
       +
         ; category : string option   (* Optional category for news-style notes *)

     

       21
       21
       +
         }

     

       22
       22
       +
       

     

       23
       23
       +
       type ts = t list

     

       24
       24
       +
       

     

       25
       25
       +
       let link { body; via; slug; _ } =

     

       26
       26
       +
         match body, via with

     

       27
       27
       +
         | "", Some (l, u) -> `Ext (l, u)

     

       28
       28
       +
         | "", None -> failwith (slug ^ ": note external without via, via-url")

     

       29
       29
       +
         | _, _ -> `Local slug

     

       30
       30
       +
       ;;

     

       31
       31
       +
       

     

       32
       32
       +
       let origdate { date; _ } = Option.get @@ Ptime.of_date date

     

       33
       33
       +
       

     

       34
       34
       +
       let date { date; updated; _ } =

     

       35
       35
       +
         match updated with

     

       36
       36
       +
         | None -> date

     

       37
       37
       +
         | Some v -> v

     

       38
       38
       +
       ;;

     

       39
       39
       +
       

     

       40
       40
       +
       let datetime v = Option.get @@ Ptime.of_date @@ date v

     

       41
       41
       +
       let compare a b = Ptime.compare (datetime b) (datetime a)

     

       42
       42
       +
       let slug { slug; _ } = slug

     

       43
       43
       +
       let body { body; _ } = body

     

       44
       44
       +
       let title { title; _ } = title

     

       45
       45
       +
       let tags { tags; _ } = tags

     

       46
       46
       +
       let sidebar { sidebar; _ } = sidebar

     

       47
       47
       +
       let synopsis { synopsis; _ } = synopsis

     

       48
       48
       +
       let draft { draft; _ } = draft

     

       49
       49
       +
       let perma { perma; _ } = perma

     

       50
       50
       +
       let doi { doi; _ } = doi

     

       51
       51
       +
       let titleimage { titleimage; _ } = titleimage

     

       52
       52
       +
       let slug_ent { slug_ent; _ } = slug_ent

     

       53
       53
       +
       let source { source; _ } = source

     

       54
       54
       +
       let url { url; _ } = url

     

       55
       55
       +
       let author { author; _ } = author

     

       56
       56
       +
       let category { category; _ } = category

     

       57
       57
       +
       let lookup slug notes = List.find (fun n -> n.slug = slug) notes

     

       58
       58
       +
       let read_file file = In_channel.(with_open_bin file input_all)

     

       59
       59
       +
       let words { body; _ } = Util.count_words body

     

       60
       60
       +
       

     

       61
       61
       +
       

     

       62
       62
       +
       let of_md fname =

     

       63
       63
       +
         (* TODO fix Jekyll_post to basename the fname all the time *)

     

       64
       64
       +
         match Jekyll_post.of_string ~fname:(Filename.basename fname) (read_file fname) with

     

       65
       65
       +
         | Error (`Msg m) -> failwith ("note_of_md: " ^ m)

     

       66
       66
       +
         | Ok jp ->

     

       67
       67
       +
           let fields = jp.Jekyll_post.fields in

     

       68
       68
       +
           let { Jekyll_post.title; date; slug; body; _ } = jp in

     

       69
       69
       +
           let date, _ = Ptime.to_date_time date in

     

       70
       70
       +
           let index_page =

     

       71
       71
       +
             match Jekyll_format.find "index_page" fields with

     

       72
       72
       +
             | Some (`Bool v) -> v

     

       73
       73
       +
             | _ -> false

     

       74
       74
       +
           in

     

       75
       75
       +
           let perma =

     

       76
       76
       +
             match Jekyll_format.find "perma" fields with

     

       77
       77
       +
             | Some (`Bool v) -> v

     

       78
       78
       +
             | _ -> false

     

       79
       79
       +
           in

     

       80
       80
       +
           let updated =

     

       81
       81
       +
             match Jekyll_format.find "updated" fields with

     

       82
       82
       +
             | Some (`String v) -> Some (Jekyll_format.parse_date_exn v |> Ptime.to_date)

     

       83
       83
       +
             | _ -> None

     

       84
       84
       +
           in

     

       85
       85
       +
           let draft =

     

       86
       86
       +
             match Jekyll_format.find "draft" fields with

     

       87
       87
       +
             | Some (`Bool v) -> v

     

       88
       88
       +
             | _ -> false

     

       89
       89
       +
           in

     

       90
       90
       +
           let titleimage =

     

       91
       91
       +
             match Jekyll_format.find "titleimage" fields with

     

       92
       92
       +
             | Some (`String v) -> Some v

     

       93
       93
       +
             | _ -> None

     

       94
       94
       +
           in

     

       95
       95
       +
           let synopsis =

     

       96
       96
       +
             match Jekyll_format.find "synopsis" fields with

     

       97
       97
       +
             | Some (`String v) -> Some v

     

       98
       98
       +
             | _ -> None

     

       99
       99
       +
           in

     

       100
       100
       +
           let sidebar =

     

       101
       101
       +
             try Some (read_file ("data/sidebar/" ^ Filename.basename fname)) with

     

       102
       102
       +
             | _ -> None

     

       103
       103
       +
           in

     

       104
       104
       +
           let tags =

     

       105
       105
       +
             match Jekyll_format.find "tags" fields with

     

       106
       106
       +
             | Some (`A l) ->

     

       107
       107
       +
               List.filter_map

     

       108
       108
       +
                 (function

     

       109
       109
       +
                   | `String s -> Some s

     

       110
       110
       +
                   | _ -> None)

     

       111
       111
       +
                 l

     

       112
       112
       +
             | _ -> []

     

       113
       113
       +
           in

     

       114
       114
       +
           let via =

     

       115
       115
       +
             match Jekyll_format.find "via" fields, Jekyll_format.find "via-url" fields with

     

       116
       116
       +
             | Some (`String a), Some (`String b) -> Some (a, b)

     

       117
       117
       +
             | None, Some (`String b) -> Some ("", b)

     

       118
       118
       +
             | _ -> None

     

       119
       119
       +
           in

     

       120
       120
       +
           let slug_ent =

     

       121
       121
       +
             match Jekyll_format.find "slug_ent" fields with

     

       122
       122
       +
             | Some (`String v) -> Some v

     

       123
       123
       +
             | _ -> None

     

       124
       124
       +
           in

     

       125
       125
       +
           let source =

     

       126
       126
       +
             match Jekyll_format.find "source" fields with

     

       127
       127
       +
             | Some (`String v) -> Some v

     

       128
       128
       +
             | _ -> None

     

       129
       129
       +
           in

     

       130
       130
       +
           let url =

     

       131
       131
       +
             match Jekyll_format.find "url" fields with

     

       132
       132
       +
             | Some (`String v) -> Some v

     

       133
       133
       +
             | _ -> None

     

       134
       134
       +
           in

     

       135
       135
       +
           let author =

     

       136
       136
       +
             match Jekyll_format.find "author" fields with

     

       137
       137
       +
             | Some (`String v) -> Some v

     

       138
       138
       +
             | _ -> None

     

       139
       139
       +
           in

     

       140
       140
       +
           let category =

     

       141
       141
       +
             match Jekyll_format.find "category" fields with

     

       142
       142
       +
             | Some (`String v) -> Some v

     

       143
       143
       +
             | _ -> None

     

       144
       144
       +
           in

     

       145
       145
       +
           let doi =

     

       146
       146
       +
             match Jekyll_format.find "doi" fields with

     

       147
       147
       +
             | Some (`String v) -> Some v

     

       148
       148
       +
             | _ -> None

     

       149
       149
       +
           in

     

       150
       150
       +
           { title; draft; date; slug; synopsis; titleimage; index_page; perma; doi; body; via; updated; tags; sidebar; slug_ent; source; url; author; category }

     

       151
       151
       +
       

     

       152
       152
       +
       (* TODO:claude *)

     

       153
       153
       +
       let typesense_schema =

     

       154
       154
       +
         let open Ezjsonm in

     

       155
       155
       +
         dict [

     

       156
       156
       +
           ("name", string "notes");

     

       157
       157
       +
           ("fields", list (fun d -> dict d) [

     

       158
       158
       +
             [("name", string "id"); ("type", string "string")];

     

       159
       159
       +
             [("name", string "title"); ("type", string "string")];

     

       160
       160
       +
             [("name", string "content"); ("type", string "string")];

     

       161
       161
       +
             [("name", string "date"); ("type", string "string")];

     

       162
       162
       +
             [("name", string "date_timestamp"); ("type", string "int64")];

     

       163
       163
       +
             [("name", string "tags"); ("type", string "string[]"); ("facet", bool true)];

     

       164
       164
       +
             [("name", string "body"); ("type", string "string"); ("optional", bool true)];

     

       165
       165
       +
             [("name", string "draft"); ("type", string "bool")];

     

       166
       166
       +
             [("name", string "synopsis"); ("type", string "string[]"); ("optional", bool true)];

     

       167
       167
       +
             [("name", string "thumbnail_url"); ("type", string "string"); ("optional", bool true)];

     

       168
       168
       +
             [("name", string "type"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       169
       169
       +
             [("name", string "status"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       170
       170
       +
             [("name", string "related_papers"); ("type", string "string[]"); ("optional", bool true)];

     

       171
       171
       +
             [("name", string "related_projects"); ("type", string "string[]"); ("optional", bool true)];

     

       172
       172
       +
             [("name", string "related_contacts"); ("type", string "string[]"); ("optional", bool true)];

     

       173
       173
       +
             [("name", string "attachments"); ("type", string "string[]"); ("optional", bool true)];

     

       174
       174
       +
             [("name", string "source"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       175
       175
       +
             [("name", string "url"); ("type", string "string"); ("optional", bool true)];

     

       176
       176
       +
             [("name", string "author"); ("type", string "string"); ("optional", bool true)];

     

       177
       177
       +
             [("name", string "category"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       178
       178
       +
             [("name", string "slug_ent"); ("type", string "string"); ("optional", bool true)];

     

       179
       179
       +
             [("name", string "words"); ("type", string "int32"); ("optional", bool true)];

     

       180
       180
       +
           ]);

     

       181
       181
       +
           ("default_sorting_field", string "date_timestamp");

     

       182
       182
       +
         ]

     

       183
       183
       +
       

     

       184
       184
       +
       (** TODO:claude Pretty-print a note with ANSI formatting *)

     

       185
       185
       +
       let pp ppf n =

     

       186
       186
       +
         let open Fmt in

     

       187
       187
       +
         pf ppf "@[<v>";

     

       188
       188
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Note";

     

       189
       189
       +
         pf ppf "%a: %a@," (styled `Bold string) "Slug" string (slug n);

     

       190
       190
       +
         pf ppf "%a: %a@," (styled `Bold string) "Title" string (title n);

     

       191
       191
       +
         let (year, month, day) = date n in

     

       192
       192
       +
         pf ppf "%a: %04d-%02d-%02d@," (styled `Bold string) "Date" year month day;

     

       193
       193
       +
         (match n.updated with

     

       194
       194
       +
          | Some (y, m, d) -> pf ppf "%a: %04d-%02d-%02d@," (styled `Bold string) "Updated" y m d

     

       195
       195
       +
          | None -> ());

     

       196
       196
       +
         pf ppf "%a: %b@," (styled `Bold string) "Draft" (draft n);

     

       197
       197
       +
         pf ppf "%a: %b@," (styled `Bold string) "Index Page" n.index_page;

     

       198
       198
       +
         pf ppf "%a: %b@," (styled `Bold string) "Perma" (perma n);

     

       199
       199
       +
         (match doi n with

     

       200
       200
       +
          | Some d -> pf ppf "%a: %a@," (styled `Bold string) "DOI" string d

     

       201
       201
       +
          | None -> ());

     

       202
       202
       +
         (match synopsis n with

     

       203
       203
       +
          | Some syn -> pf ppf "%a: %a@," (styled `Bold string) "Synopsis" string syn

     

       204
       204
       +
          | None -> ());

     

       205
       205
       +
         (match titleimage n with

     

       206
       206
       +
          | Some img -> pf ppf "%a: %a@," (styled `Bold string) "Title Image" string img

     

       207
       207
       +
          | None -> ());

     

       208
       208
       +
         (match n.via with

     

       209
       209
       +
          | Some (label, url) ->

     

       210
       210
       +
            if label <> "" then

     

       211
       211
       +
              pf ppf "%a: %a (%a)@," (styled `Bold string) "Via" string label string url

     

       212
       212
       +
            else

     

       213
       213
       +
              pf ppf "%a: %a@," (styled `Bold string) "Via" string url

     

       214
       214
       +
          | None -> ());

     

       215
       215
       +
         let t = tags n in

     

       216
       216
       +
         if t <> [] then

     

       217
       217
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Tags" (list ~sep:comma string) t;

     

       218
       218
       +
         (match sidebar n with

     

       219
       219
       +
          | Some sb ->

     

       220
       220
       +
            pf ppf "@,";

     

       221
       221
       +
            pf ppf "%a:@," (styled `Bold string) "Sidebar";

     

       222
       222
       +
            pf ppf "%a@," string sb

     

       223
       223
       +
          | None -> ());

     

       224
       224
       +
         let bd = body n in

     

       225
       225
       +
         if bd <> "" then begin

     

       226
       226
       +
           pf ppf "@,";

     

       227
       227
       +
           pf ppf "%a:@," (styled `Bold string) "Body";

     

       228
       228
       +
           pf ppf "%a@," string bd;

     

       229
       229
       +
         end;

     

       230
       230
       +
         pf ppf "@]"

+49

stack/bushel/lib/note.mli

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { title : string

     

       3
       3
       +
         ; date : Ptime.date

     

       4
       4
       +
         ; slug : string

     

       5
       5
       +
         ; body : string

     

       6
       6
       +
         ; tags : string list

     

       7
       7
       +
         ; draft : bool

     

       8
       8
       +
         ; updated : Ptime.date option

     

       9
       9
       +
         ; sidebar : string option

     

       10
       10
       +
         ; index_page : bool

     

       11
       11
       +
         ; perma : bool

     

       12
       12
       +
         ; doi : string option

     

       13
       13
       +
         ; synopsis: string option

     

       14
       14
       +
         ; titleimage: string option

     

       15
       15
       +
         ; via : (string * string) option

     

       16
       16
       +
         ; slug_ent : string option

     

       17
       17
       +
         ; source : string option

     

       18
       18
       +
         ; url : string option

     

       19
       19
       +
         ; author : string option

     

       20
       20
       +
         ; category : string option

     

       21
       21
       +
         }

     

       22
       22
       +
       

     

       23
       23
       +
       type ts = t list

     

       24
       24
       +
       

     

       25
       25
       +
       val link : t -> [> `Ext of string * string | `Local of string ]

     

       26
       26
       +
       val origdate : t -> Ptime.t

     

       27
       27
       +
       val date : t -> Ptime.date

     

       28
       28
       +
       val datetime : t -> Ptime.t

     

       29
       29
       +
       val compare : t -> t -> int

     

       30
       30
       +
       val slug : t -> string

     

       31
       31
       +
       val body : t -> string

     

       32
       32
       +
       val title : t -> string

     

       33
       33
       +
       val draft : t -> bool

     

       34
       34
       +
       val perma : t -> bool

     

       35
       35
       +
       val doi : t -> string option

     

       36
       36
       +
       val synopsis : t -> string option

     

       37
       37
       +
       val titleimage : t -> string option

     

       38
       38
       +
       val slug_ent : t -> string option

     

       39
       39
       +
       val source : t -> string option

     

       40
       40
       +
       val url : t -> string option

     

       41
       41
       +
       val author : t -> string option

     

       42
       42
       +
       val category : t -> string option

     

       43
       43
       +
       val tags : t -> string list

     

       44
       44
       +
       val sidebar : t -> string option

     

       45
       45
       +
       val lookup : string -> t list -> t

     

       46
       46
       +
       val words : t -> int

     

       47
       47
       +
       val of_md : string -> t

     

       48
       48
       +
       val typesense_schema : Ezjsonm.value

     

       49
       49
       +
       val pp : Format.formatter -> t -> unit

+373

stack/bushel/lib/paper.ml

···

       1
       1
       +
       module J = Ezjsonm

     

       2
       2
       +
       

     

       3
       3
       +
       type paper = Ezjsonm.value

     

       4
       4
       +
       

     

       5
       5
       +
       type t =

     

       6
       6
       +
         { slug : string

     

       7
       7
       +
         ; ver : string

     

       8
       8
       +
         ; paper : paper

     

       9
       9
       +
         ; abstract : string

     

       10
       10
       +
         ; latest : bool

     

       11
       11
       +
         }

     

       12
       12
       +
       

     

       13
       13
       +
       type ts = t list

     

       14
       14
       +
       

     

       15
       15
       +
       let key y k = J.find y [ k ]

     

       16
       16
       +
       

     

       17
       17
       +
       let slugs ts =

     

       18
       18
       +
         List.fold_left (fun acc t -> if List.mem t.slug acc then acc else t.slug :: acc) [] ts

     

       19
       19
       +
       ;;

     

       20
       20
       +
       

     

       21
       21
       +
       let slug { slug; _ } = slug

     

       22
       22
       +
       let title { paper; _ } : string = key paper "title" |> J.get_string

     

       23
       23
       +
       let authors { paper; _ } : string list = key paper "author" |> J.get_list J.get_string

     

       24
       24
       +
       

     

       25
       25
       +
       let project_slugs { paper; _ } : string list =

     

       26
       26
       +
         try key paper "projects" |> J.get_list J.get_string with

     

       27
       27
       +
         | _ -> []

     

       28
       28
       +
       ;;

     

       29
       29
       +
       

     

       30
       30
       +
       let slides { paper; _ } : string list =

     

       31
       31
       +
         try key paper "slides" |> J.get_list J.get_string with

     

       32
       32
       +
         | _ -> []

     

       33
       33
       +
       ;;

     

       34
       34
       +
       

     

       35
       35
       +
       let bibtype { paper; _ } : string = key paper "bibtype" |> J.get_string

     

       36
       36
       +
       

     

       37
       37
       +
       let journal { paper; _ } =

     

       38
       38
       +
         try key paper "journal" |> J.get_string with

     

       39
       39
       +
         | Not_found ->

     

       40
       40
       +
           failwith

     

       41
       41
       +
             (Printf.sprintf "no journal found for %s\n%!" (Ezjsonm.value_to_string paper))

     

       42
       42
       +
       ;;

     

       43
       43
       +
       

     

       44
       44
       +
       (** TODO:claude Helper to extract raw JSON *)

     

       45
       45
       +
       let raw_json { paper; _ } = paper

     

       46
       46
       +
       

     

       47
       47
       +
       let doi { paper; _ } =

     

       48
       48
       +
         try Some (key paper "doi" |> J.get_string) with

     

       49
       49
       +
         | _ -> None

     

       50
       50
       +
       ;;

     

       51
       51
       +
       

     

       52
       52
       +
       let volume { paper; _ } =

     

       53
       53
       +
         try Some (key paper "volume" |> J.get_string) with

     

       54
       54
       +
         | _ -> None

     

       55
       55
       +
       ;;

     

       56
       56
       +
       

     

       57
       57
       +
       let video { paper; _ } =

     

       58
       58
       +
         try Some (key paper "video" |> J.get_string) with

     

       59
       59
       +
         | _ -> None

     

       60
       60
       +
       ;;

     

       61
       61
       +
       

     

       62
       62
       +
       let issue { paper; _ } =

     

       63
       63
       +
         try Some (key paper "number" |> J.get_string) with

     

       64
       64
       +
         | _ -> None

     

       65
       65
       +
       ;;

     

       66
       66
       +
       

     

       67
       67
       +
       let url { paper; _ } =

     

       68
       68
       +
         try Some (key paper "url" |> J.get_string) with

     

       69
       69
       +
         | _ -> None

     

       70
       70
       +
       ;;

     

       71
       71
       +
       

     

       72
       72
       +
       let pages { paper; _ } = try key paper "pages" |> J.get_string with _ -> ""

     

       73
       73
       +
       let abstract { abstract; _ } = abstract

     

       74
       74
       +
       

     

       75
       75
       +
       let institution { paper; _ } =

     

       76
       76
       +
         try key paper "institution" |> J.get_string with

     

       77
       77
       +
         | Not_found ->

     

       78
       78
       +
           failwith

     

       79
       79
       +
             (Printf.sprintf "no institution found for %s\n%!" (Ezjsonm.value_to_string paper))

     

       80
       80
       +
       ;;

     

       81
       81
       +
       

     

       82
       82
       +
       let number { paper; _ } =

     

       83
       83
       +
         try Some (key paper "number" |> J.get_string) with

     

       84
       84
       +
         | Not_found -> None

     

       85
       85
       +
       ;;

     

       86
       86
       +
       

     

       87
       87
       +
       let editor { paper; _ } = key paper "editor" |> J.get_string

     

       88
       88
       +
       let isbn { paper; _ } = key paper "isbn" |> J.get_string

     

       89
       89
       +
       let bib { paper; _ } = key paper "bib" |> J.get_string

     

       90
       90
       +
       let year { paper; _ } = key paper "year" |> J.get_string |> int_of_string

     

       91
       91
       +
       

     

       92
       92
       +
       let publisher { paper; _ } =

     

       93
       93
       +
         try key paper "publisher" |> J.get_string with

     

       94
       94
       +
         | Not_found -> ""

     

       95
       95
       +
       ;;

     

       96
       96
       +
       

     

       97
       97
       +
       let booktitle { paper; _ } =

     

       98
       98
       +
         let r = key paper "booktitle" |> J.get_string |> Bytes.of_string in

     

       99
       99
       +
         Bytes.set r 0 (Char.lowercase_ascii (Bytes.get r 0));

     

       100
       100
       +
         String.of_bytes r

     

       101
       101
       +
       ;;

     

       102
       102
       +
       

     

       103
       103
       +
       let date { paper; _ } =

     

       104
       104
       +
         let m =

     

       105
       105
       +
           try

     

       106
       106
       +
             match String.lowercase_ascii (key paper "month" |> J.get_string) with

     

       107
       107
       +
             | "jan" -> 1

     

       108
       108
       +
             | "feb" -> 2

     

       109
       109
       +
             | "mar" -> 3

     

       110
       110
       +
             | "apr" -> 4

     

       111
       111
       +
             | "may" -> 5

     

       112
       112
       +
             | "jun" -> 6

     

       113
       113
       +
             | "jul" -> 7

     

       114
       114
       +
             | "aug" -> 8

     

       115
       115
       +
             | "sep" -> 9

     

       116
       116
       +
             | "oct" -> 10

     

       117
       117
       +
             | "nov" -> 11

     

       118
       118
       +
             | "dec" -> 12

     

       119
       119
       +
             | _ -> 1

     

       120
       120
       +
           with

     

       121
       121
       +
           | Not_found -> 1

     

       122
       122
       +
         in

     

       123
       123
       +
         let y =

     

       124
       124
       +
           try key paper "year" |> J.get_string |> int_of_string with

     

       125
       125
       +
           | Not_found ->

     

       126
       126
       +
             failwith (Printf.sprintf "no year found for %s" (Ezjsonm.value_to_string paper))

     

       127
       127
       +
         in

     

       128
       128
       +
         y, m, 1

     

       129
       129
       +
       ;;

     

       130
       130
       +
       

     

       131
       131
       +
       let datetime p = Option.get @@ Ptime.of_date @@ date p

     

       132
       132
       +
       

     

       133
       133
       +
       let compare p2 p1 =

     

       134
       134
       +
         let d1 =

     

       135
       135
       +
           Ptime.of_date

     

       136
       136
       +
             (try date p1 with

     

       137
       137
       +
              | _ -> 1977, 1, 1)

     

       138
       138
       +
           |> Option.get

     

       139
       139
       +
         in

     

       140
       140
       +
         let d2 =

     

       141
       141
       +
           Ptime.of_date

     

       142
       142
       +
             (try date p2 with

     

       143
       143
       +
              | _ -> 1977, 1, 1)

     

       144
       144
       +
           |> Option.get

     

       145
       145
       +
         in

     

       146
       146
       +
         Ptime.compare d1 d2

     

       147
       147
       +
       ;;

     

       148
       148
       +
       

     

       149
       149
       +
       let get_papers ~slug ts =

     

       150
       150
       +
         List.filter (fun p -> p.slug = slug && p.latest <> true) ts |> List.sort compare

     

       151
       151
       +
       ;;

     

       152
       152
       +
       

     

       153
       153
       +
       let read_file file = In_channel.(with_open_bin file input_all)

     

       154
       154
       +
       

     

       155
       155
       +
       let of_md ~slug ~ver fname =

     

       156
       156
       +
         (* TODO fix Jekyll_post to not error on no date *)

     

       157
       157
       +
         let fname' = "2000-01-01-" ^ Filename.basename fname in

     

       158
       158
       +
         match Jekyll_post.of_string ~fname:fname' (read_file fname) with

     

       159
       159
       +
         | Error (`Msg m) -> failwith ("paper_of_md: " ^ m)

     

       160
       160
       +
         | Ok jp ->

     

       161
       161
       +
           let fields = jp.Jekyll_post.fields |> Jekyll_format.fields_to_yaml in

     

       162
       162
       +
           let { Jekyll_post.body; _ } = jp in

     

       163
       163
       +
           { slug; ver; abstract = body; paper = fields; latest = false }

     

       164
       164
       +
       ;;

     

       165
       165
       +
       

     

       166
       166
       +
       let tv (l : t list) =

     

       167
       167
       +
         let h = Hashtbl.create 7 in

     

       168
       168
       +
         List.iter

     

       169
       169
       +
           (fun { slug; ver; _ } ->

     

       170
       170
       +
             match Hashtbl.find_opt h slug with

     

       171
       171
       +
             | None -> Hashtbl.add h slug [ ver ]

     

       172
       172
       +
             | Some l ->

     

       173
       173
       +
               let l = ver :: l in

     

       174
       174
       +
               let l = List.sort Stdlib.compare l in

     

       175
       175
       +
               Hashtbl.replace h slug l)

     

       176
       176
       +
           l;

     

       177
       177
       +
         List.map

     

       178
       178
       +
           (fun p ->

     

       179
       179
       +
             let latest = Hashtbl.find h p.slug |> List.rev |> List.hd in

     

       180
       180
       +
             let latest = p.ver = latest in

     

       181
       181
       +
             { p with latest })

     

       182
       182
       +
           l

     

       183
       183
       +
       ;;

     

       184
       184
       +
       

     

       185
       185
       +
       let lookup ts slug = List.find_opt (fun t -> t.slug = slug && t.latest) ts

     

       186
       186
       +
       

     

       187
       187
       +
       let tag_of_bibtype bt =

     

       188
       188
       +
         match String.lowercase_ascii bt with

     

       189
       189
       +
         | "article" -> "journal"

     

       190
       190
       +
         | "inproceedings" -> "conference"

     

       191
       191
       +
         | "techreport" -> "report"

     

       192
       192
       +
         | "misc" -> "preprint"

     

       193
       193
       +
         | "book" -> "book"

     

       194
       194
       +
         | x -> x

     

       195
       195
       +
       ;;

     

       196
       196
       +
       

     

       197
       197
       +
       let tags { paper; _ } =

     

       198
       198
       +
         let tags f =

     

       199
       199
       +
           try key paper f |> J.get_list J.get_string with

     

       200
       200
       +
           | _ -> []

     

       201
       201
       +
         in

     

       202
       202
       +
         let core = tags "tags" in

     

       203
       203
       +
         let extra = tags "keywords" in

     

       204
       204
       +
         let projects = tags "projects" in

     

       205
       205
       +
         let ty = [ key paper "bibtype" |> J.get_string |> tag_of_bibtype ] in

     

       206
       206
       +
         List.flatten [ core; extra; ty; projects ]

     

       207
       207
       +
       ;;

     

       208
       208
       +
       

     

       209
       209
       +
       let best_url p =

     

       210
       210
       +
         if Sys.file_exists (Printf.sprintf "static/papers/%s.pdf" (slug p))

     

       211
       211
       +
         then Some (Printf.sprintf "/papers/%s.pdf" (slug p))

     

       212
       212
       +
         else url p

     

       213
       213
       +
       ;;

     

       214
       214
       +
       

     

       215
       215
       +
       (** TODO:claude Classification types for papers *)

     

       216
       216
       +
       type classification = Full | Short | Preprint

     

       217
       217
       +
       

     

       218
       218
       +
       let string_of_classification = function

     

       219
       219
       +
         | Full -> "full"

     

       220
       220
       +
         | Short -> "short" 

     

       221
       221
       +
         | Preprint -> "preprint"

     

       222
       222
       +
       

     

       223
       223
       +
       let classification_of_string = function

     

       224
       224
       +
         | "full" -> Full

     

       225
       225
       +
         | "short" -> Short

     

       226
       226
       +
         | "preprint" -> Preprint

     

       227
       227
       +
         | _ -> Full (* default to full if unknown *)

     

       228
       228
       +
       

     

       229
       229
       +
       (** TODO:claude Get classification from paper metadata, with fallback to heuristic *)

     

       230
       230
       +
       let classification { paper; _ } =

     

       231
       231
       +
         try 

     

       232
       232
       +
           key paper "classification" |> J.get_string |> classification_of_string

     

       233
       233
       +
         with _ -> 

     

       234
       234
       +
           (* Fallback to heuristic classification based on venue/bibtype/title *)

     

       235
       235
       +
           let bibtype = try key paper "bibtype" |> J.get_string with _ -> "" in

     

       236
       236
       +
           let journal = try key paper "journal" |> J.get_string |> String.lowercase_ascii with _ -> "" in

     

       237
       237
       +
           let booktitle = try key paper "booktitle" |> J.get_string |> String.lowercase_ascii with _ -> "" in

     

       238
       238
       +
           let title_str = try key paper "title" |> J.get_string |> String.lowercase_ascii with _ -> "" in

     

       239
       239
       +
           

     

       240
       240
       +
           (* Helper function to check if text contains any of the patterns *)

     

       241
       241
       +
           let contains_any text patterns =

     

       242
       242
       +
             List.exists (fun pattern ->

     

       243
       243
       +
               let regex = Re.Perl.compile_pat ~opts:[`Caseless] pattern in

     

       244
       244
       +
               Re.execp regex text

     

       245
       245
       +
             ) patterns

     

       246
       246
       +
           in

     

       247
       247
       +
           

     

       248
       248
       +
           (* Check for preprint indicators *)

     

       249
       249
       +
           let bibtype_lower = String.lowercase_ascii bibtype in

     

       250
       250
       +
           if contains_any journal ["arxiv"] || contains_any booktitle ["arxiv"] || bibtype_lower = "misc" || bibtype_lower = "techreport"

     

       251
       251
       +
           then Preprint

     

       252
       252
       +
           (* Check for workshop/short paper indicators including in title *)

     

       253
       253
       +
           else if contains_any journal ["workshop"; "wip"; "poster"; "demo"; "hotdep"; "short"] ||

     

       254
       254
       +
                   contains_any booktitle ["workshop"; "wip"; "poster"; "demo"; "hotdep"; "short"] ||

     

       255
       255
       +
                   contains_any title_str ["poster"]

     

       256
       256
       +
           then Short

     

       257
       257
       +
           (* Default to full paper (journal or conference) *)

     

       258
       258
       +
           else Full

     

       259
       259
       +
       

     

       260
       260
       +
       (** TODO:claude Check if paper is marked as selected *)

     

       261
       261
       +
       let selected { paper; _ } =

     

       262
       262
       +
         try 

     

       263
       263
       +
           let keys = J.get_dict paper in

     

       264
       264
       +
           match List.assoc_opt "selected" keys with

     

       265
       265
       +
           | Some (`Bool true) -> true

     

       266
       266
       +
           | Some (`String "true") -> true

     

       267
       267
       +
           | _ -> false

     

       268
       268
       +
         with _ -> false

     

       269
       269
       +
       

     

       270
       270
       +
       (** TODO:claude Get note field from paper metadata *)

     

       271
       271
       +
       let note { paper; _ } =

     

       272
       272
       +
         try 

     

       273
       273
       +
           let keys = J.get_dict paper in

     

       274
       274
       +
           match List.assoc_opt "note" keys with

     

       275
       275
       +
           | Some note_json -> Some (J.get_string note_json)

     

       276
       276
       +
           | None -> None

     

       277
       277
       +
         with _ -> None

     

       278
       278
       +
       

     

       279
       279
       +
       (* TODO:claude *)

     

       280
       280
       +
       let to_yaml ?abstract ~ver:_ json_data =

     

       281
       281
       +
         (* Don't add version - it's inferred from filename *)

     

       282
       282
       +
         let frontmatter = Yaml.to_string_exn json_data in

     

       283
       283
       +
         match abstract with

     

       284
       284
       +
         | Some abs -> 

     

       285
       285
       +
           (* Trim leading/trailing whitespace and normalize blank lines *)

     

       286
       286
       +
           let trimmed_abs = String.trim abs in

     

       287
       287
       +
           let normalized_abs = 

     

       288
       288
       +
             (* Replace 3+ consecutive newlines with exactly 2 newlines *)

     

       289
       289
       +
             Re.replace_string (Re.compile (Re.seq [Re.char '\n'; Re.char '\n'; Re.rep1 (Re.char '\n')])) ~by:"\n\n" trimmed_abs

     

       290
       290
       +
           in

     

       291
       291
       +
           if normalized_abs = "" then

     

       292
       292
       +
             Printf.sprintf "---\n%s---\n" frontmatter

     

       293
       293
       +
           else

     

       294
       294
       +
             Printf.sprintf "---\n%s---\n\n%s\n" frontmatter normalized_abs

     

       295
       295
       +
         | None -> Printf.sprintf "---\n%s---\n" frontmatter

     

       296
       296
       +
       

     

       297
       297
       +
       (* TODO:claude *)

     

       298
       298
       +
       let typesense_schema =

     

       299
       299
       +
         let open Ezjsonm in

     

       300
       300
       +
         dict [

     

       301
       301
       +
           ("name", string "papers");

     

       302
       302
       +
           ("fields", list (fun d -> dict d) [

     

       303
       303
       +
             [("name", string "id"); ("type", string "string")];

     

       304
       304
       +
             [("name", string "title"); ("type", string "string")];

     

       305
       305
       +
             [("name", string "authors"); ("type", string "string[]")];

     

       306
       306
       +
             [("name", string "abstract"); ("type", string "string")];

     

       307
       307
       +
             [("name", string "date"); ("type", string "string")];

     

       308
       308
       +
             [("name", string "date_timestamp"); ("type", string "int64")];

     

       309
       309
       +
             [("name", string "tags"); ("type", string "string[]"); ("facet", bool true)];

     

       310
       310
       +
             [("name", string "doi"); ("type", string "string[]"); ("optional", bool true)];

     

       311
       311
       +
             [("name", string "arxiv_id"); ("type", string "string"); ("optional", bool true)];

     

       312
       312
       +
             [("name", string "pdf_url"); ("type", string "string[]"); ("optional", bool true)];

     

       313
       313
       +
             [("name", string "thumbnail_url"); ("type", string "string"); ("optional", bool true)];

     

       314
       314
       +
             [("name", string "journal"); ("type", string "string[]"); ("optional", bool true)];

     

       315
       315
       +
             [("name", string "related_projects"); ("type", string "string[]"); ("optional", bool true)];

     

       316
       316
       +
             [("name", string "related_talks"); ("type", string "string[]"); ("optional", bool true)];

     

       317
       317
       +
           ]);

     

       318
       318
       +
           ("default_sorting_field", string "date_timestamp");

     

       319
       319
       +
         ]

     

       320
       320
       +
       

     

       321
       321
       +
       (** TODO:claude Pretty-print a paper with ANSI formatting *)

     

       322
       322
       +
       let pp ppf p =

     

       323
       323
       +
         let open Fmt in

     

       324
       324
       +
         pf ppf "@[<v>";

     

       325
       325
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Paper";

     

       326
       326
       +
         pf ppf "%a: %a@," (styled `Bold string) "Slug" string (slug p);

     

       327
       327
       +
         pf ppf "%a: %a@," (styled `Bold string) "Version" string p.ver;

     

       328
       328
       +
         pf ppf "%a: %a@," (styled `Bold string) "Title" string (title p);

     

       329
       329
       +
         pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Authors" (list ~sep:comma string) (authors p);

     

       330
       330
       +
         pf ppf "%a: %a@," (styled `Bold string) "Year" int (year p);

     

       331
       331
       +
         pf ppf "%a: %a@," (styled `Bold string) "Bibtype" string (bibtype p);

     

       332
       332
       +
         (match doi p with

     

       333
       333
       +
          | Some d -> pf ppf "%a: %a@," (styled `Bold string) "DOI" string d

     

       334
       334
       +
          | None -> ());

     

       335
       335
       +
         (match url p with

     

       336
       336
       +
          | Some u -> pf ppf "%a: %a@," (styled `Bold string) "URL" string u

     

       337
       337
       +
          | None -> ());

     

       338
       338
       +
         (match video p with

     

       339
       339
       +
          | Some v -> pf ppf "%a: %a@," (styled `Bold string) "Video" string v

     

       340
       340
       +
          | None -> ());

     

       341
       341
       +
         let projs = project_slugs p in

     

       342
       342
       +
         if projs <> [] then

     

       343
       343
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Projects" (list ~sep:comma string) projs;

     

       344
       344
       +
         let sl = slides p in

     

       345
       345
       +
         if sl <> [] then

     

       346
       346
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Slides" (list ~sep:comma string) sl;

     

       347
       347
       +
         (match bibtype p with

     

       348
       348
       +
          | "article" ->

     

       349
       349
       +
            pf ppf "%a: %a@," (styled `Bold string) "Journal" string (journal p);

     

       350
       350
       +
            (match volume p with

     

       351
       351
       +
             | Some vol -> pf ppf "%a: %a@," (styled `Bold string) "Volume" string vol

     

       352
       352
       +
             | None -> ());

     

       353
       353
       +
            (match issue p with

     

       354
       354
       +
             | Some iss -> pf ppf "%a: %a@," (styled `Bold string) "Issue" string iss

     

       355
       355
       +
             | None -> ());

     

       356
       356
       +
            let pgs = pages p in

     

       357
       357
       +
            if pgs <> "" then

     

       358
       358
       +
              pf ppf "%a: %a@," (styled `Bold string) "Pages" string pgs;

     

       359
       359
       +
          | "inproceedings" ->

     

       360
       360
       +
            pf ppf "%a: %a@," (styled `Bold string) "Booktitle" string (booktitle p);

     

       361
       361
       +
            let pgs = pages p in

     

       362
       362
       +
            if pgs <> "" then

     

       363
       363
       +
              pf ppf "%a: %a@," (styled `Bold string) "Pages" string pgs;

     

       364
       364
       +
          | "techreport" ->

     

       365
       365
       +
            pf ppf "%a: %a@," (styled `Bold string) "Institution" string (institution p);

     

       366
       366
       +
            (match number p with

     

       367
       367
       +
             | Some num -> pf ppf "%a: %a@," (styled `Bold string) "Number" string num

     

       368
       368
       +
             | None -> ());

     

       369
       369
       +
          | _ -> ());

     

       370
       370
       +
         pf ppf "@,";

     

       371
       371
       +
         pf ppf "%a:@," (styled `Bold string) "Abstract";

     

       372
       372
       +
         pf ppf "%a@," (styled `Faint string) (abstract p);

     

       373
       373
       +
         pf ppf "@]"

+55

stack/bushel/lib/paper.mli

···

       1
       1
       +
       type paper

     

       2
       2
       +
       

     

       3
       3
       +
       type t =

     

       4
       4
       +
         { slug : string

     

       5
       5
       +
         ; ver : string

     

       6
       6
       +
         ; paper : paper

     

       7
       7
       +
         ; abstract : string

     

       8
       8
       +
         ; latest : bool

     

       9
       9
       +
         }

     

       10
       10
       +
       

     

       11
       11
       +
       type ts = t list

     

       12
       12
       +
       

     

       13
       13
       +
       val tv : t list -> ts

     

       14
       14
       +
       val slug : t -> string

     

       15
       15
       +
       val title : t -> string

     

       16
       16
       +
       val authors : t -> string list

     

       17
       17
       +
       val project_slugs : t -> string list

     

       18
       18
       +
       val slides : t -> string list

     

       19
       19
       +
       val bibtype : t -> string

     

       20
       20
       +
       val journal : t -> string

     

       21
       21
       +
       val raw_json : t -> Ezjsonm.value

     

       22
       22
       +
       val doi : t -> string option

     

       23
       23
       +
       val volume : t -> string option

     

       24
       24
       +
       val video : t -> string option

     

       25
       25
       +
       val issue : t -> string option

     

       26
       26
       +
       val url : t -> string option

     

       27
       27
       +
       val best_url : t -> string option

     

       28
       28
       +
       val pages : t -> string

     

       29
       29
       +
       val abstract : t -> string

     

       30
       30
       +
       val institution : t -> string

     

       31
       31
       +
       val number : t -> string option

     

       32
       32
       +
       val editor : t -> string

     

       33
       33
       +
       val isbn : t -> string

     

       34
       34
       +
       val bib : t -> string

     

       35
       35
       +
       val year : t -> int

     

       36
       36
       +
       val publisher : t -> string

     

       37
       37
       +
       val booktitle : t -> string

     

       38
       38
       +
       val tags : t -> string list

     

       39
       39
       +
       val date : t -> int * int * int

     

       40
       40
       +
       val datetime : t -> Ptime.t

     

       41
       41
       +
       val compare : t -> t -> int

     

       42
       42
       +
       val get_papers : slug:string -> ts -> ts

     

       43
       43
       +
       val slugs : ts -> string list

     

       44
       44
       +
       val lookup : ts -> string -> t option

     

       45
       45
       +
       val of_md : slug:string -> ver:string -> string -> t

     

       46
       46
       +
       val to_yaml : ?abstract:string -> ver:string -> Ezjsonm.value -> string

     

       47
       47
       +
       val typesense_schema : Ezjsonm.value

     

       48
       48
       +
       

     

       49
       49
       +
       type classification = Full | Short | Preprint

     

       50
       50
       +
       val string_of_classification : classification -> string

     

       51
       51
       +
       val classification_of_string : string -> classification

     

       52
       52
       +
       val classification : t -> classification

     

       53
       53
       +
       val selected : t -> bool

     

       54
       54
       +
       val note : t -> string option

     

       55
       55
       +
       val pp : Format.formatter -> t -> unit

+100

stack/bushel/lib/project.ml

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { slug : string

     

       3
       3
       +
         ; title : string

     

       4
       4
       +
         ; start : int (* year *)

     

       5
       5
       +
         ; finish : int option

     

       6
       6
       +
         ; tags : string list

     

       7
       7
       +
         ; ideas : string

     

       8
       8
       +
         ; body : string

     

       9
       9
       +
         }

     

       10
       10
       +
       

     

       11
       11
       +
       type ts = t list

     

       12
       12
       +
       

     

       13
       13
       +
       let tags p = p.tags

     

       14
       14
       +
       

     

       15
       15
       +
       let compare a b =

     

       16
       16
       +
         match compare a.start b.start with

     

       17
       17
       +
         | 0 -> compare b.finish a.finish

     

       18
       18
       +
         | n -> n

     

       19
       19
       +
       ;;

     

       20
       20
       +
       

     

       21
       21
       +
       let title { title; _ } = title

     

       22
       22
       +
       let body { body; _ } = body

     

       23
       23
       +
       let ideas { ideas; _ } = ideas

     

       24
       24
       +
       

     

       25
       25
       +
       let of_md fname =

     

       26
       26
       +
         match Jekyll_post.of_string ~fname (Util.read_file fname) with

     

       27
       27
       +
         | Error (`Msg m) -> failwith ("Project.of_file: " ^ m)

     

       28
       28
       +
         | Ok jp ->

     

       29
       29
       +
           let fields = jp.Jekyll_post.fields in

     

       30
       30
       +
           let { Jekyll_post.title; date; slug; body; _ } = jp in

     

       31
       31
       +
           let (start, _, _), _ = Ptime.to_date_time date in

     

       32
       32
       +
           let finish =

     

       33
       33
       +
             match Jekyll_format.find "finish" fields with

     

       34
       34
       +
             | Some (`String date) ->

     

       35
       35
       +
               let date = Jekyll_format.parse_date_exn date in

     

       36
       36
       +
               let (finish, _, _), _ = Ptime.to_date_time date in

     

       37
       37
       +
               Some finish

     

       38
       38
       +
             | _ -> None

     

       39
       39
       +
           in

     

       40
       40
       +
           let ideas =

     

       41
       41
       +
             match Jekyll_format.find "ideas" fields with

     

       42
       42
       +
             | Some (`String e) -> e

     

       43
       43
       +
             | _ -> failwith ("no ideas key in " ^ fname)

     

       44
       44
       +
           in

     

       45
       45
       +
           let tags =

     

       46
       46
       +
             match Jekyll_format.find "tags" fields with

     

       47
       47
       +
             | Some (`A tags) -> List.map Yaml.Util.to_string_exn tags

     

       48
       48
       +
             | _ -> []

     

       49
       49
       +
           in

     

       50
       50
       +
           { slug; title; start; finish; ideas; tags; body }

     

       51
       51
       +
       ;;

     

       52
       52
       +
       

     

       53
       53
       +
       let lookup projects slug = List.find_opt (fun p -> p.slug = slug) projects

     

       54
       54
       +
       

     

       55
       55
       +
       (* TODO:claude *)

     

       56
       56
       +
       let typesense_schema =

     

       57
       57
       +
         let open Ezjsonm in

     

       58
       58
       +
         dict [

     

       59
       59
       +
           ("name", string "projects");

     

       60
       60
       +
           ("fields", list (fun d -> dict d) [

     

       61
       61
       +
             [("name", string "id"); ("type", string "string")];

     

       62
       62
       +
             [("name", string "title"); ("type", string "string")];

     

       63
       63
       +
             [("name", string "description"); ("type", string "string")];

     

       64
       64
       +
             [("name", string "start_year"); ("type", string "int32")];

     

       65
       65
       +
             [("name", string "finish_year"); ("type", string "int32"); ("optional", bool true)];

     

       66
       66
       +
             [("name", string "date"); ("type", string "string")];

     

       67
       67
       +
             [("name", string "date_timestamp"); ("type", string "int64")];

     

       68
       68
       +
             [("name", string "tags"); ("type", string "string[]"); ("facet", bool true)];

     

       69
       69
       +
             [("name", string "repository_url"); ("type", string "string"); ("optional", bool true)];

     

       70
       70
       +
             [("name", string "homepage_url"); ("type", string "string"); ("optional", bool true)];

     

       71
       71
       +
             [("name", string "languages"); ("type", string "string[]"); ("facet", bool true); ("optional", bool true)];

     

       72
       72
       +
             [("name", string "license"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       73
       73
       +
             [("name", string "status"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       74
       74
       +
             [("name", string "related_papers"); ("type", string "string[]"); ("optional", bool true)];

     

       75
       75
       +
             [("name", string "related_talks"); ("type", string "string[]"); ("optional", bool true)];

     

       76
       76
       +
             [("name", string "body"); ("type", string "string"); ("optional", bool true)];

     

       77
       77
       +
             [("name", string "ideas"); ("type", string "string"); ("optional", bool true)];

     

       78
       78
       +
           ]);

     

       79
       79
       +
           ("default_sorting_field", string "date_timestamp");

     

       80
       80
       +
         ]

     

       81
       81
       +
       

     

       82
       82
       +
       (** TODO:claude Pretty-print a project with ANSI formatting *)

     

       83
       83
       +
       let pp ppf p =

     

       84
       84
       +
         let open Fmt in

     

       85
       85
       +
         pf ppf "@[<v>";

     

       86
       86
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Project";

     

       87
       87
       +
         pf ppf "%a: %a@," (styled `Bold string) "Slug" string p.slug;

     

       88
       88
       +
         pf ppf "%a: %a@," (styled `Bold string) "Title" string (title p);

     

       89
       89
       +
         pf ppf "%a: %d@," (styled `Bold string) "Start" p.start;

     

       90
       90
       +
         (match p.finish with

     

       91
       91
       +
          | Some year -> pf ppf "%a: %d@," (styled `Bold string) "Finish" year

     

       92
       92
       +
          | None -> pf ppf "%a: ongoing@," (styled `Bold string) "Finish");

     

       93
       93
       +
         let t = tags p in

     

       94
       94
       +
         if t <> [] then

     

       95
       95
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Tags" (list ~sep:comma string) t;

     

       96
       96
       +
         pf ppf "%a: %a@," (styled `Bold string) "Ideas" string (ideas p);

     

       97
       97
       +
         pf ppf "@,";

     

       98
       98
       +
         pf ppf "%a:@," (styled `Bold string) "Body";

     

       99
       99
       +
         pf ppf "%a@," string (body p);

     

       100
       100
       +
         pf ppf "@]"

+21

stack/bushel/lib/project.mli

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { slug : string

     

       3
       3
       +
         ; title : string

     

       4
       4
       +
         ; start : int

     

       5
       5
       +
         ; finish : int option

     

       6
       6
       +
         ; tags : string list

     

       7
       7
       +
         ; ideas : string

     

       8
       8
       +
         ; body : string

     

       9
       9
       +
         }

     

       10
       10
       +
       

     

       11
       11
       +
       type ts = t list

     

       12
       12
       +
       

     

       13
       13
       +
       val title : t -> string

     

       14
       14
       +
       val body : t -> string

     

       15
       15
       +
       val ideas : t -> string

     

       16
       16
       +
       val lookup : t list -> string -> t option

     

       17
       17
       +
       val tags : t -> string list

     

       18
       18
       +
       val compare : t -> t -> int

     

       19
       19
       +
       val of_md : string -> t

     

       20
       20
       +
       val typesense_schema : Ezjsonm.value

     

       21
       21
       +
       val pp : Format.formatter -> t -> unit

+44

stack/bushel/lib/srcsetter.ml

···

       1
       1
       +
       module MS = Map.Make (String)

     

       2
       2
       +
       

     

       3
       3
       +
       type t =

     

       4
       4
       +
         { name : string

     

       5
       5
       +
         ; slug : string

     

       6
       6
       +
         ; origin : string

     

       7
       7
       +
         ; dims : int * int

     

       8
       8
       +
         ; variants : (int * int) MS.t

     

       9
       9
       +
         }

     

       10
       10
       +
       

     

       11
       11
       +
       type ts = t list

     

       12
       12
       +
       

     

       13
       13
       +
       let v name slug origin variants dims = { name; slug; origin; variants; dims }

     

       14
       14
       +
       let slug { slug; _ } = slug

     

       15
       15
       +
       let origin { origin; _ } = origin

     

       16
       16
       +
       let name { name; _ } = name

     

       17
       17
       +
       let dims { dims; _ } = dims

     

       18
       18
       +
       let variants { variants; _ } = variants

     

       19
       19
       +
       

     

       20
       20
       +
       let dims_json_t =

     

       21
       21
       +
         let open Jsont in

     

       22
       22
       +
         let dec x y = x, y in

     

       23
       23
       +
         let enc (w, h) = function

     

       24
       24
       +
           | 0 -> w

     

       25
       25
       +
           | _ -> h

     

       26
       26
       +
         in

     

       27
       27
       +
         t2 ~dec ~enc uint16

     

       28
       28
       +
       ;;

     

       29
       29
       +
       

     

       30
       30
       +
       let json_t =

     

       31
       31
       +
         let open Jsont in

     

       32
       32
       +
         let open Jsont.Object in

     

       33
       33
       +
         map ~kind:"Entry" v

     

       34
       34
       +
         |> mem "name" string ~enc:name

     

       35
       35
       +
         |> mem "slug" string ~enc:slug

     

       36
       36
       +
         |> mem "origin" string ~enc:origin

     

       37
       37
       +
         |> mem "variants" (as_string_map dims_json_t) ~enc:variants

     

       38
       38
       +
         |> mem "dims" dims_json_t ~enc:dims

     

       39
       39
       +
         |> finish

     

       40
       40
       +
       ;;

     

       41
       41
       +
       

     

       42
       42
       +
       let list = Jsont.list json_t

     

       43
       43
       +
       let list_to_json es = Jsont_bytesrw.encode_string list ~format:Jsont.Indent es

     

       44
       44
       +
       let list_of_json = Jsont_bytesrw.decode_string list

+21

stack/bushel/lib/srcsetter.mli

···

       1
       1
       +
       module MS : Map.S with type key = string

     

       2
       2
       +
       

     

       3
       3
       +
       type t =

     

       4
       4
       +
         { name : string

     

       5
       5
       +
         ; slug : string

     

       6
       6
       +
         ; origin : string

     

       7
       7
       +
         ; dims : int * int

     

       8
       8
       +
         ; variants : (int * int) MS.t

     

       9
       9
       +
         }

     

       10
       10
       +
       

     

       11
       11
       +
       type ts = t list

     

       12
       12
       +
       

     

       13
       13
       +
       val origin : t -> string

     

       14
       14
       +
       val slug : t -> string

     

       15
       15
       +
       val name : t -> string

     

       16
       16
       +
       val dims : t -> int * int

     

       17
       17
       +
       val variants : t -> (int * int) MS.t

     

       18
       18
       +
       val list_to_json : t list -> (string, string) result

     

       19
       19
       +
       val list_of_json : string -> (t list, string) result

     

       20
       20
       +
       val json_t : t Jsont.t

     

       21
       21
       +
       val list : t list Jsont.t

+114

stack/bushel/lib/tags.ml

···

       1
       1
       +
       open Entry

     

       2
       2
       +
       

     

       3
       3
       +
       type t =

     

       4
       4
       +
         [ `Slug of string (* :foo points to the specific slug foo *)

     

       5
       5
       +
         | `Contact of string (* @foo points to contact foo *)

     

       6
       6
       +
         | `Set of string (* #papers points to all Paper entries *)

     

       7
       7
       +
         | `Text of string (* foo points to a free text "foo" *)

     

       8
       8
       +
         | `Year of int (* a number between 1900--2100 is interpreted as a year *)

     

       9
       9
       +
         ]

     

       10
       10
       +
       

     

       11
       11
       +
       let is_text = function

     

       12
       12
       +
         | `Text _ -> true

     

       13
       13
       +
         | _ -> false

     

       14
       14
       +
       ;;

     

       15
       15
       +
       

     

       16
       16
       +
       let is_slug = function

     

       17
       17
       +
         | `Slug _ -> true

     

       18
       18
       +
         | _ -> false

     

       19
       19
       +
       ;;

     

       20
       20
       +
       

     

       21
       21
       +
       let is_set = function

     

       22
       22
       +
         | `Set _ -> true

     

       23
       23
       +
         | _ -> false

     

       24
       24
       +
       ;;

     

       25
       25
       +
       

     

       26
       26
       +
       let is_year = function

     

       27
       27
       +
         | `Year _ -> true

     

       28
       28
       +
         | _ -> false

     

       29
       29
       +
       ;;

     

       30
       30
       +
       

     

       31
       31
       +
       let of_string s : t =

     

       32
       32
       +
         if String.length s < 2 then invalid_arg ("Tag.of_string: " ^ s);

     

       33
       33
       +
         match s.[0] with

     

       34
       34
       +
         | ':' ->

     

       35
       35
       +
           let slug = String.sub s 1 (String.length s - 1) in

     

       36
       36
       +
           `Slug slug

     

       37
       37
       +
         | '@' -> failwith "TODO add contacts to entries"

     

       38
       38
       +
         | '#' ->

     

       39
       39
       +
           let cl = String.sub s 1 (String.length s - 1) in

     

       40
       40
       +
           `Set cl

     

       41
       41
       +
         | _ ->

     

       42
       42
       +
           (try

     

       43
       43
       +
              let x = int_of_string s in

     

       44
       44
       +
              if x > 1900 && x < 2100 then `Year x else `Text s

     

       45
       45
       +
            with

     

       46
       46
       +
            | _ -> `Text s)

     

       47
       47
       +
       ;;

     

       48
       48
       +
       

     

       49
       49
       +
       let of_string_list l = List.map of_string l

     

       50
       50
       +
       

     

       51
       51
       +
       let to_string = function

     

       52
       52
       +
         | `Slug t -> ":" ^ t

     

       53
       53
       +
         | `Contact c -> "@" ^ c

     

       54
       54
       +
         | `Set s -> "#" ^ s

     

       55
       55
       +
         | `Text t -> t

     

       56
       56
       +
         | `Year y -> string_of_int y

     

       57
       57
       +
       ;;

     

       58
       58
       +
       

     

       59
       59
       +
       let to_raw_string = function

     

       60
       60
       +
         | `Slug t -> t

     

       61
       61
       +
         | `Contact c -> c

     

       62
       62
       +
         | `Set s -> s

     

       63
       63
       +
         | `Text t -> t

     

       64
       64
       +
         | `Year y -> string_of_int y

     

       65
       65
       +
       ;;

     

       66
       66
       +
       

     

       67
       67
       +
       let pp ppf t = Fmt.string ppf (to_string t)

     

       68
       68
       +
       

     

       69
       69
       +
       let tags_of_ent _entries ent : t list =

     

       70
       70
       +
         match ent with

     

       71
       71
       +
         | `Paper p -> of_string_list @@ Paper.tags p

     

       72
       72
       +
         | `Video v -> of_string_list v.Video.tags

     

       73
       73
       +
         | `Project p -> of_string_list @@ Project.tags p

     

       74
       74
       +
         | `Note n -> of_string_list @@ Note.tags n

     

       75
       75
       +
         | `Idea i -> of_string_list i.Idea.tags

     

       76
       76
       +
       ;;

     

       77
       77
       +
       

     

       78
       78
       +
       let mentions tags =

     

       79
       79
       +
         List.filter

     

       80
       80
       +
           (function

     

       81
       81
       +
             | `Contact _ | `Slug _ -> true

     

       82
       82
       +
             | _ -> false)

     

       83
       83
       +
           tags

     

       84
       84
       +
       ;;

     

       85
       85
       +
       

     

       86
       86
       +
       let mention_entries entries tags =

     

       87
       87
       +
         let lk t =

     

       88
       88
       +
          try Some (lookup_exn entries t)

     

       89
       89
       +
          with Not_found -> Printf.eprintf "mention_entries not found: %s\n%!" t; None

     

       90
       90
       +
         in

     

       91
       91
       +
         List.filter_map

     

       92
       92
       +
           (function

     

       93
       93
       +
             | `Slug t -> lk t

     

       94
       94
       +
             | _ -> None)

     

       95
       95
       +
           tags

     

       96
       96
       +
       ;;

     

       97
       97
       +
       

     

       98
       98
       +
       let count_tags ?h fn vs =

     

       99
       99
       +
         let h =

     

       100
       100
       +
           match h with

     

       101
       101
       +
           | Some h -> h

     

       102
       102
       +
           | None -> Hashtbl.create 42

     

       103
       103
       +
         in

     

       104
       104
       +
         List.iter

     

       105
       105
       +
           (fun ent ->

     

       106
       106
       +
             List.iter

     

       107
       107
       +
               (fun tag ->

     

       108
       108
       +
                 match Hashtbl.find_opt h tag with

     

       109
       109
       +
                 | Some num -> Hashtbl.replace h tag (num + 1)

     

       110
       110
       +
                 | None -> Hashtbl.add h tag 1)

     

       111
       111
       +
               (fn ent))

     

       112
       112
       +
           vs;

     

       113
       113
       +
         h

     

       114
       114
       +
       ;;

+25

stack/bushel/lib/tags.mli

···

       1
       1
       +
       type t =

     

       2
       2
       +
         [ `Contact of string

     

       3
       3
       +
         | `Set of string

     

       4
       4
       +
         | `Slug of string

     

       5
       5
       +
         | `Text of string

     

       6
       6
       +
         | `Year of int

     

       7
       7
       +
         ]

     

       8
       8
       +
       

     

       9
       9
       +
       val is_text : t -> bool

     

       10
       10
       +
       val is_set : t -> bool

     

       11
       11
       +
       val is_slug : t -> bool

     

       12
       12
       +
       val is_year : t -> bool

     

       13
       13
       +
       val of_string : string -> t

     

       14
       14
       +
       val to_string : t -> string

     

       15
       15
       +
       val to_raw_string : t -> string

     

       16
       16
       +
       val pp : Format.formatter -> t -> unit

     

       17
       17
       +
       val mention_entries : Entry.t -> t list -> Entry.entry list

     

       18
       18
       +
       val tags_of_ent : Entry.t -> Entry.entry -> t list

     

       19
       19
       +
       val mentions : t list -> t list

     

       20
       20
       +
       

     

       21
       21
       +
       val count_tags

     

       22
       22
       +
         :  ?h:('a, int) Hashtbl.t

     

       23
       23
       +
         -> ('b -> 'a list)

     

       24
       24
       +
         -> 'b list

     

       25
       25
       +
         -> ('a, int) Hashtbl.t

+518

stack/bushel/lib/typesense.ml

···

       1
       1
       +
       open Lwt.Syntax

     

       2
       2
       +
       open Cohttp_lwt_unix

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Typesense API client for Bushel *)

     

       5
       5
       +
       

     

       6
       6
       +
       type config = {

     

       7
       7
       +
         endpoint : string;

     

       8
       8
       +
         api_key : string;

     

       9
       9
       +
         openai_key : string;

     

       10
       10
       +
       }

     

       11
       11
       +
       

     

       12
       12
       +
       type error = 

     

       13
       13
       +
         | Http_error of int * string

     

       14
       14
       +
         | Json_error of string

     

       15
       15
       +
         | Connection_error of string

     

       16
       16
       +
       

     

       17
       17
       +
       let pp_error fmt = function

     

       18
       18
       +
         | Http_error (code, msg) -> Fmt.pf fmt "HTTP %d: %s" code msg

     

       19
       19
       +
         | Json_error msg -> Fmt.pf fmt "JSON error: %s" msg

     

       20
       20
       +
         | Connection_error msg -> Fmt.pf fmt "Connection error: %s" msg

     

       21
       21
       +
       

     

       22
       22
       +
       (** TODO:claude Create authentication headers for Typesense API *)

     

       23
       23
       +
       let auth_headers api_key =

     

       24
       24
       +
         Cohttp.Header.of_list [

     

       25
       25
       +
           ("X-TYPESENSE-API-KEY", api_key);

     

       26
       26
       +
           ("Content-Type", "application/json");

     

       27
       27
       +
         ]

     

       28
       28
       +
       

     

       29
       29
       +
       (** TODO:claude Make HTTP request to Typesense API *)

     

       30
       30
       +
       let make_request ?(meth=`GET) ?(body="") config path =

     

       31
       31
       +
         let uri = Uri.of_string (config.endpoint ^ path) in

     

       32
       32
       +
         let headers = auth_headers config.api_key in

     

       33
       33
       +
         let body = if body = "" then `Empty else `String body in

     

       34
       34
       +
         Lwt.catch (fun () ->

     

       35
       35
       +
           let* resp, body = Client.call ~headers ~body meth uri in

     

       36
       36
       +
           let status = Cohttp.Code.code_of_status (Response.status resp) in

     

       37
       37
       +
           let* body_str = Cohttp_lwt.Body.to_string body in

     

       38
       38
       +
           if status >= 200 && status < 300 then

     

       39
       39
       +
             Lwt.return_ok body_str

     

       40
       40
       +
           else

     

       41
       41
       +
             Lwt.return_error (Http_error (status, body_str))

     

       42
       42
       +
         ) (fun exn ->

     

       43
       43
       +
           Lwt.return_error (Connection_error (Printexc.to_string exn))

     

       44
       44
       +
         )

     

       45
       45
       +
       

     

       46
       46
       +
       (** TODO:claude Create a collection with given schema *)

     

       47
       47
       +
       let create_collection config (schema : Ezjsonm.value) =

     

       48
       48
       +
         let body = Ezjsonm.value_to_string schema in

     

       49
       49
       +
         make_request ~meth:`POST ~body config "/collections"

     

       50
       50
       +
       

     

       51
       51
       +
       (** TODO:claude Check if collection exists *)

     

       52
       52
       +
       let collection_exists config name =

     

       53
       53
       +
         let* result = make_request config ("/collections/" ^ name) in

     

       54
       54
       +
         match result with

     

       55
       55
       +
         | Ok _ -> Lwt.return true

     

       56
       56
       +
         | Error (Http_error (404, _)) -> Lwt.return false

     

       57
       57
       +
         | Error _ -> Lwt.return false

     

       58
       58
       +
       

     

       59
       59
       +
       (** TODO:claude Delete a collection *)

     

       60
       60
       +
       let delete_collection config name =

     

       61
       61
       +
         make_request ~meth:`DELETE config ("/collections/" ^ name)

     

       62
       62
       +
       

     

       63
       63
       +
       (** TODO:claude Upload documents to a collection in batch *)

     

       64
       64
       +
       let upload_documents config collection_name (documents : Ezjsonm.value list) =

     

       65
       65
       +
         let jsonl_lines = List.map (fun doc -> Ezjsonm.value_to_string doc) documents in

     

       66
       66
       +
         let body = String.concat "\n" jsonl_lines in

     

       67
       67
       +
         make_request ~meth:`POST ~body config 

     

       68
       68
       +
           (Printf.sprintf "/collections/%s/documents/import?action=upsert" collection_name)

     

       69
       69
       +
       

     

       70
       70
       +
       

     

       71
       71
       +
       (** TODO:claude Convert Bushel objects to Typesense documents *)

     

       72
       72
       +
       

     

       73
       73
       +
       (** TODO:claude Helper function to truncate long strings for embedding *)

     

       74
       74
       +
       let truncate_for_embedding ?(max_chars=20000) text =

     

       75
       75
       +
         if String.length text <= max_chars then text

     

       76
       76
       +
         else String.sub text 0 max_chars

     

       77
       77
       +
       

     

       78
       78
       +
       (** TODO:claude Helper function to convert Ptime to Unix timestamp *)

     

       79
       79
       +
       let ptime_to_timestamp ptime =

     

       80
       80
       +
         let span = Ptime.to_span ptime in

     

       81
       81
       +
         let seconds = Ptime.Span.to_int_s span in

     

       82
       82
       +
         match seconds with

     

       83
       83
       +
         | Some s -> Int64.of_int s

     

       84
       84
       +
         | None -> 0L

     

       85
       85
       +
       

     

       86
       86
       +
       (** TODO:claude Helper function to convert date tuple to Unix timestamp *)

     

       87
       87
       +
       let date_to_timestamp (year, month, day) =

     

       88
       88
       +
         match Ptime.of_date (year, month, day) with

     

       89
       89
       +
         | Some ptime -> ptime_to_timestamp ptime

     

       90
       90
       +
         | None -> 0L

     

       91
       91
       +
       

     

       92
       92
       +
       (** Resolve author handles to full names in a list *)

     

       93
       93
       +
       let resolve_author_list contacts authors =

     

       94
       94
       +
         List.map (fun author ->

     

       95
       95
       +
           (* Strip '@' prefix if present *)

     

       96
       96
       +
           let handle =

     

       97
       97
       +
             if String.length author > 0 && author.[0] = '@' then

     

       98
       98
       +
               String.sub author 1 (String.length author - 1)

     

       99
       99
       +
             else

     

       100
       100
       +
               author

     

       101
       101
       +
           in

     

       102
       102
       +
           (* Try to look up as a contact handle *)

     

       103
       103
       +
           match Contact.find_by_handle contacts handle with

     

       104
       104
       +
           | Some contact -> Contact.name contact

     

       105
       105
       +
           | None -> author (* Keep original if not found *)

     

       106
       106
       +
         ) authors

     

       107
       107
       +
       

     

       108
       108
       +
       let contact_to_document (contact : Contact.t) =

     

       109
       109
       +
         let open Ezjsonm in

     

       110
       110
       +
         let safe_string_list_from_opt = function

     

       111
       111
       +
           | Some s -> [s]

     

       112
       112
       +
           | None -> []

     

       113
       113
       +
         in

     

       114
       114
       +
         dict [

     

       115
       115
       +
           ("id", string (Contact.handle contact));

     

       116
       116
       +
           ("handle", string (Contact.handle contact));

     

       117
       117
       +
           ("name", string (Contact.name contact));

     

       118
       118
       +
           ("names", list string (Contact.names contact));

     

       119
       119
       +
           ("email", list string (safe_string_list_from_opt (Contact.email contact)));

     

       120
       120
       +
           ("icon", list string (safe_string_list_from_opt (Contact.icon contact)));

     

       121
       121
       +
           ("github", list string (safe_string_list_from_opt (Contact.github contact)));

     

       122
       122
       +
           ("twitter", list string (safe_string_list_from_opt (Contact.twitter contact)));

     

       123
       123
       +
           ("url", list string (safe_string_list_from_opt (Contact.url contact)));

     

       124
       124
       +
         ]

     

       125
       125
       +
       

     

       126
       126
       +
       let paper_to_document entries (paper : Paper.t) =

     

       127
       127
       +
         let date_tuple = Paper.date paper in

     

       128
       128
       +
         let contacts = Entry.contacts entries in

     

       129
       129
       +
       

     

       130
       130
       +
         (* Helper to extract string arrays from JSON, handling both single strings and arrays *)

     

       131
       131
       +
         let extract_string_array_from_json json_field_name =

     

       132
       132
       +
           try

     

       133
       133
       +
             (* Access the raw JSON from the paper record *)

     

       134
       134
       +
             let paper_json = Paper.raw_json paper in

     

       135
       135
       +
             let value = Ezjsonm.get_dict paper_json |> List.assoc json_field_name in

     

       136
       136
       +
             match value with

     

       137
       137
       +
             | `String s -> [s]

     

       138
       138
       +
             | `A l -> List.filter_map (function `String s -> Some s | _ -> None) l

     

       139
       139
       +
             | _ -> []

     

       140
       140
       +
           with _ -> []

     

       141
       141
       +
         in

     

       142
       142
       +
       

     

       143
       143
       +
         (* Resolve author handles to full names *)

     

       144
       144
       +
         let authors = resolve_author_list contacts (Paper.authors paper) in

     

       145
       145
       +
       

     

       146
       146
       +
         (* Convert abstract markdown to plain text *)

     

       147
       147
       +
         let abstract = Md.markdown_to_plaintext entries (Paper.abstract paper) |> truncate_for_embedding in

     

       148
       148
       +
       

     

       149
       149
       +
         (* Extract publication metadata *)

     

       150
       150
       +
         let bibtype = Paper.bibtype paper in

     

       151
       151
       +
         let metadata =

     

       152
       152
       +
           try

     

       153
       153
       +
             match bibtype with

     

       154
       154
       +
             | "article" -> Printf.sprintf "Journal: %s" (Paper.journal paper)

     

       155
       155
       +
             | "inproceedings" -> Printf.sprintf "Proceedings: %s" (Paper.journal paper)

     

       156
       156
       +
             | "misc" | "techreport" -> Printf.sprintf "Preprint: %s" (Paper.journal paper)

     

       157
       157
       +
             | _ -> Printf.sprintf "%s: %s" (String.capitalize_ascii bibtype) (Paper.journal paper)

     

       158
       158
       +
           with _ -> bibtype

     

       159
       159
       +
         in

     

       160
       160
       +
       

     

       161
       161
       +
         (* Get bibtex from raw JSON *)

     

       162
       162
       +
         let bibtex =

     

       163
       163
       +
           try

     

       164
       164
       +
             let paper_json = Paper.raw_json paper in

     

       165
       165
       +
             Ezjsonm.get_dict paper_json

     

       166
       166
       +
             |> List.assoc "bibtex"

     

       167
       167
       +
             |> Ezjsonm.get_string

     

       168
       168
       +
           with _ -> ""

     

       169
       169
       +
         in

     

       170
       170
       +
       

     

       171
       171
       +
         let thumbnail_url = Entry.thumbnail entries (`Paper paper) in

     

       172
       172
       +
         Ezjsonm.dict [

     

       173
       173
       +
           ("id", Ezjsonm.string (Paper.slug paper));

     

       174
       174
       +
           ("title", Ezjsonm.string (Paper.title paper));

     

       175
       175
       +
           ("authors", Ezjsonm.list Ezjsonm.string authors);

     

       176
       176
       +
           ("abstract", Ezjsonm.string abstract);

     

       177
       177
       +
           ("metadata", Ezjsonm.string metadata);

     

       178
       178
       +
           ("bibtex", Ezjsonm.string bibtex);

     

       179
       179
       +
           ("date", Ezjsonm.string (let y, m, d = date_tuple in Printf.sprintf "%04d-%02d-%02d" y m d));

     

       180
       180
       +
           ("date_timestamp", Ezjsonm.int64 (date_to_timestamp date_tuple));

     

       181
       181
       +
           ("tags", Ezjsonm.list Ezjsonm.string (Paper.tags paper));

     

       182
       182
       +
           ("doi", Ezjsonm.list Ezjsonm.string (extract_string_array_from_json "doi"));

     

       183
       183
       +
           ("pdf_url", Ezjsonm.list Ezjsonm.string (extract_string_array_from_json "pdf_url"));

     

       184
       184
       +
           ("journal", Ezjsonm.list Ezjsonm.string (extract_string_array_from_json "journal"));

     

       185
       185
       +
           ("related_projects", Ezjsonm.list Ezjsonm.string (Paper.project_slugs paper));

     

       186
       186
       +
           ("thumbnail_url", Ezjsonm.string (Option.value ~default:"" thumbnail_url));

     

       187
       187
       +
         ]

     

       188
       188
       +
       

     

       189
       189
       +
       let project_to_document entries (project : Project.t) =

     

       190
       190
       +
         let open Ezjsonm in

     

       191
       191
       +
         (* Use January 1st of start year as the date for sorting *)

     

       192
       192
       +
         let date_timestamp = date_to_timestamp (project.start, 1, 1) in

     

       193
       193
       +
       

     

       194
       194
       +
         (* Convert body markdown to plain text *)

     

       195
       195
       +
         let description = Md.markdown_to_plaintext entries (Project.body project) |> truncate_for_embedding in

     

       196
       196
       +
       

     

       197
       197
       +
         let thumbnail_url = Entry.thumbnail entries (`Project project) in

     

       198
       198
       +
         dict [

     

       199
       199
       +
           ("id", string project.slug);

     

       200
       200
       +
           ("title", string (Project.title project));

     

       201
       201
       +
           ("description", string description);

     

       202
       202
       +
           ("start", int project.start);

     

       203
       203
       +
           ("finish", option int project.finish);

     

       204
       204
       +
           ("start_year", int project.start);

     

       205
       205
       +
           ("date", string (Printf.sprintf "%04d-01-01" project.start));

     

       206
       206
       +
           ("date_timestamp", int64 date_timestamp);

     

       207
       207
       +
           ("tags", list string (Project.tags project));

     

       208
       208
       +
           ("thumbnail_url", string (Option.value ~default:"" thumbnail_url));

     

       209
       209
       +
         ]

     

       210
       210
       +
       

     

       211
       211
       +
       let video_to_document entries (video : Video.t) =

     

       212
       212
       +
         let open Ezjsonm in

     

       213
       213
       +
         let datetime = Video.datetime video in

     

       214
       214
       +
         let safe_string_list_from_opt = function

     

       215
       215
       +
           | Some s -> [s]

     

       216
       216
       +
           | None -> []

     

       217
       217
       +
         in

     

       218
       218
       +
       

     

       219
       219
       +
         (* Convert body markdown to plain text *)

     

       220
       220
       +
         let description = Md.markdown_to_plaintext entries (Video.body video) |> truncate_for_embedding in

     

       221
       221
       +
       

     

       222
       222
       +
         (* Resolve paper and project slugs to titles *)

     

       223
       223
       +
         let paper_title = match Video.paper video with

     

       224
       224
       +
           | Some slug ->

     

       225
       225
       +
               (match Entry.lookup entries slug with

     

       226
       226
       +
                | Some entry -> Some (Entry.title entry)

     

       227
       227
       +
                | None -> Some slug) (* Fallback to slug if not found *)

     

       228
       228
       +
           | None -> None

     

       229
       229
       +
         in

     

       230
       230
       +
         let project_title = match Video.project video with

     

       231
       231
       +
           | Some slug ->

     

       232
       232
       +
               (match Entry.lookup entries slug with

     

       233
       233
       +
                | Some entry -> Some (Entry.title entry)

     

       234
       234
       +
                | None -> Some slug) (* Fallback to slug if not found *)

     

       235
       235
       +
           | None -> None

     

       236
       236
       +
         in

     

       237
       237
       +
       

     

       238
       238
       +
         let thumbnail_url = Entry.thumbnail entries (`Video video) in

     

       239
       239
       +
         dict [

     

       240
       240
       +
           ("id", string (Video.slug video));

     

       241
       241
       +
           ("title", string (Video.title video));

     

       242
       242
       +
           ("description", string description);

     

       243
       243
       +
           ("published_date", string (Ptime.to_rfc3339 datetime));

     

       244
       244
       +
           ("date", string (Ptime.to_rfc3339 datetime));

     

       245
       245
       +
           ("date_timestamp", int64 (ptime_to_timestamp datetime));

     

       246
       246
       +
           ("url", string (Video.url video));

     

       247
       247
       +
           ("uuid", string (Video.uuid video));

     

       248
       248
       +
           ("is_talk", bool (Video.talk video));

     

       249
       249
       +
           ("paper", list string (safe_string_list_from_opt paper_title));

     

       250
       250
       +
           ("project", list string (safe_string_list_from_opt project_title));

     

       251
       251
       +
           ("tags", list string video.tags);

     

       252
       252
       +
           ("thumbnail_url", string (Option.value ~default:"" thumbnail_url));

     

       253
       253
       +
         ]

     

       254
       254
       +
       

     

       255
       255
       +
       let note_to_document entries (note : Note.t) =

     

       256
       256
       +
         let open Ezjsonm in

     

       257
       257
       +
         let datetime = Note.datetime note in

     

       258
       258
       +
         let safe_string_list_from_opt = function

     

       259
       259
       +
           | Some s -> [s]

     

       260
       260
       +
           | None -> []

     

       261
       261
       +
         in

     

       262
       262
       +
       

     

       263
       263
       +
         (* Convert body markdown to plain text *)

     

       264
       264
       +
         let content = Md.markdown_to_plaintext entries (Note.body note) |> truncate_for_embedding in

     

       265
       265
       +
       

     

       266
       266
       +
         let thumbnail_url = Entry.thumbnail entries (`Note note) in

     

       267
       267
       +
         let word_count = Note.words note in

     

       268
       268
       +
         dict [

     

       269
       269
       +
           ("id", string (Note.slug note));

     

       270
       270
       +
           ("title", string (Note.title note));

     

       271
       271
       +
           ("date", string (Ptime.to_rfc3339 datetime));

     

       272
       272
       +
           ("date_timestamp", int64 (ptime_to_timestamp datetime));

     

       273
       273
       +
           ("content", string content);

     

       274
       274
       +
           ("tags", list string (Note.tags note));

     

       275
       275
       +
           ("draft", bool (Note.draft note));

     

       276
       276
       +
           ("synopsis", list string (safe_string_list_from_opt (Note.synopsis note)));

     

       277
       277
       +
           ("thumbnail_url", string (Option.value ~default:"" thumbnail_url));

     

       278
       278
       +
           ("words", int word_count);

     

       279
       279
       +
         ]

     

       280
       280
       +
       

     

       281
       281
       +
       let idea_to_document entries (idea : Idea.t) =

     

       282
       282
       +
         let open Ezjsonm in

     

       283
       283
       +
         let contacts = Entry.contacts entries in

     

       284
       284
       +
         (* Use January 1st of the year as the date for sorting *)

     

       285
       285
       +
         let date_timestamp = date_to_timestamp (Idea.year idea, 1, 1) in

     

       286
       286
       +
       

     

       287
       287
       +
         (* Convert body markdown to plain text *)

     

       288
       288
       +
         let description = Md.markdown_to_plaintext entries (Idea.body idea) |> truncate_for_embedding in

     

       289
       289
       +
       

     

       290
       290
       +
         (* Resolve supervisor and student handles to full names *)

     

       291
       291
       +
         let supervisors = resolve_author_list contacts (Idea.supervisors idea) in

     

       292
       292
       +
         let students = resolve_author_list contacts (Idea.students idea) in

     

       293
       293
       +
       

     

       294
       294
       +
         (* Resolve project slug to project title *)

     

       295
       295
       +
         let project_title =

     

       296
       296
       +
           match Entry.lookup entries (Idea.project idea) with

     

       297
       297
       +
           | Some entry -> Entry.title entry

     

       298
       298
       +
           | None -> Idea.project idea (* Fallback to slug if not found *)

     

       299
       299
       +
         in

     

       300
       300
       +
       

     

       301
       301
       +
         let thumbnail_url = Entry.thumbnail entries (`Idea idea) in

     

       302
       302
       +
         dict [

     

       303
       303
       +
           ("id", string idea.slug);

     

       304
       304
       +
           ("title", string (Idea.title idea));

     

       305
       305
       +
           ("description", string description);

     

       306
       306
       +
           ("level", string (Idea.level_to_string (Idea.level idea)));

     

       307
       307
       +
           ("project", string project_title);

     

       308
       308
       +
           ("status", string (Idea.status_to_string (Idea.status idea)));

     

       309
       309
       +
           ("year", int (Idea.year idea));

     

       310
       310
       +
           ("date", string (Printf.sprintf "%04d-01-01" (Idea.year idea)));

     

       311
       311
       +
           ("date_timestamp", int64 date_timestamp);

     

       312
       312
       +
           ("supervisors", list string supervisors);

     

       313
       313
       +
           ("students", list string students);

     

       314
       314
       +
           ("tags", list string idea.tags);

     

       315
       315
       +
           ("thumbnail_url", string (Option.value ~default:"" thumbnail_url));

     

       316
       316
       +
         ]

     

       317
       317
       +
       

     

       318
       318
       +
       (** TODO:claude Helper function to add embedding field to schema *)

     

       319
       319
       +
       let add_embedding_field_to_schema schema config embedding_from_fields =

     

       320
       320
       +
         let open Ezjsonm in

     

       321
       321
       +
         let fields = get_dict schema |> List.assoc "fields" |> get_list (fun f -> f) in

     

       322
       322
       +
         let embedding_field = dict [

     

       323
       323
       +
           ("name", string "embedding");

     

       324
       324
       +
           ("type", string "float[]");

     

       325
       325
       +
           ("embed", dict [

     

       326
       326
       +
             ("from", list string embedding_from_fields);

     

       327
       327
       +
             ("model_config", dict [

     

       328
       328
       +
               ("model_name", string "openai/text-embedding-3-small");

     

       329
       329
       +
               ("api_key", string config.openai_key);

     

       330
       330
       +
             ]);

     

       331
       331
       +
           ]);

     

       332
       332
       +
         ] in

     

       333
       333
       +
         let updated_fields = fields @ [embedding_field] in

     

       334
       334
       +
         let updated_schema = 

     

       335
       335
       +
           List.map (fun (k, v) ->

     

       336
       336
       +
             if k = "fields" then (k, list (fun f -> f) updated_fields)

     

       337
       337
       +
             else (k, v)

     

       338
       338
       +
           ) (get_dict schema)

     

       339
       339
       +
         in

     

       340
       340
       +
         dict updated_schema

     

       341
       341
       +
       

     

       342
       342
       +
       (** TODO:claude Upload all bushel objects to their respective collections *)

     

       343
       343
       +
       let upload_all config entries =

     

       344
       344
       +
         let* () = Lwt_io.write Lwt_io.stdout "Uploading bushel data to Typesense\n" in

     

       345
       345
       +
       

     

       346
       346
       +
         let contacts = Entry.contacts entries in

     

       347
       347
       +
         let papers = Entry.papers entries in

     

       348
       348
       +
         let projects = Entry.projects entries in

     

       349
       349
       +
         let notes = Entry.notes entries in

     

       350
       350
       +
         let videos = Entry.videos entries in

     

       351
       351
       +
         let ideas = Entry.ideas entries in

     

       352
       352
       +
       

     

       353
       353
       +
         let collections = [

     

       354
       354
       +
           ("contacts", add_embedding_field_to_schema Contact.typesense_schema config ["name"; "names"], (List.map contact_to_document contacts : Ezjsonm.value list));

     

       355
       355
       +
           ("papers", add_embedding_field_to_schema Paper.typesense_schema config ["title"; "abstract"; "authors"], (List.map (paper_to_document entries) papers : Ezjsonm.value list));

     

       356
       356
       +
           ("videos", add_embedding_field_to_schema Video.typesense_schema config ["title"; "description"], (List.map (video_to_document entries) videos : Ezjsonm.value list));

     

       357
       357
       +
           ("projects", add_embedding_field_to_schema Project.typesense_schema config ["title"; "description"; "tags"], (List.map (project_to_document entries) projects : Ezjsonm.value list));

     

       358
       358
       +
           ("notes", add_embedding_field_to_schema Note.typesense_schema config ["title"; "content"; "tags"], (List.map (note_to_document entries) notes : Ezjsonm.value list));

     

       359
       359
       +
           ("ideas", add_embedding_field_to_schema Idea.typesense_schema config ["title"; "description"; "tags"], (List.map (idea_to_document entries) ideas : Ezjsonm.value list));

     

       360
       360
       +
         ] in

     

       361
       361
       +
       

     

       362
       362
       +
         let upload_collection ((name, schema, documents) : string * Ezjsonm.value * Ezjsonm.value list) =

     

       363
       363
       +
           let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Processing collection: %s\n" name) in

     

       364
       364
       +
           let* exists = collection_exists config name in

     

       365
       365
       +
           let* () = 

     

       366
       366
       +
             if exists then (

     

       367
       367
       +
               let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Collection %s exists, deleting...\n" name) in

     

       368
       368
       +
               let* result = delete_collection config name in

     

       369
       369
       +
               match result with

     

       370
       370
       +
               | Ok _ -> Lwt_io.write Lwt_io.stdout (Fmt.str "Deleted collection %s\n" name)

     

       371
       371
       +
               | Error err -> 

     

       372
       372
       +
                 let err_str = Fmt.str "%a" pp_error err in

     

       373
       373
       +
                 Lwt_io.write Lwt_io.stdout (Fmt.str "Failed to delete collection %s: %s\n" name err_str)

     

       374
       374
       +
             ) else

     

       375
       375
       +
               Lwt.return_unit

     

       376
       376
       +
           in

     

       377
       377
       +
           let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Creating collection %s with %d documents\n" name (List.length documents)) in

     

       378
       378
       +
           let* result = create_collection config schema in

     

       379
       379
       +
           match result with

     

       380
       380
       +
           | Ok _ ->

     

       381
       381
       +
             let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Created collection %s\n" name) in

     

       382
       382
       +
             if documents = [] then

     

       383
       383
       +
               Lwt_io.write Lwt_io.stdout (Fmt.str "No documents to upload for %s\n" name)

     

       384
       384
       +
             else (

     

       385
       385
       +
               let* result = upload_documents config name documents in

     

       386
       386
       +
               match result with

     

       387
       387
       +
               | Ok response -> 

     

       388
       388
       +
                 (* Count successes and failures *)

     

       389
       389
       +
                 let lines = String.split_on_char '\n' response in

     

       390
       390
       +
                 let successes = List.fold_left (fun acc line -> 

     

       391
       391
       +
                   if String.contains line ':' && Str.string_match (Str.regexp ".*success.*true.*") line 0 then acc + 1 else acc) 0 lines in

     

       392
       392
       +
                 let failures = List.fold_left (fun acc line -> 

     

       393
       393
       +
                   if String.contains line ':' && Str.string_match (Str.regexp ".*success.*false.*") line 0 then acc + 1 else acc) 0 lines in

     

       394
       394
       +
                 let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Upload results for %s: %d successful, %d failed out of %d total\n" 

     

       395
       395
       +
                   name successes failures (List.length documents)) in

     

       396
       396
       +
                 if failures > 0 then

     

       397
       397
       +
                   let* () = Lwt_io.write Lwt_io.stdout (Fmt.str "Failed documents in %s:\n" name) in

     

       398
       398
       +
                   let failed_lines = List.filter (fun line -> Str.string_match (Str.regexp ".*success.*false.*") line 0) lines in

     

       399
       399
       +
                   Lwt_list.iter_s (fun line -> Lwt_io.write Lwt_io.stdout (line ^ "\n")) failed_lines

     

       400
       400
       +
                 else

     

       401
       401
       +
                   Lwt.return_unit

     

       402
       402
       +
               | Error err -> 

     

       403
       403
       +
                 let err_str = Fmt.str "%a" pp_error err in

     

       404
       404
       +
                 Lwt_io.write Lwt_io.stdout (Fmt.str "Failed to upload documents to %s: %s\n" name err_str)

     

       405
       405
       +
             )

     

       406
       406
       +
           | Error err ->

     

       407
       407
       +
             let err_str = Fmt.str "%a" pp_error err in

     

       408
       408
       +
             Lwt_io.write Lwt_io.stdout (Fmt.str "Failed to create collection %s: %s\n" name err_str)

     

       409
       409
       +
         in

     

       410
       410
       +
       

     

       411
       411
       +
         Lwt_list.iter_s upload_collection collections

     

       412
       412
       +
       

     

       413
       413
       +
       (** TODO:claude Re-export search types from Typesense_client *)

     

       414
       414
       +
       type search_result = Typesense_client.search_result = {

     

       415
       415
       +
         id: string;

     

       416
       416
       +
         title: string;

     

       417
       417
       +
         content: string;

     

       418
       418
       +
         score: float;

     

       419
       419
       +
         collection: string;

     

       420
       420
       +
         highlights: (string * string list) list;

     

       421
       421
       +
         document: Ezjsonm.value;

     

       422
       422
       +
       }

     

       423
       423
       +
       

     

       424
       424
       +
       type search_response = Typesense_client.search_response = {

     

       425
       425
       +
         hits: search_result list;

     

       426
       426
       +
         total: int;

     

       427
       427
       +
         query_time: float;

     

       428
       428
       +
       }

     

       429
       429
       +
       

     

       430
       430
       +
       (** TODO:claude Convert bushel config to client config *)

     

       431
       431
       +
       let to_client_config (config : config) =

     

       432
       432
       +
         Typesense_client.{ endpoint = config.endpoint; api_key = config.api_key }

     

       433
       433
       +
       

     

       434
       434
       +
       (** TODO:claude Search a single collection *)

     

       435
       435
       +
       let search_collection (config : config) collection_name query ?(limit=10) ?(offset=0) () =

     

       436
       436
       +
         let client_config = to_client_config config in

     

       437
       437
       +
         let* result = Typesense_client.search_collection client_config collection_name query ~limit ~offset () in

     

       438
       438
       +
         match result with

     

       439
       439
       +
         | Ok response -> Lwt.return_ok response

     

       440
       440
       +
         | Error (Typesense_client.Http_error (code, msg)) -> Lwt.return_error (Http_error (code, msg))

     

       441
       441
       +
         | Error (Typesense_client.Json_error msg) -> Lwt.return_error (Json_error msg)

     

       442
       442
       +
         | Error (Typesense_client.Connection_error msg) -> Lwt.return_error (Connection_error msg)

     

       443
       443
       +
       

     

       444
       444
       +
       (** TODO:claude Search across all collections - use client multisearch *)

     

       445
       445
       +
       let search_all (config : config) query ?(limit=10) ?(offset=0) () =

     

       446
       446
       +
         let client_config = to_client_config config in

     

       447
       447
       +
         let* result = Typesense_client.multisearch client_config query ~limit:50 () in

     

       448
       448
       +
         match result with

     

       449
       449
       +
         | Ok multisearch_resp ->

     

       450
       450
       +
           let combined_response = Typesense_client.combine_multisearch_results multisearch_resp ~limit ~offset () in

     

       451
       451
       +
           Lwt.return_ok combined_response

     

       452
       452
       +
         | Error (Typesense_client.Http_error (code, msg)) -> Lwt.return_error (Http_error (code, msg))

     

       453
       453
       +
         | Error (Typesense_client.Json_error msg) -> Lwt.return_error (Json_error msg)

     

       454
       454
       +
         | Error (Typesense_client.Connection_error msg) -> Lwt.return_error (Connection_error msg)

     

       455
       455
       +
       

     

       456
       456
       +
       (** TODO:claude List all collections *)

     

       457
       457
       +
       let list_collections (config : config) =

     

       458
       458
       +
         let client_config = to_client_config config in

     

       459
       459
       +
         let* result = Typesense_client.list_collections client_config in

     

       460
       460
       +
         match result with

     

       461
       461
       +
         | Ok collections -> Lwt.return_ok collections

     

       462
       462
       +
         | Error (Typesense_client.Http_error (code, msg)) -> Lwt.return_error (Http_error (code, msg))

     

       463
       463
       +
         | Error (Typesense_client.Json_error msg) -> Lwt.return_error (Json_error msg)

     

       464
       464
       +
         | Error (Typesense_client.Connection_error msg) -> Lwt.return_error (Connection_error msg)

     

       465
       465
       +
       

     

       466
       466
       +
       (** TODO:claude Re-export multisearch types from Typesense_client *)

     

       467
       467
       +
       type multisearch_response = Typesense_client.multisearch_response = {

     

       468
       468
       +
         results: search_response list;

     

       469
       469
       +
       }

     

       470
       470
       +
       

     

       471
       471
       +
       (** TODO:claude Perform multisearch across all collections *)

     

       472
       472
       +
       let multisearch (config : config) query ?(limit=10) () =

     

       473
       473
       +
         let client_config = to_client_config config in

     

       474
       474
       +
         let* result = Typesense_client.multisearch client_config query ~limit () in

     

       475
       475
       +
         match result with

     

       476
       476
       +
         | Ok multisearch_resp -> Lwt.return_ok multisearch_resp

     

       477
       477
       +
         | Error (Typesense_client.Http_error (code, msg)) -> Lwt.return_error (Http_error (code, msg))

     

       478
       478
       +
         | Error (Typesense_client.Json_error msg) -> Lwt.return_error (Json_error msg)

     

       479
       479
       +
         | Error (Typesense_client.Connection_error msg) -> Lwt.return_error (Connection_error msg)

     

       480
       480
       +
       

     

       481
       481
       +
       (** TODO:claude Combine multisearch results into single result set *)

     

       482
       482
       +
       let combine_multisearch_results (multisearch_resp : multisearch_response) ?(limit=10) ?(offset=0) () =

     

       483
       483
       +
         Typesense_client.combine_multisearch_results multisearch_resp ~limit ~offset ()

     

       484
       484
       +
       

     

       485
       485
       +
       (** TODO:claude Load configuration from files *)

     

       486
       486
       +
       let load_config_from_files () =

     

       487
       487
       +
         let read_file_if_exists filename =

     

       488
       488
       +
           if Sys.file_exists filename then

     

       489
       489
       +
             let ic = open_in filename in

     

       490
       490
       +
             let content = really_input_string ic (in_channel_length ic) in

     

       491
       491
       +
             close_in ic;

     

       492
       492
       +
             Some (String.trim content)

     

       493
       493
       +
           else None

     

       494
       494
       +
         in

     

       495
       495
       +
         

     

       496
       496
       +
         let endpoint = match read_file_if_exists ".typesense-url" with

     

       497
       497
       +
           | Some url -> url

     

       498
       498
       +
           | None -> "http://localhost:8108"

     

       499
       499
       +
         in

     

       500
       500
       +
         

     

       501
       501
       +
         let api_key = match read_file_if_exists ".typesense-key" with

     

       502
       502
       +
           | Some key -> key

     

       503
       503
       +
           | None -> 

     

       504
       504
       +
             try Sys.getenv "TYPESENSE_API_KEY"

     

       505
       505
       +
             with Not_found -> ""

     

       506
       506
       +
         in

     

       507
       507
       +
         

     

       508
       508
       +
         let openai_key = match read_file_if_exists ".openrouter-api" with

     

       509
       509
       +
           | Some key -> key

     

       510
       510
       +
           | None -> 

     

       511
       511
       +
             try Sys.getenv "OPENAI_API_KEY"

     

       512
       512
       +
             with Not_found -> ""

     

       513
       513
       +
         in

     

       514
       514
       +
         

     

       515
       515
       +
         { endpoint; api_key; openai_key }

     

       516
       516
       +
       

     

       517
       517
       +
       (** TODO:claude Re-export pretty printer from Typesense_client *)

     

       518
       518
       +
       let pp_search_result_oneline = Typesense_client.pp_search_result_oneline

+123

stack/bushel/lib/typesense.mli

···

       1
       1
       +
       (** Typesense API client for Bushel

     

       2
       2
       +
           

     

       3
       3
       +
           This module provides an OCaml client for the Typesense search engine API.

     

       4
       4
       +
           It handles collection management and document indexing for all Bushel object

     

       5
       5
       +
           types including contacts, papers, projects, news, videos, notes, and ideas.

     

       6
       6
       +
           

     

       7
       7
       +
           Example usage:

     

       8
       8
       +
           {[

     

       9
       9
       +
             let config = { endpoint = "https://search.example.com"; api_key = "xyz123" } in

     

       10
       10
       +
             Lwt_main.run (Typesense.upload_all config "/path/to/bushel/data")

     

       11
       11
       +
           ]}

     

       12
       12
       +
           

     

       13
       13
       +
           TODO:claude *)

     

       14
       14
       +
       

     

       15
       15
       +
       (** Configuration for connecting to a Typesense server *)

     

       16
       16
       +
       type config = {

     

       17
       17
       +
         endpoint : string;    (** Typesense server URL (e.g., "https://search.example.com") *)

     

       18
       18
       +
         api_key : string;     (** API key for authentication *)

     

       19
       19
       +
         openai_key : string;  (** OpenAI API key for embeddings *)

     

       20
       20
       +
       }

     

       21
       21
       +
       

     

       22
       22
       +
       (** Possible errors that can occur during Typesense operations *)

     

       23
       23
       +
       type error = 

     

       24
       24
       +
         | Http_error of int * string        (** HTTP error with status code and message *)

     

       25
       25
       +
         | Json_error of string              (** JSON parsing or encoding error *)

     

       26
       26
       +
         | Connection_error of string        (** Network connection error *)

     

       27
       27
       +
       

     

       28
       28
       +
       (** Pretty-printer for error types *)

     

       29
       29
       +
       val pp_error : Format.formatter -> error -> unit

     

       30
       30
       +
       

     

       31
       31
       +
       (** Create a collection with the given schema. 

     

       32
       32
       +
           The schema should follow Typesense's collection schema format.

     

       33
       33
       +
           TODO:claude *)

     

       34
       34
       +
       val create_collection : config -> Ezjsonm.value -> (string, error) result Lwt.t

     

       35
       35
       +
       

     

       36
       36
       +
       (** Check if a collection exists by name. 

     

       37
       37
       +
           Returns true if the collection exists, false otherwise.

     

       38
       38
       +
           TODO:claude *)

     

       39
       39
       +
       val collection_exists : config -> string -> bool Lwt.t

     

       40
       40
       +
       

     

       41
       41
       +
       (** Delete a collection by name.

     

       42
       42
       +
           TODO:claude *)

     

       43
       43
       +
       val delete_collection : config -> string -> (string, error) result Lwt.t

     

       44
       44
       +
       

     

       45
       45
       +
       (** Upload documents to a collection in batch using JSONL format.

     

       46
       46
       +
           More efficient than uploading documents one by one.

     

       47
       47
       +
           TODO:claude *)

     

       48
       48
       +
       val upload_documents : config -> string -> Ezjsonm.value list -> (string, error) result Lwt.t

     

       49
       49
       +
       

     

       50
       50
       +
       (** Upload all bushel objects to Typesense.

     

       51
       51
       +
           This function will:

     

       52
       52
       +
           - Extract all bushel data types from the Entry.t

     

       53
       53
       +
           - Create or recreate collections for each type

     

       54
       54
       +
           - Upload all documents in batches

     

       55
       55
       +
           - Report progress to stdout

     

       56
       56
       +
           TODO:claude *)

     

       57
       57
       +
       val upload_all : config -> Entry.t -> unit Lwt.t

     

       58
       58
       +
       

     

       59
       59
       +
       (** Search result structure containing document information and relevance score *)

     

       60
       60
       +
       type search_result = {

     

       61
       61
       +
         id: string;                               (** Document ID *)

     

       62
       62
       +
         title: string;                            (** Document title *)

     

       63
       63
       +
         content: string;                          (** Document content/description *)

     

       64
       64
       +
         score: float;                             (** Relevance score *)

     

       65
       65
       +
         collection: string;                       (** Collection name *)

     

       66
       66
       +
         highlights: (string * string list) list; (** Highlighted search terms by field *)

     

       67
       67
       +
         document: Ezjsonm.value;                  (** Raw document for flexible field access *)

     

       68
       68
       +
       }

     

       69
       69
       +
       

     

       70
       70
       +
       (** Search response containing results and metadata *)

     

       71
       71
       +
       type search_response = {

     

       72
       72
       +
         hits: search_result list;                 (** List of matching documents *)

     

       73
       73
       +
         total: int;                               (** Total number of matches *)

     

       74
       74
       +
         query_time: float;                        (** Query execution time in milliseconds *)

     

       75
       75
       +
       }

     

       76
       76
       +
       

     

       77
       77
       +
       (** Search a specific collection.

     

       78
       78
       +
           TODO:claude *)

     

       79
       79
       +
       val search_collection : config -> string -> string -> ?limit:int -> ?offset:int -> unit -> (search_response, error) result Lwt.t

     

       80
       80
       +
       

     

       81
       81
       +
       (** Search across all bushel collections.

     

       82
       82
       +
           Results are sorted by relevance score and paginated.

     

       83
       83
       +
           TODO:claude *)

     

       84
       84
       +
       val search_all : config -> string -> ?limit:int -> ?offset:int -> unit -> (search_response, error) result Lwt.t

     

       85
       85
       +
       

     

       86
       86
       +
       (** Multisearch response containing results from multiple collections *)

     

       87
       87
       +
       type multisearch_response = {

     

       88
       88
       +
         results: search_response list;                (** Results from each collection *)

     

       89
       89
       +
       }

     

       90
       90
       +
       

     

       91
       91
       +
       (** Perform multisearch across all collections using Typesense's multi_search endpoint.

     

       92
       92
       +
           More efficient than individual searches as it's done in a single request.

     

       93
       93
       +
           TODO:claude *)

     

       94
       94
       +
       val multisearch : config -> string -> ?limit:int -> unit -> (multisearch_response, error) result Lwt.t

     

       95
       95
       +
       

     

       96
       96
       +
       (** Combine multisearch results into a single result set.

     

       97
       97
       +
           Results are sorted by relevance score and paginated.

     

       98
       98
       +
           TODO:claude *)

     

       99
       99
       +
       val combine_multisearch_results : multisearch_response -> ?limit:int -> ?offset:int -> unit -> search_response

     

       100
       100
       +
       

     

       101
       101
       +
       (** List all collections with document counts.

     

       102
       102
       +
           Returns a list of (collection_name, document_count) pairs.

     

       103
       103
       +
           TODO:claude *)

     

       104
       104
       +
       val list_collections : config -> ((string * int) list, error) result Lwt.t

     

       105
       105
       +
       

     

       106
       106
       +
       (** Load configuration from .typesense-url and .typesense-api files.

     

       107
       107
       +
           Falls back to environment variables and defaults.

     

       108
       108
       +
           TODO:claude *)

     

       109
       109
       +
       val load_config_from_files : unit -> config

     

       110
       110
       +
       

     

       111
       111
       +
       (** Pretty-print a search result in a one-line format with relevant information.

     

       112
       112
       +
           Shows different fields based on the collection type (papers, videos, etc.).

     

       113
       113
       +
           TODO:claude *)

     

       114
       114
       +
       val pp_search_result_oneline : search_result -> string

     

       115
       115
       +
       

     

       116
       116
       +
       (** Convert Bushel objects to Typesense documents *)

     

       117
       117
       +
       

     

       118
       118
       +
       val contact_to_document : Contact.t -> Ezjsonm.value

     

       119
       119
       +
       val paper_to_document : Entry.t -> Paper.t -> Ezjsonm.value

     

       120
       120
       +
       val project_to_document : Entry.t -> Project.t -> Ezjsonm.value

     

       121
       121
       +
       val video_to_document : Entry.t -> Video.t -> Ezjsonm.value

     

       122
       122
       +
       val note_to_document : Entry.t -> Note.t -> Ezjsonm.value

     

       123
       123
       +
       val idea_to_document : Entry.t -> Idea.t -> Ezjsonm.value

+80

stack/bushel/lib/util.ml

···

       1
       1
       +
       let first_hunk s =

     

       2
       2
       +
         let lines = String.split_on_char '\n' s in

     

       3
       3
       +
         let rec aux acc = function

     

       4
       4
       +
           | [] -> String.concat "\n" (List.rev acc)

     

       5
       5
       +
           | "" :: "" :: _ -> String.concat "\n" (List.rev acc)

     

       6
       6
       +
           | line :: rest -> aux (line :: acc) rest

     

       7
       7
       +
         in

     

       8
       8
       +
         aux [] lines

     

       9
       9
       +
       ;;

     

       10
       10
       +
       

     

       11
       11
       +
       let first_and_last_hunks s =

     

       12
       12
       +
         let lines = String.split_on_char '\n' s in

     

       13
       13
       +
         let rec aux acc = function

     

       14
       14
       +
           | [] -> String.concat "\n" (List.rev acc), ""

     

       15
       15
       +
           | "" :: "" :: rest ->

     

       16
       16
       +
             String.concat "\n" (List.rev acc), String.concat "\n" (List.rev rest)

     

       17
       17
       +
           | line :: rest -> aux (line :: acc) rest

     

       18
       18
       +
         in

     

       19
       19
       +
         aux [] lines

     

       20
       20
       +
       ;;

     

       21
       21
       +
       

     

       22
       22
       +
       (* Find all footnote definition lines in text *)

     

       23
       23
       +
       let find_footnote_lines s =

     

       24
       24
       +
         let lines = String.split_on_char '\n' s in

     

       25
       25
       +
         let is_footnote_def line =

     

       26
       26
       +
           String.length line > 3 &&

     

       27
       27
       +
           line.[0] = '[' &&

     

       28
       28
       +
           line.[1] = '^' &&

     

       29
       29
       +
           String.contains line ':' &&

     

       30
       30
       +
           let colon_pos = String.index line ':' in

     

       31
       31
       +
           colon_pos > 2 && line.[colon_pos - 1] = ']'

     

       32
       32
       +
         in

     

       33
       33
       +
         let is_continuation line =

     

       34
       34
       +
           String.length line > 0 && (line.[0] = ' ' || line.[0] = '\t')

     

       35
       35
       +
         in

     

       36
       36
       +
         let rec collect_footnotes acc in_footnote = function

     

       37
       37
       +
           | [] -> List.rev acc

     

       38
       38
       +
           | line :: rest ->

     

       39
       39
       +
             if is_footnote_def line then

     

       40
       40
       +
               collect_footnotes (line :: acc) true rest

     

       41
       41
       +
             else if in_footnote && is_continuation line then

     

       42
       42
       +
               collect_footnotes (line :: acc) true rest

     

       43
       43
       +
             else

     

       44
       44
       +
               collect_footnotes acc false rest

     

       45
       45
       +
         in

     

       46
       46
       +
         collect_footnotes [] false lines

     

       47
       47
       +
       ;;

     

       48
       48
       +
       

     

       49
       49
       +
       (* Augment first hunk with footnote definitions from last hunk *)

     

       50
       50
       +
       let first_hunk_with_footnotes s =

     

       51
       51
       +
         let first, last = first_and_last_hunks s in

     

       52
       52
       +
         let footnote_lines = find_footnote_lines last in

     

       53
       53
       +
         if footnote_lines = [] then first

     

       54
       54
       +
         else first ^ "\n\n" ^ String.concat "\n" footnote_lines

     

       55
       55
       +
       ;;

     

       56
       56
       +
       

     

       57
       57
       +
       let count_words (text : string) : int =

     

       58
       58
       +
         let len = String.length text in

     

       59
       59
       +
         let rec count_words_helper (index : int) (in_word : bool) (count : int) : int =

     

       60
       60
       +
           if index >= len

     

       61
       61
       +
           then if in_word then count + 1 else count

     

       62
       62
       +
           else (

     

       63
       63
       +
             let char = String.get text index in

     

       64
       64
       +
             let is_whitespace =

     

       65
       65
       +
               Char.equal char ' '

     

       66
       66
       +
               || Char.equal char '\t'

     

       67
       67
       +
               || Char.equal char '\n'

     

       68
       68
       +
               || Char.equal char '\r'

     

       69
       69
       +
             in

     

       70
       70
       +
             if is_whitespace

     

       71
       71
       +
             then

     

       72
       72
       +
               if in_word

     

       73
       73
       +
               then count_words_helper (index + 1) false (count + 1)

     

       74
       74
       +
               else count_words_helper (index + 1) false count

     

       75
       75
       +
             else count_words_helper (index + 1) true count)

     

       76
       76
       +
         in

     

       77
       77
       +
         count_words_helper 0 false 0

     

       78
       78
       +
       ;;

     

       79
       79
       +
       

     

       80
       80
       +
       let read_file file = In_channel.(with_open_bin file input_all)

+166

stack/bushel/lib/video.ml

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { slug : string

     

       3
       3
       +
         ; title : string

     

       4
       4
       +
         ; published_date : Ptime.t

     

       5
       5
       +
         ; uuid : string

     

       6
       6
       +
         ; description : string

     

       7
       7
       +
         ; url : string

     

       8
       8
       +
         ; talk : bool

     

       9
       9
       +
         ; paper : string option

     

       10
       10
       +
         ; project : string option

     

       11
       11
       +
         ; tags : string list

     

       12
       12
       +
         }

     

       13
       13
       +
       

     

       14
       14
       +
       type ts = t list

     

       15
       15
       +
       

     

       16
       16
       +
       let get_shadow fs k =

     

       17
       17
       +
         match List.assoc_opt k fs with

     

       18
       18
       +
         | Some v -> Some v

     

       19
       19
       +
         | None -> List.assoc_opt ("_" ^ k) fs

     

       20
       20
       +
       ;;

     

       21
       21
       +
       

     

       22
       22
       +
       let get_shadow_string fs k =

     

       23
       23
       +
         match get_shadow fs k with

     

       24
       24
       +
         | Some (`String v) -> v

     

       25
       25
       +
         | _ -> failwith "invalid yaml"

     

       26
       26
       +
       ;;

     

       27
       27
       +
       

     

       28
       28
       +
       let get_shadow_bool fs k =

     

       29
       29
       +
         match get_shadow fs k with

     

       30
       30
       +
         | Some (`Bool v) -> v

     

       31
       31
       +
         | _ -> failwith "invalid yaml"

     

       32
       32
       +
       ;;

     

       33
       33
       +
       

     

       34
       34
       +
       let compare a b = Ptime.compare b.published_date a.published_date

     

       35
       35
       +
       let url v = v.url

     

       36
       36
       +
       let body { description; _ } = description

     

       37
       37
       +
       let title { title; _ } = title

     

       38
       38
       +
       let uuid { uuid; _ } = uuid

     

       39
       39
       +
       let paper { paper; _ } = paper

     

       40
       40
       +
       let project { project; _ } = project

     

       41
       41
       +
       let slug { slug; _ } = slug

     

       42
       42
       +
       let date { published_date; _ } = published_date |> Ptime.to_date

     

       43
       43
       +
       let datetime { published_date; _ } = published_date

     

       44
       44
       +
       let talk { talk; _ } = talk

     

       45
       45
       +
       

     

       46
       46
       +
       let t_of_yaml ~description = function

     

       47
       47
       +
         | `O fields ->

     

       48
       48
       +
           let slug = get_shadow_string fields "uuid" in

     

       49
       49
       +
           let title = get_shadow_string fields "title" in

     

       50
       50
       +
           let published_date =

     

       51
       51
       +
             get_shadow_string fields "published_date"

     

       52
       52
       +
             |> Ptime.of_rfc3339

     

       53
       53
       +
             |> Result.get_ok

     

       54
       54
       +
             |> fun (a, _, _) -> a

     

       55
       55
       +
           in

     

       56
       56
       +
           let uuid = get_shadow_string fields "uuid" in

     

       57
       57
       +
           let url = get_shadow_string fields "url" in

     

       58
       58
       +
           let talk =

     

       59
       59
       +
             try get_shadow_bool fields "talk" with

     

       60
       60
       +
             | _ -> false

     

       61
       61
       +
           in

     

       62
       62
       +
           let tags =

     

       63
       63
       +
             match List.assoc_opt "tags" fields with

     

       64
       64
       +
             | Some l -> Ezjsonm.get_list Ezjsonm.get_string l

     

       65
       65
       +
             | _ -> []

     

       66
       66
       +
           in

     

       67
       67
       +
           let paper =

     

       68
       68
       +
             try Some (get_shadow_string fields "paper") with

     

       69
       69
       +
             | _ -> None

     

       70
       70
       +
           in

     

       71
       71
       +
           let project =

     

       72
       72
       +
             try Some (get_shadow_string fields "project") with

     

       73
       73
       +
             | _ -> None

     

       74
       74
       +
           in

     

       75
       75
       +
           { slug; title; tags; published_date; uuid; description; talk; paper; project; url }

     

       76
       76
       +
         | _ -> failwith "invalid yaml"

     

       77
       77
       +
       ;;

     

       78
       78
       +
       

     

       79
       79
       +
       let to_yaml t =

     

       80
       80
       +
         `O [

     

       81
       81
       +
           ("title", `String t.title);

     

       82
       82
       +
           ("description", `String t.description);

     

       83
       83
       +
           ("url", `String t.url);

     

       84
       84
       +
           ("uuid", `String t.uuid);

     

       85
       85
       +
           ("slug", `String t.slug);

     

       86
       86
       +
           ("published_date", `String (Ptime.to_rfc3339 t.published_date));

     

       87
       87
       +
           ("talk", `Bool t.talk);

     

       88
       88
       +
           ("tags", `A (List.map (fun t -> `String t) t.tags));

     

       89
       89
       +
           ("paper", match t.paper with None -> `Null | Some p -> `String p);

     

       90
       90
       +
           ("project", match t.project with None -> `Null | Some p -> `String p)

     

       91
       91
       +
         ]

     

       92
       92
       +
       

     

       93
       93
       +
       let to_file output_dir t =

     

       94
       94
       +
         let file_path = Fpath.v (Filename.concat output_dir (t.uuid ^ ".md")) in

     

       95
       95
       +
         let yaml = to_yaml t in

     

       96
       96
       +
         let yaml_str = Yaml.to_string_exn yaml in

     

       97
       97
       +
         let content = "---\n" ^ yaml_str ^ "---\n" in

     

       98
       98
       +
         Bos.OS.File.write file_path content

     

       99
       99
       +
       ;;

     

       100
       100
       +
       

     

       101
       101
       +
       let of_md fname =

     

       102
       102
       +
         (* TODO fix Jekyll_post to not error on no date *)

     

       103
       103
       +
         let fname' = "2000-01-01-" ^ Filename.basename fname in

     

       104
       104
       +
         match Jekyll_post.of_string ~fname:fname' (Util.read_file fname) with

     

       105
       105
       +
         | Error (`Msg m) -> failwith ("paper_of_md: " ^ m)

     

       106
       106
       +
         | Ok jp ->

     

       107
       107
       +
           let fields = jp.Jekyll_post.fields |> Jekyll_format.fields_to_yaml in

     

       108
       108
       +
           let { Jekyll_post.body; _ } = jp in

     

       109
       109
       +
           t_of_yaml ~description:body fields

     

       110
       110
       +
       ;;

     

       111
       111
       +
       

     

       112
       112
       +
       (* TODO:claude *)

     

       113
       113
       +
       let typesense_schema =

     

       114
       114
       +
         let open Ezjsonm in

     

       115
       115
       +
         dict [

     

       116
       116
       +
           ("name", string "videos");

     

       117
       117
       +
           ("fields", list (fun d -> dict d) [

     

       118
       118
       +
             [("name", string "id"); ("type", string "string")];

     

       119
       119
       +
             [("name", string "title"); ("type", string "string")];

     

       120
       120
       +
             [("name", string "description"); ("type", string "string")];

     

       121
       121
       +
             [("name", string "published_date"); ("type", string "string")];

     

       122
       122
       +
             [("name", string "date"); ("type", string "string")];

     

       123
       123
       +
             [("name", string "date_timestamp"); ("type", string "int64")];

     

       124
       124
       +
             [("name", string "tags"); ("type", string "string[]"); ("facet", bool true)];

     

       125
       125
       +
             [("name", string "url"); ("type", string "string")];

     

       126
       126
       +
             [("name", string "uuid"); ("type", string "string")];

     

       127
       127
       +
             [("name", string "is_talk"); ("type", string "bool")];

     

       128
       128
       +
             [("name", string "paper"); ("type", string "string[]"); ("optional", bool true)];

     

       129
       129
       +
             [("name", string "project"); ("type", string "string[]"); ("optional", bool true)];

     

       130
       130
       +
             [("name", string "video_url"); ("type", string "string"); ("optional", bool true)];

     

       131
       131
       +
             [("name", string "embed_url"); ("type", string "string"); ("optional", bool true)];

     

       132
       132
       +
             [("name", string "duration"); ("type", string "int32"); ("optional", bool true)];

     

       133
       133
       +
             [("name", string "channel"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       134
       134
       +
             [("name", string "platform"); ("type", string "string"); ("facet", bool true); ("optional", bool true)];

     

       135
       135
       +
             [("name", string "views"); ("type", string "int32"); ("optional", bool true)];

     

       136
       136
       +
             [("name", string "related_papers"); ("type", string "string[]"); ("optional", bool true)];

     

       137
       137
       +
             [("name", string "related_talks"); ("type", string "string[]"); ("optional", bool true)];

     

       138
       138
       +
           ]);

     

       139
       139
       +
           ("default_sorting_field", string "date_timestamp");

     

       140
       140
       +
         ]

     

       141
       141
       +
       

     

       142
       142
       +
       (** TODO:claude Pretty-print a video with ANSI formatting *)

     

       143
       143
       +
       let pp ppf v =

     

       144
       144
       +
         let open Fmt in

     

       145
       145
       +
         pf ppf "@[<v>";

     

       146
       146
       +
         pf ppf "%a: %a@," (styled `Bold string) "Type" (styled `Cyan string) "Video";

     

       147
       147
       +
         pf ppf "%a: %a@," (styled `Bold string) "Slug" string (slug v);

     

       148
       148
       +
         pf ppf "%a: %a@," (styled `Bold string) "UUID" string (uuid v);

     

       149
       149
       +
         pf ppf "%a: %a@," (styled `Bold string) "Title" string (title v);

     

       150
       150
       +
         let (year, month, day) = date v in

     

       151
       151
       +
         pf ppf "%a: %04d-%02d-%02d@," (styled `Bold string) "Date" year month day;

     

       152
       152
       +
         pf ppf "%a: %a@," (styled `Bold string) "URL" string (url v);

     

       153
       153
       +
         pf ppf "%a: %b@," (styled `Bold string) "Talk" (talk v);

     

       154
       154
       +
         (match paper v with

     

       155
       155
       +
          | Some p -> pf ppf "%a: %a@," (styled `Bold string) "Paper" string p

     

       156
       156
       +
          | None -> ());

     

       157
       157
       +
         (match project v with

     

       158
       158
       +
          | Some p -> pf ppf "%a: %a@," (styled `Bold string) "Project" string p

     

       159
       159
       +
          | None -> ());

     

       160
       160
       +
         let t = v.tags in

     

       161
       161
       +
         if t <> [] then

     

       162
       162
       +
           pf ppf "%a: @[<h>%a@]@," (styled `Bold string) "Tags" (list ~sep:comma string) t;

     

       163
       163
       +
         pf ppf "@,";

     

       164
       164
       +
         pf ppf "%a:@," (styled `Bold string) "Description";

     

       165
       165
       +
         pf ppf "%a@," string v.description;

     

       166
       166
       +
         pf ppf "@]"

+32

stack/bushel/lib/video.mli

···

       1
       1
       +
       type t =

     

       2
       2
       +
         { slug : string

     

       3
       3
       +
         ; title : string

     

       4
       4
       +
         ; published_date : Ptime.t

     

       5
       5
       +
         ; uuid : string

     

       6
       6
       +
         ; description : string

     

       7
       7
       +
         ; url : string

     

       8
       8
       +
         ; talk : bool

     

       9
       9
       +
         ; paper : string option

     

       10
       10
       +
         ; project : string option

     

       11
       11
       +
         ; tags : string list

     

       12
       12
       +
         }

     

       13
       13
       +
       

     

       14
       14
       +
       type ts = t list

     

       15
       15
       +
       

     

       16
       16
       +
       val compare : t -> t -> int

     

       17
       17
       +
       val url : t -> string

     

       18
       18
       +
       val body : t -> string

     

       19
       19
       +
       val title : t -> string

     

       20
       20
       +
       val uuid : t -> string

     

       21
       21
       +
       val paper : t -> string option

     

       22
       22
       +
       val project : t -> string option

     

       23
       23
       +
       val slug : t -> string

     

       24
       24
       +
       val date : t -> Ptime.date

     

       25
       25
       +
       val datetime : t -> Ptime.t

     

       26
       26
       +
       val talk : t -> bool

     

       27
       27
       +
       val of_md : string -> t

     

       28
       28
       +
       val t_of_yaml : description:string -> Yaml.value -> t

     

       29
       29
       +
       val to_yaml : t -> Yaml.value

     

       30
       30
       +
       val to_file : string -> t -> (unit, [> `Msg of string]) result

     

       31
       31
       +
       val typesense_schema : Ezjsonm.value

     

       32
       32
       +
       val pp : Format.formatter -> t -> unit

+34

stack/bushel/peertube.opam

···

       1
       1
       +
       # This file is generated by dune, edit dune-project instead

     

       2
       2
       +
       opam-version: "2.0"

     

       3
       3
       +
       synopsis: "PeerTube API client"

     

       4
       4
       +
       description: "Client for interacting with PeerTube instances"

     

       5
       5
       +
       maintainer: ["anil@recoil.org"]

     

       6
       6
       +
       authors: ["Anil Madhavapeddy"]

     

       7
       7
       +
       license: "ISC"

     

       8
       8
       +
       homepage: "https://github.com/avsm/bushel"

     

       9
       9
       +
       bug-reports: "https://github.com/avsm/bushel/issues"

     

       10
       10
       +
       depends: [

     

       11
       11
       +
         "dune" {>= "3.17"}

     

       12
       12
       +
         "ocaml" {>= "5.2.0"}

     

       13
       13
       +
         "ezjsonm"

     

       14
       14
       +
         "lwt"

     

       15
       15
       +
         "cohttp-lwt-unix"

     

       16
       16
       +
         "ptime"

     

       17
       17
       +
         "fmt"

     

       18
       18
       +
         "odoc" {with-doc}

     

       19
       19
       +
       ]

     

       20
       20
       +
       build: [

     

       21
       21
       +
         ["dune" "subst"] {dev}

     

       22
       22
       +
         [

     

       23
       23
       +
           "dune"

     

       24
       24
       +
           "build"

     

       25
       25
       +
           "-p"

     

       26
       26
       +
           name

     

       27
       27
       +
           "-j"

     

       28
       28
       +
           jobs

     

       29
       29
       +
           "@install"

     

       30
       30
       +
           "@runtest" {with-test}

     

       31
       31
       +
           "@doc" {with-doc}

     

       32
       32
       +
         ]

     

       33
       33
       +
       ]

     

       34
       34
       +
       dev-repo: "git+https://github.com/avsm/bushel.git"

stack/bushel/peertube/dune

···

       1
       1
       +
       (library

     

       2
       2
       +
        (name peertube)

     

       3
       3
       +
        (public_name peertube)

     

       4
       4
       +
        (libraries ezjsonm lwt cohttp-lwt-unix ptime fmt))

+191

stack/bushel/peertube/peertube.ml

···

       1
       1
       +
       (** PeerTube API client implementation

     

       2
       2
       +
           TODO:claude *)

     

       3
       3
       +
       

     

       4
       4
       +
       open Lwt.Infix

     

       5
       5
       +
       

     

       6
       6
       +
       module J = Ezjsonm

     

       7
       7
       +
       

     

       8
       8
       +
       (** Type representing a PeerTube video *)

     

       9
       9
       +
       type video = {

     

       10
       10
       +
         id: int;

     

       11
       11
       +
         uuid: string;

     

       12
       12
       +
         name: string;

     

       13
       13
       +
         description: string option;

     

       14
       14
       +
         url: string;

     

       15
       15
       +
         embed_path: string;

     

       16
       16
       +
         published_at: Ptime.t;

     

       17
       17
       +
         originally_published_at: Ptime.t option;

     

       18
       18
       +
         thumbnail_path: string option;

     

       19
       19
       +
         tags: string list;

     

       20
       20
       +
       }

     

       21
       21
       +
       

     

       22
       22
       +
       (** Type for PeerTube API response containing videos *)

     

       23
       23
       +
       type video_response = {

     

       24
       24
       +
         total: int;

     

       25
       25
       +
         data: video list;

     

       26
       26
       +
       }

     

       27
       27
       +
       

     

       28
       28
       +
       (** Parse a date string to Ptime.t, defaulting to epoch if invalid *)

     

       29
       29
       +
       let parse_date str =

     

       30
       30
       +
         match Ptime.of_rfc3339 str with

     

       31
       31
       +
         | Ok (date, _, _) -> date

     

       32
       32
       +
         | Error _ -> 

     

       33
       33
       +
             Fmt.epr "Warning: could not parse date '%s'\n" str;

     

       34
       34
       +
             (* Default to epoch time *)

     

       35
       35
       +
             let span_opt = Ptime.Span.of_d_ps (0, 0L) in

     

       36
       36
       +
             match span_opt with

     

       37
       37
       +
             | None -> failwith "Internal error: couldn't create epoch time span"

     

       38
       38
       +
             | Some span ->

     

       39
       39
       +
               match Ptime.of_span span with

     

       40
       40
       +
               | Some t -> t

     

       41
       41
       +
               | None -> failwith "Internal error: couldn't create epoch time"

     

       42
       42
       +
       

     

       43
       43
       +
       (** Extract a string field from JSON, returns None if not present or not a string *)

     

       44
       44
       +
       let get_string_opt json path =

     

       45
       45
       +
         try Some (J.find json path |> J.get_string)

     

       46
       46
       +
         with _ -> None

     

       47
       47
       +
       

     

       48
       48
       +
       (** Extract a string list field from JSON, returns empty list if not present *)

     

       49
       49
       +
       let get_string_list json path =

     

       50
       50
       +
         try 

     

       51
       51
       +
           let tags_json = J.find json path in

     

       52
       52
       +
           J.get_list J.get_string tags_json

     

       53
       53
       +
         with _ -> []

     

       54
       54
       +
       

     

       55
       55
       +
       (** Parse a single video from PeerTube JSON *)

     

       56
       56
       +
       let parse_video json =

     

       57
       57
       +
         let id = J.find json ["id"] |> J.get_int in

     

       58
       58
       +
         let uuid = J.find json ["uuid"] |> J.get_string in

     

       59
       59
       +
         let name = J.find json ["name"] |> J.get_string in

     

       60
       60
       +
         let description = get_string_opt json ["description"] in

     

       61
       61
       +
         let url = J.find json ["url"] |> J.get_string in

     

       62
       62
       +
         let embed_path = J.find json ["embedPath"] |> J.get_string in

     

       63
       63
       +
         

     

       64
       64
       +
         (* Parse dates *)

     

       65
       65
       +
         let published_at = 

     

       66
       66
       +
           J.find json ["publishedAt"] |> J.get_string |> parse_date 

     

       67
       67
       +
         in

     

       68
       68
       +
         

     

       69
       69
       +
         let originally_published_at =

     

       70
       70
       +
           match get_string_opt json ["originallyPublishedAt"] with

     

       71
       71
       +
           | Some date -> Some (parse_date date)

     

       72
       72
       +
           | None -> None

     

       73
       73
       +
         in

     

       74
       74
       +
         

     

       75
       75
       +
         let thumbnail_path = get_string_opt json ["thumbnailPath"] in

     

       76
       76
       +
         let tags = get_string_list json ["tags"] in

     

       77
       77
       +
         

     

       78
       78
       +
         { id; uuid; name; description; url; embed_path; 

     

       79
       79
       +
           published_at; originally_published_at; 

     

       80
       80
       +
           thumbnail_path; tags }

     

       81
       81
       +
       

     

       82
       82
       +
       (** Parse a PeerTube video response *)

     

       83
       83
       +
       let parse_video_response json =

     

       84
       84
       +
         let total = J.find json ["total"] |> J.get_int in

     

       85
       85
       +
         let videos_json = J.find json ["data"] in

     

       86
       86
       +
         let data = J.get_list parse_video videos_json in

     

       87
       87
       +
         { total; data }

     

       88
       88
       +
       

     

       89
       89
       +
       (** Fetch videos from a PeerTube instance channel with pagination support

     

       90
       90
       +
           @param count Number of videos to fetch per page

     

       91
       91
       +
           @param start Starting index for pagination (0-based)

     

       92
       92
       +
           @param base_url Base URL of the PeerTube instance

     

       93
       93
       +
           @param channel Channel name to fetch videos from

     

       94
       94
       +
           @return A Lwt promise with the video response

     

       95
       95
       +
           TODO:claude *)

     

       96
       96
       +
       let fetch_channel_videos ?(count=20) ?(start=0) base_url channel =

     

       97
       97
       +
         let open Cohttp_lwt_unix in

     

       98
       98
       +
         let url = Printf.sprintf "%s/api/v1/video-channels/%s/videos?count=%d&start=%d" 

     

       99
       99
       +
                     base_url channel count start in

     

       100
       100
       +
         Client.get (Uri.of_string url) >>= fun (resp, body) ->

     

       101
       101
       +
         if resp.status = `OK then

     

       102
       102
       +
           Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       103
       103
       +
           let json = J.from_string body_str in

     

       104
       104
       +
           Lwt.return (parse_video_response json)

     

       105
       105
       +
         else

     

       106
       106
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       107
       107
       +
           Lwt.fail_with (Fmt.str "HTTP error: %d" status_code)

     

       108
       108
       +
       

     

       109
       109
       +
       (** Fetch all videos from a PeerTube instance channel using pagination

     

       110
       110
       +
           @param page_size Number of videos to fetch per page

     

       111
       111
       +
           @param max_pages Maximum number of pages to fetch (None for all pages)

     

       112
       112
       +
           @param base_url Base URL of the PeerTube instance

     

       113
       113
       +
           @param channel Channel name to fetch videos from

     

       114
       114
       +
           @return A Lwt promise with all videos combined

     

       115
       115
       +
           TODO:claude *)

     

       116
       116
       +
       let fetch_all_channel_videos ?(page_size=20) ?max_pages base_url channel =

     

       117
       117
       +
         let rec fetch_pages start acc _total_count =

     

       118
       118
       +
           fetch_channel_videos ~count:page_size ~start base_url channel >>= fun response ->

     

       119
       119
       +
           let all_videos = acc @ response.data in

     

       120
       120
       +
           

     

       121
       121
       +
           (* Determine if we need to fetch more pages *)

     

       122
       122
       +
           let fetched_count = start + List.length response.data in

     

       123
       123
       +
           let more_available = fetched_count < response.total in

     

       124
       124
       +
           let under_max_pages = match max_pages with

     

       125
       125
       +
             | None -> true

     

       126
       126
       +
             | Some max -> (start / page_size) + 1 < max

     

       127
       127
       +
           in

     

       128
       128
       +
           

     

       129
       129
       +
           if more_available && under_max_pages then

     

       130
       130
       +
             fetch_pages fetched_count all_videos response.total

     

       131
       131
       +
           else

     

       132
       132
       +
             Lwt.return all_videos

     

       133
       133
       +
         in

     

       134
       134
       +
         fetch_pages 0 [] 0

     

       135
       135
       +
       

     

       136
       136
       +
       (** Fetch detailed information for a single video by UUID

     

       137
       137
       +
           @param base_url Base URL of the PeerTube instance

     

       138
       138
       +
           @param uuid UUID of the video to fetch

     

       139
       139
       +
           @return A Lwt promise with the complete video details

     

       140
       140
       +
           TODO:claude *)

     

       141
       141
       +
       let fetch_video_details base_url uuid =

     

       142
       142
       +
         let open Cohttp_lwt_unix in

     

       143
       143
       +
         let url = Printf.sprintf "%s/api/v1/videos/%s" base_url uuid in

     

       144
       144
       +
         Client.get (Uri.of_string url) >>= fun (resp, body) ->

     

       145
       145
       +
         if resp.status = `OK then

     

       146
       146
       +
           Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       147
       147
       +
           let json = J.from_string body_str in

     

       148
       148
       +
           (* Parse the single video details *)

     

       149
       149
       +
           Lwt.return (parse_video json)

     

       150
       150
       +
         else

     

       151
       151
       +
           let status_code = Cohttp.Code.code_of_status resp.status in

     

       152
       152
       +
           Lwt.fail_with (Fmt.str "HTTP error: %d" status_code)

     

       153
       153
       +
       

     

       154
       154
       +
       (** Convert a PeerTube video to Bushel.Video.t compatible structure *)

     

       155
       155
       +
       let to_bushel_video video =

     

       156
       156
       +
         let description = Option.value ~default:"" video.description in

     

       157
       157
       +
         let published_date = video.originally_published_at |> Option.value ~default:video.published_at in

     

       158
       158
       +
         (description, published_date, video.name, video.url, video.uuid, string_of_int video.id)

     

       159
       159
       +
       

     

       160
       160
       +
       (** Get the thumbnail URL for a video *)

     

       161
       161
       +
       let thumbnail_url base_url video =

     

       162
       162
       +
         match video.thumbnail_path with

     

       163
       163
       +
         | Some path -> Some (base_url ^ path)

     

       164
       164
       +
         | None -> None

     

       165
       165
       +
       

     

       166
       166
       +
       (** Download a thumbnail to a file

     

       167
       167
       +
           @param base_url Base URL of the PeerTube instance

     

       168
       168
       +
           @param video The video to download the thumbnail for

     

       169
       169
       +
           @param output_path Path where to save the thumbnail

     

       170
       170
       +
           @return A Lwt promise with unit on success *)

     

       171
       171
       +
       let download_thumbnail base_url video output_path =

     

       172
       172
       +
         match thumbnail_url base_url video with

     

       173
       173
       +
         | None ->

     

       174
       174
       +
             Lwt.return (Error (`Msg (Printf.sprintf "No thumbnail available for video %s" video.uuid)))

     

       175
       175
       +
         | Some url ->

     

       176
       176
       +
             let open Cohttp_lwt_unix in

     

       177
       177
       +
             Client.get (Uri.of_string url) >>= fun (resp, body) ->

     

       178
       178
       +
             if resp.status = `OK then

     

       179
       179
       +
               Cohttp_lwt.Body.to_string body >>= fun body_str ->

     

       180
       180
       +
               Lwt.catch

     

       181
       181
       +
                 (fun () ->

     

       182
       182
       +
                   let oc = open_out_bin output_path in

     

       183
       183
       +
                   output_string oc body_str;

     

       184
       184
       +
                   close_out oc;

     

       185
       185
       +
                   Lwt.return (Ok ()))

     

       186
       186
       +
                 (fun exn ->

     

       187
       187
       +
                   Lwt.return (Error (`Msg (Printf.sprintf "Failed to write thumbnail: %s"

     

       188
       188
       +
                     (Printexc.to_string exn)))))

     

       189
       189
       +
             else

     

       190
       190
       +
               let status_code = Cohttp.Code.code_of_status resp.status in

     

       191
       191
       +
               Lwt.return (Error (`Msg (Printf.sprintf "HTTP error downloading thumbnail: %d" status_code)))

+62

stack/bushel/peertube/peertube.mli

···

       1
       1
       +
       (** PeerTube API client interface

     

       2
       2
       +
           TODO:claude *)

     

       3
       3
       +
       

     

       4
       4
       +
       (** Type representing a PeerTube video *)

     

       5
       5
       +
       type video = {

     

       6
       6
       +
         id: int;

     

       7
       7
       +
         uuid: string;

     

       8
       8
       +
         name: string;

     

       9
       9
       +
         description: string option;

     

       10
       10
       +
         url: string;

     

       11
       11
       +
         embed_path: string;

     

       12
       12
       +
         published_at: Ptime.t;

     

       13
       13
       +
         originally_published_at: Ptime.t option;

     

       14
       14
       +
         thumbnail_path: string option;

     

       15
       15
       +
         tags: string list;

     

       16
       16
       +
       }

     

       17
       17
       +
       

     

       18
       18
       +
       (** Type for PeerTube API response containing videos *)

     

       19
       19
       +
       type video_response = {

     

       20
       20
       +
         total: int;

     

       21
       21
       +
         data: video list;

     

       22
       22
       +
       }

     

       23
       23
       +
       

     

       24
       24
       +
       (** Parse a single video from PeerTube JSON *)

     

       25
       25
       +
       val parse_video : Ezjsonm.value -> video

     

       26
       26
       +
       

     

       27
       27
       +
       (** Parse a PeerTube video response *)

     

       28
       28
       +
       val parse_video_response : Ezjsonm.value -> video_response

     

       29
       29
       +
       

     

       30
       30
       +
       (** Fetch videos from a PeerTube instance channel with pagination support

     

       31
       31
       +
           @param count Number of videos to fetch per page (default: 20)

     

       32
       32
       +
           @param start Starting index for pagination (0-based) (default: 0)

     

       33
       33
       +
           @param base_url Base URL of the PeerTube instance

     

       34
       34
       +
           @param channel Channel name to fetch videos from *)

     

       35
       35
       +
       val fetch_channel_videos : ?count:int -> ?start:int -> string -> string -> video_response Lwt.t

     

       36
       36
       +
       

     

       37
       37
       +
       (** Fetch all videos from a PeerTube instance channel using pagination

     

       38
       38
       +
           @param page_size Number of videos to fetch per page (default: 20)

     

       39
       39
       +
           @param max_pages Maximum number of pages to fetch (None for all pages)

     

       40
       40
       +
           @param base_url Base URL of the PeerTube instance

     

       41
       41
       +
           @param channel Channel name to fetch videos from *)

     

       42
       42
       +
       val fetch_all_channel_videos : ?page_size:int -> ?max_pages:int -> string -> string -> video list Lwt.t

     

       43
       43
       +
       

     

       44
       44
       +
       (** Fetch detailed information for a single video by UUID

     

       45
       45
       +
           @param base_url Base URL of the PeerTube instance

     

       46
       46
       +
           @param uuid UUID of the video to fetch *)

     

       47
       47
       +
       val fetch_video_details : string -> string -> video Lwt.t

     

       48
       48
       +
       

     

       49
       49
       +
       (** Convert a PeerTube video to Bushel.Video.t compatible structure

     

       50
       50
       +
           Returns (description, published_date, title, url, uuid, slug) *)

     

       51
       51
       +
       val to_bushel_video : video -> string * Ptime.t * string * string * string * string

     

       52
       52
       +
       

     

       53
       53
       +
       (** Get the thumbnail URL for a video

     

       54
       54
       +
           @param base_url Base URL of the PeerTube instance

     

       55
       55
       +
           @param video The video to get the thumbnail URL for *)

     

       56
       56
       +
       val thumbnail_url : string -> video -> string option

     

       57
       57
       +
       

     

       58
       58
       +
       (** Download a thumbnail to a file

     

       59
       59
       +
           @param base_url Base URL of the PeerTube instance

     

       60
       60
       +
           @param video The video to download the thumbnail for

     

       61
       61
       +
           @param output_path Path where to save the thumbnail *)

     

       62
       62
       +
       val download_thumbnail : string -> video -> string -> (unit, [> `Msg of string]) result Lwt.t

+36

stack/bushel/typesense-client.opam

···

       1
       1
       +
       # This file is generated by dune, edit dune-project instead

     

       2
       2
       +
       opam-version: "2.0"

     

       3
       3
       +
       synopsis: "Standalone Typesense client for OCaml"

     

       4
       4
       +
       description:

     

       5
       5
       +
         "A standalone Typesense client that can be compiled to JavaScript"

     

       6
       6
       +
       maintainer: ["anil@recoil.org"]

     

       7
       7
       +
       authors: ["Anil Madhavapeddy"]

     

       8
       8
       +
       license: "ISC"

     

       9
       9
       +
       homepage: "https://github.com/avsm/bushel"

     

       10
       10
       +
       bug-reports: "https://github.com/avsm/bushel/issues"

     

       11
       11
       +
       depends: [

     

       12
       12
       +
         "dune" {>= "3.17"}

     

       13
       13
       +
         "ocaml" {>= "5.2.0"}

     

       14
       14
       +
         "ezjsonm"

     

       15
       15
       +
         "lwt"

     

       16
       16
       +
         "cohttp-lwt-unix"

     

       17
       17
       +
         "ptime"

     

       18
       18
       +
         "fmt"

     

       19
       19
       +
         "uri"

     

       20
       20
       +
         "odoc" {with-doc}

     

       21
       21
       +
       ]

     

       22
       22
       +
       build: [

     

       23
       23
       +
         ["dune" "subst"] {dev}

     

       24
       24
       +
         [

     

       25
       25
       +
           "dune"

     

       26
       26
       +
           "build"

     

       27
       27
       +
           "-p"

     

       28
       28
       +
           name

     

       29
       29
       +
           "-j"

     

       30
       30
       +
           jobs

     

       31
       31
       +
           "@install"

     

       32
       32
       +
           "@runtest" {with-test}

     

       33
       33
       +
           "@doc" {with-doc}

     

       34
       34
       +
         ]

     

       35
       35
       +
       ]

     

       36
       36
       +
       dev-repo: "git+https://github.com/avsm/bushel.git"

stack/bushel/typesense-client/dune

···

       1
       1
       +
       (library

     

       2
       2
       +
        (public_name typesense-client)

     

       3
       3
       +
        (name typesense_client)

     

       4
       4
       +
        (libraries lwt cohttp-lwt-unix ezjsonm fmt uri ptime)

     

       5
       5
       +
        (preprocess (pps lwt_ppx)))

+372

stack/bushel/typesense-client/typesense_client.ml

···

       1
       1
       +
       open Lwt.Syntax

     

       2
       2
       +
       open Cohttp_lwt_unix

     

       3
       3
       +
       

     

       4
       4
       +
       (** TODO:claude Standalone Typesense client for OCaml *)

     

       5
       5
       +
       

     

       6
       6
       +
       (** Configuration for Typesense client *)

     

       7
       7
       +
       type config = {

     

       8
       8
       +
         endpoint : string;

     

       9
       9
       +
         api_key : string;

     

       10
       10
       +
       }

     

       11
       11
       +
       

     

       12
       12
       +
       (** Error types for Typesense operations *)

     

       13
       13
       +
       type error = 

     

       14
       14
       +
         | Http_error of int * string

     

       15
       15
       +
         | Json_error of string

     

       16
       16
       +
         | Connection_error of string

     

       17
       17
       +
       

     

       18
       18
       +
       let pp_error fmt = function

     

       19
       19
       +
         | Http_error (code, msg) -> Fmt.pf fmt "HTTP %d: %s" code msg

     

       20
       20
       +
         | Json_error msg -> Fmt.pf fmt "JSON error: %s" msg

     

       21
       21
       +
         | Connection_error msg -> Fmt.pf fmt "Connection error: %s" msg

     

       22
       22
       +
       

     

       23
       23
       +
       (** TODO:claude Create authentication headers for Typesense API *)

     

       24
       24
       +
       let auth_headers api_key =

     

       25
       25
       +
         Cohttp.Header.of_list [

     

       26
       26
       +
           ("X-TYPESENSE-API-KEY", api_key);

     

       27
       27
       +
           ("Content-Type", "application/json");

     

       28
       28
       +
         ]

     

       29
       29
       +
       

     

       30
       30
       +
       (** TODO:claude Make HTTP request to Typesense API *)

     

       31
       31
       +
       let make_request ?(meth=`GET) ?(body="") config path =

     

       32
       32
       +
         let uri = Uri.of_string (config.endpoint ^ path) in

     

       33
       33
       +
         let headers = auth_headers config.api_key in

     

       34
       34
       +
         let body = if body = "" then `Empty else `String body in

     

       35
       35
       +
         Lwt.catch (fun () ->

     

       36
       36
       +
           let* resp, body = Client.call ~headers ~body meth uri in

     

       37
       37
       +
           let status = Cohttp.Code.code_of_status (Response.status resp) in

     

       38
       38
       +
           let* body_str = Cohttp_lwt.Body.to_string body in

     

       39
       39
       +
           if status >= 200 && status < 300 then

     

       40
       40
       +
             Lwt.return_ok body_str

     

       41
       41
       +
           else

     

       42
       42
       +
             Lwt.return_error (Http_error (status, body_str))

     

       43
       43
       +
         ) (fun exn ->

     

       44
       44
       +
           Lwt.return_error (Connection_error (Printexc.to_string exn))

     

       45
       45
       +
         )

     

       46
       46
       +
       

     

       47
       47
       +
       (** TODO:claude Search result types *)

     

       48
       48
       +
       type search_result = {

     

       49
       49
       +
         id: string;

     

       50
       50
       +
         title: string;

     

       51
       51
       +
         content: string;

     

       52
       52
       +
         score: float;

     

       53
       53
       +
         collection: string;

     

       54
       54
       +
         highlights: (string * string list) list;

     

       55
       55
       +
         document: Ezjsonm.value;  (* Store raw document for flexible field access *)

     

       56
       56
       +
       }

     

       57
       57
       +
       

     

       58
       58
       +
       type search_response = {

     

       59
       59
       +
         hits: search_result list;

     

       60
       60
       +
         total: int;

     

       61
       61
       +
         query_time: float;

     

       62
       62
       +
       }

     

       63
       63
       +
       

     

       64
       64
       +
       (** TODO:claude Parse search result from JSON *)

     

       65
       65
       +
       let parse_search_result collection json =

     

       66
       66
       +
         let open Ezjsonm in

     

       67
       67
       +
         let document = get_dict json |> List.assoc "document" in

     

       68
       68
       +
         let highlights = try get_dict json |> List.assoc "highlights" with _ -> `A [] in

     

       69
       69
       +
         let score = try get_dict json |> List.assoc "text_match" |> get_float with _ -> 0.0 in

     

       70
       70
       +
         

     

       71
       71
       +
         let id = get_dict document |> List.assoc "id" |> get_string in

     

       72
       72
       +
         let title = try get_dict document |> List.assoc "title" |> get_string with _ -> "" in

     

       73
       73
       +
         let content = try

     

       74
       74
       +
           match collection with

     

       75
       75
       +
           | "papers" -> get_dict document |> List.assoc "abstract" |> get_string

     

       76
       76
       +
           | "projects" -> get_dict document |> List.assoc "description" |> get_string

     

       77
       77
       +
           | "news" -> get_dict document |> List.assoc "content" |> get_string

     

       78
       78
       +
           | "videos" -> get_dict document |> List.assoc "description" |> get_string

     

       79
       79
       +
           | "notes" -> get_dict document |> List.assoc "content" |> get_string

     

       80
       80
       +
           | "ideas" -> get_dict document |> List.assoc "description" |> get_string

     

       81
       81
       +
           | "contacts" -> get_dict document |> List.assoc "name" |> get_string

     

       82
       82
       +
           | _ -> ""

     

       83
       83
       +
         with _ -> "" in

     

       84
       84
       +
         

     

       85
       85
       +
         let parse_highlights highlights =

     

       86
       86
       +
           try

     

       87
       87
       +
             get_list (fun h ->

     

       88
       88
       +
               let field = get_dict h |> List.assoc "field" |> get_string in

     

       89
       89
       +
               let snippets = get_dict h |> List.assoc "snippets" |> get_list get_string in

     

       90
       90
       +
               (field, snippets)

     

       91
       91
       +
             ) highlights

     

       92
       92
       +
           with _ -> []

     

       93
       93
       +
         in

     

       94
       94
       +
         

     

       95
       95
       +
         { id; title; content; score; collection; highlights = parse_highlights highlights; document }

     

       96
       96
       +
       

     

       97
       97
       +
       (** TODO:claude Parse search response from JSON *)

     

       98
       98
       +
       let parse_search_response collection json =

     

       99
       99
       +
         let open Ezjsonm in

     

       100
       100
       +
         let hits = get_dict json |> List.assoc "hits" |> get_list (parse_search_result collection) in

     

       101
       101
       +
         let total = get_dict json |> List.assoc "found" |> get_int in

     

       102
       102
       +
         let query_time = get_dict json |> List.assoc "search_time_ms" |> get_float in

     

       103
       103
       +
         { hits; total; query_time }

     

       104
       104
       +
       

     

       105
       105
       +
       (** TODO:claude Search a single collection *)

     

       106
       106
       +
       let search_collection config collection_name query ?(limit=10) ?(offset=0) () =

     

       107
       107
       +
         let escaped_query = Uri.pct_encode query in

     

       108
       108
       +
         let query_fields = match collection_name with

     

       109
       109
       +
           | "papers" -> "title,abstract,authors"

     

       110
       110
       +
           | "projects" -> "title,description"

     

       111
       111
       +
           | "news" -> "title,content"

     

       112
       112
       +
           | "videos" -> "title,description"

     

       113
       113
       +
           | "notes" -> "title,content"

     

       114
       114
       +
           | "ideas" -> "title,description"

     

       115
       115
       +
           | "contacts" -> "name,names"

     

       116
       116
       +
           | _ -> "title,content,description,abstract"

     

       117
       117
       +
         in

     

       118
       118
       +
         let path = Printf.sprintf "/collections/%s/documents/search?q=%s&query_by=%s&per_page=%d&page=%d&highlight_full_fields=%s"

     

       119
       119
       +
           collection_name escaped_query query_fields limit ((offset / limit) + 1) query_fields in

     

       120
       120
       +
         let* result = make_request config path in

     

       121
       121
       +
         match result with

     

       122
       122
       +
         | Ok response_str ->

     

       123
       123
       +
           (try

     

       124
       124
       +
             let json = Ezjsonm.from_string response_str in

     

       125
       125
       +
             let search_response = parse_search_response collection_name json in

     

       126
       126
       +
             Lwt.return_ok search_response

     

       127
       127
       +
           with exn ->

     

       128
       128
       +
             Lwt.return_error (Json_error (Printexc.to_string exn)))

     

       129
       129
       +
         | Error err -> Lwt.return_error err

     

       130
       130
       +
       

     

       131
       131
       +
       (** TODO:claude Helper function to drop n elements from list *)

     

       132
       132
       +
       let rec drop n lst =

     

       133
       133
       +
         if n <= 0 then lst

     

       134
       134
       +
         else match lst with

     

       135
       135
       +
         | [] -> []

     

       136
       136
       +
         | _ :: tl -> drop (n - 1) tl

     

       137
       137
       +
       

     

       138
       138
       +
       (** TODO:claude Helper function to take n elements from list *)

     

       139
       139
       +
       let rec take n lst =

     

       140
       140
       +
         if n <= 0 then []

     

       141
       141
       +
         else match lst with

     

       142
       142
       +
         | [] -> []

     

       143
       143
       +
         | hd :: tl -> hd :: take (n - 1) tl

     

       144
       144
       +
       

     

       145
       145
       +
       (** TODO:claude Multisearch result types *)

     

       146
       146
       +
       type multisearch_response = {

     

       147
       147
       +
         results: search_response list;

     

       148
       148
       +
       }

     

       149
       149
       +
       

     

       150
       150
       +
       (** TODO:claude Parse multisearch response from JSON *)

     

       151
       151
       +
       let parse_multisearch_response json =

     

       152
       152
       +
         let open Ezjsonm in

     

       153
       153
       +
         let results_json = get_dict json |> List.assoc "results" |> get_list (fun r -> r) in

     

       154
       154
       +
         let results = List.mapi (fun i result_json ->

     

       155
       155
       +
           let collection_name = match i with

     

       156
       156
       +
             | 0 -> "contacts"

     

       157
       157
       +
             | 1 -> "news" 

     

       158
       158
       +
             | 2 -> "notes"

     

       159
       159
       +
             | 3 -> "papers"

     

       160
       160
       +
             | 4 -> "projects"

     

       161
       161
       +
             | 5 -> "ideas"

     

       162
       162
       +
             | 6 -> "videos"

     

       163
       163
       +
             | _ -> "unknown"

     

       164
       164
       +
           in

     

       165
       165
       +
           parse_search_response collection_name result_json

     

       166
       166
       +
         ) results_json in

     

       167
       167
       +
         { results }

     

       168
       168
       +
       

     

       169
       169
       +
       (** TODO:claude Perform multisearch across all collections *)

     

       170
       170
       +
       let multisearch config query ?(limit=10) () =

     

       171
       171
       +
         let collections = ["contacts"; "news"; "notes"; "papers"; "projects"; "ideas"; "videos"] in

     

       172
       172
       +
         let query_by_collection = [

     

       173
       173
       +
           ("contacts", "name,names,email,handle,github,twitter,url");

     

       174
       174
       +
           ("news", "title,content,source,author,category,tags");

     

       175
       175
       +
           ("notes", "title,content,tags,synopsis");

     

       176
       176
       +
           ("papers", "title,authors,abstract,journal,tags");

     

       177
       177
       +
           ("projects", "title,description,languages,license,status,tags");

     

       178
       178
       +
           ("ideas", "title,description,level,status,project,supervisors,tags");

     

       179
       179
       +
           ("videos", "title,description,channel,platform,tags");

     

       180
       180
       +
         ] in

     

       181
       181
       +
         

     

       182
       182
       +
         let searches = List.map (fun collection ->

     

       183
       183
       +
           let query_by = List.assoc collection query_by_collection in

     

       184
       184
       +
           Ezjsonm.dict [

     

       185
       185
       +
             ("collection", Ezjsonm.string collection);

     

       186
       186
       +
             ("q", Ezjsonm.string query);

     

       187
       187
       +
             ("query_by", Ezjsonm.string query_by);

     

       188
       188
       +
             ("exclude_fields", Ezjsonm.string "embedding");

     

       189
       189
       +
             ("per_page", Ezjsonm.int limit);

     

       190
       190
       +
           ]

     

       191
       191
       +
         ) collections in

     

       192
       192
       +
         

     

       193
       193
       +
         let body = Ezjsonm.dict [("searches", Ezjsonm.list (fun x -> x) searches)] |> Ezjsonm.value_to_string in

     

       194
       194
       +
         let* result = make_request ~meth:`POST ~body config "/multi_search" in

     

       195
       195
       +
         

     

       196
       196
       +
         match result with

     

       197
       197
       +
         | Ok response_str ->

     

       198
       198
       +
           (try

     

       199
       199
       +
             let json = Ezjsonm.from_string response_str in

     

       200
       200
       +
             let multisearch_resp = parse_multisearch_response json in

     

       201
       201
       +
             Lwt.return_ok multisearch_resp

     

       202
       202
       +
           with exn ->

     

       203
       203
       +
             Lwt.return_error (Json_error (Printexc.to_string exn)))

     

       204
       204
       +
         | Error err -> Lwt.return_error err

     

       205
       205
       +
       

     

       206
       206
       +
       (** TODO:claude Combine multisearch results into single result set *)

     

       207
       207
       +
       let combine_multisearch_results (multisearch_resp : multisearch_response) ?(limit=10) ?(offset=0) () =

     

       208
       208
       +
         (* Collect all hits from all collections *)

     

       209
       209
       +
         let all_hits = List.fold_left (fun acc response ->

     

       210
       210
       +
           response.hits @ acc

     

       211
       211
       +
         ) [] multisearch_resp.results in

     

       212
       212
       +
         

     

       213
       213
       +
         (* Sort by score descending *)

     

       214
       214
       +
         let sorted_hits = List.sort (fun a b -> Float.compare b.score a.score) all_hits in

     

       215
       215
       +
         

     

       216
       216
       +
         (* Apply offset and limit *)

     

       217
       217
       +
         let dropped_hits = drop offset sorted_hits in

     

       218
       218
       +
         let final_hits = take limit dropped_hits in

     

       219
       219
       +
         

     

       220
       220
       +
         (* Calculate totals *)

     

       221
       221
       +
         let total = List.fold_left (fun acc response -> acc + response.total) 0 multisearch_resp.results in

     

       222
       222
       +
         let query_time = List.fold_left (fun acc response -> acc +. response.query_time) 0.0 multisearch_resp.results in

     

       223
       223
       +
         

     

       224
       224
       +
         { hits = final_hits; total; query_time }

     

       225
       225
       +
       

     

       226
       226
       +
       (** TODO:claude List all collections *)

     

       227
       227
       +
       let list_collections config =

     

       228
       228
       +
         let* result = make_request config "/collections" in

     

       229
       229
       +
         match result with

     

       230
       230
       +
         | Ok response_str ->

     

       231
       231
       +
           (try

     

       232
       232
       +
             let json = Ezjsonm.from_string response_str in

     

       233
       233
       +
             let collections = Ezjsonm.get_list (fun c ->

     

       234
       234
       +
               let name = Ezjsonm.get_dict c |> List.assoc "name" |> Ezjsonm.get_string in

     

       235
       235
       +
               let num_docs = Ezjsonm.get_dict c |> List.assoc "num_documents" |> Ezjsonm.get_int in

     

       236
       236
       +
               (name, num_docs)

     

       237
       237
       +
             ) json in

     

       238
       238
       +
             Lwt.return_ok collections

     

       239
       239
       +
           with exn ->

     

       240
       240
       +
             Lwt.return_error (Json_error (Printexc.to_string exn)))

     

       241
       241
       +
         | Error err -> Lwt.return_error err

     

       242
       242
       +
       

     

       243
       243
       +
       (** TODO:claude Pretty printer utilities *)

     

       244
       244
       +
       

     

       245
       245
       +
       (** Extract field value from JSON document or return empty string if not found *)

     

       246
       246
       +
       let extract_field_string document field =

     

       247
       247
       +
         try

     

       248
       248
       +
           let open Ezjsonm in

     

       249
       249
       +
           get_dict document |> List.assoc field |> get_string

     

       250
       250
       +
         with _ -> ""

     

       251
       251
       +
       

     

       252
       252
       +
       (** Extract field value from JSON document as string list or return empty list if not found *)

     

       253
       253
       +
       let extract_field_string_list document field =

     

       254
       254
       +
         try

     

       255
       255
       +
           let open Ezjsonm in

     

       256
       256
       +
           get_dict document |> List.assoc field |> get_list get_string

     

       257
       257
       +
         with _ -> []

     

       258
       258
       +
       

     

       259
       259
       +
       (** Extract field value from JSON document as boolean or return false if not found *)

     

       260
       260
       +
       let extract_field_bool document field =

     

       261
       261
       +
         try

     

       262
       262
       +
           let open Ezjsonm in

     

       263
       263
       +
           get_dict document |> List.assoc field |> get_bool

     

       264
       264
       +
         with _ -> false

     

       265
       265
       +
       

     

       266
       266
       +
       (** Format authors list for display *)

     

       267
       267
       +
       let format_authors authors =

     

       268
       268
       +
         match authors with

     

       269
       269
       +
         | [] -> ""

     

       270
       270
       +
         | [single] -> single

     

       271
       271
       +
         | _first :: rest when List.length rest <= 2 -> String.concat ", " authors

     

       272
       272
       +
         | first :: _rest -> Printf.sprintf "%s et al." first

     

       273
       273
       +
       

     

       274
       274
       +
       (** Format date for display *)

     

       275
       275
       +
       let format_date date_str =

     

       276
       276
       +
         match date_str with

     

       277
       277
       +
         | "" -> ""

     

       278
       278
       +
         | d when String.length d >= 10 -> String.sub d 0 10  (* Take YYYY-MM-DD part *)

     

       279
       279
       +
         | d -> d

     

       280
       280
       +
       

     

       281
       281
       +
       (** Format tags for display *)

     

       282
       282
       +
       let format_tags tags =

     

       283
       283
       +
         match tags with

     

       284
       284
       +
         | [] -> ""

     

       285
       285
       +
         | ts when List.length ts <= 3 -> String.concat ", " ts

     

       286
       286
       +
         | ts -> Printf.sprintf "%s (+%d more)" (String.concat ", " (take 2 ts)) (List.length ts - 2)

     

       287
       287
       +
       

     

       288
       288
       +
       (** TODO:claude One-line pretty printer for search results *)

     

       289
       289
       +
       let pp_search_result_oneline (result : search_result) =

     

       290
       290
       +
         let document = result.document in

     

       291
       291
       +
         

     

       292
       292
       +
         match result.collection with

     

       293
       293
       +
         | "papers" ->

     

       294
       294
       +
           let authors = extract_field_string_list document "authors" in

     

       295
       295
       +
           let date = extract_field_string document "date" in

     

       296
       296
       +
           let journal = extract_field_string_list document "journal" in

     

       297
       297
       +
           let journal_str = match journal with [] -> "" | j :: _ -> Printf.sprintf " (%s)" j in

     

       298
       298
       +
           Printf.sprintf "📄 %s — %s%s %s" 

     

       299
       299
       +
             result.title 

     

       300
       300
       +
             (format_authors authors)

     

       301
       301
       +
             journal_str

     

       302
       302
       +
             (format_date date)

     

       303
       303
       +
             

     

       304
       304
       +
         | "videos" ->

     

       305
       305
       +
           let date = extract_field_string document "published_date" in

     

       306
       306
       +
           let uuid = extract_field_string document "uuid" in

     

       307
       307
       +
           let is_talk = extract_field_bool document "is_talk" in

     

       308
       308
       +
           let talk_indicator = if is_talk then "🎤" else "🎬" in

     

       309
       309
       +
           let url = extract_field_string document "url" in

     

       310
       310
       +
           let url_display = if url = "" then "" else Printf.sprintf " <%s>" url in

     

       311
       311
       +
           Printf.sprintf "%s %s — %s [%s]%s" 

     

       312
       312
       +
             talk_indicator

     

       313
       313
       +
             result.title 

     

       314
       314
       +
             (format_date date)

     

       315
       315
       +
             (if uuid = "" then result.id else uuid)

     

       316
       316
       +
             url_display

     

       317
       317
       +
             

     

       318
       318
       +
         | "projects" ->

     

       319
       319
       +
           let start_year = extract_field_string document "start_year" in

     

       320
       320
       +
           let tags = extract_field_string_list document "tags" in

     

       321
       321
       +
           let tags_str = match tags with [] -> "" | ts -> Printf.sprintf " #%s" (format_tags ts) in

     

       322
       322
       +
           Printf.sprintf "🚀 %s — %s%s" 

     

       323
       323
       +
             result.title 

     

       324
       324
       +
             (if start_year = "" then "" else Printf.sprintf "(%s) " start_year)

     

       325
       325
       +
             tags_str

     

       326
       326
       +
             

     

       327
       327
       +
         | "news" ->

     

       328
       328
       +
           let date = extract_field_string document "date" in

     

       329
       329
       +
           let url = extract_field_string document "url" in

     

       330
       330
       +
           let url_display = if url = "" then "" else Printf.sprintf " <%s>" url in

     

       331
       331
       +
           Printf.sprintf "📰 %s — %s%s" 

     

       332
       332
       +
             result.title 

     

       333
       333
       +
             (format_date date)

     

       334
       334
       +
             url_display

     

       335
       335
       +
             

     

       336
       336
       +
         | "notes" ->

     

       337
       337
       +
           let date = extract_field_string document "date" in

     

       338
       338
       +
           let tags = extract_field_string_list document "tags" in

     

       339
       339
       +
           let tags_str = match tags with [] -> "" | ts -> Printf.sprintf " #%s" (format_tags ts) in

     

       340
       340
       +
           Printf.sprintf "📝 %s — %s%s" 

     

       341
       341
       +
             result.title 

     

       342
       342
       +
             (format_date date)

     

       343
       343
       +
             tags_str

     

       344
       344
       +
             

     

       345
       345
       +
         | "ideas" ->

     

       346
       346
       +
           let project = extract_field_string document "project" in

     

       347
       347
       +
           let level = extract_field_string document "level" in

     

       348
       348
       +
           let status = extract_field_string document "status" in

     

       349
       349
       +
           let year = extract_field_string document "year" in

     

       350
       350
       +
           Printf.sprintf "💡 %s — %s%s%s %s" 

     

       351
       351
       +
             result.title 

     

       352
       352
       +
             (if project = "" then "" else Printf.sprintf "[%s] " project)

     

       353
       353
       +
             (if level = "" then "" else Printf.sprintf "(%s) " level)

     

       354
       354
       +
             (if status = "" then "" else Printf.sprintf "%s " status)

     

       355
       355
       +
             year

     

       356
       356
       +
             

     

       357
       357
       +
         | "contacts" ->

     

       358
       358
       +
           let names = extract_field_string_list document "names" in

     

       359
       359
       +
           let handle = extract_field_string document "handle" in

     

       360
       360
       +
           let email = extract_field_string document "email" in

     

       361
       361
       +
           let github = extract_field_string document "github" in

     

       362
       362
       +
           let name_str = match names with [] -> result.title | n :: _ -> n in

     

       363
       363
       +
           let contact_info = [

     

       364
       364
       +
             (if handle = "" then "" else Printf.sprintf "@%s" handle);

     

       365
       365
       +
             (if email = "" then "" else email);

     

       366
       366
       +
             (if github = "" then "" else Printf.sprintf "github:%s" github);

     

       367
       367
       +
           ] |> List.filter (fun s -> s <> "") |> String.concat " " in

     

       368
       368
       +
           Printf.sprintf "👤 %s — %s" 

     

       369
       369
       +
             name_str 

     

       370
       370
       +
             contact_info

     

       371
       371
       +
             

     

       372
       372
       +
         | _ -> Printf.sprintf "[%s] %s" result.collection result.title

+60

stack/bushel/typesense-client/typesense_client.mli

···

       1
       1
       +
       (** Standalone Typesense client for OCaml *)

     

       2
       2
       +
       

     

       3
       3
       +
       (** Configuration for Typesense client *)

     

       4
       4
       +
       type config = {

     

       5
       5
       +
         endpoint : string;

     

       6
       6
       +
         api_key : string;

     

       7
       7
       +
       }

     

       8
       8
       +
       

     

       9
       9
       +
       (** Error types for Typesense operations *)

     

       10
       10
       +
       type error = 

     

       11
       11
       +
         | Http_error of int * string

     

       12
       12
       +
         | Json_error of string

     

       13
       13
       +
         | Connection_error of string

     

       14
       14
       +
       

     

       15
       15
       +
       val pp_error : Format.formatter -> error -> unit

     

       16
       16
       +
       

     

       17
       17
       +
       (** Search result types *)

     

       18
       18
       +
       type search_result = {

     

       19
       19
       +
         id: string;

     

       20
       20
       +
         title: string;

     

       21
       21
       +
         content: string;

     

       22
       22
       +
         score: float;

     

       23
       23
       +
         collection: string;

     

       24
       24
       +
         highlights: (string * string list) list;

     

       25
       25
       +
         document: Ezjsonm.value;  (* Store raw document for flexible field access *)

     

       26
       26
       +
       }

     

       27
       27
       +
       

     

       28
       28
       +
       type search_response = {

     

       29
       29
       +
         hits: search_result list;

     

       30
       30
       +
         total: int;

     

       31
       31
       +
         query_time: float;

     

       32
       32
       +
       }

     

       33
       33
       +
       

     

       34
       34
       +
       (** Multisearch result types *)

     

       35
       35
       +
       type multisearch_response = {

     

       36
       36
       +
         results: search_response list;

     

       37
       37
       +
       }

     

       38
       38
       +
       

     

       39
       39
       +
       (** Search a single collection *)

     

       40
       40
       +
       val search_collection : config -> string -> string -> ?limit:int -> ?offset:int -> unit -> (search_response, error) result Lwt.t

     

       41
       41
       +
       

     

       42
       42
       +
       (** Perform multisearch across all collections *)

     

       43
       43
       +
       val multisearch : config -> string -> ?limit:int -> unit -> (multisearch_response, error) result Lwt.t

     

       44
       44
       +
       

     

       45
       45
       +
       (** Combine multisearch results into single result set *)

     

       46
       46
       +
       val combine_multisearch_results : multisearch_response -> ?limit:int -> ?offset:int -> unit -> search_response

     

       47
       47
       +
       

     

       48
       48
       +
       (** List all collections *)

     

       49
       49
       +
       val list_collections : config -> ((string * int) list, error) result Lwt.t

     

       50
       50
       +
       

     

       51
       51
       +
       (** Pretty printer utilities *)

     

       52
       52
       +
       val extract_field_string : Ezjsonm.value -> string -> string

     

       53
       53
       +
       val extract_field_string_list : Ezjsonm.value -> string -> string list

     

       54
       54
       +
       val extract_field_bool : Ezjsonm.value -> string -> bool

     

       55
       55
       +
       val format_authors : string list -> string

     

       56
       56
       +
       val format_date : string -> string

     

       57
       57
       +
       val format_tags : string list -> string

     

       58
       58
       +
       

     

       59
       59
       +
       (** One-line pretty printer for search results *)

     

       60
       60
       +
       val pp_search_result_oneline : search_result -> string