experimental hashing with oxcaml
C 66.0%
OCaml 30.0%
Dune 1.0%
Other 3.0%
3 1 0

Clone this repository

https://tangled.org/anil.recoil.org/oxsha
git@git.recoil.org:anil.recoil.org/oxsha

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Oxsha - Fast SHA256 Hashing for OCaml#

A high-performance SHA256 hashing library for OCaml with zero-copy C bindings using bigarrays.

Features#

  • Zero-copy performance: Uses bigarrays for efficient data transfer to C
  • Hardware acceleration: Automatically detects and uses CPU SHA extensions (Intel SHA-NI, ARM Crypto)
  • Streaming API: Incremental hashing with init/update/finalize pattern
  • Multiple interfaces: Support for bigarrays, bytes, and strings
  • Memory-mapped files: sha256sum example uses Unix.map_file for true zero-copy file hashing
  • Minimal dependencies: Standalone library with no external dependencies
  • Well-documented: Comprehensive API documentation

Installation#

opam install oxsha

Or build from source:

dune build
dune install

Quick Start#

(* One-shot hashing *)
let digest = Oxsha.hash_string "Hello, World!" in
Printf.printf "SHA256: %s\n" (hex_of_bytes digest)

(* Streaming API for large data *)
let ctx = Oxsha.create () in
Oxsha.update_string ctx "Hello, ";
Oxsha.update_string ctx "World!";
let digest = Oxsha.final ctx in
Printf.printf "SHA256: %s\n" (hex_of_bytes digest)

API Overview#

Low-Level API (Oxsha.Raw)#

The Raw module provides direct access to the C implementation:

  • create : unit -> t - Create a new SHA256 context
  • update : t -> bigarray -> unit - Update with bigarray data (zero-copy)
  • update_bytes : t -> bytes -> unit - Update with bytes
  • update_string : t -> string -> unit - Update with string
  • final : t -> bytes - Finalize and get 32-byte digest
  • hash : bigarray -> bytes - One-shot hash for bigarrays
  • hash_bytes : bytes -> bytes - One-shot hash for bytes
  • hash_string : string -> bytes - One-shot hash for strings

High-Level API#

All Raw module functions are re-exported at the top level for convenience:

Oxsha.create ()
Oxsha.update_string ctx "data"
Oxsha.final ctx

Performance Considerations#

For maximum performance:

  1. Use the bigarray API directly when possible to avoid copying
  2. Use the streaming API for large files to avoid loading everything in memory
  3. Use Unix.map_file for hashing large files (see sha256sum example)
  4. The C implementation is optimized and allocation-free
  5. Hardware SHA extensions are automatically enabled when available

Hardware Acceleration#

The library automatically detects your CPU architecture at build time and enables hardware SHA acceleration:

  • x86/x86_64: Uses Intel SHA Extensions (-msse4.1 -msha)
  • ARM64/AArch64: Uses ARM Crypto Extensions (-march=armv8-a+crypto)

This is handled transparently by a dune configurator script in lib/discover/.

Examples#

Basic Usage#

See examples/basic_usage.ml for complete examples:

dune exec examples/basic_usage.exe

SHA256sum Utility#

A drop-in replacement for the sha256sum command that uses memory-mapped files for zero-copy hashing:

# Hash one or more files
dune exec examples/sha256sum.exe -- README.md lib/oxsha.mli

# Output format is identical to sha256sum
5663d62d52903366546603da52d18ccbf36ef7265653b641b980ec36891c7afe  README.md
b4cbb3d0d18b90cc63c0e3e8c95f4e933d1a361a5eae142e64caf17724a1447f  lib/oxsha.mli

The sha256sum example demonstrates true zero-copy file hashing by memory-mapping files directly into bigarrays.

Building#

# Clean build
opam exec -- dune clean

# Build library
opam exec -- dune build @check

# Build documentation
opam exec -- dune build @doc

# Build ignoring warnings (release mode)
opam exec -- dune build @check --profile=release

Project Structure#

oxsha/
├── lib/
│   ├── oxsha.ml          # OCaml implementation
│   ├── oxsha.mli         # Public interface
│   ├── oxsha_stubs.c     # C FFI bindings
│   ├── sha256.c          # SHA256 C implementation
│   ├── sha256.h          # C header
│   ├── dune              # Build rules
│   └── discover/
│       ├── discover.ml   # Architecture detection for C flags
│       └── dune          # Configurator build rules
├── examples/
│   ├── basic_usage.ml    # API usage examples
│   ├── sha256sum.ml      # sha256sum utility with mmap
│   └── dune
├── dune-project          # Project metadata
└── README.md

C Implementation#

The library uses Brad Conte's public domain SHA256 implementation. The C context is allocated on the C heap and wrapped in an OCaml custom block with proper finalization.

Build-Time Configuration#

The build system uses dune-configurator to detect the CPU architecture and automatically add the appropriate compiler flags for hardware SHA acceleration. The configurator script (lib/discover/discover.ml) runs during the build and generates a c_flags.sexp file that dune includes in the C compilation flags.

License#

ISC License

Contributing#

Contributions welcome! Please ensure all tests pass before submitting PRs.