experimental hashing with oxcaml

working sha256

+18
LICENSE.md
···
+
(*
+
* ISC License
+
*
+
* Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org>
+
*
+
* Permission to use, copy, modify, and distribute this software for any
+
* purpose with or without fee is hereby granted, provided that the above
+
* copyright notice and this permission notice appear in all copies.
+
*
+
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
*
+
*)
+163 -1
README.md
···
-
A sha256 experiment
+
# Oxsha - Fast SHA256 Hashing for OCaml
+
+
A high-performance SHA256 hashing library for OCaml with zero-copy C bindings using bigarrays.
+
+
## Features
+
+
- **Zero-copy performance**: Uses bigarrays for efficient data transfer to C
+
- **Hardware acceleration**: Automatically detects and uses CPU SHA extensions (Intel SHA-NI, ARM Crypto)
+
- **Streaming API**: Incremental hashing with init/update/finalize pattern
+
- **Multiple interfaces**: Support for bigarrays, bytes, and strings
+
- **Memory-mapped files**: sha256sum example uses `Unix.map_file` for true zero-copy file hashing
+
- **Minimal dependencies**: Standalone library with no external dependencies
+
- **Well-documented**: Comprehensive API documentation
+
+
## Installation
+
+
```bash
+
opam install oxsha
+
```
+
+
Or build from source:
+
+
```bash
+
dune build
+
dune install
+
```
+
+
## Quick Start
+
+
```ocaml
+
(* One-shot hashing *)
+
let digest = Oxsha.hash_string "Hello, World!" in
+
Printf.printf "SHA256: %s\n" (hex_of_bytes digest)
+
+
(* Streaming API for large data *)
+
let ctx = Oxsha.create () in
+
Oxsha.update_string ctx "Hello, ";
+
Oxsha.update_string ctx "World!";
+
let digest = Oxsha.final ctx in
+
Printf.printf "SHA256: %s\n" (hex_of_bytes digest)
+
```
+
+
## API Overview
+
+
### Low-Level API (`Oxsha.Raw`)
+
+
The Raw module provides direct access to the C implementation:
+
+
- `create : unit -> t` - Create a new SHA256 context
+
- `update : t -> bigarray -> unit` - Update with bigarray data (zero-copy)
+
- `update_bytes : t -> bytes -> unit` - Update with bytes
+
- `update_string : t -> string -> unit` - Update with string
+
- `final : t -> bytes` - Finalize and get 32-byte digest
+
- `hash : bigarray -> bytes` - One-shot hash for bigarrays
+
- `hash_bytes : bytes -> bytes` - One-shot hash for bytes
+
- `hash_string : string -> bytes` - One-shot hash for strings
+
+
### High-Level API
+
+
All Raw module functions are re-exported at the top level for convenience:
+
+
```ocaml
+
Oxsha.create ()
+
Oxsha.update_string ctx "data"
+
Oxsha.final ctx
+
```
+
+
## Performance Considerations
+
+
For maximum performance:
+
1. Use the bigarray API directly when possible to avoid copying
+
2. Use the streaming API for large files to avoid loading everything in memory
+
3. Use `Unix.map_file` for hashing large files (see sha256sum example)
+
4. The C implementation is optimized and allocation-free
+
5. Hardware SHA extensions are automatically enabled when available
+
+
### Hardware Acceleration
+
+
The library automatically detects your CPU architecture at build time and enables hardware SHA acceleration:
+
+
- **x86/x86_64**: Uses Intel SHA Extensions (`-msse4.1 -msha`)
+
- **ARM64/AArch64**: Uses ARM Crypto Extensions (`-march=armv8-a+crypto`)
+
+
This is handled transparently by a dune configurator script in `lib/discover/`.
+
+
## Examples
+
+
### Basic Usage
+
+
See `examples/basic_usage.ml` for complete examples:
+
+
```bash
+
dune exec examples/basic_usage.exe
+
```
+
+
### SHA256sum Utility
+
+
A drop-in replacement for the `sha256sum` command that uses memory-mapped files for zero-copy hashing:
+
+
```bash
+
# Hash one or more files
+
dune exec examples/sha256sum.exe -- README.md lib/oxsha.mli
+
+
# Output format is identical to sha256sum
+
5663d62d52903366546603da52d18ccbf36ef7265653b641b980ec36891c7afe README.md
+
b4cbb3d0d18b90cc63c0e3e8c95f4e933d1a361a5eae142e64caf17724a1447f lib/oxsha.mli
+
```
+
+
The sha256sum example demonstrates true zero-copy file hashing by memory-mapping files directly into bigarrays.
+
+
## Building
+
+
```bash
+
# Clean build
+
opam exec -- dune clean
+
+
# Build library
+
opam exec -- dune build @check
+
+
# Build documentation
+
opam exec -- dune build @doc
+
+
# Build ignoring warnings (release mode)
+
opam exec -- dune build @check --profile=release
+
```
+
+
## Project Structure
+
+
```
+
oxsha/
+
├── lib/
+
│ ├── oxsha.ml # OCaml implementation
+
│ ├── oxsha.mli # Public interface
+
│ ├── oxsha_stubs.c # C FFI bindings
+
│ ├── sha256.c # SHA256 C implementation
+
│ ├── sha256.h # C header
+
│ ├── dune # Build rules
+
│ └── discover/
+
│ ├── discover.ml # Architecture detection for C flags
+
│ └── dune # Configurator build rules
+
├── examples/
+
│ ├── basic_usage.ml # API usage examples
+
│ ├── sha256sum.ml # sha256sum utility with mmap
+
│ └── dune
+
├── dune-project # Project metadata
+
└── README.md
+
```
+
+
## C Implementation
+
+
The library uses Brad Conte's public domain SHA256 implementation. The C context is allocated on the C heap and wrapped in an OCaml custom block with proper finalization.
+
+
### Build-Time Configuration
+
+
The build system uses `dune-configurator` to detect the CPU architecture and automatically add the appropriate compiler flags for hardware SHA acceleration. The configurator script (`lib/discover/discover.ml`) runs during the build and generates a `c_flags.sexp` file that dune includes in the C compilation flags.
+
+
## License
+
+
ISC License
+
+
## Contributing
+
+
Contributions welcome! Please ensure all tests pass before submitting PRs.
-173
bench/bench_sha256.ml
···
-
open Sha256
-
-
(* Memory allocation tracking *)
-
let measure_allocations f =
-
let before = Gc.allocated_bytes () in
-
let result = f () in
-
let after = Gc.allocated_bytes () in
-
(result, after -. before)
-
-
(* Benchmark different scenarios *)
-
let bench_sizes () =
-
print_endline "Benchmarking various input sizes:";
-
print_endline "Size (B) | Iterations | Time (s) | Throughput (MB/s) | Allocations (B)";
-
print_endline "---------|------------|----------|-------------------|----------------";
-
-
let sizes = [
-
(16, 100000);
-
(64, 100000);
-
(256, 50000);
-
(1024, 20000);
-
(4096, 5000);
-
(16384, 1000);
-
(65536, 250);
-
(262144, 60);
-
(1048576, 15);
-
] in
-
-
List.iter (fun (size, iterations) ->
-
let data = String.make size 'x' in
-
-
(* Warmup *)
-
for _ = 1 to 10 do
-
ignore (hash_string data)
-
done;
-
-
(* Benchmark *)
-
let start = Unix.gettimeofday () in
-
let _, allocs = measure_allocations (fun () ->
-
for _ = 1 to iterations do
-
ignore (hash_string data)
-
done
-
) in
-
let elapsed = Unix.gettimeofday () -. start in
-
-
let throughput = (float_of_int (size * iterations)) /. elapsed /. 1_000_000.0 in
-
let allocs_per_op = allocs /. float_of_int iterations in
-
-
Printf.printf "%8d | %10d | %8.3f | %17.1f | %14.0f\n"
-
size iterations elapsed throughput allocs_per_op
-
) sizes
-
-
let bench_parallel_scaling () =
-
print_endline "\nParallel scaling benchmark:";
-
print_endline "Threads | Hashes | Time (s) | Hashes/sec | Speedup";
-
print_endline "--------|--------|----------|------------|--------";
-
-
let num_hashes = 10000 in
-
let data_size = 1024 in
-
let inputs = List.init num_hashes (fun i ->
-
Bytes.of_string (String.make data_size (Char.chr (65 + (i mod 26))))
-
) in
-
-
(* Sequential baseline *)
-
let start_seq = Unix.gettimeofday () in
-
let _ = List.map hash_bytes inputs in
-
let time_seq = Unix.gettimeofday () -. start_seq in
-
let hashes_per_sec_seq = float_of_int num_hashes /. time_seq in
-
-
Printf.printf "%7d | %6d | %8.3f | %10.0f | %7.2fx\n"
-
1 num_hashes time_seq hashes_per_sec_seq 1.0;
-
-
(* Parallel with different thread counts *)
-
let thread_counts = [2; 4; 8] in
-
List.iter (fun threads ->
-
(* Simulate parallel execution with multiple Parallel.fork_join2 calls *)
-
let par = Parallel.create () in
-
let chunk_size = num_hashes / threads in
-
-
let start_par = Unix.gettimeofday () in
-
-
(* Process in parallel chunks *)
-
let rec process_chunks remaining acc =
-
match remaining with
-
| [] -> acc
-
| chunk :: [] -> (List.map hash_bytes chunk) :: acc
-
| chunk1 :: chunk2 :: rest ->
-
let r1, r2 = Parallel.fork_join2 par
-
(fun _ -> List.map hash_bytes chunk1)
-
(fun _ -> List.map hash_bytes chunk2)
-
in
-
process_chunks rest (r2 :: r1 :: acc)
-
in
-
-
(* Split inputs into chunks *)
-
let rec split_into_chunks lst n acc =
-
if n <= 0 || lst = [] then List.rev acc
-
else
-
let rec take k lst acc =
-
if k = 0 || lst = [] then (List.rev acc, lst)
-
else match lst with
-
| h::t -> take (k-1) t (h::acc)
-
| [] -> (List.rev acc, [])
-
in
-
let (chunk, rest) = take chunk_size lst [] in
-
split_into_chunks rest (n-1) (chunk :: acc)
-
in
-
-
let chunks = split_into_chunks inputs threads [] in
-
let _ = process_chunks chunks [] in
-
-
let time_par = Unix.gettimeofday () -. start_par in
-
let hashes_per_sec_par = float_of_int num_hashes /. time_par in
-
let speedup = time_seq /. time_par in
-
-
Printf.printf "%7d | %6d | %8.3f | %10.0f | %7.2fx\n"
-
threads num_hashes time_par hashes_per_sec_par speedup
-
) thread_counts
-
-
let bench_zero_allocation () =
-
print_endline "\nZero-allocation verification:";
-
-
(* Create aligned buffer *)
-
let size = 1024 in
-
let buffer = Bigarray.Array1.create Bigarray.int8_unsigned Bigarray.c_layout size in
-
for i = 0 to size - 1 do
-
Bigarray.Array1.set buffer i (65 + (i mod 26))
-
done;
-
-
(* Measure allocations for direct oneshot call *)
-
Gc.full_major ();
-
let before = Gc.allocated_bytes () in
-
-
for _ = 1 to 1000 do
-
ignore (oneshot buffer (Int64.of_int size))
-
done;
-
-
let after = Gc.allocated_bytes () in
-
let allocs_per_hash = (after -. before) /. 1000.0 in
-
-
Printf.printf " Direct oneshot (bigarray): %.1f bytes/hash\n" allocs_per_hash;
-
-
(* Compare with string version *)
-
let str = String.make size 'x' in
-
Gc.full_major ();
-
let before_str = Gc.allocated_bytes () in
-
-
for _ = 1 to 1000 do
-
ignore (hash_string str)
-
done;
-
-
let after_str = Gc.allocated_bytes () in
-
let allocs_per_hash_str = (after_str -. before_str) /. 1000.0 in
-
-
Printf.printf " String wrapper: %.1f bytes/hash\n" allocs_per_hash_str;
-
-
if allocs_per_hash < 100.0 then
-
print_endline " ✓ Near-zero allocation achieved!"
-
else
-
print_endline " ⚠ Higher than expected allocations"
-
-
let () =
-
print_endline "SHA256 Performance Benchmark Suite";
-
print_endline "===================================\n";
-
-
(* Check CPU support *)
-
print_endline "System Information:";
-
Printf.printf " OCaml version: %s\n" Sys.ocaml_version;
-
Printf.printf " Word size: %d bits\n" Sys.word_size;
-
Printf.printf " OS: %s\n\n" Sys.os_type;
-
-
bench_sizes ();
-
bench_parallel_scaling ();
-
bench_zero_allocation ()
-4
bench/dune
···
-
(executable
-
(name bench_sha256)
-
(libraries sha256 unix)
-
(modes native))
+3
bin/dune
···
+
(executable
+
(name osha256sum)
+
(libraries oxsha unix))
+53
bin/osha256sum.ml
···
+
let hex_of_bytes bytes =
+
let buf = Buffer.create (Bytes.length bytes * 2) in
+
Bytes.iter
+
(fun c -> Buffer.add_string buf (Printf.sprintf "%02x" (Char.code c)))
+
bytes;
+
Buffer.contents buf
+
+
let hash_file filename =
+
try
+
let fd = Unix.openfile filename [ Unix.O_RDONLY ] 0 in
+
let stats = Unix.fstat fd in
+
let file_size = stats.Unix.st_size in
+
+
if file_size = 0 then (
+
(* Handle empty files *)
+
Unix.close fd;
+
let digest = Oxsha.hash_string "" in
+
Ok (hex_of_bytes digest)
+
) else (
+
let mapped =
+
Unix.map_file fd Bigarray.char Bigarray.c_layout false [| file_size |]
+
in
+
let ba = Bigarray.array1_of_genarray mapped in
+
Unix.close fd;
+
+
let digest = Oxsha.hash ba in
+
Ok (hex_of_bytes digest)
+
)
+
with e -> Error e
+
+
let () =
+
if Array.length Sys.argv < 2 then (
+
Printf.eprintf "Usage: %s FILE [FILE...]\n" Sys.argv.(0);
+
Printf.eprintf "Print SHA256 (256-bit) checksums.\n";
+
exit 1
+
);
+
+
let exit_code = ref 0 in
+
+
for i = 1 to Array.length Sys.argv - 1 do
+
let filename = Sys.argv.(i) in
+
match hash_file filename with
+
| Ok hash -> Printf.printf "%s %s\n" hash filename
+
| Error (Sys_error msg) ->
+
Printf.eprintf "%s: %s\n" Sys.argv.(0) msg;
+
exit_code := 1
+
| Error e ->
+
Printf.eprintf "%s: %s: %s\n" Sys.argv.(0) filename
+
(Printexc.to_string e);
+
exit_code := 1
+
done;
+
+
exit !exit_code
+12 -9
dune-project
···
-
(lang dune 3.0)
+
(lang dune 3.20)
(name oxsha)
-
(version 0.1.0)
+
(version 0.1)
+
+
(generate_opam_files true)
+
+
(source (github avsm/oxsha))
+
(license ISC)
+
(authors "Anil Madhavapeddy")
+
(maintainers "Anil Madhavapeddy")
(package
(name oxsha)
-
(synopsis "Blazingly fast SHA256 using AMD SHA-NI instructions")
-
(description "Hardware-accelerated SHA256 implementation for OxCaml using AMD SHA-NI instructions with zero-allocation design")
-
(depends
-
ocaml
-
(dune (>= 3.0))
-
bigarray
-
parallel))
+
(synopsis "Fast SHA256 hashing library")
+
(description "OCaml bindings to a C SHA256 implementation using bigarrays for efficient, zero-copy hashing")
+
(depends (ocaml (>= 5.3))))
+20
lib/discover/discover.ml
···
+
(** Dune configurator to detect architecture and set C compiler flags for SHA256 *)
+
+
module C = Configurator.V1
+
+
let get_arch_flags c =
+
let arch = C.ocaml_config_var_exn c "architecture" in
+
let base_flags = ["-O3"] in
+
let arch_flags =
+
match arch with
+
| "arm64" | "aarch64" ->
+
["-march=armv8-a+crypto"]
+
| "amd64" | "i386" ->
+
["-msse4.1"; "-msha"]
+
| _ -> []
+
in
+
base_flags @ arch_flags
+
+
let () =
+
C.main ~name:"oxsha_discover"
+
(fun c -> C.Flags.write_sexp "c_flags.sexp" (get_arch_flags c))
+3
lib/discover/dune
···
+
(executable
+
(name discover)
+
(libraries dune-configurator))
+13 -5
lib/dune
···
+
(rule
+
(targets c_flags.sexp)
+
(deps discover/discover.exe)
+
(action
+
(run %{deps})))
+
(library
-
(name sha256)
+
(name oxsha)
(public_name oxsha)
-
(libraries bigarray parallel)
+
(modules oxsha)
(foreign_stubs
(language c)
-
(names sha256_stubs)
-
(flags :standard -msha -msse4.1 -O3 -march=native))
-
(modes native))
+
(names oxsha_stubs sha256)
+
(flags
+
(:standard
+
(:include c_flags.sexp))))
+
(c_library_flags :standard))
+56
lib/oxsha.ml
···
+
(** Fast SHA256 hashing library with zero-copy C bindings. *)
+
+
module Raw = struct
+
(** The SHA256 context type wrapping the C SHA256_CTX structure. *)
+
type t
+
+
(** External C functions *)
+
external create : unit -> t = "oxsha_create"
+
+
external update :
+
t ->
+
(char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t ->
+
unit
+
= "oxsha_update"
+
+
external final : t -> bytes = "oxsha_final"
+
+
(** Convenience function: update with bytes *)
+
let update_bytes ctx data =
+
let len = Bytes.length data in
+
let ba = Bigarray.Array1.create Bigarray.char Bigarray.c_layout len in
+
for i = 0 to len - 1 do
+
Bigarray.Array1.unsafe_set ba i (Bytes.unsafe_get data i)
+
done;
+
update ctx ba
+
+
(** Convenience function: update with string *)
+
let update_string ctx data =
+
let len = String.length data in
+
let ba = Bigarray.Array1.create Bigarray.char Bigarray.c_layout len in
+
for i = 0 to len - 1 do
+
Bigarray.Array1.unsafe_set ba i (String.unsafe_get data i)
+
done;
+
update ctx ba
+
+
(** One-shot hash function for bigarrays *)
+
let hash data =
+
let ctx = create () in
+
update ctx data;
+
final ctx
+
+
(** One-shot hash function for bytes *)
+
let hash_bytes data =
+
let ctx = create () in
+
update_bytes ctx data;
+
final ctx
+
+
(** One-shot hash function for strings *)
+
let hash_string data =
+
let ctx = create () in
+
update_string ctx data;
+
final ctx
+
end
+
+
(** Re-export Raw module contents at top level *)
+
include Raw
+89
lib/oxsha.mli
···
+
(** Fast SHA256 hashing library with zero-copy C bindings.
+
+
This library provides OCaml bindings to a C SHA256 implementation
+
using bigarrays for efficient, zero-copy hashing. *)
+
+
(** {1 Raw C Bindings} *)
+
+
module Raw : sig
+
(** Low-level bindings to the C SHA256 implementation.
+
+
This module provides direct access to the C functions with minimal
+
overhead. All operations work with bigarrays for zero-copy performance. *)
+
+
(** The SHA256 context type. This is an abstract type wrapping the C
+
SHA256_CTX structure. *)
+
type t
+
+
(** [create ()] allocates and initializes a new SHA256 context.
+
+
@return A fresh context ready for hashing. *)
+
val create : unit -> t
+
+
(** [update ctx data] updates the hash state with new data.
+
+
This function processes the input data incrementally. It can be called
+
multiple times to hash data in chunks.
+
+
@param ctx The SHA256 context to update
+
@param data A bigarray containing the data to hash. Uses bigarrays for
+
zero-copy access from the C side. *)
+
val update :
+
t ->
+
(char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t ->
+
unit
+
+
(** [update_bytes ctx data] updates the hash state with bytes data.
+
+
This is a convenience function that wraps bytes in a bigarray view.
+
+
@param ctx The SHA256 context to update
+
@param data Bytes to hash *)
+
val update_bytes : t -> bytes -> unit
+
+
(** [update_string ctx data] updates the hash state with string data.
+
+
This is a convenience function for hashing strings.
+
+
@param ctx The SHA256 context to update
+
@param data String to hash *)
+
val update_string : t -> string -> unit
+
+
(** [final ctx] finalizes the hash computation and returns the digest.
+
+
After calling this function, the context should not be used again.
+
+
@param ctx The SHA256 context to finalize
+
@return A 32-byte digest as a bytes value *)
+
val final : t -> bytes
+
+
(** [hash data] is a convenience function that performs a complete hash
+
in one operation: create, update, and final.
+
+
@param data The bigarray data to hash
+
@return A 32-byte digest *)
+
val hash :
+
(char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t ->
+
bytes
+
+
(** [hash_bytes data] hashes bytes data in one operation.
+
+
@param data The bytes to hash
+
@return A 32-byte digest *)
+
val hash_bytes : bytes -> bytes
+
+
(** [hash_string data] hashes string data in one operation.
+
+
@param data The string to hash
+
@return A 32-byte digest *)
+
val hash_string : string -> bytes
+
end
+
+
(** {1 High-Level Interface} *)
+
+
(** Re-export the Raw module as the main interface.
+
+
The Raw module provides the most efficient interface using bigarrays.
+
Higher-level abstractions can be added in the future if needed. *)
+
+
include module type of Raw
+92
lib/oxsha_stubs.c
···
+
/*
+
* OCaml bindings for SHA256 C implementation.
+
* Uses custom blocks and bigarrays for zero-copy performance.
+
*/
+
+
#include <string.h>
+
#include <caml/mlvalues.h>
+
#include <caml/memory.h>
+
#include <caml/alloc.h>
+
#include <caml/custom.h>
+
#include <caml/fail.h>
+
#include <caml/bigarray.h>
+
+
#include "sha256.h"
+
+
/* Custom block operations for SHA256_CTX */
+
+
static void oxsha_ctx_finalize(value v_ctx)
+
{
+
SHA256_CTX *ctx = (SHA256_CTX *)Data_custom_val(v_ctx);
+
/* Just clear the memory for security */
+
memset(ctx, 0, sizeof(SHA256_CTX));
+
}
+
+
static struct custom_operations oxsha_ctx_ops = {
+
"com.oxsha.sha256_ctx",
+
oxsha_ctx_finalize,
+
custom_compare_default,
+
custom_hash_default,
+
custom_serialize_default,
+
custom_deserialize_default,
+
custom_compare_ext_default,
+
custom_fixed_length_default
+
};
+
+
/* Allocate and wrap a SHA256_CTX in an OCaml custom block */
+
static value alloc_oxsha_ctx(void)
+
{
+
value v_ctx = caml_alloc_custom(&oxsha_ctx_ops, sizeof(SHA256_CTX), 0, 1);
+
return v_ctx;
+
}
+
+
/* Extract SHA256_CTX pointer from OCaml value */
+
#define Oxsha_ctx_val(v) ((SHA256_CTX *)Data_custom_val(v))
+
+
/* FFI Functions */
+
+
/* oxsha_create : unit -> t */
+
CAMLprim value oxsha_create(value unit)
+
{
+
CAMLparam1(unit);
+
CAMLlocal1(v_ctx);
+
+
v_ctx = alloc_oxsha_ctx();
+
SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx);
+
sha256_init(ctx);
+
+
CAMLreturn(v_ctx);
+
}
+
+
/* oxsha_update : t -> bigarray -> unit */
+
CAMLprim value oxsha_update(value v_ctx, value v_data)
+
{
+
CAMLparam2(v_ctx, v_data);
+
+
SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx);
+
+
/* Extract bigarray data pointer and length */
+
unsigned char *data = (unsigned char *)Caml_ba_data_val(v_data);
+
size_t len = Caml_ba_array_val(v_data)->dim[0];
+
+
sha256_update(ctx, data, len);
+
+
CAMLreturn(Val_unit);
+
}
+
+
/* oxsha_final : t -> bytes */
+
CAMLprim value oxsha_final(value v_ctx)
+
{
+
CAMLparam1(v_ctx);
+
CAMLlocal1(v_digest);
+
+
SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx);
+
+
/* Allocate bytes for the 32-byte digest */
+
v_digest = caml_alloc_string(SHA256_BLOCK_SIZE);
+
unsigned char *digest = (unsigned char *)String_val(v_digest);
+
+
sha256_final(ctx, digest);
+
+
CAMLreturn(v_digest);
+
}
+638
lib/sha256.c
···
+
/*********************************************************************
+
* Filename: sha256.c
+
* Author: Brad Conte (brad AT bradconte.com)
+
* Copyright:
+
* Disclaimer: This code is presented "as is" without any guarantees.
+
* Details: Implementation of the SHA-256 hashing algorithm.
+
SHA-256 is one of the three algorithms in the SHA2
+
specification. The others, SHA-384 and SHA-512, are not
+
offered in this implementation.
+
Algorithm specification can be found here:
+
* http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf
+
This implementation uses little endian byte order.
+
*********************************************************************/
+
+
/*************************** HEADER FILES ***************************/
+
#include <stdlib.h>
+
#include <stdio.h>
+
#include <memory.h>
+
#include "sha256.h"
+
+
static const uint32_t K[] =
+
{
+
0x428A2F98, 0x71374491, 0xB5C0FBCF, 0xE9B5DBA5,
+
0x3956C25B, 0x59F111F1, 0x923F82A4, 0xAB1C5ED5,
+
0xD807AA98, 0x12835B01, 0x243185BE, 0x550C7DC3,
+
0x72BE5D74, 0x80DEB1FE, 0x9BDC06A7, 0xC19BF174,
+
0xE49B69C1, 0xEFBE4786, 0x0FC19DC6, 0x240CA1CC,
+
0x2DE92C6F, 0x4A7484AA, 0x5CB0A9DC, 0x76F988DA,
+
0x983E5152, 0xA831C66D, 0xB00327C8, 0xBF597FC7,
+
0xC6E00BF3, 0xD5A79147, 0x06CA6351, 0x14292967,
+
0x27B70A85, 0x2E1B2138, 0x4D2C6DFC, 0x53380D13,
+
0x650A7354, 0x766A0ABB, 0x81C2C92E, 0x92722C85,
+
0xA2BFE8A1, 0xA81A664B, 0xC24B8B70, 0xC76C51A3,
+
0xD192E819, 0xD6990624, 0xF40E3585, 0x106AA070,
+
0x19A4C116, 0x1E376C08, 0x2748774C, 0x34B0BCB5,
+
0x391C0CB3, 0x4ED8AA4A, 0x5B9CCA4F, 0x682E6FF3,
+
0x748F82EE, 0x78A5636F, 0x84C87814, 0x8CC70208,
+
0x90BEFFFA, 0xA4506CEB, 0xBEF9A3F7, 0xC67178F2
+
};
+
+
#if defined(__arm__) || defined(__aarch32__) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM)
+
// ============== ARM64 begin =======================
+
// All the ARM servers supports SHA256 instructions
+
# if defined(__GNUC__)
+
# include <stdint.h>
+
# endif
+
# if defined(__ARM_NEON) || defined(_MSC_VER) || defined(__GNUC__)
+
# include <arm_neon.h>
+
# endif
+
/* GCC and LLVM Clang, but not Apple Clang */
+
# if defined(__GNUC__) && !defined(__apple_build_version__)
+
# if defined(__ARM_ACLE) || defined(__ARM_FEATURE_CRYPTO)
+
# include <arm_acle.h>
+
# endif
+
# endif
+
void sha256_process(uint32_t state[8], const uint8_t data[], uint32_t length)
+
{
+
uint32x4_t STATE0, STATE1, ABEF_SAVE, CDGH_SAVE;
+
uint32x4_t MSG0, MSG1, MSG2, MSG3;
+
uint32x4_t TMP0, TMP1, TMP2;
+
+
/* Load state */
+
STATE0 = vld1q_u32(&state[0]);
+
STATE1 = vld1q_u32(&state[4]);
+
+
while (length >= 64)
+
{
+
/* Save state */
+
ABEF_SAVE = STATE0;
+
CDGH_SAVE = STATE1;
+
+
/* Load message */
+
MSG0 = vld1q_u32((const uint32_t *)(data + 0));
+
MSG1 = vld1q_u32((const uint32_t *)(data + 16));
+
MSG2 = vld1q_u32((const uint32_t *)(data + 32));
+
MSG3 = vld1q_u32((const uint32_t *)(data + 48));
+
+
/* Reverse for little endian */
+
MSG0 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG0)));
+
MSG1 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG1)));
+
MSG2 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG2)));
+
MSG3 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG3)));
+
+
TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x00]));
+
+
/* Rounds 0-3 */
+
MSG0 = vsha256su0q_u32(MSG0, MSG1);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x04]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3);
+
+
/* Rounds 4-7 */
+
MSG1 = vsha256su0q_u32(MSG1, MSG2);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x08]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0);
+
+
/* Rounds 8-11 */
+
MSG2 = vsha256su0q_u32(MSG2, MSG3);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x0c]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1);
+
+
/* Rounds 12-15 */
+
MSG3 = vsha256su0q_u32(MSG3, MSG0);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x10]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2);
+
+
/* Rounds 16-19 */
+
MSG0 = vsha256su0q_u32(MSG0, MSG1);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x14]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3);
+
+
/* Rounds 20-23 */
+
MSG1 = vsha256su0q_u32(MSG1, MSG2);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x18]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0);
+
+
/* Rounds 24-27 */
+
MSG2 = vsha256su0q_u32(MSG2, MSG3);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x1c]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1);
+
+
/* Rounds 28-31 */
+
MSG3 = vsha256su0q_u32(MSG3, MSG0);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x20]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2);
+
+
/* Rounds 32-35 */
+
MSG0 = vsha256su0q_u32(MSG0, MSG1);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x24]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3);
+
+
/* Rounds 36-39 */
+
MSG1 = vsha256su0q_u32(MSG1, MSG2);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x28]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0);
+
+
/* Rounds 40-43 */
+
MSG2 = vsha256su0q_u32(MSG2, MSG3);
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x2c]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1);
+
+
/* Rounds 44-47 */
+
MSG3 = vsha256su0q_u32(MSG3, MSG0);
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x30]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2);
+
+
/* Rounds 48-51 */
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x34]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
+
/* Rounds 52-55 */
+
TMP2 = STATE0;
+
TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x38]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
+
/* Rounds 56-59 */
+
TMP2 = STATE0;
+
TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x3c]));
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0);
+
+
/* Rounds 60-63 */
+
TMP2 = STATE0;
+
STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1);
+
STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1);
+
+
/* Combine state */
+
STATE0 = vaddq_u32(STATE0, ABEF_SAVE);
+
STATE1 = vaddq_u32(STATE1, CDGH_SAVE);
+
+
data += 64;
+
length -= 64;
+
}
+
+
/* Save state */
+
vst1q_u32(&state[0], STATE0);
+
vst1q_u32(&state[4], STATE1);
+
}
+
+
// ============== ARM64 end =======================
+
#else
+
// ============== x86-64 begin =======================
+
/* Include the GCC super header */
+
#if defined(__GNUC__)
+
# include <stdint.h>
+
# include <x86intrin.h>
+
#endif
+
+
/* Microsoft supports Intel SHA ACLE extensions as of Visual Studio 2015 */
+
#if defined(_MSC_VER)
+
# include <immintrin.h>
+
# define WIN32_LEAN_AND_MEAN
+
# include <Windows.h>
+
#endif
+
#define ROTATE(x,y) (((x)>>(y)) | ((x)<<(32-(y))))
+
#define Sigma0(x) (ROTATE((x), 2) ^ ROTATE((x),13) ^ ROTATE((x),22))
+
#define Sigma1(x) (ROTATE((x), 6) ^ ROTATE((x),11) ^ ROTATE((x),25))
+
#define sigma0(x) (ROTATE((x), 7) ^ ROTATE((x),18) ^ ((x)>> 3))
+
#define sigma1(x) (ROTATE((x),17) ^ ROTATE((x),19) ^ ((x)>>10))
+
+
#define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z)))
+
#define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))
+
+
/* Avoid undefined behavior */
+
/* https://stackoverflow.com/q/29538935/608639 */
+
uint32_t B2U32(uint8_t val, uint8_t sh)
+
{
+
return ((uint32_t)val) << sh;
+
}
+
+
void sha256_process_c(uint32_t state[8], const uint8_t data[], size_t length)
+
{
+
uint32_t a, b, c, d, e, f, g, h, s0, s1, T1, T2;
+
uint32_t X[16], i;
+
+
size_t blocks = length / 64;
+
while (blocks--)
+
{
+
a = state[0];
+
b = state[1];
+
c = state[2];
+
d = state[3];
+
e = state[4];
+
f = state[5];
+
g = state[6];
+
h = state[7];
+
+
for (i = 0; i < 16; i++)
+
{
+
X[i] = B2U32(data[0], 24) | B2U32(data[1], 16) | B2U32(data[2], 8) | B2U32(data[3], 0);
+
data += 4;
+
+
T1 = h;
+
T1 += Sigma1(e);
+
T1 += Ch(e, f, g);
+
T1 += K[i];
+
T1 += X[i];
+
+
T2 = Sigma0(a);
+
T2 += Maj(a, b, c);
+
+
h = g;
+
g = f;
+
f = e;
+
e = d + T1;
+
d = c;
+
c = b;
+
b = a;
+
a = T1 + T2;
+
}
+
+
for (; i < 64; i++)
+
{
+
s0 = X[(i + 1) & 0x0f];
+
s0 = sigma0(s0);
+
s1 = X[(i + 14) & 0x0f];
+
s1 = sigma1(s1);
+
+
T1 = X[i & 0xf] += s0 + s1 + X[(i + 9) & 0xf];
+
T1 += h + Sigma1(e) + Ch(e, f, g) + K[i];
+
T2 = Sigma0(a) + Maj(a, b, c);
+
h = g;
+
g = f;
+
f = e;
+
e = d + T1;
+
d = c;
+
c = b;
+
b = a;
+
a = T1 + T2;
+
}
+
+
state[0] += a;
+
state[1] += b;
+
state[2] += c;
+
state[3] += d;
+
state[4] += e;
+
state[5] += f;
+
state[6] += g;
+
state[7] += h;
+
}
+
}
+
+
/* Process multiple blocks. The caller is responsible for setting the initial */
+
/* state, and the caller is responsible for padding the final block. */
+
void sha256_process_asm(uint32_t state[8], const uint8_t data[], size_t length)
+
{
+
__m128i STATE0, STATE1;
+
__m128i MSG, TMP;
+
__m128i MSG0, MSG1, MSG2, MSG3;
+
__m128i ABEF_SAVE, CDGH_SAVE;
+
const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);
+
+
/* Load initial values */
+
TMP = _mm_loadu_si128((const __m128i*) &state[0]);
+
STATE1 = _mm_loadu_si128((const __m128i*) &state[4]);
+
+
+
TMP = _mm_shuffle_epi32(TMP, 0xB1); /* CDAB */
+
STATE1 = _mm_shuffle_epi32(STATE1, 0x1B); /* EFGH */
+
STATE0 = _mm_alignr_epi8(TMP, STATE1, 8); /* ABEF */
+
STATE1 = _mm_blend_epi16(STATE1, TMP, 0xF0); /* CDGH */
+
+
while (length >= 64)
+
{
+
/* Save current state */
+
ABEF_SAVE = STATE0;
+
CDGH_SAVE = STATE1;
+
+
/* Rounds 0-3 */
+
MSG = _mm_loadu_si128((const __m128i*) (data+0));
+
MSG0 = _mm_shuffle_epi8(MSG, MASK);
+
MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0xE9B5DBA5B5C0FBCFULL, 0x71374491428A2F98ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
+
/* Rounds 4-7 */
+
MSG1 = _mm_loadu_si128((const __m128i*) (data+16));
+
MSG1 = _mm_shuffle_epi8(MSG1, MASK);
+
MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0xAB1C5ED5923F82A4ULL, 0x59F111F13956C25BULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1);
+
+
/* Rounds 8-11 */
+
MSG2 = _mm_loadu_si128((const __m128i*) (data+32));
+
MSG2 = _mm_shuffle_epi8(MSG2, MASK);
+
MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0x550C7DC3243185BEULL, 0x12835B01D807AA98ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2);
+
+
/* Rounds 12-15 */
+
MSG3 = _mm_loadu_si128((const __m128i*) (data+48));
+
MSG3 = _mm_shuffle_epi8(MSG3, MASK);
+
MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0xC19BF1749BDC06A7ULL, 0x80DEB1FE72BE5D74ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG3, MSG2, 4);
+
MSG0 = _mm_add_epi32(MSG0, TMP);
+
MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3);
+
+
/* Rounds 16-19 */
+
MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x240CA1CC0FC19DC6ULL, 0xEFBE4786E49B69C1ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG0, MSG3, 4);
+
MSG1 = _mm_add_epi32(MSG1, TMP);
+
MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0);
+
+
/* Rounds 20-23 */
+
MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x76F988DA5CB0A9DCULL, 0x4A7484AA2DE92C6FULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG1, MSG0, 4);
+
MSG2 = _mm_add_epi32(MSG2, TMP);
+
MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1);
+
+
/* Rounds 24-27 */
+
MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0xBF597FC7B00327C8ULL, 0xA831C66D983E5152ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG2, MSG1, 4);
+
MSG3 = _mm_add_epi32(MSG3, TMP);
+
MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2);
+
+
/* Rounds 28-31 */
+
MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0x1429296706CA6351ULL, 0xD5A79147C6E00BF3ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG3, MSG2, 4);
+
MSG0 = _mm_add_epi32(MSG0, TMP);
+
MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3);
+
+
/* Rounds 32-35 */
+
MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x53380D134D2C6DFCULL, 0x2E1B213827B70A85ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG0, MSG3, 4);
+
MSG1 = _mm_add_epi32(MSG1, TMP);
+
MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0);
+
+
/* Rounds 36-39 */
+
MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x92722C8581C2C92EULL, 0x766A0ABB650A7354ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG1, MSG0, 4);
+
MSG2 = _mm_add_epi32(MSG2, TMP);
+
MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1);
+
+
/* Rounds 40-43 */
+
MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0xC76C51A3C24B8B70ULL, 0xA81A664BA2BFE8A1ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG2, MSG1, 4);
+
MSG3 = _mm_add_epi32(MSG3, TMP);
+
MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2);
+
+
/* Rounds 44-47 */
+
MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0x106AA070F40E3585ULL, 0xD6990624D192E819ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG3, MSG2, 4);
+
MSG0 = _mm_add_epi32(MSG0, TMP);
+
MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3);
+
+
/* Rounds 48-51 */
+
MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x34B0BCB52748774CULL, 0x1E376C0819A4C116ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG0, MSG3, 4);
+
MSG1 = _mm_add_epi32(MSG1, TMP);
+
MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0);
+
+
/* Rounds 52-55 */
+
MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x682E6FF35B9CCA4FULL, 0x4ED8AA4A391C0CB3ULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG1, MSG0, 4);
+
MSG2 = _mm_add_epi32(MSG2, TMP);
+
MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
+
/* Rounds 56-59 */
+
MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0x8CC7020884C87814ULL, 0x78A5636F748F82EEULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
TMP = _mm_alignr_epi8(MSG2, MSG1, 4);
+
MSG3 = _mm_add_epi32(MSG3, TMP);
+
MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
+
/* Rounds 60-63 */
+
MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0xC67178F2BEF9A3F7ULL, 0xA4506CEB90BEFFFAULL));
+
STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG);
+
MSG = _mm_shuffle_epi32(MSG, 0x0E);
+
STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG);
+
+
/* Combine state */
+
STATE0 = _mm_add_epi32(STATE0, ABEF_SAVE);
+
STATE1 = _mm_add_epi32(STATE1, CDGH_SAVE);
+
+
data += 64;
+
length -= 64;
+
}
+
+
TMP = _mm_shuffle_epi32(STATE0, 0x1B); /* FEBA */
+
STATE1 = _mm_shuffle_epi32(STATE1, 0xB1); /* DCHG */
+
STATE0 = _mm_blend_epi16(TMP, STATE1, 0xF0); /* DCBA */
+
STATE1 = _mm_alignr_epi8(STATE1, TMP, 8); /* ABEF */
+
+
/* Save state */
+
_mm_storeu_si128((__m128i*) &state[0], STATE0);
+
_mm_storeu_si128((__m128i*) &state[4], STATE1);
+
}
+
+
#if defined(__clang__) || defined(__GNUC__) || defined(__INTEL_COMPILER)
+
+
#include <cpuid.h>
+
int supports_sha_ni(void)
+
{
+
unsigned int CPUInfo[4];
+
__cpuid(0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
+
if (CPUInfo[0] < 7)
+
return 0;
+
+
__cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
+
return CPUInfo[1] & (1 << 29); /* SHA */
+
}
+
+
#else /* defined(__clang__) || defined(__GNUC__) */
+
+
int supports_sha_ni(void)
+
{
+
unsigned int CPUInfo[4];
+
__cpuid(CPUInfo, 0);
+
if (CPUInfo[0] < 7)
+
return 0;
+
+
__cpuidex(CPUInfo, 7, 0);
+
return CPUInfo[1] & (1 << 29); /* Check SHA */
+
}
+
+
#endif /* defined(__clang__) || defined(__GNUC__) */
+
+
void sha256_process(uint32_t state[8], const uint8_t data[], size_t length) {
+
static int has_sha_ni = -1;
+
if(has_sha_ni == -1 ) {
+
has_sha_ni = supports_sha_ni();
+
}
+
+
if(has_sha_ni) {
+
sha256_process_asm(state, data, length);
+
//printf("In sha256_process_asm length %zu\n", length);
+
} else {
+
sha256_process_c(state, data, length);
+
//printf("In sha256_process_c length %zu\n", length);
+
}
+
}
+
// ============== x86-64 end =======================
+
#endif
+
+
void sha256_init(SHA256_CTX *ctx)
+
{
+
ctx->datalen = 0;
+
ctx->bitlen = 0;
+
ctx->state[0] = 0x6a09e667;
+
ctx->state[1] = 0xbb67ae85;
+
ctx->state[2] = 0x3c6ef372;
+
ctx->state[3] = 0xa54ff53a;
+
ctx->state[4] = 0x510e527f;
+
ctx->state[5] = 0x9b05688c;
+
ctx->state[6] = 0x1f83d9ab;
+
ctx->state[7] = 0x5be0cd19;
+
}
+
+
void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len)
+
{
+
WORD i;
+
+
size_t rounded = 64*(len/64);
+
if(rounded != 0) {
+
sha256_process(ctx->state, data, rounded);
+
}
+
+
ctx->bitlen = rounded*8;
+
ctx->datalen = 0;
+
for (i = rounded; i < len; ++i) {
+
ctx->data[ctx->datalen] = data[i];
+
ctx->datalen++;
+
}
+
}
+
+
void sha256_final(SHA256_CTX *ctx, BYTE hash[])
+
{
+
WORD i;
+
+
i = ctx->datalen;
+
+
// Pad whatever data is left in the buffer.
+
if (ctx->datalen < 56) {
+
ctx->data[i++] = 0x80;
+
while (i < 56)
+
ctx->data[i++] = 0x00;
+
}
+
else {
+
ctx->data[i++] = 0x80;
+
while (i < 64)
+
ctx->data[i++] = 0x00;
+
sha256_process(ctx->state, ctx->data, 64);
+
memset(ctx->data, 0, 56);
+
}
+
+
// Append to the padding the total message's length in bits and transform.
+
ctx->bitlen += ctx->datalen * 8;
+
ctx->data[63] = ctx->bitlen;
+
ctx->data[62] = ctx->bitlen >> 8;
+
ctx->data[61] = ctx->bitlen >> 16;
+
ctx->data[60] = ctx->bitlen >> 24;
+
ctx->data[59] = ctx->bitlen >> 32;
+
ctx->data[58] = ctx->bitlen >> 40;
+
ctx->data[57] = ctx->bitlen >> 48;
+
ctx->data[56] = ctx->bitlen >> 56;
+
sha256_process(ctx->state, ctx->data, 64);
+
+
// Since this implementation uses little endian byte ordering and SHA uses big endian,
+
// reverse all the bytes when copying the final state to the output hash.
+
for (i = 0; i < 4; ++i) {
+
hash[i] = (ctx->state[0] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 4] = (ctx->state[1] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 8] = (ctx->state[2] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000ff;
+
hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000ff;
+
}
+
}
+35
lib/sha256.h
···
+
/*********************************************************************
+
* Filename: sha256.h
+
* Author: Brad Conte (brad AT bradconte.com)
+
* Copyright:
+
* Disclaimer: This code is presented "as is" without any guarantees.
+
* Details: Defines the API for the corresponding SHA1 implementation.
+
*********************************************************************/
+
+
#ifndef SHA256_H
+
#define SHA256_H
+
+
/*************************** HEADER FILES ***************************/
+
#include <stddef.h>
+
#include <stdint.h>
+
+
/****************************** MACROS ******************************/
+
#define SHA256_BLOCK_SIZE 32 // SHA256 outputs a 32 byte digest
+
+
/**************************** DATA TYPES ****************************/
+
typedef uint8_t BYTE; // 8-bit byte
+
typedef uint32_t WORD; // 32-bit word, change to "long" for 16-bit machines
+
+
typedef struct {
+
BYTE data[64];
+
WORD datalen;
+
unsigned long long bitlen;
+
WORD state[8];
+
} SHA256_CTX;
+
+
/*********************** FUNCTION DECLARATIONS **********************/
+
void sha256_init(SHA256_CTX *ctx);
+
void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len);
+
void sha256_final(SHA256_CTX *ctx, BYTE hash[]);
+
+
#endif // SHA256_H
-96
lib/sha256.ml
···
-
open Bigarray
-
-
type state = (int32, int32_elt, c_layout) Array1.t
-
type digest = (int, int8_unsigned_elt, c_layout) Array1.t
-
type buffer = (int, int8_unsigned_elt, c_layout) Array1.t
-
-
(* External C functions *)
-
external init : unit -> state = "oxcaml_sha256_init"
-
external process_block : state -> buffer -> unit = "oxcaml_sha256_process_block" [@@noalloc]
-
external finalize : state -> buffer -> int64 -> digest = "oxcaml_sha256_finalize"
-
external oneshot : buffer -> int64 -> digest = "oxcaml_sha256_oneshot"
-
-
(* High-level interface *)
-
-
let hash_bytes bytes =
-
let len = Bytes.length bytes in
-
let buffer = Array1.create int8_unsigned c_layout len in
-
for i = 0 to len - 1 do
-
Array1.set buffer i (Char.code (Bytes.get bytes i))
-
done;
-
oneshot buffer (Int64.of_int len)
-
-
let hash_string str =
-
let len = String.length str in
-
let buffer = Array1.create int8_unsigned c_layout len in
-
for i = 0 to len - 1 do
-
Array1.set buffer i (Char.code str.[i])
-
done;
-
oneshot buffer (Int64.of_int len)
-
-
(* Utilities *)
-
-
let digest_to_hex digest =
-
let hex_of_byte b =
-
Printf.sprintf "%02x" b
-
in
-
let buf = Buffer.create 64 in
-
for i = 0 to 31 do
-
Buffer.add_string buf (hex_of_byte (Array1.get digest i))
-
done;
-
Buffer.contents buf
-
-
let digest_to_bytes digest =
-
let bytes = Bytes.create 32 in
-
for i = 0 to 31 do
-
Bytes.set bytes i (Char.chr (Array1.get digest i))
-
done;
-
bytes
-
-
let digest_equal d1 d2 =
-
let rec compare i =
-
if i >= 32 then true
-
else if Array1.get d1 i <> Array1.get d2 i then false
-
else compare (i + 1)
-
in
-
compare 0
-
-
(* Zero-allocation variants using OxCaml features *)
-
-
module Fast = struct
-
(* Stack-allocated processing for temporary computations *)
-
let[@inline] [@zero_alloc assume] process_block_local state block =
-
process_block state block
-
-
(* Process multiple blocks efficiently *)
-
let[@zero_alloc assume] process_blocks state blocks num_blocks =
-
for i = 0 to num_blocks - 1 do
-
let offset = i * 64 in
-
let block = Array1.sub blocks offset 64 in
-
process_block state block
-
done
-
-
(* Parallel hashing for multiple inputs *)
-
let parallel_hash_many par inputs =
-
match inputs with
-
| [] -> []
-
| [x] -> [hash_bytes x]
-
| _ ->
-
let process_batch batch =
-
List.map hash_bytes batch
-
in
-
let mid = List.length inputs / 2 in
-
let rec split n lst =
-
if n = 0 then ([], lst)
-
else match lst with
-
| [] -> ([], [])
-
| h::t -> let (l1, l2) = split (n-1) t in (h::l1, l2)
-
in
-
let (left, right) = split mid inputs in
-
let left_results, right_results =
-
Parallel.fork_join2 par
-
(fun _ -> process_batch left)
-
(fun _ -> process_batch right)
-
in
-
left_results @ right_results
-
end
-47
lib/sha256.mli
···
-
(** SHA256 hardware-accelerated implementation using AMD SHA-NI instructions *)
-
-
open Bigarray
-
-
(** {1 Types} *)
-
-
(** SHA256 state (8 x int32) *)
-
type state = (int32, int32_elt, c_layout) Array1.t
-
-
(** SHA256 digest (32 bytes) *)
-
type digest = (int, int8_unsigned_elt, c_layout) Array1.t
-
-
(** Input data buffer *)
-
type buffer = (int, int8_unsigned_elt, c_layout) Array1.t
-
-
(** {1 Low-level interface} *)
-
-
(** Initialize a new SHA256 state *)
-
val init : unit -> state
-
-
(** Process a single 512-bit (64 byte) block. Buffer must be exactly 64 bytes. *)
-
val process_block : state -> buffer -> unit
-
-
(** Finalize the hash computation with padding and return digest *)
-
val finalize : state -> buffer -> int64 -> digest
-
-
(** {1 High-level interface} *)
-
-
(** Compute SHA256 hash in one shot (fastest for single use) *)
-
val oneshot : buffer -> int64 -> digest
-
-
(** Compute SHA256 hash from bytes *)
-
val hash_bytes : bytes -> digest
-
-
(** Compute SHA256 hash from string *)
-
val hash_string : string -> digest
-
-
(** {1 Utilities} *)
-
-
(** Convert digest to hexadecimal string *)
-
val digest_to_hex : digest -> string
-
-
(** Convert digest to bytes *)
-
val digest_to_bytes : digest -> bytes
-
-
(** Compare two digests for equality *)
-
val digest_equal : digest -> digest -> bool
-382
lib/sha256_stubs.c
···
-
#include <immintrin.h>
-
#include <stdint.h>
-
#include <string.h>
-
#include <caml/mlvalues.h>
-
#include <caml/memory.h>
-
#include <caml/alloc.h>
-
#include <caml/bigarray.h>
-
-
// Aligned storage for round constants
-
alignas(64) static const uint32_t K256[64] = {
-
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
-
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
-
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
-
0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
-
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
-
0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
-
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
-
0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
-
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
-
0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
-
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
-
0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
-
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
-
0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
-
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
-
0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
-
};
-
-
// Initial SHA256 state values
-
alignas(16) static const uint32_t H256_INIT[8] = {
-
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
-
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
-
};
-
-
// Byte swap for endianness
-
static const __m128i BSWAP_MASK = {0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL};
-
-
// Process a single 512-bit block using SHA-NI instructions
-
static void sha256_process_block_shani(uint32_t state[8], const uint8_t block[64]) {
-
__m128i msg0, msg1, msg2, msg3;
-
__m128i tmp;
-
__m128i state0, state1;
-
__m128i msg;
-
__m128i abef_save, cdgh_save;
-
-
// Load initial state
-
tmp = _mm_loadu_si128((const __m128i*)&state[0]);
-
state1 = _mm_loadu_si128((const __m128i*)&state[4]);
-
-
// Swap byte order for initial state
-
tmp = _mm_shuffle_epi32(tmp, 0xB1); // CDAB
-
state1 = _mm_shuffle_epi32(state1, 0x1B); // EFGH
-
state0 = _mm_alignr_epi8(tmp, state1, 8); // ABEF
-
state1 = _mm_blend_epi16(state1, tmp, 0xF0); // CDGH
-
-
// Save initial state
-
abef_save = state0;
-
cdgh_save = state1;
-
-
// Load message blocks with byte swap
-
msg0 = _mm_loadu_si128((const __m128i*)(block + 0));
-
msg1 = _mm_loadu_si128((const __m128i*)(block + 16));
-
msg2 = _mm_loadu_si128((const __m128i*)(block + 32));
-
msg3 = _mm_loadu_si128((const __m128i*)(block + 48));
-
-
msg0 = _mm_shuffle_epi8(msg0, BSWAP_MASK);
-
msg1 = _mm_shuffle_epi8(msg1, BSWAP_MASK);
-
msg2 = _mm_shuffle_epi8(msg2, BSWAP_MASK);
-
msg3 = _mm_shuffle_epi8(msg3, BSWAP_MASK);
-
-
// Rounds 0-3
-
msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[0]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
-
// Rounds 4-7
-
msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[4]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg0 = _mm_sha256msg1_epu32(msg0, msg1);
-
-
// Rounds 8-11
-
msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[8]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg1 = _mm_sha256msg1_epu32(msg1, msg2);
-
-
// Rounds 12-15
-
msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[12]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg3, msg2, 4);
-
msg0 = _mm_add_epi32(msg0, tmp);
-
msg0 = _mm_sha256msg2_epu32(msg0, msg3);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg2 = _mm_sha256msg1_epu32(msg2, msg3);
-
-
// Rounds 16-19
-
msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[16]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg0, msg3, 4);
-
msg1 = _mm_add_epi32(msg1, tmp);
-
msg1 = _mm_sha256msg2_epu32(msg1, msg0);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg3 = _mm_sha256msg1_epu32(msg3, msg0);
-
-
// Rounds 20-23
-
msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[20]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg1, msg0, 4);
-
msg2 = _mm_add_epi32(msg2, tmp);
-
msg2 = _mm_sha256msg2_epu32(msg2, msg1);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg0 = _mm_sha256msg1_epu32(msg0, msg1);
-
-
// Rounds 24-27
-
msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[24]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg2, msg1, 4);
-
msg3 = _mm_add_epi32(msg3, tmp);
-
msg3 = _mm_sha256msg2_epu32(msg3, msg2);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg1 = _mm_sha256msg1_epu32(msg1, msg2);
-
-
// Rounds 28-31
-
msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[28]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg3, msg2, 4);
-
msg0 = _mm_add_epi32(msg0, tmp);
-
msg0 = _mm_sha256msg2_epu32(msg0, msg3);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg2 = _mm_sha256msg1_epu32(msg2, msg3);
-
-
// Rounds 32-35
-
msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[32]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg0, msg3, 4);
-
msg1 = _mm_add_epi32(msg1, tmp);
-
msg1 = _mm_sha256msg2_epu32(msg1, msg0);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg3 = _mm_sha256msg1_epu32(msg3, msg0);
-
-
// Rounds 36-39
-
msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[36]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg1, msg0, 4);
-
msg2 = _mm_add_epi32(msg2, tmp);
-
msg2 = _mm_sha256msg2_epu32(msg2, msg1);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg0 = _mm_sha256msg1_epu32(msg0, msg1);
-
-
// Rounds 40-43
-
msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[40]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg2, msg1, 4);
-
msg3 = _mm_add_epi32(msg3, tmp);
-
msg3 = _mm_sha256msg2_epu32(msg3, msg2);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg1 = _mm_sha256msg1_epu32(msg1, msg2);
-
-
// Rounds 44-47
-
msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[44]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg3, msg2, 4);
-
msg0 = _mm_add_epi32(msg0, tmp);
-
msg0 = _mm_sha256msg2_epu32(msg0, msg3);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg2 = _mm_sha256msg1_epu32(msg2, msg3);
-
-
// Rounds 48-51
-
msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[48]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg0, msg3, 4);
-
msg1 = _mm_add_epi32(msg1, tmp);
-
msg1 = _mm_sha256msg2_epu32(msg1, msg0);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
msg3 = _mm_sha256msg1_epu32(msg3, msg0);
-
-
// Rounds 52-55
-
msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[52]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg1, msg0, 4);
-
msg2 = _mm_add_epi32(msg2, tmp);
-
msg2 = _mm_sha256msg2_epu32(msg2, msg1);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
-
// Rounds 56-59
-
msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[56]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
tmp = _mm_alignr_epi8(msg2, msg1, 4);
-
msg3 = _mm_add_epi32(msg3, tmp);
-
msg3 = _mm_sha256msg2_epu32(msg3, msg2);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
-
// Rounds 60-63
-
msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[60]));
-
state1 = _mm_sha256rnds2_epu32(state1, state0, msg);
-
msg = _mm_shuffle_epi32(msg, 0x0E);
-
state0 = _mm_sha256rnds2_epu32(state0, state1, msg);
-
-
// Add initial state
-
state0 = _mm_add_epi32(state0, abef_save);
-
state1 = _mm_add_epi32(state1, cdgh_save);
-
-
// Swap byte order back and store
-
tmp = _mm_shuffle_epi32(state0, 0x1B); // FEBA
-
state1 = _mm_shuffle_epi32(state1, 0xB1); // DCHG
-
state0 = _mm_blend_epi16(tmp, state1, 0xF0); // DCBA
-
state1 = _mm_alignr_epi8(state1, tmp, 8); // HGFE
-
-
_mm_storeu_si128((__m128i*)&state[0], state0);
-
_mm_storeu_si128((__m128i*)&state[4], state1);
-
}
-
-
// OCaml interface functions
-
-
// Initialize SHA256 state
-
value oxcaml_sha256_init(value unit) {
-
CAMLparam1(unit);
-
CAMLlocal1(state);
-
-
// Allocate bigarray for state (8 x int32)
-
long dims[1] = {8};
-
state = caml_ba_alloc_dims(CAML_BA_INT32 | CAML_BA_C_LAYOUT, 1, NULL, dims);
-
uint32_t* s = (uint32_t*)Caml_ba_data_val(state);
-
-
// Copy initial values
-
memcpy(s, H256_INIT, 32);
-
-
CAMLreturn(state);
-
}
-
-
// Process a single 512-bit block
-
value oxcaml_sha256_process_block(value state, value block) {
-
CAMLparam2(state, block);
-
-
uint32_t* s = (uint32_t*)Caml_ba_data_val(state);
-
uint8_t* b = (uint8_t*)Caml_ba_data_val(block);
-
-
sha256_process_block_shani(s, b);
-
-
CAMLreturn(Val_unit);
-
}
-
-
// Finalize hash with padding and return digest
-
value oxcaml_sha256_finalize(value state, value data, value len_v) {
-
CAMLparam3(state, data, len_v);
-
CAMLlocal1(result);
-
-
uint32_t* s = (uint32_t*)Caml_ba_data_val(state);
-
uint8_t* input = (uint8_t*)Caml_ba_data_val(data);
-
uint64_t len = Int64_val(len_v);
-
-
// Process full blocks
-
uint64_t full_blocks = len / 64;
-
for (uint64_t i = 0; i < full_blocks; i++) {
-
sha256_process_block_shani(s, input + i * 64);
-
}
-
-
// Handle final block with padding
-
uint8_t final_block[128] = {0}; // Max 2 blocks for padding
-
uint64_t remaining = len % 64;
-
-
// Copy remaining bytes
-
if (remaining > 0) {
-
memcpy(final_block, input + full_blocks * 64, remaining);
-
}
-
-
// Add padding
-
final_block[remaining] = 0x80;
-
-
// Add length in bits at the end
-
uint64_t bit_len = len * 8;
-
if (remaining >= 56) {
-
// Need two blocks
-
sha256_process_block_shani(s, final_block);
-
memset(final_block, 0, 64);
-
}
-
-
// Add bit length (big-endian)
-
final_block[56] = (bit_len >> 56) & 0xFF;
-
final_block[57] = (bit_len >> 48) & 0xFF;
-
final_block[58] = (bit_len >> 40) & 0xFF;
-
final_block[59] = (bit_len >> 32) & 0xFF;
-
final_block[60] = (bit_len >> 24) & 0xFF;
-
final_block[61] = (bit_len >> 16) & 0xFF;
-
final_block[62] = (bit_len >> 8) & 0xFF;
-
final_block[63] = bit_len & 0xFF;
-
-
sha256_process_block_shani(s, final_block);
-
-
// Create result bigarray (32 bytes)
-
long dims[1] = {32};
-
result = caml_ba_alloc_dims(CAML_BA_UINT8 | CAML_BA_C_LAYOUT, 1, NULL, dims);
-
uint8_t* res = (uint8_t*)Caml_ba_data_val(result);
-
-
// Convert to big-endian bytes
-
for (int i = 0; i < 8; i++) {
-
res[i*4 + 0] = (s[i] >> 24) & 0xFF;
-
res[i*4 + 1] = (s[i] >> 16) & 0xFF;
-
res[i*4 + 2] = (s[i] >> 8) & 0xFF;
-
res[i*4 + 3] = s[i] & 0xFF;
-
}
-
-
CAMLreturn(result);
-
}
-
-
// Fast one-shot SHA256
-
value oxcaml_sha256_oneshot(value data, value len_v) {
-
CAMLparam2(data, len_v);
-
CAMLlocal1(result);
-
-
uint8_t* input = (uint8_t*)Caml_ba_data_val(data);
-
uint64_t len = Int64_val(len_v);
-
-
// Local state
-
alignas(16) uint32_t state[8];
-
memcpy(state, H256_INIT, 32);
-
-
// Process full blocks
-
uint64_t full_blocks = len / 64;
-
for (uint64_t i = 0; i < full_blocks; i++) {
-
sha256_process_block_shani(state, input + i * 64);
-
}
-
-
// Handle final block with padding
-
alignas(64) uint8_t final_block[128] = {0};
-
uint64_t remaining = len % 64;
-
-
if (remaining > 0) {
-
memcpy(final_block, input + full_blocks * 64, remaining);
-
}
-
-
final_block[remaining] = 0x80;
-
-
uint64_t bit_len = len * 8;
-
if (remaining >= 56) {
-
sha256_process_block_shani(state, final_block);
-
memset(final_block, 0, 64);
-
}
-
-
// Add bit length (big-endian)
-
final_block[56] = (bit_len >> 56) & 0xFF;
-
final_block[57] = (bit_len >> 48) & 0xFF;
-
final_block[58] = (bit_len >> 40) & 0xFF;
-
final_block[59] = (bit_len >> 32) & 0xFF;
-
final_block[60] = (bit_len >> 24) & 0xFF;
-
final_block[61] = (bit_len >> 16) & 0xFF;
-
final_block[62] = (bit_len >> 8) & 0xFF;
-
final_block[63] = bit_len & 0xFF;
-
-
sha256_process_block_shani(state, final_block);
-
-
// Create result bigarray
-
long dims[1] = {32};
-
result = caml_ba_alloc_dims(CAML_BA_UINT8 | CAML_BA_C_LAYOUT, 1, NULL, dims);
-
uint8_t* res = (uint8_t*)Caml_ba_data_val(result);
-
-
// Convert to big-endian bytes
-
for (int i = 0; i < 8; i++) {
-
res[i*4 + 0] = (state[i] >> 24) & 0xFF;
-
res[i*4 + 1] = (state[i] >> 16) & 0xFF;
-
res[i*4 + 2] = (state[i] >> 8) & 0xFF;
-
res[i*4 + 3] = state[i] & 0xFF;
-
}
-
-
CAMLreturn(result);
-
}
+32
oxsha.opam
···
+
# This file is generated by dune, edit dune-project instead
+
opam-version: "2.0"
+
version: "0.1"
+
synopsis: "Fast SHA256 hashing library"
+
description:
+
"OCaml bindings to a C SHA256 implementation using bigarrays for efficient, zero-copy hashing"
+
maintainer: ["Anil Madhavapeddy"]
+
authors: ["Anil Madhavapeddy"]
+
license: "ISC"
+
homepage: "https://github.com/avsm/oxsha"
+
bug-reports: "https://github.com/avsm/oxsha/issues"
+
depends: [
+
"dune" {>= "3.20"}
+
"ocaml" {>= "5.3"}
+
"odoc" {with-doc}
+
]
+
build: [
+
["dune" "subst"] {dev}
+
[
+
"dune"
+
"build"
+
"-p"
+
name
+
"-j"
+
jobs
+
"@install"
+
"@runtest" {with-test}
+
"@doc" {with-doc}
+
]
+
]
+
dev-repo: "git+https://github.com/avsm/oxsha.git"
+
x-maintenance-intent: ["(latest)"]
+3 -4
test/dune
···
-
(executable
-
(name test_sha256)
-
(libraries sha256 unix)
-
(modes native))
+
(test
+
(name speed_test)
+
(libraries cryptokit oxsha unix))
+171
test/speed_test.ml
···
+
(* Speed test comparing system sha256sum with Cryptokit and oxsha implementations *)
+
+
let deadbeef_pattern = "\xde\xad\xbe\xef"
+
+
(* Convert bytes to hex string *)
+
let hex_of_bytes bytes =
+
let buf = Buffer.create (Bytes.length bytes * 2) in
+
Bytes.iter
+
(fun c -> Buffer.add_string buf (Printf.sprintf "%02x" (Char.code c)))
+
bytes;
+
Buffer.contents buf
+
+
(* Create a 2GB file filled with 0xdeadbeef pattern *)
+
let create_test_file filename size =
+
Printf.printf "Creating %s (%d bytes = %.2f GB)...\n%!"
+
filename size (float_of_int size /. (1024.0 *. 1024.0 *. 1024.0));
+
+
let oc = open_out_bin filename in
+
let chunk_size = 1024 * 1024 in (* 1 MB chunks *)
+
let chunk = Bytes.make chunk_size '\x00' in
+
+
(* Fill chunk with 0xdeadbeef pattern *)
+
for i = 0 to chunk_size - 1 do
+
Bytes.set chunk i deadbeef_pattern.[i mod 4]
+
done;
+
+
let chunks = size / chunk_size in
+
let remainder = size mod chunk_size in
+
+
for i = 0 to chunks - 1 do
+
output_bytes oc chunk;
+
if i mod 100 = 0 then (
+
Printf.printf "\rProgress: %.1f%%..."
+
(float_of_int i *. 100.0 /. float_of_int chunks);
+
flush stdout
+
)
+
done;
+
+
if remainder > 0 then
+
output oc chunk 0 remainder;
+
+
close_out oc;
+
Printf.printf "\rProgress: 100.0%%... Done!\n%!"
+
+
(* SHA-256 using Cryptokit *)
+
let sha256sum_cryptokit filename =
+
let hash = Cryptokit.Hash.sha256 () in
+
let digest =
+
In_channel.with_open_bin filename
+
(Cryptokit.hash_channel hash)
+
in
+
let hex_digest =
+
Cryptokit.transform_string
+
(Cryptokit.Hexa.encode ()) digest
+
in
+
hex_digest
+
+
(* SHA-256 using system command *)
+
let sha256sum_system filename =
+
let cmd = Printf.sprintf "sha256sum %s" (Filename.quote filename) in
+
let ic = Unix.open_process_in cmd in
+
let line = input_line ic in
+
let _ = Unix.close_process_in ic in
+
(* sha256sum outputs: "<hash> <filename>" *)
+
let hash = String.sub line 0 64 in
+
hash
+
+
(* SHA-256 using oxsha with Unix.map_file *)
+
let sha256sum_oxsha filename =
+
let fd = Unix.openfile filename [ Unix.O_RDONLY ] 0 in
+
let stats = Unix.fstat fd in
+
let file_size = stats.Unix.st_size in
+
+
if file_size = 0 then (
+
(* Handle empty files *)
+
Unix.close fd;
+
let digest = Oxsha.hash_string "" in
+
hex_of_bytes digest
+
) else (
+
let mapped =
+
Unix.map_file fd Bigarray.char Bigarray.c_layout false [| file_size |]
+
in
+
let ba = Bigarray.array1_of_genarray mapped in
+
Unix.close fd;
+
+
let digest = Oxsha.hash ba in
+
hex_of_bytes digest
+
)
+
+
(* Time a function execution *)
+
let time_function name f =
+
Printf.printf "\nRunning %s...\n%!" name;
+
let start = Unix.gettimeofday () in
+
let result = f () in
+
let elapsed = Unix.gettimeofday () -. start in
+
Printf.printf "%s completed in %.3f seconds\n%!" name elapsed;
+
(result, elapsed)
+
+
let () =
+
let test_file = "test_2gb.bin" in
+
let file_size = 2 * 1024 * 1024 * 1024 in (* 2 GB *)
+
+
Printf.printf "=== SHA-256 Speed Test ===\n\n";
+
+
(* Create test file if it doesn't exist *)
+
if not (Sys.file_exists test_file) then
+
create_test_file test_file file_size
+
else
+
Printf.printf "Test file %s already exists, using existing file.\n%!" test_file;
+
+
(* Test system sha256sum *)
+
let (hash_system, time_system) =
+
time_function "system sha256sum" (fun () -> sha256sum_system test_file) in
+
Printf.printf "Hash: %s\n" hash_system;
+
+
(* Test Cryptokit implementation *)
+
let (hash_cryptokit, time_cryptokit) =
+
time_function "Cryptokit sha256sum" (fun () -> sha256sum_cryptokit test_file) in
+
Printf.printf "Hash: %s\n" hash_cryptokit;
+
+
(* Test oxsha implementation *)
+
let (hash_oxsha, time_oxsha) =
+
time_function "oxsha (mmap)" (fun () -> sha256sum_oxsha test_file) in
+
Printf.printf "Hash: %s\n" hash_oxsha;
+
+
(* Compare results *)
+
Printf.printf "\n=== Results ===\n";
+
Printf.printf "System sha256sum: %.3f seconds (%.2f MB/s)\n"
+
time_system
+
(float_of_int file_size /. time_system /. (1024.0 *. 1024.0));
+
Printf.printf "Cryptokit sha256sum: %.3f seconds (%.2f MB/s)\n"
+
time_cryptokit
+
(float_of_int file_size /. time_cryptokit /. (1024.0 *. 1024.0));
+
Printf.printf "oxsha (mmap): %.3f seconds (%.2f MB/s)\n"
+
time_oxsha
+
(float_of_int file_size /. time_oxsha /. (1024.0 *. 1024.0));
+
+
(* Find fastest *)
+
let times = [
+
("System sha256sum", time_system);
+
("Cryptokit", time_cryptokit);
+
("oxsha (mmap)", time_oxsha)
+
] in
+
let fastest_name, fastest_time =
+
List.fold_left (fun (n, t) (n', t') -> if t' < t then (n', t') else (n, t))
+
(List.hd times) (List.tl times)
+
in
+
Printf.printf "\nFastest: %s\n" fastest_name;
+
List.iter (fun (name, time) ->
+
if name <> fastest_name then
+
Printf.printf " %s is %.2fx faster than %s\n"
+
fastest_name (time /. fastest_time) name
+
) times;
+
+
(* Verify hashes match *)
+
let hash_system_lower = String.lowercase_ascii hash_system in
+
let hash_cryptokit_lower = String.lowercase_ascii hash_cryptokit in
+
let hash_oxsha_lower = String.lowercase_ascii hash_oxsha in
+
+
if hash_system_lower = hash_cryptokit_lower && hash_system_lower = hash_oxsha_lower then
+
Printf.printf "\n✓ All hashes match!\n"
+
else (
+
Printf.printf "\n✗ ERROR: Hashes do not match!\n";
+
Printf.printf " System: %s\n" hash_system;
+
Printf.printf " Cryptokit: %s\n" hash_cryptokit;
+
Printf.printf " oxsha: %s\n" hash_oxsha;
+
exit 1
+
);
+
+
Printf.printf "\nNote: Test file %s has been preserved for future runs.\n" test_file;
+
Printf.printf " Delete it manually if you want to recreate it.\n"
-124
test/test_sha256.ml
···
-
open Sha256
-
-
(* Test vectors from NIST *)
-
let test_vectors = [
-
("", "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855");
-
("abc", "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad");
-
("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
-
"248d6a61d20638b8e5c026930c3e6039a33ce45964ff2167f6ecedd419db06c1");
-
("The quick brown fox jumps over the lazy dog",
-
"d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592");
-
(String.make 1000000 'a',
-
"cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0");
-
]
-
-
let test_basic () =
-
print_endline "Testing basic SHA256 functionality...";
-
List.iter (fun (input, expected) ->
-
let digest = hash_string input in
-
let hex = digest_to_hex digest in
-
if hex = expected then
-
Printf.printf " ✓ Test passed for input length %d\n" (String.length input)
-
else begin
-
Printf.printf " ✗ Test FAILED for input: %S\n"
-
(if String.length input > 50 then
-
String.sub input 0 50 ^ "..."
-
else input);
-
Printf.printf " Expected: %s\n" expected;
-
Printf.printf " Got: %s\n" hex
-
end
-
) test_vectors
-
-
let benchmark () =
-
print_endline "\nBenchmarking SHA256 performance...";
-
-
(* Test different input sizes *)
-
let sizes = [64; 256; 1024; 4096; 16384; 65536; 1048576] in
-
-
List.iter (fun size ->
-
let data = String.make size 'x' in
-
let start = Unix.gettimeofday () in
-
let iterations = if size > 10000 then 1000 else 10000 in
-
-
for _ = 1 to iterations do
-
ignore (hash_string data)
-
done;
-
-
let elapsed = Unix.gettimeofday () -. start in
-
let throughput = (float_of_int (size * iterations)) /. elapsed /. 1_000_000.0 in
-
Printf.printf " Size: %7d bytes | Iterations: %6d | Time: %.3fs | Throughput: %.1f MB/s\n"
-
size iterations elapsed throughput
-
) sizes
-
-
let test_incremental () =
-
print_endline "\nTesting incremental hashing...";
-
-
(* Create test data *)
-
let data = "The quick brown fox jumps over the lazy dog" in
-
let expected = "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" in
-
-
(* Hash using oneshot *)
-
let digest1 = hash_string data in
-
let hex1 = digest_to_hex digest1 in
-
-
(* Hash using incremental API *)
-
let state = init () in
-
let bytes = Bytes.of_string data in
-
let buffer = Bigarray.Array1.create Bigarray.int8_unsigned Bigarray.c_layout (String.length data) in
-
for i = 0 to String.length data - 1 do
-
Bigarray.Array1.set buffer i (Char.code data.[i])
-
done;
-
-
let digest2 = finalize state buffer (Int64.of_int (String.length data)) in
-
let hex2 = digest_to_hex digest2 in
-
-
if hex1 = expected && hex2 = expected then
-
print_endline " ✓ Incremental hashing works correctly"
-
else begin
-
print_endline " ✗ Incremental hashing FAILED";
-
Printf.printf " Expected: %s\n" expected;
-
Printf.printf " Oneshot: %s\n" hex1;
-
Printf.printf " Incremental: %s\n" hex2
-
end
-
-
let test_parallel () =
-
print_endline "\nTesting parallel hashing...";
-
-
(* Create test data *)
-
let num_hashes = 100 in
-
let inputs = List.init num_hashes (fun i ->
-
Printf.sprintf "Test string number %d with some padding to make it longer" i
-
|> Bytes.of_string
-
) in
-
-
(* Sequential hashing *)
-
let start_seq = Unix.gettimeofday () in
-
let results_seq = List.map hash_bytes inputs in
-
let time_seq = Unix.gettimeofday () -. start_seq in
-
-
(* Parallel hashing *)
-
let par = Parallel.create () in
-
let start_par = Unix.gettimeofday () in
-
let results_par = Fast.parallel_hash_many par inputs in
-
let time_par = Unix.gettimeofday () -. start_par in
-
-
(* Verify results match *)
-
let results_match =
-
List.for_all2 (fun d1 d2 -> digest_equal d1 d2) results_seq results_par
-
in
-
-
if results_match then begin
-
Printf.printf " ✓ Parallel hashing produces correct results\n";
-
Printf.printf " Sequential: %.3fs\n" time_seq;
-
Printf.printf " Parallel: %.3fs\n" time_par;
-
Printf.printf " Speedup: %.2fx\n" (time_seq /. time_par)
-
end else
-
print_endline " ✗ Parallel hashing produced different results!"
-
-
let () =
-
print_endline "SHA256 Hardware Accelerated Test Suite";
-
print_endline "======================================";
-
test_basic ();
-
test_incremental ();
-
test_parallel ();
-
benchmark ()