Blazehash Performance Benchmarks¶

Measured against hashdeep 4.4 — Apple M4 Pro — April 2026

All numbers on this page are real measurements from actual hardware. Methodology, raw timing data, and reproduction instructions follow.

Abstract¶

We present a systematic performance comparison of blazehash 0.2.2 against hashdeep 4.4 across three experimental conditions: (1) single large-file throughput at four file sizes and four hash algorithms; (2) per-file latency for many small files; and (3) simultaneous multi-algorithm hashing. Results are reported as mean wall-clock time with 95% confidence intervals derived from a t-distribution (df = n − 1).

Key findings:

blazehash is 1.05–2.05× faster on large files (64 MiB – 1 GiB). SHA-1 shows the largest gain (2×) due to ARM NEON hardware instructions. SHA-256 shows a more modest 1.14× at 1 GiB.
blazehash is 1.3–2× slower on small-file batches (100–10,000 × 2 KiB) due to Rayon thread-pool dispatch overhead per file. An adaptive 64 KiB threshold (parallel_threshold_bytes) falls back to sequential I/O for small files; run blazehash bench calibrate to tune it to your hardware.
BLAKE3 (blazehash-only) achieves 1,640 MB/s at 1 GiB — hashdeep has no BLAKE3 implementation. Compared to hashdeep's fastest algorithm (SHA-1 at 595 MB/s), BLAKE3 is 2.8× faster.
Correctness: 15/15 hash vectors match byte-for-byte across all tested sizes and algorithms.

Test Environment¶

Component	Value
Machine	Apple MacBook Pro, M4 Pro (14-core)
OS	macOS 15.7.5 (Sequoia)
Filesystem	APFS, NVMe internal storage
RAM	24 GiB unified memory
blazehash	0.2.2 (`cargo build --release`, Rust 1.88.0)
hashdeep	4.4 (Homebrew)
Rust toolchain	rustc 1.88.0 (2025-06-23)
Python (bench harness)	3.13.x
Run date	2026-04-10

All test files were generated deterministically from a Lehmer LCG seeded at 42 and read from warm cache (one discarded warm-up run before each timed series).

Methodology¶

n = 7 runs per condition (Experiment 1); n = 5 (Experiment 2).
Confidence interval: 95% CI via t-distribution, t(0.025, df) × (sd / √n). Critical values: df=6 → t = 2.447; df=4 → t = 2.776.
Throughput: size_bytes / mean_seconds / 1 000 000 MB/s.
Speedup: hd_mean / bh_mean (>1 means blazehash is faster).
Tool invocations: blazehash --bare -c <algo> <file> vs hashdeep -c <algo> <file>.

Correctness Verification¶

15/15 hash vectors match byte-for-byte across 5 file sizes (0 B – 1 MiB) and 3 algorithms (MD5, SHA-1, SHA-256). Full digest table in docs/bench/results.json.

Experiment 1 — Single Large File Throughput¶

SHA-256¶

Size	blazehash (s)	hashdeep (s)	Speedup
64 MiB	0.145 ± 0.002	0.140 ± 0.003	0.97x
256 MiB	0.519 ± 0.002	0.545 ± 0.008	1.05x
512 MiB	1.022 ± 0.008	1.154 ± 0.070	1.13x
1 GiB	2.182 ± 0.148	2.485 ± 0.230	1.14x

SHA-256 speedup is modest but consistent at larger file sizes where startup overhead is amortised. The variance at 1 GiB is higher than at 512 MiB — both tools show OS scheduler jitter over multi-second runs.

MD5¶

Size	blazehash (s)	hashdeep (s)	Speedup
64 MiB	0.111 ± 0.002	0.134 ± 0.013	1.21x
256 MiB	0.402 ± 0.042	0.525 ± 0.045	1.31x
512 MiB	0.750 ± 0.039	0.962 ± 0.065	1.28x
1 GiB	1.447 ± 0.032	2.135 ± 0.405	1.48x

SHA-1¶

Size	blazehash (s)	hashdeep (s)	Speedup
64 MiB	0.075 ± 0.007	0.116 ± 0.006	1.55x
256 MiB	0.237 ± 0.009	0.442 ± 0.015	1.86x
512 MiB	0.455 ± 0.005	0.906 ± 0.042	1.99x
1 GiB	0.879 ± 0.006	1.803 ± 0.136	2.05x

SHA-1 advantage is driven by ARM NEON hardware instructions (sha1c, sha1p, sha1m, sha1h) on the M4 Pro. The Rust sha1 crate generates these automatically via LLVM; the Homebrew hashdeep binary is compiled without -march=native and does not use them. This 2× speedup is ARM-specific and will not reproduce on x86-64.

BLAKE3 (blazehash only — not in hashdeep 4.4)¶

Size	blazehash (s)	Throughput
64 MiB	0.053 ± 0.002	1,278 MB/s
256 MiB	0.155 ± 0.002	1,728 MB/s
512 MiB	0.302 ± 0.008	1,780 MB/s
1 GiB	0.655 ± 0.110	1,640 MB/s

Compared to hashdeep's fastest supported algorithm (SHA-1 at 595 MB/s), BLAKE3 is 2.8× faster on this hardware.

Charts¶

Figure 1 — Single-file throughput at 1 GiB

Figure 1. Throughput (MB/s) at 1 GiB. Error bars: 95% CI.

Figure 2 — Throughput scaling

Figure 2. Throughput vs file size. Shaded bands: 95% CI.

Figure 4 — Algorithm comparison

Figure 4. All algorithms at 1 GiB.

Experiment 2 — Many Small Files¶

10,000 × 2 KiB files, SHA-256, n = 5 runs each.

File count	blazehash (µs/file)	hashdeep (µs/file)	Speedup
100	268	137	0.51x
1,000	70	51	0.73x
5,000	49	39	0.78x
10,000	65	42	0.65x

blazehash is slower on small-file workloads. Rayon's per-file dispatch overhead (~20-40 µs) is significant relative to hashing 2 KiB of data (~4 µs). hashdeep uses a single-threaded loop with minimal per-file overhead in C.

The gap shrinks with file count (268 µs at 100 files → 49–65 µs at 5,000+) as the thread pool amortises across more work. The 10,000-file result shows higher variance due to OS file-descriptor pressure.

For small-file forensic triage (malware analysis, live evidence collection), hashdeep or a sequential tool remains faster. blazehash's advantage is correctness, audit mode, signing, fuzzy matching, and large-file throughput.

Figure 3 — Small files benchmark

Figure 3. Per-file latency and speedup ratios. Values below 1.0 mean hashdeep is faster.

Experiment 3 — Multi-Algorithm Hashing¶

256 MiB file, n = 7 runs.

Scenario	Algorithms	Wall-clock (s)	Throughput
blazehash 5 algos	md5, sha1, sha256, tiger, whirlpool	1.988 ± 0.015	135 MB/s
hashdeep 3 algos	md5, sha1, sha256 (default)	0.960 ± 0.011	280 MB/s

Not a fair direct comparison — different algorithm sets. Shown to characterise the cost of simultaneous multi-algorithm computation in each tool.

Extrapolation to 1 TiB¶

Two scenarios are relevant for forensic work:

Cached (file fits in RAM) — measured at 1 GiB warm cache, CPU-bound:

Algorithm	blazehash	hashdeep	blazehash 1 TiB est.
BLAKE3	~1,640 MB/s	N/A	~10 min
SHA-1	~1,222 MB/s	~595 MB/s	~14 min
SHA-256	~492 MB/s	~432 MB/s	~35 min

Disk-bound (forensic scale, > available RAM) — measured at 20 GiB with random data on this test machine (Apple M-series, NVMe internal SSD):

Algorithm	blazehash measured	1 TiB est.
BLAKE3	~331 MB/s	~53 min
SHA-1	~289 MB/s	~60 min
SHA-256	~228 MB/s	~77 min

At forensic scale (evidence images that exceed available RAM), throughput is limited by memory bandwidth and page-cache pressure rather than the CPU. Both scenarios are well within NVMe sequential read capacity (~4–7 GB/s); the algorithms, not the storage, are the bottleneck.

Capability Gap: EWF/E01¶

EWF image processing is outside hashdeep's scope. blazehash --verify-image handles E01/Ex01/L01/Lx01 via the bundled libewf backend, with embedded hash verification against the image's stored digests.

Limitations¶

Single hardware platform (Apple M4 Pro). SHA-1 2× speedup is ARM-specific (NEON sha1c/sha1p/sha1m). x86-64 results will differ.
Warm-cache measurements only. Cold-cache throughput is bounded by storage I/O speed, not CPU.
Small files: hashdeep is faster. Rayon thread dispatch overhead dominates for sub-64 KiB files. blazehash mitigates this with an adaptive 64 KiB threshold (parallel_threshold_bytes): files below the threshold are hashed sequentially, matching hashdeep's overhead profile. Run blazehash bench calibrate to tune this to your hardware.
hashdeep compiled without -march=native. Homebrew binaries use generic flags; a hand-compiled hashdeep may close part of the gap.
n = 7 runs. Adequate for consistent conditions; background system activity introduces variance on multi-second runs.

Reproducing¶

cargo build --release
python3 docs/bench/run_benchmarks.py   # ~15 min
python3 docs/bench/generate_charts.py

Raw data: docs/bench/results.json (2026-04-10). Charts: docs/charts/ — matplotlib 3.x, 150 DPI, 95% CI bars.