Skip to content

Blazehash Performance Benchmarks

Measured against hashdeep 4.4 — Apple M4 Pro — April 2026

All numbers on this page are real measurements from actual hardware. Methodology, raw timing data, and reproduction instructions follow.


Abstract

We present a systematic performance comparison of blazehash 0.2.2 against hashdeep 4.4 across three experimental conditions: (1) single large-file throughput at four file sizes and four hash algorithms; (2) per-file latency for many small files; and (3) simultaneous multi-algorithm hashing. Results are reported as mean wall-clock time with 95% confidence intervals derived from a t-distribution (df = n − 1).

Key findings:

  • blazehash is 1.05–2.05× faster on large files (64 MiB – 1 GiB). SHA-1 shows the largest gain (2×) due to ARM NEON hardware instructions. SHA-256 shows a more modest 1.14× at 1 GiB.
  • blazehash is 1.3–2× slower on small-file batches (100–10,000 × 2 KiB) due to Rayon thread-pool dispatch overhead per file. An adaptive 64 KiB threshold (parallel_threshold_bytes) falls back to sequential I/O for small files; run blazehash bench calibrate to tune it to your hardware.
  • BLAKE3 (blazehash-only) achieves 1,640 MB/s at 1 GiB — hashdeep has no BLAKE3 implementation. Compared to hashdeep's fastest algorithm (SHA-1 at 595 MB/s), BLAKE3 is 2.8× faster.
  • Correctness: 15/15 hash vectors match byte-for-byte across all tested sizes and algorithms.

Test Environment

Component Value
Machine Apple MacBook Pro, M4 Pro (14-core)
OS macOS 15.7.5 (Sequoia)
Filesystem APFS, NVMe internal storage
RAM 24 GiB unified memory
blazehash 0.2.2 (cargo build --release, Rust 1.88.0)
hashdeep 4.4 (Homebrew)
Rust toolchain rustc 1.88.0 (2025-06-23)
Python (bench harness) 3.13.x
Run date 2026-04-10

All test files were generated deterministically from a Lehmer LCG seeded at 42 and read from warm cache (one discarded warm-up run before each timed series).


Methodology

  • n = 7 runs per condition (Experiment 1); n = 5 (Experiment 2).
  • Confidence interval: 95% CI via t-distribution, t(0.025, df) × (sd / √n). Critical values: df=6 → t = 2.447; df=4 → t = 2.776.
  • Throughput: size_bytes / mean_seconds / 1 000 000 MB/s.
  • Speedup: hd_mean / bh_mean (>1 means blazehash is faster).
  • Tool invocations: blazehash --bare -c <algo> <file> vs hashdeep -c <algo> <file>.

Correctness Verification

15/15 hash vectors match byte-for-byte across 5 file sizes (0 B – 1 MiB) and 3 algorithms (MD5, SHA-1, SHA-256). Full digest table in docs/bench/results.json.


Experiment 1 — Single Large File Throughput

SHA-256

Size blazehash (s) hashdeep (s) Speedup
64 MiB 0.145 ± 0.002 0.140 ± 0.003 0.97x
256 MiB 0.519 ± 0.002 0.545 ± 0.008 1.05x
512 MiB 1.022 ± 0.008 1.154 ± 0.070 1.13x
1 GiB 2.182 ± 0.148 2.485 ± 0.230 1.14x

SHA-256 speedup is modest but consistent at larger file sizes where startup overhead is amortised. The variance at 1 GiB is higher than at 512 MiB — both tools show OS scheduler jitter over multi-second runs.

MD5

Size blazehash (s) hashdeep (s) Speedup
64 MiB 0.111 ± 0.002 0.134 ± 0.013 1.21x
256 MiB 0.402 ± 0.042 0.525 ± 0.045 1.31x
512 MiB 0.750 ± 0.039 0.962 ± 0.065 1.28x
1 GiB 1.447 ± 0.032 2.135 ± 0.405 1.48x

SHA-1

Size blazehash (s) hashdeep (s) Speedup
64 MiB 0.075 ± 0.007 0.116 ± 0.006 1.55x
256 MiB 0.237 ± 0.009 0.442 ± 0.015 1.86x
512 MiB 0.455 ± 0.005 0.906 ± 0.042 1.99x
1 GiB 0.879 ± 0.006 1.803 ± 0.136 2.05x

SHA-1 advantage is driven by ARM NEON hardware instructions (sha1c, sha1p, sha1m, sha1h) on the M4 Pro. The Rust sha1 crate generates these automatically via LLVM; the Homebrew hashdeep binary is compiled without -march=native and does not use them. This 2× speedup is ARM-specific and will not reproduce on x86-64.

BLAKE3 (blazehash only — not in hashdeep 4.4)

Size blazehash (s) Throughput
64 MiB 0.053 ± 0.002 1,278 MB/s
256 MiB 0.155 ± 0.002 1,728 MB/s
512 MiB 0.302 ± 0.008 1,780 MB/s
1 GiB 0.655 ± 0.110 1,640 MB/s

Compared to hashdeep's fastest supported algorithm (SHA-1 at 595 MB/s), BLAKE3 is 2.8× faster on this hardware.

Charts

Figure 1 — Single-file throughput at 1 GiB

Figure 1. Throughput (MB/s) at 1 GiB. Error bars: 95% CI.

Figure 2 — Throughput scaling

Figure 2. Throughput vs file size. Shaded bands: 95% CI.

Figure 4 — Algorithm comparison

Figure 4. All algorithms at 1 GiB.


Experiment 2 — Many Small Files

10,000 × 2 KiB files, SHA-256, n = 5 runs each.

File count blazehash (µs/file) hashdeep (µs/file) Speedup
100 268 137 0.51x
1,000 70 51 0.73x
5,000 49 39 0.78x
10,000 65 42 0.65x

blazehash is slower on small-file workloads. Rayon's per-file dispatch overhead (~20-40 µs) is significant relative to hashing 2 KiB of data (~4 µs). hashdeep uses a single-threaded loop with minimal per-file overhead in C.

The gap shrinks with file count (268 µs at 100 files → 49–65 µs at 5,000+) as the thread pool amortises across more work. The 10,000-file result shows higher variance due to OS file-descriptor pressure.

For small-file forensic triage (malware analysis, live evidence collection), hashdeep or a sequential tool remains faster. blazehash's advantage is correctness, audit mode, signing, fuzzy matching, and large-file throughput.

Figure 3 — Small files benchmark

Figure 3. Per-file latency and speedup ratios. Values below 1.0 mean hashdeep is faster.


Experiment 3 — Multi-Algorithm Hashing

256 MiB file, n = 7 runs.

Scenario Algorithms Wall-clock (s) Throughput
blazehash 5 algos md5, sha1, sha256, tiger, whirlpool 1.988 ± 0.015 135 MB/s
hashdeep 3 algos md5, sha1, sha256 (default) 0.960 ± 0.011 280 MB/s

Not a fair direct comparison — different algorithm sets. Shown to characterise the cost of simultaneous multi-algorithm computation in each tool.


Extrapolation to 1 TiB

Two scenarios are relevant for forensic work:

Cached (file fits in RAM) — measured at 1 GiB warm cache, CPU-bound:

Algorithm blazehash hashdeep blazehash 1 TiB est.
BLAKE3 ~1,640 MB/s N/A ~10 min
SHA-1 ~1,222 MB/s ~595 MB/s ~14 min
SHA-256 ~492 MB/s ~432 MB/s ~35 min

Disk-bound (forensic scale, > available RAM) — measured at 20 GiB with random data on this test machine (Apple M-series, NVMe internal SSD):

Algorithm blazehash measured 1 TiB est.
BLAKE3 ~331 MB/s ~53 min
SHA-1 ~289 MB/s ~60 min
SHA-256 ~228 MB/s ~77 min

At forensic scale (evidence images that exceed available RAM), throughput is limited by memory bandwidth and page-cache pressure rather than the CPU. Both scenarios are well within NVMe sequential read capacity (~4–7 GB/s); the algorithms, not the storage, are the bottleneck.


Capability Gap: EWF/E01

EWF image processing is outside hashdeep's scope. blazehash --verify-image handles E01/Ex01/L01/Lx01 via the bundled libewf backend, with embedded hash verification against the image's stored digests.


Limitations

  1. Single hardware platform (Apple M4 Pro). SHA-1 2× speedup is ARM-specific (NEON sha1c/sha1p/sha1m). x86-64 results will differ.
  2. Warm-cache measurements only. Cold-cache throughput is bounded by storage I/O speed, not CPU.
  3. Small files: hashdeep is faster. Rayon thread dispatch overhead dominates for sub-64 KiB files. blazehash mitigates this with an adaptive 64 KiB threshold (parallel_threshold_bytes): files below the threshold are hashed sequentially, matching hashdeep's overhead profile. Run blazehash bench calibrate to tune this to your hardware.
  4. hashdeep compiled without -march=native. Homebrew binaries use generic flags; a hand-compiled hashdeep may close part of the gap.
  5. n = 7 runs. Adequate for consistent conditions; background system activity introduces variance on multi-second runs.

Reproducing

cargo build --release
python3 docs/bench/run_benchmarks.py   # ~15 min
python3 docs/bench/generate_charts.py

Raw data: docs/bench/results.json (2026-04-10). Charts: docs/charts/ — matplotlib 3.x, 150 DPI, 95% CI bars.