Differential Validation — Deleted-Record Carving¶
This document is the Doer-Checker evidence for sqlite_forensic's deleted-record
carving. It records how our carver's output was reconciled against an independent
reference tool so that correctness is not asserted only by tests we wrote against a
fixture we generated. The machine-checkable form of this evidence is
forensic/tests/oracle_differential.rs.
This document is the historical differential record; the current capability matrix lives in
recovery-comparison.md. The page-level findings below remain accurate as the record of how each tool draws the freelist/allocated boundary, but several carver scope boundaries they describe are now closed — so read the per-scenario numbers and the "Summary" section below as the pre-fix snapshot and defer torecovery-comparison.mdfor current numbers:
carve_all_deleted_recordsadded in-page free-block carving and dropped-table carving, so on the fixture it recovers the in-page remnant (rowid 237) and exactly matches undark, and it recovers the DC3 dropped-table rows. Where this doc says "our freelist-only carver recovers none" of those cases, that is the pre-fix state.- It then added value-aware prior-version recovery: an
UPDATE's freed old version (same rowid, different values) is recovered (taggedPriorVersion), not dropped. The differential test (oracle_differential.rs) asserts agreement now, plus a prior-version reconciliation, rather than the former exemptions.The Summary's "consistent with / agree exactly" statements describe the freelist-page differential specifically and still hold for that scenario; they are not the whole-corpus capability claim — for that, see
recovery-comparison.md.
Summary¶
- Conclusion: on the freelist-page deletion scenario our carver is designed for,
its output is consistent with TWO independent reference carvers —
undark(C) andfqlite(Java) — with 100% content agreement on every overlapping row and no false positives. Where all three tools overlap on our fixture, they agree exactly. - Two independent oracles, two corpora.
undarkand a headless source-instrumented tap offqlite's recovery engine are both used as oracles; ourdeleted_places.dbfixture and the third-party DC3sqlite_dissectcorpus are both used as input. - Divergences are diagnosed at the page level, not papered over. Each tool draws the freelist-vs-allocated and trunk-vs-leaf boundaries slightly differently; every ours-vs-oracle difference is explained by which page a row lives on and which pages each tool scans. None is a defect in our freelist-carving path.
- We make no claim that our carver is "proven correct". The evidence supports only that its freelist-page recovery is consistent with two independent tools' recovery.
The two oracles¶
Oracle 1 — undark (C)¶
| Tool | undark |
| Version | 0.7.1 (Paul L. Daniels) |
| Upstream | https://github.com/inflex/undark |
| Source tarball (master) | https://github.com/inflex/undark/archive/refs/heads/master.tar.gz |
| Source tarball sha256 | c0a9ee7ebd180727deef52fbafe0ef0e2b7c9b43c5604761bfeb86bc9306912a |
| Local binary | tools/undark (gitignored, not committed) |
| Test gate | UNDARK_BIN |
Oracle 2 — fqlite (Java), via a headless source-instrumentation tap¶
fqlite was the originally-named oracle. Its command-line mode was removed in v2.0
(README: "With version 2.0, the support for the command line mode was cancelled"),
releases ship only ~440 MB JavaFX jpackage installers (no runnable CLI jar), it is
not on Maven Central, and its repo ships no test databases. So it cannot be used as
a packaged CLI oracle.
But fqlite IS usable as an oracle via source instrumentation — the CLI cancellation
was the only blocker, not the engine. fqlite's carving engine (fqlite.base.Job) is
plain Java that populates a result list the GUI merely reads. A small headless tap
(tools/fqlite/HeadlessTap.java) constructs Job, runs Job.run(path), and emits the
recovered DELETED records as CSV — never launching the JavaFX UI. The engine is not
cleanly decoupled from JavaFX in the current source (its logger's static init builds a
JavaFX TextArea, processDB() posts a Platform.runLater cleanup fence and calls
gui.add_table unguarded), so the tap (a) null-guards those add_table calls, (b) sets
GUI.baseDir, and (c) boots the JavaFX toolkit headlessly (no window). The full engine
API map, the JavaFX-coupling findings, and the minimal changes a clean
fqlite.base.MAIN revival would need are in tools/fqlite/ENGINE_NOTES.md.
| Tool | fqlite (recovery engine) |
| Version | 4.22 |
| Commit | 26922bd9e3cdc60c93b72dfb1fb2f5972a0af6a6 |
| Upstream | https://github.com/pawlaszczyk/fqlite |
| Driver | tools/fqlite/HeadlessTap.java + run-tap.sh (gitignored; recipe in tools/fqlite/README.md) |
| Test gate | FQLITE_TAP |
(sqlite_dissect was also evaluated as an oracle but its free-block carver produced
misaligned/garbled column boundaries on these fixtures — recovering corrupt title
values and surfacing live rows — so it was rejected as a yardstick. Its test databases,
authored by DC3, are still used as independent input; see below.)
undark build recipe (macOS / clang)¶
Build recipe (macOS / clang)¶
Upstream undark uses two GCC nested-function definitions and a function named ntohll
that collides with the macOS <sys/_endian.h> ntohll macro, so it does not compile
with clang out of the box. Two minimal, behavior-preserving patches make it build:
- Hoist the nested
swap64/ntohllhelpers out ofdecode_rowto file scope. - Rename undark's
ntohlltou_ntohllto avoid the macOS macro collision.
curl -sL https://github.com/inflex/undark/archive/refs/heads/master.tar.gz | tar xz
cd undark-master
# patch 1+2 (see tools/undark.c.patched for the exact patched source)
make # produces ./undark
./undark -V # => undark version 0.7.1, by Paul L Daniels
The exact patched source is kept at tools/undark.c.patched (gitignored) for
reproducibility.
CLI invocation¶
undark dumps every record it can reconstruct (live + recovered-deleted) to stdout as CSV,
one record per line: rowid,id,col1,col2,…. The command used by the test is simply:
Deleted rows are identified by rowid: any recovered rowid that is not present in the
live b-tree (read via sqlite3) is a recovered-deleted record. (--freespace scans free
blocks within allocated pages; it returns nothing on these fixtures because the deleted
content there is on freed whole pages, not in allocated-page free blocks.)
fqlite tap invocation¶
FQLITE_TAP=tools/fqlite/run-tap.sh
"$FQLITE_TAP" <database.db> # -> CSV: rowid,col1,col2,... (recovered DELETED rows)
fqlite often cannot recover a carved row's rowid (emits -1), so the fqlite comparison
is keyed by the row's text content (url), not rowid. Build recipe in
tools/fqlite/README.md; engine API map in tools/fqlite/ENGINE_NOTES.md.
Comparison projection¶
Each tool's output is reduced to the same identity per row: the url/title
(moz_places) or name/surname (DC3 users) text columns at record positions 1 and 2.
The undark comparison keys by rowid; the fqlite comparison keys by url (fqlite does not
always recover the rowid). Agreement is defined on this projection.
Results¶
Corpus 1 — our fixture (undark AND fqlite as oracles over our input)¶
forensic/tests/data/deleted_places.db — moz_places, 400 rows inserted, ids 201..=400
DELETEd without VACUUM under secure_delete=OFF; freed whole leaf pages onto the
freelist. Ground truth: 200 live (1..=200), 200 deleted (201..=400). Freelist =
trunk page 9 + leaf pages 10,11,12,13.
Three-way recovery over the deleted range (ids 201..=400):
| tool | recovers | which rows |
|---|---|---|
| our carver | 162 | 238..=400 (except 250) |
| undark | 163 | 237..=400 (except 250) |
| fqlite | 126 | 235, 237, and 277..=400 (except none) |
Agreement:
| comparison | result |
|---|---|
| content agreement (url + title) on every overlapping row | 100%, 0 mismatches (all three tools) |
| our false positives (rows we carve no oracle corroborates) | 0 |
| ours vs undark | ours ⊇ undark minus 1 row (237); 162/163 = 99.4% |
| ours vs fqlite | ours adds 238..=276; fqlite adds 235, 237 — all explained below |
Why the three tools draw the freelist boundary differently — page-level diagnosis:
- Rows 277..=400 live on freelist leaf pages 10–13. All three tools carve these. ✓
- Rows 238..=276 live on page 9, the freelist trunk page. Our carver and undark scan the trunk page body (below its small 8-byte trunk header + leaf-pointer array) and recover them. fqlite reads page 9 only as a trunk (next-pointer + leaf-pointer array) and does not carve record content from its body — so fqlite misses 238..=276. This is a genuine fqlite-specific behaviour, not a defect in either carver.
- Rows 235, 237 live on page 8, a still-allocated leaf page (in-page free blocks from rows deleted in place). undark (byte-by-byte) and fqlite (in-page free-block carver) reach them; our carver scans only freelist pages by design, so it skips them — the same safety property (never re-surface content from an allocated page) seen in the DC3 corpus.
- Rows 201..=236 and 250 are recovered by no tool: their cells were overwritten by the freelist trunk header / leaf-pointer array when the pages were freed.
Both divergence sets are encoded as explicit, asserted exemptions in the test
(FIXTURE_IN_PAGE_DIVERGENCES / FQLITE_IN_PAGE_DIVERGENCES for the allocated-page rows;
FQLITE_TRUNK_PAGE_DIVERGENCES for the trunk-page rows). Each is asserted to be a real
disagreement, so a future carver change that closes a gap fails the test and forces the
exemption to be re-derived rather than silently passing.
Corpus 2 — DC3 sqlite_dissect test corpus (independent input and independent oracle)¶
The Department of Defense Cyber Crime Center (DC3) sqlite_dissect test databases were
authored by neither us nor undark's author, so for these cases neither the input DB nor
the oracle is ours — the strongest Doer-Checker form. Provenance + hashes are in
tests-oracle-corpus/README.md and docs/corpus-catalog.md. The DBs with carvable
deleted records:
| DB | table cols | freelist_count | undark recovers | fqlite recovers | our carver recovers | agreement |
|---|---|---|---|---|---|---|
corpus_01-01.db |
4 | 0 | 10 | 6 | 0 | documented gap |
corpus_01-02.db |
4 | 0 | 10 | 6 | 0 | documented gap |
corpus_03-02.db |
4 | 0 | 11 | 7 | 0 | documented gap |
corpus_07-01.db |
4 | 0 | 19 | 7 | 0 | documented gap |
corpus_0A-01.db |
6 | 1 | 20 | 20 | 0 | documented gap |
corpus_0A-02.db |
6 | 1 | 10 | 19 | 0 | documented gap |
Both independent oracles (undark and fqlite) recover deleted rows from these in-page / dropped-table DBs; our freelist-only carver recovers none — the same documented scope boundary, now corroborated by two tools rather than one.
Divergence — our carver recovers 0 from every DC3 case (documented scope boundary).
This is the load-bearing independent finding. These DBs delete records without freeing
whole pages onto the freelist (freelist_count = 0 for the in-page cases) or drop a
table entirely (0A-01/0A-02 have no table in sqlite_master; the dropped table's
page went on the freelist). The deleted content therefore lives in free blocks inside
still-allocated b-tree pages or in dropped-table pages, neither of which our
freelist-page scan covers. undark, scanning byte-by-byte, recovers them.
We did not "fix" this by bolting on in-page free-block carving: that is a new capability (a feature), not a bug in the freelist path, and adding it under a validation task would exceed scope. It is recorded here honestly as the carver's current boundary and asserted explicitly in the test (each DC3 case asserts our carver recovers 0 here — if a future in-page carver lands, the assertion fires and forces a re-reconciliation against undark rather than passing silently). On the cases where undark and ours overlap, content agreement is required and holds (vacuously, since our set is empty); our carver produces no false positives on any DC3 DB.
What this validates, and what it does not¶
- Validates: the freelist-page carving path — the scenario our carver targets — is consistent with two independent tools' recovery (100% content agreement, no false positives; 99.4% recall vs undark, and full agreement vs fqlite outside the trunk-page rows fqlite structurally skips).
- Does not validate / out of scope: in-page free-block recovery and dropped-table recovery. Both undark and fqlite recover these; our carver does not — surfaced here as the documented divergence and the candidate next feature, not claimed as working.
- Epistemic stance: carved records remain confidence-graded observations ("consistent with a deleted row"); this validation likewise establishes consistency with two independent oracles, not proof of correctness.