Deleted-Record Recovery — Measured Capability¶
What sqlite_forensic::carve_all_deleted_records actually recovers, and how it
compares head-to-head against undark and fqlite — every tool scored
against the same independent third-party ground truth: the SQLite Forensic
Corpus (Nemetz, Schmitt & Freiling, DFRWS-EU 2018, CC0), whose authors shipped,
per database, an .xml answer key tagging every deleted row with its full
content.
Both matrices below are harness-computed, not hand-written: the single-tool
matrix by forensic/tests/nemetz_metrics.rs and the three-tool head-to-head by
forensic/tests/nemetz_tool_comparison.rs (run either with --nocapture to
regenerate its table). Corpus and oracle provenance are in
corpus-catalog.md and validation.md.
Executive summary¶
- Freeblock-aware reconstruction (task #56) closed most of the in-page recall
gap at the highest precision of any tool. On the in-page-deletion category
0C, our carver now recovers 63 of the 84 recoverable deleted rows (recall 0.750, up from 23 / 0.274), against fqlite 67 (0.798) and undark 14 (0.167). We do it at precision 0.955 (3 phantoms) versus fqlite's 0.807 (16 phantoms) — i.e. nearly fqlite's recall with roughly five times fewer false rows. - Our carver still never re-surfaces a live row. Across the in-scope corpus
it emits 0 live-re-reads — the structural 0-false-positive guarantee held
through the change. fqlite also holds 0; undark does not — on
0Dit re-reads 56 live rows as deleted (precision 0.091) and on0E27 (precision 0.333). - We Pareto-dominate fqlite on overflow (
0E) — higher recall (0.333 vs 0.222) AND higher precision (1.000 vs 0.500). On0Cand0Dfqlite keeps a recall edge (0.798 vs 0.750; 0.556 vs 0.306) while we keep the precision edge; neither tool dominates the other there. - The mechanism: when SQLite frees an in-page cell it overwrites the cell's
first four bytes (payload-length + rowid varints, the record
header_len, and the leading serial type) with the freeblock header.reconstruct_freeblock_recordsrebuilds each freed cell from its surviving serial-type tail plus a header template derived from a live cell on the same page; the destroyed rowid is surfaced as unknown (0), never invented. See "Freeblock reconstruction". - Two recall denominators are reported because they answer different questions — substrate-limited (carver capability) and end-to-end (examiner usefulness).
- These are honest measurements of each tool against this corpus, not a verdict that any tool is "best": stated plainly below.
The earlier headline ("163 of 163 recoverable, 0 false positives") was a fixture-only measurement against our own
deleted_places.db(whole-freed-page deletion — the one shape our carver handles well) and has been retracted; the numbers here, against independent ground truth that also exercises in-page deletion, supersede it.
Head-to-head — ours vs undark vs fqlite (computed)¶
Every tool's recovered rows are matched against the same answer key by a
format-stable (col1, col2) identity (the two integer/text columns at positions
1 and 2 — name/surname for the text tables, the two non-id integer columns for
the integer tables). These columns uniquely identify every deleted row in every
0C/0D/0E database and are byte-stable across tools; the floating-point columns are
excluded from the key because the three tools render reals at different
precision (ours 5 dp, undark 6 dp, fqlite 8 dp), which would penalise float
formatting rather than recovery. Two databases — 0C-06 and 0C-07 —
carry FLOAT values at positions 1 and 2, so no format-stable cross-tool key
exists for them; they are excluded from this table (our own single-tool matrix
still scores them, rounding reals symmetrically). Categories 0A/0B
(dropped/overwritten tables — no live-vs-deleted anchor) and 11 (anti-forensic
tampering — no deleted answer key) carry no clean row-level deleted set and are
out of scope for a recall table.
Ddel = rows deleted; Drec = of those, byte-present (recoverable substrate);
live = live rows wrongly recovered as deleted (must be 0); recall denominators
as defined under "How the matrix is computed".
| cat | tool | Ddel | Drec | TP | FP | FN | live | recall (substrate) | recall (e2e) | precision |
|---|---|---|---|---|---|---|---|---|---|---|
| 0C | ours | 84 | 84 | 63 | 3 | 21 | 0 | 0.750 | 0.750 | 0.955 |
| 0C | undark | 84 | 84 | 14 | 10 | 70 | 4 | 0.167 | 0.167 | 0.583 |
| 0C | fqlite | 84 | 84 | 67 | 16 | 17 | 0 | 0.798 | 0.798 | 0.807 |
| 0D | ours | 45 | 36 | 11 | 0 | 25 | 0 | 0.306 | 0.244 | 1.000 |
| 0D | undark | 45 | 36 | 1 | 10 | 35 | 56 | 0.028 | 0.022 | 0.091 |
| 0D | fqlite | 45 | 36 | 20 | 0 | 16 | 0 | 0.556 | 0.444 | 1.000 |
| 0E | ours | 12 | 9 | 3 | 0 | 6 | 0 | 0.333 | 0.250 | 1.000 |
| 0E | undark | 12 | 9 | 3 | 6 | 6 | 27 | 0.333 | 0.250 | 0.333 |
| 0E | fqlite | 12 | 9 | 2 | 2 | 7 | 0 | 0.222 | 0.167 | 0.500 |
(Totals exclude 0C-06/0C-07; 0C therefore sums 8 databases, 84 deleted rows.
Regenerate with cargo test -p sqlite-forensic --test nemetz_tool_comparison --
--nocapture, with UNDARK_BIN and FQLITE_TAP set.)
Honest read — who wins where, and why¶
- In-page deletion (
0C): ours and fqlite now lead together; undark trails. With freeblock reconstruction (task #56) our recall rose 0.274 → 0.750, essentially matching fqlite's 0.798 — but at precision 0.955 (3 phantoms) versus fqlite's 0.807 (16 phantoms). fqlite keeps a small recall edge; we keep a clear precision edge and re-read no live row. Neither tool Pareto-dominates the other on0C. - Deleted-then-overwritten (
0D): fqlite leads recall (0.556 vs ours 0.306); ours and fqlite hold perfect precision; undark fails on precision. undark re-surfaces 56 live rows as deleted here (precision 0.091) — it mis-parses these overwritten tables and re-reads the live cells. Our recall rose 0.056 → 0.306 via reconstruction, but the laterINSERToverwrites destroy more substrate than0C, so fqlite's more aggressive (lower-precision elsewhere) reconstruction recovers more here. Both keep 0 live-re-reads. - Overflow (
0E): ours Pareto-dominates fqlite — higher recall (0.333 vs 0.222) AND higher precision (1.000 vs 0.500). undark ties our recall (0.333) but re-reads 27 live rows (precision 0.333); fqlite emits phantoms (0.500). Our carver recovers the rows whose first overflow segment + header survived, with no false positive. (Freeblock reconstruction adds nothing on0E— these records spill to overflow, so the residue is not a simple in-page freeblock.)
The consistent picture: after task #56, our carver matches fqlite's in-page
recall while keeping the highest precision of all three tools on every category
and never re-reading a live row; it Pareto-dominates fqlite on overflow; fqlite
retains a recall edge on 0C/0D only by tolerating more phantoms; undark
trails on both recall and precision and over-reports live rows as deleted.
How the matrix is computed¶
For each database, the carver's output is matched against the answer key by full decoded-row content (schema column order; integers in decimal, reals at 5 decimal places, text verbatim, NULL as empty — the corpus's export format, applied symmetrically to both sides):
- TP — a carved row equal to an answer-key deleted row.
- FP — a carved row equal to neither a deleted nor a live row (a phantom).
- live-re-read — a carved row equal to a live row; counted separately, never folded into FP, so the two very different failure modes stay distinct.
- FN — an answer-key deleted (substrate-recoverable) row no carved row matched.
Recall is reported with two denominators:
- substrate-limited recall = TP /
|D_recoverable|— of the deleted rows whose bytes physically survive in the file (a corpus property computed independently of our carver, by byte-presence), how many did we recover? The carver-capability number. - end-to-end recall = TP /
|D_deleted|— of all rows the workload deleted (some destroyed by later overwrites), how many did we recover? The examiner-usefulness number.
F2 is F-beta with β = 2 (recall-weighted, since missing evidence costs an
examiner more than discarding a low-confidence phantom), over precision and
substrate-limited recall.
Our carver — per-database detail (computed)¶
The head-to-head above totals each tool per category. This section breaks our
carver down per database (from nemetz_metrics.rs), so a low category recall
can be traced to specific files. It differs from the head-to-head in two
deliberate ways, both of which slightly raise the count reported here versus the
head-to-head's ours row: it matches on the full decoded row (all columns,
reals at 5 dp) rather than the (col1,col2) projection, and it includes
0C-06/0C-07 (whose float key columns the cross-tool table must drop). So our
0C total here is 79 (over all ten 0C databases) versus 63 in the head-to-head
(eight databases) — the same carver, two compatible scopings.
Categories: 0C deleted records (in-page free block); 0D deleted then
overwritten; 0E deleted overflow records. Ddel = rows deleted; Drec = of
those, byte-present (recoverable substrate); live = live-re-reads (must be 0).
| DB | Ddel | Drec | TP | FP | FN | live | recall (substrate) | recall (e2e) | precision | F2 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0C-01 | 7 | 7 | 6 | 0 | 1 | 0 | 0.857 | 0.857 | 1.000 | 0.882 |
| 0C-02 | 10 | 10 | 8 | 0 | 2 | 0 | 0.800 | 0.800 | 1.000 | 0.833 |
| 0C-03 | 7 | 7 | 6 | 1 | 1 | 0 | 0.857 | 0.857 | 0.857 | 0.857 |
| 0C-04 | 10 | 10 | 8 | 2 | 2 | 0 | 0.800 | 0.800 | 0.800 | 0.800 |
| 0C-05 | 10 | 10 | 10 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 |
| 0C-06 | 7 | 7 | 7 | 0 | 0 | 0 | 1.000 | 1.000 | 1.000 | 1.000 |
| 0C-07 | 10 | 10 | 9 | 0 | 1 | 0 | 0.900 | 0.900 | 1.000 | 0.918 |
| 0C-08 | 10 | 10 | 9 | 1 | 1 | 0 | 0.900 | 0.900 | 0.900 | 0.900 |
| 0C-09 | 10 | 10 | 5 | 0 | 5 | 0 | 0.500 | 0.500 | 1.000 | 0.556 |
| 0C-10 | 20 | 20 | 11 | 0 | 9 | 0 | 0.550 | 0.550 | 1.000 | 0.604 |
| 0C total | 101 | 101 | 79 | 4 | 22 | 0 | 0.782 | 0.782 | 0.952 | — |
| 0D-01 | 5 | 3 | 1 | 0 | 2 | 0 | 0.333 | 0.200 | 1.000 | 0.385 |
| 0D-02 | 5 | 2 | 1 | 0 | 1 | 0 | 0.500 | 0.200 | 1.000 | 0.556 |
| 0D-03 | 5 | 2 | 0 | 0 | 2 | 0 | 0.000 | 0.000 | 1.000 | 0.000 |
| 0D-04 | 5 | 5 | 1 | 0 | 4 | 0 | 0.200 | 0.200 | 1.000 | 0.238 |
| 0D-05 | 5 | 5 | 0 | 0 | 5 | 0 | 0.000 | 0.000 | 1.000 | 0.000 |
| 0D-06 | 10 | 9 | 1 | 0 | 8 | 0 | 0.111 | 0.100 | 1.000 | 0.135 |
| 0D-07 | 5 | 5 | 3 | 0 | 2 | 0 | 0.600 | 0.600 | 1.000 | 0.652 |
| 0D-08 | 5 | 5 | 4 | 0 | 1 | 0 | 0.800 | 0.800 | 1.000 | 0.833 |
| 0E-01 | 7 | 6 | 3 | 0 | 3 | 0 | 0.500 | 0.429 | 1.000 | 0.556 |
| 0E-02 | 5 | 3 | 0 | 0 | 3 | 0 | 0.000 | 0.000 | 1.000 | 0.000 |
(Dropped/overwritten-table categories 0A/0B carry no row-level deleted set
that anchors a live-vs-deleted distinction — the whole table is gone — so they are
reported as bounded dropped-table recovery, not in the recall matrix; their
correctness is measured by the DC3 differential below.)
Reading the numbers¶
- 0C precision 0.952 (79 TP / 83 carved): the only leaks are the phantom all-empty class (4 across 0C-03/04/08), never a live row. Freeblock reconstruction added 55 TP without adding a single live-re-read.
- 0C recall ≈ 78 % with every deleted row substrate-present: reconstruction now recovers the freeblock-head cells the forward parser missed; the residual FN are mostly 0C-09/0C-10 (whose freed cells have no freeblock chain — they sit in the unallocated gap with a destroyed prefix the template cannot anchor).
- 0D recall rose to ≈ 31 %: reconstruction recovers the freeblock residue,
but later
INSERTs overwrite freed slack (Drec < Ddel) and destroy more than in0C. - 0E (overflow): unchanged — these records spill to overflow pages, so the in-page freeblock template does not apply; the carver recovers the rows whose first overflow segment + header survived.
Freeblock reconstruction¶
This was the dominant FN class before task #56, and the fix that closed it.
When SQLite frees a cell from an allocated page, it converts the cell into a
freeblock (file-format §1.6): the first two bytes become the next-freeblock
pointer and the next two the freeblock size — overwriting the cell's first four
bytes, which span the payload-length varint, the rowid varint, the record
header_len varint, and the leading serial type(s). The record's surviving
serial-type tail and its whole value body remain intact after those four
bytes.
Database::reconstruct_freeblock_records rebuilds each freed cell from that
surviving tail plus a schema template derived from a live cell on the same
page — the table's column count, header length, and the serial types of the
leading columns that fall inside the clobbered prefix (e.g. a fixed-width id
column). It walks the page's freeblock chain (bounded, cycle-safe), reads the
surviving serials at the byte offset where the clobber ends, prepends the
template's leading serials, and decodes the body. The destroyed rowid is surfaced
as unknown (0) — never invented — and the record is graded LOW
(FREEBLOCK_RECONSTRUCT_CONFIDENCE), tagged RecoverySource::FreeblockReconstructed.
Precision is preserved by construction. A reconstructed candidate is emitted
only when every serial type is legal AND the whole record fits within the
freeblock's [offset, offset + size) bounds; the forensic layer additionally
drops any reconstruction whose decoded values match a live row (by value, since
the rowid is gone). The result: 0C recall 0.274 → 0.750 with 0 new phantoms and
0 live-re-reads — fqlite-level recall at higher precision (3 phantoms vs 16).
This is the published forensic technique (Nemetz et al. 2018; Pawlaszczyk & Hummert 2021), implemented from the SQLite file-format spec — not adapted from any GPL tool's source.
What the numbers do and do NOT claim¶
- They claim an honest, reproducible measurement of all three tools against
this independent corpus: after task #56, our carver matches fqlite's in-page
recall at the highest precision of the three and never re-reads a live row;
it Pareto-dominates fqlite on overflow (
0E); fqlite keeps a recall edge on0C/0Dby tolerating more phantoms; undark trails on both and over-reports live rows as deleted on the overwritten and overflow tables. - They do not claim our carver is "best" overall. fqlite still recovers more
raw rows on
0C(0.798 vs 0.750) and0D(0.556 vs 0.306); we trade those few rows for a structural 0-false-positive guarantee and the lowest phantom rate. - A low per-category recall (e.g.
0E) is a true statement about a capability boundary, not a harness artifact — the substrate partition proves the bytes are present and we still miss them.
Inter-tool concordance on our own fixture (agreement, not correctness)¶
Separate from the head-to-head above (which scores each tool against ground
truth), oracle_differential.rs reconciles our output against undark and fqlite
as oracles over our own deleted_places.db fixture — it answers "do we agree
with them?", not "how does each score?". On that fixture (whole-freed-page
deletion — the shape our carver handles), our output matches undark exactly
(163 rows) and matches-or-exceeds fqlite, and on the prior-version fixture we
match fqlite and exceed undark. This is genuine agreement on that deletion
shape, but it is inter-tool concordance, not ground truth — which is why the
Nemetz head-to-head (real answer keys) is the headline.
DC3 sqlite_dissect corpus — a no-false-positive regression set¶
The DC3 corpus carries no deleted-row ground truth: its expected_rows were
found (independently confirmed: freelist_count = 0, contiguous rowids,
expected == SELECT * on the readable DBs) to be the live table content, used
upstream only as a precision allow-list. We therefore keep it solely as a
no-false-positive / NoGenuineDeletion regression set (the carver must not
re-surface those intact live rows) plus a dropped-table recovery check on
0A-01/0A-02. No precision/recall is computed from it.
WAL-resident records¶
Out of scope for the carver: a row that exists only in an uncheckpointed -wal
overlay is live (not yet checkpointed), not deleted, so it is not a carving
target. sqlite-core surfaces the WAL-applied view via
Database::open_with_wal, and the auditor flags an active overlay as
WalUncheckpointedState. Recovering genuinely deleted content from within WAL
frames, with WAL-sequencing ground truth, is the subject of a separate
(NIST CFReDS-based) evaluation not yet landed.