AFF4 Implementation Notes¶
Format quirks and empirically verified behaviour, for contributors. Every byte-level claim here is reconciled against the AFF4 reference corpus and pyaff4 (see Corpus Validation).
1. AFF4 is a ZIP container with RDF/Turtle metadata¶
AFF4 stores forensic images as standard ZIP archives. The reader reads through
zip-forensic-core (our read-only forensic ZIP reader); the third-party zip crate
is a test-only fixture writer (see
docs/decisions/0005-read-via-zip-forensic-core-writer-is-test-only.md). The metadata
is an RDF/Turtle document named information.turtle inside the ZIP. Disk-image data
is stored as "bevy" segments:
{base}/{segment_idx:08x} ← chunk data for this bevy
{base}/{segment_idx:08x}.index ← chunk index for this bevy
{base} derives from the stream ARN with the aff4:// scheme stripped. Real
Evimetry / aff4-imager images URL-encode the IRI as aff4%3A%2F%2F{uuid}/…;
synthetic fixtures use the bare path. The reader detects which form the ZIP uses.
2. Bevy index: 12-byte (offset, length) entries¶
The .index file is a packed array of 12-byte little-endian entries, one per
chunk: (u64 byte_offset, u32 length) — the chunk's position and stored (possibly
compressed) size within the bevy segment.
let base = chunk_in_seg * 12;
let offset = u64::from_le_bytes(index[base..base + 8]) as usize;
let length = u32::from_le_bytes(index[base + 8..base + 12]) as usize;
let (start, end) = (offset, offset + length);
A zero-length entry marks a sparse (all-zero) chunk. A chunk whose stored
length equals aff4:chunkSize was written uncompressed — copy it verbatim,
regardless of the stream's compressionMethod.
This 12-byte layout is verified by reproducing Evimetry's stored aff4:hash
digests; a 4-byte cumulative-end interpretation reproduces none of them and mis-reads
every real image (it reads Base-Linear sector 0 as zeros instead of its MBR).
3. Compression¶
aff4:compressionMethod selects the codec:
aff4:NullCompressor(or absent) → raw bytesaff4:DeflateCompressor→ zlib (RFC 1950, 2-byte header + Adler-32) —flate2::read::ZlibDecoder, not raw DEFLATE<http://code.google.com/p/snappy/>→ raw Snappy —snap::raw::Decoder<https://github.com/lz4/lz4>→ LZ4 frame (aff4-imager) —lz4_flex::frame
All five AFF4 Standard reference images use Snappy.
4. RDF/Turtle parsing is intentionally minimal¶
Rather than a full Turtle parser: normalize whitespace and ; to spaces, split on
" . " (the RDF node delimiter), find the relevant block, and extract the IRI
(<…>) and predicate-value pairs by token scanning. This handles aff4-cpp,
aff4-imager, and pyaff4 output. One real-world quirk it must absorb: pyaff4 writes a
trailing comma attached to a hash datatype (…"^^aff4:MD5,), so the datatype token
is trimmed of trailing non-alphanumerics.
5. Geometry validation is mandatory before opening¶
aff4:chunkSize and aff4:chunksInSegment feed division in read_chunk; a value
of 0 would divide-by-zero. Both are rejected at parse time with a BadFormat error.
This is a consequence of the reader's arithmetic, not stated in the spec.
6. Map streams and symbolic fills¶
Evimetry images use an aff4:Map as the top-level stream: a binary /map of
28-byte entries (map_offset, length, target_offset, target_id) plus an /idx
list of target URIs. A virtual address resolves to an ImageStream region or a
symbolic stream:
| Target | Fill |
|---|---|
aff4:Zero |
0x00 |
aff4:SymbolicStreamFF / aff4:SymbolicStream{XX} |
constant byte 0xXX |
aff4:UnknownData |
tile UNKNOWN |
aff4:UnreadableData |
tile UNREADABLEDATA |
Tile fills follow pyaff4: byte(p) = seed[(p % 1_048_576) % seed.len()] with
p = target_offset + offset_within_region. The 1 MiB modulus introduces a seam at
each 1 MiB boundary. Base-Allocated fills unallocated regions with UnknownData;
this is the substance behind the corpus's "allocated" maps.
7. AFF4-Logical (AFF4-L)¶
An AFF4-L container stores logical files as named ZIP segments described by
aff4:FileImage nodes (path, aff4:size, aff4:hash, timestamps). It has no
virtual disk and no bevy/chunk/map machinery — LogicalContainer reads each file's
content straight from its ZIP segment. Validated against pyaff4's dream.aff4.
8. Encryption: refuse by default, decrypt with a key¶
aff4:EncryptedStream (AES-XTS, password-wrapped keybag) is decrypted with a
password via LogicalContainer::open_encrypted — PBKDF2-HMAC-SHA256 → RFC 3394 key
unwrap → AES-128-XTS. The passwordless Aff4Reader::open still refuses with
Aff4Error::Encrypted (secure-by-default), never decoding ciphertext as plaintext,
and certificate/public-key keybags are refused as unsupported. See
docs/decisions/0002-encrypted-streams-secure-by-default-decryption.md.
9. Container-kind detection¶
aff4::container_kind(&Path) -> ContainerKind {Disk, Logical, Encrypted} classifies
a container from its information.turtle alone (no full reader open), so a consumer
can dispatch without reaching into the RDF or try/catching openers. See
docs/decisions/0004-container-kind-detection-probe.md.