forensic-catalog — Data Architecture

How raw forensic bytes become typed, MITRE-tagged, triage-prioritised records

STRUCTURED DECODE PATH FAST PATH Registry Key Value HKLM · HKCU · NTUSER.DAT raw bytes from RegQueryValueEx / hive parser File / Filesystem Artifact Prefetch · Amcache · EVTX · MFT · LNK · Hive raw bytes from disk image, live acquisition, or triage tool Event Log / Memory Region EVTX record · process memory · shimcache entry raw bytes from WinAPI, live memory, or image ContainerSignature — format recognition header_magic: &[u8] · footer_magic: &[u8] · header_offset: usize · min_size: usize invariants: &[&str] — structural rules that must hold (e.g. "regf magic at offset 0") ArtifactDescriptor THE SCHEMA — const-constructible, lives in static memory, zero heap cost Location hive: HiveTarget key_path: &'static str value_name: Option<&str> file_path: Option<&str> os_scope: OsScope scope: DataScope Decode Spec decoder: Decoder fields: &[FieldSchema] retention: Option<&str> ↑ defines output shape Threat Context mitre_techniques: &[&str] Triage triage_priority Critical / High Medium / Low Provenance sources: &[&str] authoritative URLs FieldSchema defines each output column name: &'static str value_type: ValueType Text · Integer · UnsignedInt Timestamp · Bytes · Bool · List description: &'static str is_uid_component: bool ↑ participates in uid hash BinaryField for BinaryRecord decoder variant name: &'static str offset: usize field_type: BinaryFieldType U16Le · U32Le · U64Le · FiletimeLe Decoder — transform engine selected by ArtifactDescriptor.decoder variant · operates on raw bytes · no allocation in hot path Rot13Name FiletimeAt{offset} BinaryRecord(fields) MruListEx MultiSz ROT13 on value name FILETIME → ISO 8601 little-endian struct MRU order REG_MULTI_SZ ArtifactRecord — decoded output owned · heap-allocated · fully typed · ready for downstream consumption uid: String built from FieldSchema.is_uid_component fields fields: Vec<(&'static str, ArtifactValue)> zipped from FieldSchema names + decoded values timestamp: Option<String> ISO 8601 UTC when present mitre_techniques: Vec<&str> carried from descriptor confidence: f32 · meaning: String ArtifactValue (enum) variant per field in the record Text(String) Integer(i64) UnsignedInt(u64) Timestamp(String) ← ISO 8601 Bytes(Vec<u8>) List(Vec<ArtifactValue>) ForensicCatalog — query API by_id(id) · filter(query) · by_mitre(technique) · for_triage() filter_by_keyword(kw) · decode(descriptor, name, bytes) → ArtifactRecord CATALOG: static ForensicCatalog — zero allocation at startup lookup by id / filter Threat Context (carried) set on descriptor · present in record mitre_techniques[] T1547.001 · T1218.011 ··· triage_priority Critical(3) High(2) Med(1) Low(0) sources[] — authoritative URLs Indicator Tables flat lookup · no decode is_suspicious_port(u16) is_windows_lolbin(&str) is_suspicious_port(u16) identify_application(&str) all_lolrmm_paths() Input: primitive Output: bool / &str No schema, no decoder Zero heap allocation → match / no-match detection · triage rule SIEM integration LEGEND Evidence / Input Schema / Struct Transform / Decoder Output / Record Query API Threat Context decode pipeline fast path v0.1.0

ArtifactDescriptor is the schema

Decoder pipeline

Two distinct data paths