Input conventions¶
The most dangerous failure for a timestamp tool is silently wrong output: a value decoded under the wrong input encoding produces a valid-looking date that is simply incorrect. This chapter maps the encoding ambiguities and how timeglyph handles them.
Format is not encoding¶
A reading needs two independent things:
- Format (semantics) — the epoch, unit, calendar, and leap policy (e.g. "FILETIME: 100 ns since 1601 UTC").
- Encoding (presentation) — how the raw input represents the value(s) the format consumes: radix, endianness, width, signedness, and field packing.
The same FILETIME instant can arrive as a decimal QWORD (132223104000000000),
as little-endian hex bytes (a registry .reg export), or as a big-endian dump.
The format is identical; the encoding differs. Conflating the two is what produces
silent errors.
The axes of input ambiguity¶
| Axis | Example trap |
|---|---|
| Representation | decimal vs hex vs float vs string for the same value |
| Endianness | hex bytes read LE vs BE — different numbers entirely |
| Width & signedness | 0xFFFFFFFF as -1, as a sentinel, or as unsigned 2106 |
| Packing & field order | which 16-bit word is the date vs time; on-disk byte order vs a packed integer |
| Sub-second sub-encoding | OLE's "two ints separated by a dot" vs a real IEEE double |
The FAT case (a worked example)¶
FAT/DOS packs a 16-bit date word and a 16-bit time word. The DOS packed
convention is date-word-then-time-word, but a FAT directory entry stores
time-word then date-word (each little-endian). The same four bytes therefore
mean two different instants depending on word order — feeding raw directory bytes
under the DOS order silently swaps date and time into a plausible wrong date.
timeglyph's hex path decodes both orders and labels each
(date|time vs time|date (directory order)), so the analyst chooses rather than
being silently misled.
Things a naïve design gets wrong¶
These are real traps (several flagged in an adversarial design review):
- Width inferred from length.
00000001may intentionally be au32— leading zeros are evidence, not decoration. Never guess width from string length. - UUID/GUID textual order ≠ raw byte order, and UUIDv1 rearranges its timestamp fields — decoding the text as raw bytes gives a plausible wrong date.
SYSTEMTIMEis not a timestamp — it is a struct of calendar fields (LE words), and it may be local or UTC depending on the API that wrote it.- Composite/split timestamps — date and time in separate columns; decoding one field alone yields plausible nonsense.
- Hex is a syntax, not an encoding model —
--hex/hexsays "these are bytes", not "this is how the format lays them out". The layout is separate.
How timeglyph stays safe¶
- Ranked candidates, never one verdict — an ambiguous input surfaces every plausible reading, scored, so no single (possibly wrong) answer looks authoritative. See Methodology.
- Explicit, labelled byte layouts — the
hexpath tries LE/BE widths and packed on-disk layouts, each labelled with its assumption. - Sentinels are flagged — magic "unset/never" values do not masquerade as instants. See Sentinel values.
- Pipeline-safe exit codes —
identifyexits2("review needed") when the top reading is a sentinel or tied for the top score, so a script cannot mistake an ambiguous result for a confident one. (0= clear winner,1= error.) - Every reading states its assumptions — including the timezone caveat for local-time formats and the leap-smear caveat for POSIX readings.
Decimal-packed: allowed, but never authoritative¶
Analysts often have a decimal pulled from a database column, not raw bytes, so
timeglyph does not forbid decimal input for packed formats. But because field
and byte order are not derivable from a bare scalar, such a reading is marked
ambiguous and ranked so it never looks authoritative — prefer the raw hex bytes
when you have them.