This document records the intended pre-1.0 production format. All on-disk numeric fields are little-endian unless stated otherwise.
Implementations must not serialize on-disk structures with native-endian conversions, memory transmutation, or raw struct layout. Every numeric field must be encoded and decoded with an explicit byte order. Language bindings should expose parsed APIs rather than asking callers to reinterpret raw lockbox bytes.
The CI workflow .github/workflows/endian-interop.yml verifies this contract
by transferring lockbox fixtures between Linux x64, Linux arm64, macOS arm64,
and an emulated big-endian s390x environment. The big-endian job reads a
little-endian-created fixture and emits a fixture that Linux x64 reads back.
The physical unit of the lockbox is a fixed-size encrypted page. Higher level structures such as TOC nodes, file chunks, environment variables, key directories, free-space indexes, and commit roots are encoded as objects inside pages. Public APIs must not expose page management.
The fixed header is 96 bytes and is the only mutable fixed-location structure in the file.
offset size field
0 8 magic: "LBX2HDR\0"
8 2 version: 3
10 2 header flags
12 4 header length: 96
16 8 latest commit-root page offset, or 0
24 8 latest commit sequence
32 8 latest public key-directory offset, or 0
40 16 public lockbox UUID
56 8 reserved
64 32 SHA-256 checksum over bytes 0..64
The lockbox UUID is public metadata. It exists so tools can identify a lockbox even if the file is renamed or moved. It must not be derived from file names, content, recipients, or passwords.
The header checksum is a domain-separated SHA-256 digest. It detects torn or malformed header updates. It is not a security boundary. Security decisions must be based on authenticated pages and authenticated commit roots.
The key-directory pointer remains in the fixed header because users need the key directory before the content key is unlocked. The key directory stores only unlock metadata and must not contain private file metadata.
Every page has the same physical size. The default page size is 8 MiB. Implementations may read and write whole pages for native storage. Browser/WASM clients may fetch page ranges over HTTP using the TOC offsets, but decryption still authenticates a whole page.
offset size field
0 8 magic: "LBX2PAG\0"
8 2 page header version: 2
10 2 public flags
12 4 header length
16 8 page id
24 8 commit sequence that wrote this page
32 12 AEAD nonce
44 4 encrypted body length
48 16 reserved header extension
64 32 SHA-256 checksum over bytes 0..64
H m encrypted body
H+m p zero padding to the fixed page size
The nonce is generated per page write and must be unique for the content key. It must not be derived only from record kind, object kind, or commit sequence.
Only the page header is public. Object kinds, object lengths, logical paths, symlink targets, environment variable names, permissions, compression selection, and file contents are inside the encrypted body.
Page public-header checksums are generated and verified at the page cache/page-codec boundary. Higher-level TOC, file, recovery, and extraction code must not bypass the page cache for normal page reads or writes.
Page AEAD associated data includes:
- lockbox format domain string
- fixed header version
- lockbox UUID
- page id
- commit sequence
- public flags
- encrypted body length
The decrypted page body is an object container.
offset size field
0 1 page body version: 1
1 1 compression algorithm
2 1 compression profile
3 1 reserved
4 8 uncompressed object-stream length
12 4 reserved
16 n compressed or uncompressed object stream
Compression is chosen by the core per page body:
- default writes try zstd with the normal profile
- if compression is larger or not useful, the body is stored uncompressed
- compaction may use a higher-ratio internal zstd profile
- the chosen algorithm and profile are stored inside encrypted metadata
The public API should not expose many compression modes. Normal callers get the default policy. Maintenance commands such as compaction may choose the archival profile internally.
The production file-data layout is page-packed compressed extents, not one compressed object per physical page. A fixed 8 MiB physical page is a container. Its encrypted body may contain:
- many complete compressed small files
- many compressed chunks from one or more files
- one fragment of a large compressed chunk
- a mix of complete chunks and fragments, as long as the page body fits
Large files are compressed as independent bounded frames rather than one whole-file solid stream. A frame may span multiple physical pages, but it remains independently decompressible once its page fragments have been fetched and decrypted. TOC chunk entries identify the logical file offset, logical length, compressed length, compression algorithm, frame id, and ordered physical page fragments needed to reassemble that frame. Each physical fragment reference contains the page offset, fixed page length, encrypted object id, compressed frame offset, and fragment length.
This gives browser and web-service clients fast random access at frame granularity:
- fetch the TOC pages
- decrypt the TOC
- locate the compressed frame or frames for the requested file/range
- request only the physical pages containing those frame fragments
- decrypt the pages, reassemble the compressed frame bytes, and decompress
Recovery does not depend solely on the TOC. File fragment metadata is inside
encrypted page bodies and includes path, permissions, optional final file length,
logical frame offset, frame length, compression algorithm, frame id, compressed
frame length, compressed fragment offset, and fragment length. Streaming writes
may store 0 for the final file length because the final length is not known
when early frames are written; the TOC is authoritative when available, and
recovery can still infer a best-effort length from intact frames.
Paths remain private because both TOC entries and fragment metadata are inside encrypted page bodies. They are exposed only after the caller has unlocked the content key.
Encryption and decryption are owned by the page cache. Higher layers construct or consume decoded page objects and are otherwise oblivious to encryption. On read, the cache loads fixed encrypted bytes from storage, authenticates and decrypts the page once, then caches the decoded page. On write, callers submit decoded page objects; the cache encodes, compresses, encrypts, writes one fixed page to storage, and stores the decoded page in cache.
Raw page encode/decode helpers are format primitives. Production read/write paths should route through the cache boundary. Direct raw decoding is reserved for recovery scans and low-level format tests, where the caller starts from untrusted bytes rather than from an opened lockbox.
The object stream contains typed objects. Object headers are encrypted because they are part of the page body.
offset size field
0 1 object kind
1 1 object header version
2 2 object flags
4 8 object id
12 8 object payload length
20 n object payload
Object ids are stable references used by TOC entries and indexes. A logical file may reference one or more file-data objects. Multiple small logical files may be packed into one file-pack object.
Initial object kinds:
1 commit root
2 TOC leaf node
3 TOC internal node
4 file data
5 packed file data
6 symlink
7 env set
8 env delete
9 key directory
10 free-space index leaf
11 free-space index internal
The fixed header points to the latest commit-root page offset. The commit root is an encrypted object inside that page.
The commit root payload contains:
field
commit sequence
lockbox UUID
format parameter set id
TOC root object reference
free-space index root object reference
primary key-directory offset, or zero
key-directory mirror offset A, or zero
key-directory mirror offset B, or zero
key-directory generation
previous commit-root reference, or zero
commit creation timestamp, optional and coarse
commit flags
Opening a lockbox reads the header, decrypts the commit-root page, validates the commit root, then opens the referenced TOC and free-space indexes. If the header is corrupt or stale, recovery may scan pages for valid commit roots and choose the highest valid sequence.
Rollback attacks on a standalone copied file cannot be fully prevented without an external freshness anchor. Lockbox detects internal corruption; it cannot prove that an attacker has not replaced the entire file with an older valid copy.
An external freshness anchor is state outside the lockbox that records the latest known version, generation, hash, or signed timestamp. Examples include a server-side object generation number, transparency log entry, signed manifest, append-only audit log, or application database row. Lockbox can reject stale internal metadata within one file by choosing the highest authenticated generation; only an external anchor can detect replacement of the whole file with an older but internally valid lockbox.
The TOC is a live-only copy-on-write BTree. Tombstones are not stored in the current TOC. Deletes remove entries from the live TOC and return old object/page references to the free-space index once they are no longer referenced.
Leaf payloads contain sorted manifest entries. Internal payloads contain sorted child separators and child object references.
Decode rules are intentionally strict:
- leaf entries must be strictly sorted by logical path
- duplicate leaf paths are corrupt
- internal children must be strictly sorted by separator path
- duplicate separators are corrupt
- child references must resolve to valid TOC objects
- every stored path must pass logical path validation
- missing or corrupt child objects make the TOC corrupt
Updating a file rewrites the touched TOC leaf and changed ancestors. Unchanged TOC pages remain referenced by the previous and current commit roots until compaction reclaims unreachable history.
Reusable physical pages and reusable free regions are tracked by a transactional free-space index committed with the same commit root as the TOC.
The index is maintained in two logical orders:
- by offset/page id, for coalescing adjacent free ranges
- by size, for best-fit allocation
The free-space index is a performance and space-reuse structure, not the source of user-visible truth. If it is corrupt, tools may rebuild it by scanning valid pages and comparing reachable objects from the latest valid TOC and commit root.
The root object may be either a free index leaf or a free index internal
object. Leaf payloads contain sorted non-overlapping (offset, length) free
ranges. Internal payloads contain sorted child references:
offset size field
0 1 free-index version: 1
1 1 node kind: 0 = leaf, 1 = internal
2 2 reserved
4 4 entry count
8 n leaf ranges or internal children
Leaf entries are (free_offset, free_length). Internal entries are
(first_free_offset, child_page_offset). Children must be strictly sorted by
first_free_offset. Free-index pages are append-only during commit so the
published index never lists the page that stores the index itself.
The key directory is public unlock metadata referenced by the fixed header and mirrored in the commit root. It stores only slot ids, slot kinds, salts/ciphertexts, public recipient wrapping data, and encrypted content-key bytes. It must not store paths, file names, environment variable names, or file contents.
The key directory is intentionally readable before the content key is available. Its wrapped content-key values are authenticated by their wrapping algorithms; its outer structure is length-limited and protected by SHA-256 checksums so tools can reject malformed metadata early.
Every key-directory block has its own public recovery header:
offset size field
0 8 magic: "LBX2KEY\0"
8 2 key-directory version: 3
10 2 flags
12 4 header length: 128
16 8 total key-directory length
24 8 key-directory generation
32 16 lockbox UUID
48 4 copy index
52 4 reserved
56 32 SHA-256 checksum over key-slot payload
88 8 reserved
96 32 SHA-256 checksum over bytes 0..96
128 n key-slot payload
The lockbox writes three copies of the key directory for every key-directory
generation: a primary copy referenced by the fixed header, plus two mirror
copies referenced by the commit root. Recovery can also scan the raw lockbox for
LBX2KEY\0 blocks, validate their checksums, group them by lockbox UUID, and
use the highest generation that successfully unwraps the content key.
The fixed header is therefore a fast path, not the only path. If the header is corrupt, password/public-key unlock can recover the lockbox UUID and content key from a scanned key-directory mirror, then use those values to authenticate and decrypt pages while scanning for the latest valid commit root.
Removing a password or recipient is not just a metadata delete. Because old COW history may contain old key directories or data pages, the CLI must treat key removal as a conservative maintenance operation:
- remove the key slot from the live key directory
- rewrite reachable encrypted pages as needed under the retained content key or a new content key
- compact unreachable old pages
- commit the new key directory and free-space index
The core uses a unified page cache for reads and dirty writes. Clean decoded
pages are held in a weighted LRU cache. Dirty pages stay in the cache and are
visible to reads from the same opened lockbox, but they are not written to the
backing store until commit() flushes them and publishes a new commit root.
There is no background writer.
Copy-on-write happens at commit time. This allows the same dirty page to absorb multiple logical mutations before the library allocates and writes replacement pages.
When a file, symlink, or environment variable is deleted or replaced, the commit path must redact the physical page that held the old encrypted object. If the old page also contains live objects, those live objects are relocated to a new page first; then the old physical page is overwritten with zeros and removed from the decoded-page cache. This is required because old COW pages may still contain decryptable ciphertext even after the live TOC no longer references them.
CacheLimit::Auto is page-aware:
- minimum native cache: max of eight pages or 64 MiB
- native target: about 15% of currently available/reclaimable memory
- native cap: 4 GiB by default
- WASM default: 64 MiB unless the embedder supplies an explicit limit
The cache is a performance mechanism, not a correctness requirement. TOC traversal, recovery, and extraction must all work when the cache is disabled.
Recovery scans pages from after the fixed header. It does not require a valid fixed header, TOC, or free-space index.
Recovery can:
- authenticate and decrypt intact pages independently
- locate valid commit roots
- rebuild a best-effort live view from the highest valid commit root
- salvage file objects whose metadata can still be associated with paths
- report intact, corrupt, and lost counts
Recovery is not an undelete guarantee. Once freed pages or free regions have been overwritten, the old objects are no longer recoverable.