Skip to content

AES-GCM nonce reuse in WAL encryption on LSN rewind / snapshot restore #36

@hollanf

Description

@hollanf

Breaks WAL confidentiality and enables forgery under a realistic operator workflow.

Summary

nodedb-wal/src/crypto.rs::lsn_to_nonce derives the AES-256-GCM 96-bit nonce solely from the 8-byte LSN, with the top 4 bytes hard-coded to zero. There is no per-writer epoch, random prefix, or any value committed to the segment header that would disambiguate nonces across WAL lifetimes.

AES-GCM requires that every (key, nonce) tuple be unique; reusing a nonce with the same key catastrophically breaks both confidentiality (keystream recoverable via XOR) and integrity (GCM auth tag forgery is well-documented once a nonce collides).

Current code

nodedb-wal/src/crypto.rs:186-195

/// Derive a 12-byte nonce from an LSN.
///
/// AES-256-GCM requires a 96-bit (12 byte) nonce. Since LSNs are monotonically
/// increasing and globally unique, they make ideal deterministic nonces.
/// We zero-pad the 8-byte LSN to 12 bytes.
fn lsn_to_nonce(lsn: u64) -> aes_gcm::Nonce<aes_gcm::aead::consts::U12> {
    let mut nonce_bytes = [0u8; 12];
    nonce_bytes[..8].copy_from_slice(&lsn.to_le_bytes());
    nonce_bytes.into()
}

Combined with the writer at nodedb-wal/src/writer.rs:207:

let lsn = self.next_lsn.fetch_add(1, Ordering::Relaxed);

where next_lsn is seeded from recovery::recover() which scans the current WAL file from offset 0 and sets next_lsn = last_lsn + 1. There is no persisted monotonic counter independent of the WAL file's own content.

The KeyRing at crypto.rs:108-113 tracks current + previous keys for rotation but does not carry a nonce prefix.

Why it's broken

Any operator workflow that re-issues already-used LSNs under the same key reuses nonces. Concrete paths:

  1. Snapshot restore + WAL truncation. Restore a snapshot at LSN X, delete the WAL directory (standard restore flow), restart — recover() finds no file, next_lsn = 1. New writes encrypt lsn = 1, 2, 3, … with the same encryption key as previous writes of the same LSNs. The previous ciphertexts exist in backups, off-site replicas, or tape. Attacker XORs matching-LSN ciphertexts → pt_old ^ pt_new recovered → full plaintext if either is known.
  2. Operator clone / replay from backup into a new DB with the same key.
  3. Segment truncation before compaction that resets next_lsn.

Since the nonce space has no random component whatsoever, this is a latent landmine even when current operational procedures happen to rotate the key — anyone who misses the key-rotate step on restore loses confidentiality silently.

Reproduction

# Enable encryption with a fixed key.
nodedb --wal-encrypt-key=<K> ...
# Write some records.
INSERT ... ; INSERT ... ; INSERT ...
# Stop, save ciphertext of the WAL segment, then wipe the WAL dir.
rm -rf $DATA_DIR/wal
# Restart with same key.
nodedb --wal-encrypt-key=<K> ...
# Write DIFFERENT plaintext records.
INSERT ... ;
# Diff the two ciphertexts at matching LSNs — XOR recovers pt_old ^ pt_new.

Notes

  • Found during a CPU/memory audit sweep of nodedb-wal/src/*.
  • No evidence this has been exploited in the wild; filing as a design-level crypto defect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions