Skip to content

Protect rootkey storage against breaking changes #45

@gmaclennan

Description

@gmaclennan

Problem

RootKeyStore (Android + iOS) holds the device's identity. A single
breaking change — an alias rename, a SharedPreferences key rename, an
algorithm tweak, an envelope-field rename — would silently make every
upgrading user's persisted rootkey unreadable, which is identity loss.
The cost is asymmetric: the change is one line, the consequence is
unrecoverable.

We currently rely on (a) careful review of RootKeyStore.kt /
RootKeyStore.swift PRs, and (b) the docstrings in the file. Neither
catches a developer who's confidently wrong.

This issue scopes the protections we should add.

What kinds of changes break previously-persisted rootkeys?

A non-exhaustive enumeration. Each of these is a one-line change that
silently breaks reads on existing installs:

  1. Rename WRAPPER_KEY_ALIAS (Android) → loadWrapperKey returns
    null → throws.
  2. Rename PREFS_NAME or PREFS_KEY (Android) →
    prefs.getString(PREFS_KEY, null) returns null → goes to
    first-install path → identity loss.
  3. Change KeyGenParameterSpec flags (Android, e.g. set
    setUserAuthenticationRequired(true) or change accessibility) →
    existing wrapper key may invalidate or the new key isn't compatible
    with old envelopes.
  4. Change kSecAttrService / kSecAttrAccount (iOS) →
    SecItemCopyMatching returns errSecItemNotFound → throws.
  5. Change kSecAttrAccessible (iOS) → existing item still
    readable on most paths but inconsistencies in restore semantics.
  6. Add/remove/rename envelope JSON fields (Android) →
    parseEnvelope throws or reads stale data.
  7. Bump ENVELOPE_VERSION constant without a migration path
    parseEnvelope throws "Unsupported rootkey envelope version".
  8. Change GCM_TAG_LENGTH_BITS / GCM_IV_LENGTH_BYTES
    existing ciphertext fails to decrypt.
  9. Change ROOTKEY_BYTE_LENGTH → length check throws.
  10. Refactor that introduces silent regeneration on read failure
    (e.g. catching RootKeyException and falling through to
    generate). This is the worst category — silently turns existing
    users into "first launch".
  11. Migration logic that deletes the legacy entry before verifying
    the new one
    → window of identity loss if the verify fails.

Protections, grouped by class

A. Test-based protections

A1. Fixture-based round-trip tests (highest value, medium cost)

Plant a known-good envelope (encrypted under a test-fixed wrapper
key) in test resources. Test that the current RootKeyStore decrypts
it back to the expected 16 bytes. Any change that affects the
read path — algorithm, IV, schema, parsing, length checks, defaults —
breaks this test loudly before the bad code ships.

Cost: requires injecting the wrapper-key source so tests can use a
fixed key (production keeps AndroidKeyStore / Keychain). Small
refactor (~30 LOC each platform) to introduce a WrapperKeyProvider
interface or constructor seam.

A2. Envelope schema snapshot test (medium value, low cost)

Unit test that asserts the JSON envelope produced by
generateAndPersist has exactly the expected fields with expected
types. Adding a field, renaming ivnonce, etc. fails the test.
Smaller scope than A1 but catches the schema-change subset cheaply.

A3. Cross-version migration tests (highest value, high cost)

Maintain historical envelope fixtures (V1, V2, ...) in test
resources. Every release runs "v1 envelope must still be readable by
current code" tests. When we eventually do introduce a v2 wrapper
key, the v2 release adds a "v1 envelope must migrate to v2 cleanly"
test that survives forever.

Cost: ongoing maintenance — each format change needs a fixture
preserved and a test added.

A4. Maestro end-to-end test (covered by #39 already)

Cold-launch app, read MapeoManager.deviceId, kill app, cold-launch,
assert deviceId matches. Catches the very worst case ("rootkey is
gone after restart") but doesn't isolate which layer broke.

B. Build-time / process protections

B1. CODEOWNERS pinning RootKeyStore + envelope format docs (low
cost, real value)

/.github/CODEOWNERS:

android/src/main/java/com/comapeo/core/RootKeyStore.kt @<reviewer>
ios/RootKeyStore.swift                                  @<reviewer>
docs/root-key-storage-and-migration-plan.md             @<reviewer>

Forces a designated reviewer on every change, regardless of who
opened the PR. Doesn't prevent breaking changes, but makes accidental
ones much less likely to land.

B2. PR template question (low cost, marginal value)

"Does this PR touch RootKeyStore? If yes, link the migration test
covering the change."

B3. File header DO-NOT-CHANGE banner (low cost, marginal value)

A loud comment block at the top of each RootKeyStore listing the
constants that affect persistence and what each one's renaming /
rotation means. Doesn't stop anyone, but makes "I didn't realise"
defenses harder.

B4. Lint rule (medium cost, low value — overkill)

A custom Detekt / SwiftLint rule that flags any change to the
constants. Probably not worth the maintenance.

C. Runtime protections

C1. Existing: parseEnvelope throws on missing fields, version
mismatch.
Already in place. No-op for this issue.

C2. Existing: read-back verification in generateAndPersist.
Already in place. No-op.

C3. Existing: no silent regeneration. Already in place via
RootKeyException from every read failure. The risk is future
refactors silently catching this; the linchpin is test coverage.

D. Production telemetry

D1. Crashlytics / Sentry alert on RootKeyException (high value,
ongoing operational cost)

Catch identity loss in the wild on the first user who hits it.
Threshold-based alerting on a baseline of zero.

D2. "First install" rate per release (high value, ongoing cost)

A sudden spike in RootKeyStore: generated for first install log
events after a new release means that release inadvertently broke
the read path for existing installs.

Both D1 and D2 require log/event infrastructure we don't have today;
this is a longer-horizon project.

Prioritised recommendation

Land in this order:

  1. B1 (CODEOWNERS) — half-day, catches ~80% of careless mistakes.
  2. A2 (envelope schema snapshot test) — half-day, low maintenance,
    catches schema regressions immediately.
  3. A1 (fixture round-trip tests) — 1–2 days including the
    WrapperKeyProvider refactor. Highest test-coverage value.
  4. B3 (file header banner) — 30 minutes, free signal.
  5. A3 (cross-version migration tests) — defer until we actually
    have a v2 format to test against. Add the test scaffolding when
    v2 is introduced; don't pre-build it for hypothetical futures.
  6. D1 + D2 (production telemetry) — track via separate
    observability epic (out of scope here).

Acceptance for the first-pass implementation

  • CODEOWNERS file added pinning the three rootkey-related paths.
  • Android: RootKeyStoreSchemaTest.kt (instrumented) verifying
    the generateAndPersist envelope contains exactly the v1
    fields with the expected types.
  • iOS: RootKeyStoreSchemaTests.swift doing the equivalent
    keychain-attribute check.
  • Android: RootKeyStore accepts an injectable wrapper-key
    source for tests; production wiring unchanged.
  • iOS: RootKeyStore accepts an injectable keychain accessor
    for tests; production wiring unchanged.
  • Android: fixture round-trip test plants a known envelope
    under a fixed wrapper key, asserts current code decrypts it
    back to the expected 16 bytes.
  • iOS: equivalent fixture round-trip test.
  • File header on each RootKeyStore lists the
    identity-affecting constants and the consequences of changing
    each.

A2 + B1 + B3 are the minimum-viable gate; A1 is the load-bearing
test. A3 and D* can come later or via separate issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions