Self-Serve MDR: Security review / threat model

Umbrella tracking issue for the security posture of the self-serve MDR feature shipped under issues #882, #883, and #884. Filed at PM request for visibility.

Each item below is a gap or concern that warrants its own analysis (and, in most cases, its own follow-up issue or PR). Pre-existing security findings caught during the 2026-05-26 dev re-deploy session are at the bottom.

## Authentication / identity

- [ ] **MFA for Cognito self-serve** — Cognito user pool currently does not enforce MFA. Acceptable for evaluators; revisit before any production-grade use.
- [ ] **Email-domain allowlist** — Any verifiable email can register today. Should self-serve restrict to specific domains, or rely on the abuse controls in #917?
- [ ] **Rate limiting + CAPTCHA + tenant TTL** — Already tracked in [#917](https://github.com/LIF-Initiative/lif-core/issues/917).

## Cryptographic material

- [ ] **HMAC secret sharing** — `mdr__auth__jwt_secret_key` currently signs HS256 JWTs *and* HMACs the workspace cookie *and* HMACs invite tokens. Rotating the key atomically invalidates all three (operationally noisy). Consider separating per concern. See `components/lif/mdr_auth/{workspace_cookie,invite_token}.py`.
- [ ] **Secret rotation runbook** — How is `MdrAuthJwtSecretKey` rotated, and what's the downstream blast radius? Not currently documented.

## Invite tokens

- [ ] **Reusable until expiry** — Invite tokens are self-contained (no server-side store), so they are effectively reusable until the 7-day TTL. Single-use enforcement is explicitly deferred from v1. Decide if this is acceptable for the demo audience or needs to ship before broader rollout.
- [ ] **Inviter accountability** — Token carries the inviter's Cognito sub but no audit log on the server side; if an invite is misused, we can't trace it back without DB-level forensics.

## Tenant isolation correctness

- [ ] **Cross-tenant query test** — Tenant routing depends entirely on `SET search_path` per request. We should have an integration test that asserts a request bearing tenant A's cookie cannot read data in tenant B's schema, including via SQL injection vectors and accidental superuser fallthrough.
- [ ] **`search_path` fallback** — When the resolved tenant schema doesn't exist (e.g., the [bug we hit on 2026-05-26](https://github.com/LIF-Initiative/lif-core/pull/PLACEHOLDER) where Lambda failed silently to provision), what's the documented fallback? Should it be "deny" rather than "silently use the next schema in search_path"?

## IAM / least privilege

- [ ] **Post-confirmation Lambda IAM scope** — The Lambda's role currently grants SSM read for one specific key + Cognito `AdminAddUserToGroup`. Audit whether it's exactly that or wider.
- [ ] **MDR API task role** — Does it have any privileges (e.g., S3, KMS) beyond what tenant routing needs?

## Logging / audit / leakage

- [ ] **DATABASE_URL leaked in logs** — Tracked separately: [#938]. Pre-existing, not from #884, but lives in self-serve-adjacent code. Linked here for completeness.
- [ ] **No audit trail for workspace joins** — Who joined what tenant, when, by whose invite? No log today. Required for compliance / forensics if/when this feature graduates from "evaluator" to "real customer."
- [ ] **Cognito sub in client-readable strings** — Each user's auto-created group name is `eval-<sub>`. Subs aren't strictly secret, but they're stable identifiers; surfacing them in UI / cookies / URLs is a minor information-leak concern.

## Operational (CFN drift; demo prep finding)

- [ ] **Stack drift detection** — On 2026-05-26 we discovered the dev CFN stacks were 6 weeks stale: code in `main` had moved past what was deployed. Should we wire a drift detector that alerts when `dev-lif-mdr-cognito` or `dev-lif-mdr-api` falls more than N commits behind `main`? Same risk on demo.
- [ ] **Flyway migration application gating** — The 6-week drift hid the fact that V1.2/V1.3/V1.4 migrations had never run on dev. ECS task expected new schema state; DB was on V1.1. Recovery required a SAM redeploy. Consider failing MDR startup health-check if `flyway_schema_history` doesn't match the expected highest version baked into the image.

## Related issues

- #917 — Self-Serve MDR: Registration abuse + cost controls
- #936 — MDR frontend: Display currently-selected workspace
- #673 — Dependabot for dependencies
- (Linked from items above as they're created)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-Serve MDR: Security review / threat model #937

Authentication / identity

Cryptographic material

Invite tokens

Tenant isolation correctness

IAM / least privilege

Logging / audit / leakage

Operational (CFN drift; demo prep finding)

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Self-Serve MDR: Security review / threat model #937

Description

Authentication / identity

Cryptographic material

Invite tokens

Tenant isolation correctness

IAM / least privilege

Logging / audit / leakage

Operational (CFN drift; demo prep finding)

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions