Skip to content

fix: handle audit_log pkey unique violation in log_changes (Ecto.ConstraintError) (OPS-4622)#80

Closed
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4622-handle-audit-log-pkey-collision
Closed

fix: handle audit_log pkey unique violation in log_changes (Ecto.ConstraintError) (OPS-4622)#80
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4622-handle-audit-log-pkey-collision

Conversation

@palantir-valiot

Copy link
Copy Markdown

Summary

Add unique_constraint/2 on the audit log primary key (name derived from the configured table, default audit_log_pkey) inside EctoTrail.Changelog.changeset/1. The internal log_changes/5 and log_changes_alone/6 (called by update_and_log/4, insert_and_log/4, etc.) perform a plain repo.insert/1 on the changelog row. When a duplicate pkey occurs (race, retry, or concurrent update_and_log for the same resource in the same instant), Postgres raises a unique violation on audit_logs_pkey. Without the constraint declaration this surfaces as an unhandled Ecto.ConstraintError that kills the caller's Repo.transaction (visible in the stack as Elixir.ValiotApp.Repo.transaction/2EctoTrail.update_and_log/4log_changes/5).

The call sites already treat {:error, reason} from the log insert as a soft failure: they Logger.error and return {:ok, reason} (or equivalent in log_bulk), deliberately avoiding tx_repo.rollback so the user's data mutation succeeds. By turning the pkey collision into a normal changeset error via unique_constraint/2, the existing resilience paths now cover this case as well.

No behavior change for the happy path. A new integration test (tagged :db) reproduces the collision by pre-occupying a pkey value and asserts the user update_and_log still returns {:ok, updated_struct} with no ConstraintError.

Why

Linear: OPS-4622 — production Ecto.ConstraintError on audit_logs_pkey during EctoTrail.update_and_log (first-party package). Triage: NOTIFY+FIX, severity high, category code_bug. The root cause is missing unique_constraint handling for the pkey in the log_changes path.

Test plan

  • mix format --check-formatted
  • mix credo --strict (0 issues after adding @spec)
  • mix compile (warnings pre-existing and unrelated to this diff)
  • mix test --exclude db (all non-DB tests excluded cleanly; DB tests require Postgres which is not present in this agent pod — they are the ones exercising the fix and the new pkey-collision scenario)
  • Added regression test in test/unit/ecto_trail_test.exs that forces an audit_log pkey collision inside update_and_log and asserts the user update succeeds without raising Ecto.ConstraintError
  • Version bumped to 1.0.4 in mix.exs
  • Concise entry added to CHANGELOG.md
  • PR branch name is the required palantir/OPS-4622-handle-audit-log-pkey-collision
  • Self-review of diff: no debug prints, no empty commits, no scope creep, only the minimal targeted change + test + release bits

Files changed (from git diff --stat)

 CHANGELOG.md                           |  6 +++++
 lib/ecto_trail/changelog.ex            | 20 ++++++++++++++++
 lib/ecto_trail/ecto_trail.ex           |  4 +---
 mix.exs                                |  2 +-
 test/test_helper.exs                   | 30 ++++++++++++++++--------
 test/unit/ecto_trail_log_only_test.exs |  2 ++
 test/unit/ecto_trail_test.exs          | 42 ++++++++++++++++++++++++++++++++++

Assumptions

  • The pkey column is always named id (standard Ecto primary key) and the constraint name in Postgres is <table_name>_pkey (the default Postgres names for a create table(:foo) without primary_key: false + explicit add :id, ... primary_key: true). This matches the migration in the package and the error message in the incident (audit_logs_pkey).
  • Consumers who set a custom :table_name will get the matching <table_name>_pkey constraint name automatically.
  • The existing "swallow log write failures" contract in log_changes* is intentional and stable (it was added for the Ecto.Multi case in 1.0.3).

Closes OPS-4622

…traintError)

- Move changelog_changeset to EctoTrail.Changelog.changeset/1
- Add unique_constraint(:id, name: <table>_pkey) derived from :table_name config
- Existing log error paths already swallow the {:error, changeset} and do not rollback caller tx
- Add :db-tagged test that forces pkey collision and asserts user operation succeeds
- Version 1.0.4 + CHANGELOG entry
- Guard test_helper DB setup so suite loads without Postgres (tests still run :db tag in CI)

Closes OPS-4622
@linear-code

linear-code Bot commented Jun 13, 2026

Copy link
Copy Markdown

OPS-4622

@acrogenesis

Copy link
Copy Markdown
Member

Closing as a duplicate of #24 — same bug: Ecto.ConstraintError on audit_logs_pkey in EctoTrail.log_changes/5. These were generated from a backlog of duplicate Linear issues created by a log-agent dedup gap (now fixed in palantir 38438d6; no new duplicates are being filed). Consolidating on #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant