Skip to content

fix: register unique_constraint on audit_log pkey to prevent Ecto.ConstraintError in *_and_log (OPS-4615)#76

Closed
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4615-fix-audit-log-pkey-constraint
Closed

fix: register unique_constraint on audit_log pkey to prevent Ecto.ConstraintError in *_and_log (OPS-4615)#76
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4615-fix-audit-log-pkey-constraint

Conversation

@palantir-valiot

Copy link
Copy Markdown

Summary

Root cause was in changelog_changeset/1 (lib/ecto_trail/ecto_trail.ex:575): it only did Changeset.cast/3 and never registered unique_constraint(:id, name: "#{table}_pkey"). When any *_and_log/* (or log/5) performed its internal repo.insert(%Changelog{}) inside an outer transaction (e.g. update_and_log called from app code inside Repo.transaction/1 or from Ecto.Multi.run), a pkey collision (sequence reuse, custom table, or concurrent id assignment) produced an Ecto.ConstraintError on audit_log_pkey (or the configured *_pkey) instead of a normal changeset error.

Fix: read the same @table_name compile_env used by the Changelog schema, and chain unique_constraint/3 using the conventional #{table}_pkey name. Existing call sites in log_changes* already treat {:error, changeset} by logging and returning {:ok, reason}, so the user mutation still succeeds and the audit row is simply not written for that op.

Also added a TDD test that forces a pkey collision via setval on the sequence and asserts update_and_log returns {:ok, updated} instead of raising.

Why

  • Linear: OPS-4615 (eliot-lamosa-gto-prod prod outage: Ecto.ConstraintError on audit_logs_pkey during log_changes called from update_and_log inside a tx).
  • First-party package ecto_trail (valiot/ecto_trail) was the source.
  • Protocol requires FIX for any clear code bug.

Test plan

  • mix format (applied; only our new test + small refactor touched).
  • mix compile (clean for our changes; one pre-existing redundant-clause warning on map_custom_ecto_type unrelated to this diff).
  • git diff reviewed (only 4 files, real diff, no debug prints, no scope creep).
  • mix test — could not execute in agent environment (no local Postgres; all connection attempts refused). The new test follows the exact patterns of the existing "use inside Ecto.Multi" tests and directly reproduces the pkey collision scenario described in the incident. CI on this PR will run the full suite against Postgres.
  • Version bump (1.0.3 → 1.0.4) + concise CHANGELOG entry per repo release rules.
  • Branch name exactly palantir/OPS-4615-fix-audit-log-pkey-constraint; pushed via git push-safe (never plain git push).
  • PR body written from skeleton (not left empty).

Closes OPS-4615

…et (OPS-4615)

- TDD: added test that forces pkey collision via sequence reset and asserts update_and_log does not raise Ecto.ConstraintError.
- Root cause: changelog_changeset/1 only cast; no unique_constraint/3, so repo.insert of Changelog inside *_and_log tx raised ConstraintError on audit_logs_pkey when the app's tx reused an id (e.g. after prior log in same tx or Multi).
- Fix: read same @table_name compile_env as Changelog schema, add unique_constraint(:id, name: "#{table}_pkey") so Ecto converts the violation to changeset error; existing log_changes paths already swallow {:error, changeset} by logging and returning {:ok, reason}.
- Also bumped to 1.0.4 and updated CHANGELOG per release rules.
- mix format applied; pre-existing credo/compile warning unrelated (map_custom_ecto_type clause).
@linear-code

linear-code Bot commented Jun 13, 2026

Copy link
Copy Markdown

OPS-4615

@palantir-valiot palantir-valiot Bot left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

The PR adds unique_constraint(:id, name: "#{@table_name}_pkey") to changelog_changeset/1 so that primary-key violations on the audit log table are converted to changeset errors instead of raising Ecto.ConstraintError. The implementation reads the same compile_env key used by the Changelog schema, and existing error paths in log_changes* already swallow {:error, changeset} results. No blocking bugs found.

Findings

No actionable findings.

Notes

  • The new test directly reproduces the pkey collision via setval and asserts the expected {:ok, ...} return path, following the same patterns as the existing "use inside Ecto.Multi" tests.
  • The fix is scoped correctly: only changelog_changeset/1 is modified, and the @table_name constant is defined once at module level to match the Changelog schema's compile-time config.
  • Considered flagging that the test assumes a serial id column with a {table}_id_seq sequence (not UUID pkey as shown in the moduledoc example); however, this matches the actual TestRepo schema used by the test suite, so the test is valid for its environment.

@palantir-valiot

Copy link
Copy Markdown
Author

Thanks for the review.

  • No code changes required; the implementation and test scope match the review notes.
  • Confirmed: the new test uses the actual TestRepo schema (serial id + {table}_id_seq) that the suite and CI rely on (the UUID example in the moduledoc is illustrative for users who configure their own table).
  • mix test was not runnable in the agent pod (no Postgres), but the test was written following existing patterns and will be exercised by CI on this PR.

Ready for any further CI results or approvals.

@acrogenesis

Copy link
Copy Markdown
Member

Closing as a duplicate of #24 — same bug: Ecto.ConstraintError on audit_logs_pkey in EctoTrail.log_changes/5. These were generated from a backlog of duplicate Linear issues created by a log-agent dedup gap (now fixed in palantir 38438d6; no new duplicates are being filed). Consolidating on #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant