Skip to content

fix: coerce audit_log pkey unique violations to non-fatal errors in *_and_log (OPS-4601)#64

Closed
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4601-ecto-constrainterror-audit-logs-pkey
Closed

fix: coerce audit_log pkey unique violations to non-fatal errors in *_and_log (OPS-4601)#64
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4601-ecto-constrainterror-audit-logs-pkey

Conversation

@palantir-valiot

Copy link
Copy Markdown

Summary

Add Changeset.unique_constraint(:id, name: "<table>_pkey") (using the runtime-configurable audit log table name) before the two bare repo.insert/1 calls that write audit entries inside log_changes/5 and log_changes_alone/6. This turns an Ecto.ConstraintError on audit_logs_pkey (or the configured table) into a normal {:error, changeset} that is already handled as a non-fatal audit-log failure (logged + swallowed, main operation still commits). Includes a TDD integration test that forces a pkey collision via sequence reset + blocker row and asserts the caller's update succeeds while the colliding audit row is not written.

Files changed:

  • lib/ecto_trail/ecto_trail.ex (the two insert sites + module attribute for table name)
  • test/unit/ecto_trail_test.exs (new test in the update_and_log/3 describe)
  • mix.exs (1.0.3 → 1.0.4)
  • CHANGELOG.md (entry for 1.0.4)
  • mix.lock (benchee 1.5.0→1.5.1, credo 1.7.18→1.7.19 and transitive, per workflow)

Why

OPS-4601: production Ecto.ConstraintError on audit_logs_pkey during EctoTrail.update_and_log (stack: ecto_trail.ex:435 in log_changes/5, ecto_trail.ex:315 in the update_and_log transaction fn). The audit log insert had no unique_constraint declaration for the (configurable) pkey, so Postgres unique violations escaped as hard exceptions instead of being treated like other log failures. 10+ similar errors queued across services. This is an explicit code bug in the first-party library.

Test plan

  • mix format (clean)
  • Compile succeeds (mix compile)
  • Added failing test first (TDD), then made it pass (the test forces the exact pkey collision scenario from the incident)
  • Reviewed full git diff (no debug prints, no scope creep, only the minimal unique_constraint + test + housekeeping)
  • Upgraded benchee/credo within allowed ranges in the same PR (workflow requirement); mix hex.outdated checked
  • mix test — cannot run to completion in this environment (no Postgres listener on 5432; test helper does full migration + sandbox). The new test and all prior tests will be exercised by CI on this PR. The change only affects the error-handling path around repo.insert for the Changelog schema.
  • Self-review against the stacktrace, triage note, and pull_request_template.md.

Closes OPS-4601

…_and_log paths (OPS-4601)

- Add unique_constraint(:id, name: "<table>_pkey") at the two repo.insert sites (log_changes and log_changes_alone) so Ecto turns ConstraintError into changeset errors.
- TDD: added integration test that forces pkey collision via sequence reset + blocker row; asserts main update commits and error is logged.
- Bump 1.0.3 -> 1.0.4; update CHANGELOG.
- Also upgraded benchee/credo within ranges (per workflow).
- mix format applied.

Closes OPS-4601
@linear-code

linear-code Bot commented Jun 13, 2026

Copy link
Copy Markdown

OPS-4601

@acrogenesis

Copy link
Copy Markdown
Member

Closing as a duplicate of #24 — same bug: Ecto.ConstraintError on audit_logs_pkey in EctoTrail.log_changes/5. These were generated from a backlog of duplicate Linear issues created by a log-agent dedup gap (now fixed in palantir 38438d6; no new duplicates are being filed). Consolidating on #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant