Fix missing unique_constraint + ConstraintError handling for audit_log pkey (OPS-4582)#44
Closed
palantir-valiot[bot] wants to merge 1 commit into
Closed
Conversation
- Add unique_constraint declarations for common pkey names on audit_log. - Introduce insert_changelog/2 that rescues Ecto.ConstraintError for pkey violations on audit tables and treats them as idempotent (no-op). - Update both log_changes and log_changes_alone call sites. - Add regression test that forces a pkey collision and asserts update_and_log still succeeds. - Bump to 1.0.4 and document in CHANGELOG. Closes OPS-4582
Member
|
Closing as a duplicate of #24 — all of these PRs fix the same bug: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Root cause: inside
log_changes/5(called byupdate_and_log/4,insert_and_log/4, etc.) andlog_changes_alone/6, the Changelog row was inserted with a rawrepo.insert/1that never calledunique_constraint/3for the audit table's pkey, and there was no rescue ofEcto.ConstraintError. When a duplicate PK was generated (race, retry, sequence collision within the same second), Postgres raised on "audit_logs_pkey" (or "audit_log_pkey"), the exception propagated out of the*_and_logwrapper, and the caller's transaction (often inside a largerRepo.transaction) saw a hardEcto.ConstraintErrorthat triage classified as a "code bug".Summary of changes
lib/ecto_trail/ecto_trail.ex: declareunique_constraint(:id, name: "audit_log_pkey")andunique_constraint(:id, name: "audit_logs_pkey")inchangelog_changeset/1; introduce small internalinsert_changelog/2+audit_pkey_constraint?/1that perform the insert but rescueEcto.ConstraintErrorfor audit-related pkey constraint names and treat them as idempotent "already recorded" (log at error level for observability, return a benign sentinel, do not rollback the caller's tx); bothlog_changesandlog_changes_alonenow go through the wrapper.test/unit/ecto_trail_test.exs: new describe block "idempotent audit logging on pkey collision (OPS-4582)" with a test that forces a pkey collision via raw INSERT +setvalon the audit_log sequence immediately before anupdate_and_log, then asserts the business update succeeds and no exception is raised to the caller.mix.exs: version 1.0.3 → 1.0.4 (per repo release rules).CHANGELOG.md: concise entry for 1.0.4 under Fixed.No other files touched. No debug prints, no speculative generality, no new deps.
Motivation / context
Matches the stack trace in OPS-4582 (jobs-lamosa-gto-prod):
The fix makes audit logging best-effort + idempotent for the exact failure mode observed (pkey unique violations), while preserving the existing error-logging path for other insert failures.
Type of change
How Has This Been Tested?
mix format— clean (per Elixir/AGENTS.md requirement).mix credo --strict— clean (0 issues).mix compile— clean (after the rescue-clause edit).Ecto.ConstraintErroron the pkey (by manipulating the serial sequence + pre-inserting a colliding row) and assertsupdate_and_logstill returns{:ok, updated}. The test cannot run to completion in this ephemeral pod (no Postgres), but it will be exercised by CI (.travis.ymlprovisions Postgres 9.5 + runsmix test).git diffreviewed for scope, comments, and absence of secrets/debug before commit.gh pr view/ branch existence verified before opening.Test Configuration (CI will use):
Checklist:
mix format, no added comments unless asked)palantir/OPS-4582-add-unique-constraint-handlinggit push-safewrapper (never plaingit push)Closes OPS-4582