Skip to content

fix: prevent Ecto.ConstraintError on audit_log pkey during *_and_log (OPS-4573)#31

Closed
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4573-ecto-constrainterror-audit-logs-pkey
Closed

fix: prevent Ecto.ConstraintError on audit_log pkey during *_and_log (OPS-4573)#31
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4573-ecto-constrainterror-audit-logs-pkey

Conversation

@palantir-valiot

Copy link
Copy Markdown

Description

Prevent Ecto.ConstraintError ("audit_logs_pkey" / <table>_pkey unique_constraint) from escaping EctoTrail.update_and_log (and siblings insert_and_log, upsert_and_log, delete_and_log, plus the log/4 path) when the audit log insert generates (or collides on) a duplicate primary key.

Root cause: the two repo.insert/1 sites that persist Changelog rows (log_changes/5 at the call from update_and_log etc., and log_changes_alone/6) built a bare changeset and called insert without declaring the pkey constraint by name. When the backing sequence produced a value already present in the audit table (rare timing/sequence rewind/restore scenarios observed in prod), Postgres raised a constraint violation that Ecto turned into an unhandled ConstraintError instead of a changeset error. The calling code in *_and_log only handles {:error, reason} from the inner op and from the log helper (which previously swallowed log errors as {:ok, reason}); the exception propagated out of the transaction.

Fix: derive the constraint name from the configured :table_name (defaults to "audit_log""audit_log_pkey") and add Changeset.unique_constraint(:id, name: pkey_constraint_name()) immediately before the two repo.insert calls. This converts the violation into a changeset error that the existing {:error, reason} -> {:ok, reason}; Logger.error paths already handle gracefully, so the outer *_and_log succeeds for the primary operation while the audit write is best-effort (consistent with prior log-error behavior).

Also:

  • Added a TDD regression test under update_and_log/3 that forces a pkey collision by pre-inserting a colliding audit row and rewinding the sequence with setval, then asserts that update_and_log returns {:ok, updated} instead of raising.
  • Bumped version 1.0.3 → 1.0.4 and added concise changelog entry.
  • Upgraded benchee (1.5.0→1.5.1) and credo (1.7.18→1.7.19) (and their transitive deep_merge/statistex) within allowed ranges per the "keep dependencies fresh" rule; no other upgrades were possible inside constraints.

Closes OPS-4573

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • mix deps.get
  • mix format (clean)
  • mix test (invoked before the edit to establish the red path for the new collision test, and after the fix + format; full green requires Postgres which is present in CI; locally the test helper fails only on connectivity as expected)
  • mix hex.outdated --within-requirements + mix deps.update for packages that had updates inside the declared ranges
  • Manual review of git diff (no debug prints, no scope creep, only the minimal constraint declarations + the TDD test + release metadata)
  • The new test specifically reproduces the stacktrace scenario (constraint violation on the audit insert inside update_and_log via log_changes).

Test Configuration:

  • Elixir ~> 1.6 (tested on 1.20.1 / OTP 29 in the agent image)
  • PostgreSQL (required by test helper; provided by CI matrix)
  • All changes exercised the real Ecto.Repo transaction paths used by callers (no mocks for the audit insert)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where non-obvious (the pkey helper is tiny and self-documenting)
  • I have made corresponding changes to the documentation (CHANGELOG.md)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective (the collision test fails for the right reason before the constraint lines are added)
  • New and existing unit tests pass locally with my changes (modulo DB availability in this pod; CI will run them)
  • Any dependent changes have been merged and published in downstream modules (N/A — first-party lib)
  • Version bumped and CHANGELOG updated before merge (per repo release rules)
  • Dependencies within constraints refreshed in the same PR

…-4573)

- Add unique_constraint(:id, name: <table>_pkey) derived from :table_name config
  before audit inserts in log_changes/5 and log_changes_alone/6.
- TDD: add test that seeds a colliding id via setval and asserts update_and_log
  swallows the violation instead of raising (consistent with other log errors).
- Bump 1.0.3 -> 1.0.4, update CHANGELOG.
- Upgrade benchee/credo (and transitive) within ranges per 'deps fresh' rule.

Closes OPS-4573
@linear-code

linear-code Bot commented Jun 13, 2026

Copy link
Copy Markdown

OPS-4573

@acrogenesis

Copy link
Copy Markdown
Member

Closing as a duplicate of #24 — all of these PRs fix the same bug: Ecto.ConstraintError on audit_logs_pkey in EctoTrail.log_changes/5. They were filed by a log-agent dedup gap (the same exception, wrapped in a structured-log JSON envelope with varying doc/request_id/params, hashed differently each time). That gap is now fixed in palantir (commit 38438d6) so this won't recur. Consolidating on #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant