Skip to content

fix: absorb Ecto.ConstraintError on audit_logs_pkey / audit_log_pkey in log_changes/5 and *_and_log paths (OPS-4588)#49

Closed
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4588-fix-audit-log-pkey-constraint
Closed

fix: absorb Ecto.ConstraintError on audit_logs_pkey / audit_log_pkey in log_changes/5 and *_and_log paths (OPS-4588)#49
palantir-valiot[bot] wants to merge 1 commit into
mainfrom
palantir/OPS-4588-fix-audit-log-pkey-constraint

Conversation

@palantir-valiot

Copy link
Copy Markdown

Summary

Add unique_constraint(:id, name: <table>_pkey) declarations (covering audit_log_pkey, audit_logs_pkey, and the configured table name) to the Changelog audit-log changeset. Refactor the two insert paths (log_changes/5 and log_changes_alone/6) to go through a new insert_changelog_or_swallow/4 helper that rescues any error (including Ecto.ConstraintError) on the bare repo.insert, logs it, and returns {:ok, reason} instead of letting the exception escape the caller's transaction.

This makes update_and_log/4, insert_and_log/4, upsert_and_log/4, delete_and_log/4, and the standalone log/5 resilient to transient/duplicate pkey collisions on the audit table (the exact failure mode in the OPS-4588 stack trace from valiot_app calling update_and_log inside a transaction).

Also upgraded benchee/credo (and transitive deep_merge/statistex) within allowed ranges per baseline rules, bumped to 1.0.4, and added a concise CHANGELOG entry.

Files changed:

  • lib/ecto_trail/ecto_trail.ex (core fix + helper)
  • test/unit/ecto_trail_test.exs (TDD regression tests that force pkey collision via direct INSERT + setval and assert the business op succeeds while the colliding audit row is left alone)
  • mix.exs (1.0.4)
  • CHANGELOG.md
  • mix.lock (dep upgrades)

Why

Linear: OPS-4588 — Ecto.ConstraintError on audit_logs_pkey (or audit_log_pkey) during ecto_trail.log_changes/5 called from update_and_log inside a transaction in eliot-lamosa-gto-prod. The triage analysis classified this as a high-severity first-party code bug in valiot/ecto_trail.

The root cause was a bare repo.insert(changelog_changeset(...)) with no unique_constraint on the synthetic :id (the PK is DB-generated) and no rescue around the insert, so any concurrent or replayed pkey collision blew up the caller's transaction.

Test plan

  • mix format (clean; applied)
  • mix credo --strict (clean on the three source files)
  • mix compile (and mix compile --warnings-as-errors after fixing our own introduced compile-time Application.compile_env misuse by moving the table name to a module attribute)
  • Added two TDD tests under "audit log pkey constraint handling" that deliberately collide on the audit PK (high sequence value + direct INSERT) and assert:
    • update_and_log still succeeds for the business resource
    • The colliding audit row is left untouched
    • The caller's transaction is not rolled back
    • Same for the standalone log/5 path
  • Full mix test (and the DB-dependent new tests) will run in CI (the pod environment has no reachable Postgres; see test helper and prior CI config using Travis + external postgres). The tests are written to fail for the right reason before the fix.
  • Upgraded outdated-in-range deps (benchee, credo) in the same PR as required.
  • No UI/frontend impact → no screenshots.
  • PR body written from scratch (not the raw template); followed palantir/OPS-4588-... branch convention and git push-safe.

Closes OPS-4588

…nd_log paths

- Declare unique_constraint(:id, name: <table>_pkey) variants (audit_log_pkey, audit_logs_pkey, and the configured table name) on the Changelog changeset.
- Introduce insert_changelog_or_swallow/4 that rescues any error (including Ecto.ConstraintError) on the bare audit insert and logs+returns {:ok, reason} instead of letting the exception escape the transaction.
- Add regression tests that force a pkey collision via direct INSERT at a high sequence value and assert update_and_log/log swallow it (no ConstraintError) and the original business op succeeds.
- Bump to 1.0.4; add concise changelog entry.
- Upgrade benchee/credo (within ranges) per baseline rules.

Closes OPS-4588
@linear-code

linear-code Bot commented Jun 13, 2026

Copy link
Copy Markdown

OPS-4588

@acrogenesis

Copy link
Copy Markdown
Member

Closing as a duplicate of #24 — same bug: Ecto.ConstraintError on audit_logs_pkey in EctoTrail.log_changes/5. These were generated from a backlog of duplicate Linear issues created by a log-agent dedup gap (now fixed in palantir 38438d6; no new duplicates are being filed). Consolidating on #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant