Skip to content

fix(api): make public CRON job paths transactional with pg_cron#33

Open
knutties wants to merge 1 commit into
mainfrom
claude/beautiful-dijkstra-ORdeA
Open

fix(api): make public CRON job paths transactional with pg_cron#33
knutties wants to merge 1 commit into
mainfrom
claude/beautiful-dijkstra-ORdeA

Conversation

@knutties

@knutties knutties commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #32. The public CRON-job handlers in crates/api/src/handlers/jobs.rs wrote the jobs row, then called register_pg_cron / unregister_pg_cron against a fresh pool connection — after the row write had committed and with the pg_cron error logged-and-swallowed. The API returned 2xx even when the pg_cron op failed, so the persisted jobs row could end up out of sync with what pg_cron actually scheduled.

This applies the two-part fix the issue called for (tx scope and error propagation, shipped together) at all three affected sites.

Changes

crates/common/src/db/jobs.rs — added connection-scoped variants that run on the caller's connection/transaction:

  • register_pg_cron_conn(&mut PgConnection, …)
  • unregister_pg_cron_conn(&mut PgConnection, …)

The existing pool-based register_pg_cron / unregister_pg_cron now acquire a connection and delegate to these (public API preserved).

crates/api/src/handlers/jobs.rs — reworked the three handlers so the row write and the pg_cron op share one transaction and commit (or roll back) atomically, with the error propagated via ? instead of swallowed:

  • create (CRON): scoped_connectionscoped_transaction; register on the tx; commit() only after it succeeds.
  • update: moved unregister + register inside the retire_and_replace tx, before commit.
  • cancel: wrapped the get/cancel/unregister flow in a scoped_transaction.

Behavior change (intended)

Because sqlx::Error maps to AppError::Internal (500), a failed pg_cron op now rolls back the row write. POST / PATCH / DELETE on CRON jobs can now return a 5xx when pg_cron is unhealthy — and the client can retry against a consistent state. This is the deliberate API contract change described in the issue.

Out of scope

The optional follow-up (having the reaper sweep also catch cancelled CRON jobs and unschedule stale pg_cron entries) is left out, per the issue's framing.

Testing

cargo check and cargo clippy are clean on kronos-common and kronos-api (remaining clippy warnings are pre-existing patterns elsewhere in the codebase). The repo has no integration test suite covering these handler paths.

Closes #32

https://claude.ai/code/session_012cTjd515ScehwT6Dvasexj


Generated by Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved CRON job reliability by ensuring schedule registration and unregistration occur atomically within database transactions during job creation, updates, and cancellations. This prevents inconsistencies between jobs and their associated schedules.

The public CRON-job handlers wrote the jobs row, then called
register/unregister_pg_cron against a fresh pool connection after the
row write had committed, with the pg_cron error logged and swallowed.
The API returned 2xx even when the pg_cron op failed, leaving the
persisted jobs row out of sync with what pg_cron actually scheduled.

Add connection-scoped variants register_pg_cron_conn /
unregister_pg_cron_conn that run on the caller's connection or
transaction, and have the pool-based variants delegate to them.

Rework the three affected handler sites so the row write and the
pg_cron op share one transaction and commit (or roll back) atomically,
propagating pg_cron failures as AppError instead of swallowing them:

- create (CRON): switch from scoped_connection to scoped_transaction;
  register on the tx and commit only after it succeeds.
- update: move unregister + register inside the retire_and_replace tx.
- cancel: wrap the cancel + unregister in a transaction.

This is a deliberate API contract change: POST/PATCH/DELETE on CRON
jobs can now fail with 5xx when pg_cron is unhealthy, and the client
can retry against a consistent state.

Closes #32

https://claude.ai/code/session_012cTjd515ScehwT6Dvasexj
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The PR adds transaction-scoped CRON registration and unregistration helpers to enable atomic pg_cron operations, and refactors the three public CRON job handler paths to use transactions, call these helpers, and propagate errors instead of logging and swallowing them.

Changes

CRON Job Atomicity: Transactional pg_cron Registration and Error Propagation

Layer / File(s) Summary
Connection-scoped CRON registration/unregistration helpers
crates/common/src/db/jobs.rs
New register_pg_cron_conn and unregister_pg_cron_conn functions perform schedule operations on a provided PgConnection, enabling transactional atomicity. Existing register_pg_cron and unregister_pg_cron are refactored to acquire a connection and delegate to the new helpers.
Handler atomicity for create, update, and cancel CRON paths
crates/api/src/handlers/jobs.rs
CRON job create, update, and cancel handlers now wrap pg_cron registration/unregistration operations within the same database transaction as the job row changes. Errors propagate via the ? operator instead of being logged and swallowed, allowing failed pg_cron operations to roll back the entire transaction and return an error response.

Sequence Diagram(s)

sequenceDiagram
  participant Handler as Handler
  participant Tx as Scoped Transaction
  participant DB as Database
  participant PgCron as pg_cron
  Handler->>Tx: begin transaction
  Tx->>DB: create/retire job row
  Tx->>DB: register/unregister pg_cron schedule
  alt schedule operation succeeds
    Tx->>DB: commit transaction
    Handler-->>Handler: return 2xx
  else schedule operation fails
    Tx->>DB: rollback transaction
    Handler-->>Handler: return error response
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through transactions so fine,
Where pg_cron schedules and jobs now align,
No more errors lost to silent logs deep—
Each promise to run, the rabbit will keep! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(api): make public CRON job paths transactional with pg_cron' accurately summarizes the main change: refactoring CRON job handlers to use transactions with pg_cron operations, fixing issue #32.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #32: adds connection-scoped register/unregister functions in db/jobs.rs, refactors create/update/cancel handlers to use transactions, and propagates pg_cron errors instead of swallowing them.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #32's objectives: connection-scoped functions, transactional handlers for CRON jobs, and error propagation. No unrelated refactoring or scope creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/beautiful-dijkstra-ORdeA

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/common/src/db/jobs.rs`:
- Around line 315-329: The unregister_pg_cron_conn function can return an error
if cron.unschedule($1) is called for a non-existent job; update
unregister_pg_cron_conn to be idempotent by either 1) querying cron.job for a
row matching the generated cron_job_name (and ownership) and only calling
cron.unschedule when found, or 2) executing cron.unschedule($1) inside a match
that maps the specific “could not find valid entry for job” Postgres/sqlx error
to Ok(()) while propagating other errors; locate the cron_job_name construction
and the sqlx::query("SELECT cron.unschedule($1)") call to implement one of these
two fixes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88c872e6-e237-42fd-906c-646ef13bfb6d

📥 Commits

Reviewing files that changed from the base of the PR and between c8ea6ce and f65e1c1.

📒 Files selected for processing (2)
  • crates/api/src/handlers/jobs.rs
  • crates/common/src/db/jobs.rs

Comment on lines +315 to 329
/// Unregister a CRON job from pg_cron on an existing connection or transaction.
pub async fn unregister_pg_cron_conn(
conn: &mut PgConnection,
schema_name: &str,
job_id: &str,
) -> Result<(), sqlx::Error> {
let cron_job_name = format!("kronos_{}_{}", schema_name, job_id);

sqlx::query("SELECT cron.unschedule($1)")
.bind(&cron_job_name)
.execute(pool)
.execute(&mut *conn)
.await?;

Ok(())
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pg_cron unschedule function behavior when job name not found

💡 Result:

When you call the pg_cron function cron.unschedule(job_name text) and the specified job name cannot be found for the current user, the function raises a PostgreSQL error [1][2]. Specifically, the implementation of cron_unschedule_named in the pg_cron source code performs a lookup in the cron.job table filtering by both the job name and the current user's identity [2]. If no tuple is found matching these criteria, the function executes an ereport with the error message: "could not find valid entry for job '[job_name]'" [2]. This behavior means that: - You will receive an error rather than a boolean false return value [1][2]. - If you are logged in as a different user than the one who created the job, the function will fail with this "not found" error even if the job name exists in the table, because it cannot locate a matching entry for your current user [3][1]. - To avoid this error, you must ensure the job name exists and is owned by the current database role executing the command [1][2]. Alternatively, you can use the cron.unschedule(jobid bigint) function if you have the job ID, as it may have different permission handling characteristics [3][1].

Citations:


Make unregister_pg_cron_conn idempotent for missing pg_cron jobs

cron.unschedule($1) with the job name raises a PostgreSQL error (e.g., “could not find valid entry for job …”) when no matching entry exists for the current user, so unregister_pg_cron_conn (crates/common/src/db/jobs.rs, lines 315-329) can return Err(sqlx::Error) instead of succeeding. Adjust to either pre-check cron.job for that name/ownership (and only then call cron.unschedule), or catch/translate the specific “not found” error to Ok(()) (or use cron.unschedule(jobid bigint) if you can supply the job id).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/common/src/db/jobs.rs` around lines 315 - 329, The
unregister_pg_cron_conn function can return an error if cron.unschedule($1) is
called for a non-existent job; update unregister_pg_cron_conn to be idempotent
by either 1) querying cron.job for a row matching the generated cron_job_name
(and ownership) and only calling cron.unschedule when found, or 2) executing
cron.unschedule($1) inside a match that maps the specific “could not find valid
entry for job” Postgres/sqlx error to Ok(()) while propagating other errors;
locate the cron_job_name construction and the sqlx::query("SELECT
cron.unschedule($1)") call to implement one of these two fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Public CRON job paths swallow pg_cron registration errors and skip transactional atomicity

2 participants