fix(api): make public CRON job paths transactional with pg_cron#33
fix(api): make public CRON job paths transactional with pg_cron#33knutties wants to merge 1 commit into
Conversation
The public CRON-job handlers wrote the jobs row, then called register/unregister_pg_cron against a fresh pool connection after the row write had committed, with the pg_cron error logged and swallowed. The API returned 2xx even when the pg_cron op failed, leaving the persisted jobs row out of sync with what pg_cron actually scheduled. Add connection-scoped variants register_pg_cron_conn / unregister_pg_cron_conn that run on the caller's connection or transaction, and have the pool-based variants delegate to them. Rework the three affected handler sites so the row write and the pg_cron op share one transaction and commit (or roll back) atomically, propagating pg_cron failures as AppError instead of swallowing them: - create (CRON): switch from scoped_connection to scoped_transaction; register on the tx and commit only after it succeeds. - update: move unregister + register inside the retire_and_replace tx. - cancel: wrap the cancel + unregister in a transaction. This is a deliberate API contract change: POST/PATCH/DELETE on CRON jobs can now fail with 5xx when pg_cron is unhealthy, and the client can retry against a consistent state. Closes #32 https://claude.ai/code/session_012cTjd515ScehwT6Dvasexj
WalkthroughThe PR adds transaction-scoped CRON registration and unregistration helpers to enable atomic pg_cron operations, and refactors the three public CRON job handler paths to use transactions, call these helpers, and propagate errors instead of logging and swallowing them. ChangesCRON Job Atomicity: Transactional pg_cron Registration and Error Propagation
Sequence Diagram(s)sequenceDiagram
participant Handler as Handler
participant Tx as Scoped Transaction
participant DB as Database
participant PgCron as pg_cron
Handler->>Tx: begin transaction
Tx->>DB: create/retire job row
Tx->>DB: register/unregister pg_cron schedule
alt schedule operation succeeds
Tx->>DB: commit transaction
Handler-->>Handler: return 2xx
else schedule operation fails
Tx->>DB: rollback transaction
Handler-->>Handler: return error response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/common/src/db/jobs.rs`:
- Around line 315-329: The unregister_pg_cron_conn function can return an error
if cron.unschedule($1) is called for a non-existent job; update
unregister_pg_cron_conn to be idempotent by either 1) querying cron.job for a
row matching the generated cron_job_name (and ownership) and only calling
cron.unschedule when found, or 2) executing cron.unschedule($1) inside a match
that maps the specific “could not find valid entry for job” Postgres/sqlx error
to Ok(()) while propagating other errors; locate the cron_job_name construction
and the sqlx::query("SELECT cron.unschedule($1)") call to implement one of these
two fixes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 88c872e6-e237-42fd-906c-646ef13bfb6d
📒 Files selected for processing (2)
crates/api/src/handlers/jobs.rscrates/common/src/db/jobs.rs
| /// Unregister a CRON job from pg_cron on an existing connection or transaction. | ||
| pub async fn unregister_pg_cron_conn( | ||
| conn: &mut PgConnection, | ||
| schema_name: &str, | ||
| job_id: &str, | ||
| ) -> Result<(), sqlx::Error> { | ||
| let cron_job_name = format!("kronos_{}_{}", schema_name, job_id); | ||
|
|
||
| sqlx::query("SELECT cron.unschedule($1)") | ||
| .bind(&cron_job_name) | ||
| .execute(pool) | ||
| .execute(&mut *conn) | ||
| .await?; | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
pg_cron unschedule function behavior when job name not found
💡 Result:
When you call the pg_cron function cron.unschedule(job_name text) and the specified job name cannot be found for the current user, the function raises a PostgreSQL error [1][2]. Specifically, the implementation of cron_unschedule_named in the pg_cron source code performs a lookup in the cron.job table filtering by both the job name and the current user's identity [2]. If no tuple is found matching these criteria, the function executes an ereport with the error message: "could not find valid entry for job '[job_name]'" [2]. This behavior means that: - You will receive an error rather than a boolean false return value [1][2]. - If you are logged in as a different user than the one who created the job, the function will fail with this "not found" error even if the job name exists in the table, because it cannot locate a matching entry for your current user [3][1]. - To avoid this error, you must ensure the job name exists and is owned by the current database role executing the command [1][2]. Alternatively, you can use the cron.unschedule(jobid bigint) function if you have the job ID, as it may have different permission handling characteristics [3][1].
Citations:
- 1: cron.unschedule(jobname) doesn't work citusdata/pg_cron#211
- 2: https://github.com/citusdata/pg_cron/blob/9d0576c64edd90fb3d8ac30763296a8106315638/src/job_metadata.c
- 3: API inconsistency:
cron.unschedule(jobid)can unschedule jobs of another user butcron.unschedule(jobname)cannot citusdata/pg_cron#320
Make unregister_pg_cron_conn idempotent for missing pg_cron jobs
cron.unschedule($1) with the job name raises a PostgreSQL error (e.g., “could not find valid entry for job …”) when no matching entry exists for the current user, so unregister_pg_cron_conn (crates/common/src/db/jobs.rs, lines 315-329) can return Err(sqlx::Error) instead of succeeding. Adjust to either pre-check cron.job for that name/ownership (and only then call cron.unschedule), or catch/translate the specific “not found” error to Ok(()) (or use cron.unschedule(jobid bigint) if you can supply the job id).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/common/src/db/jobs.rs` around lines 315 - 329, The
unregister_pg_cron_conn function can return an error if cron.unschedule($1) is
called for a non-existent job; update unregister_pg_cron_conn to be idempotent
by either 1) querying cron.job for a row matching the generated cron_job_name
(and ownership) and only calling cron.unschedule when found, or 2) executing
cron.unschedule($1) inside a match that maps the specific “could not find valid
entry for job” Postgres/sqlx error to Ok(()) while propagating other errors;
locate the cron_job_name construction and the sqlx::query("SELECT
cron.unschedule($1)") call to implement one of these two fixes.
Summary
Fixes #32. The public CRON-job handlers in
crates/api/src/handlers/jobs.rswrote thejobsrow, then calledregister_pg_cron/unregister_pg_cronagainst a fresh pool connection — after the row write had committed and with the pg_cron error logged-and-swallowed. The API returned2xxeven when the pg_cron op failed, so the persistedjobsrow could end up out of sync with what pg_cron actually scheduled.This applies the two-part fix the issue called for (tx scope and error propagation, shipped together) at all three affected sites.
Changes
crates/common/src/db/jobs.rs— added connection-scoped variants that run on the caller's connection/transaction:register_pg_cron_conn(&mut PgConnection, …)unregister_pg_cron_conn(&mut PgConnection, …)The existing pool-based
register_pg_cron/unregister_pg_cronnow acquire a connection and delegate to these (public API preserved).crates/api/src/handlers/jobs.rs— reworked the three handlers so the row write and the pg_cron op share one transaction and commit (or roll back) atomically, with the error propagated via?instead of swallowed:scoped_connection→scoped_transaction; register on the tx;commit()only after it succeeds.unregister+registerinside theretire_and_replacetx, before commit.scoped_transaction.Behavior change (intended)
Because
sqlx::Errormaps toAppError::Internal(500), a failed pg_cron op now rolls back the row write.POST/PATCH/DELETEon CRON jobs can now return a5xxwhen pg_cron is unhealthy — and the client can retry against a consistent state. This is the deliberate API contract change described in the issue.Out of scope
The optional follow-up (having the reaper sweep also catch cancelled CRON jobs and unschedule stale pg_cron entries) is left out, per the issue's framing.
Testing
cargo checkandcargo clippyare clean onkronos-commonandkronos-api(remaining clippy warnings are pre-existing patterns elsewhere in the codebase). The repo has no integration test suite covering these handler paths.Closes #32
https://claude.ai/code/session_012cTjd515ScehwT6Dvasexj
Generated by Claude Code
Summary by CodeRabbit