Skip to content

fix: drop stale DB connections before recurring task save#220

Merged
emyller merged 1 commit into
refactor/simplify-run-recurring-tasksfrom
fix/close-stale-connections-recurring-task
Apr 29, 2026
Merged

fix: drop stale DB connections before recurring task save#220
emyller merged 1 commit into
refactor/simplify-run-recurring-tasksfrom
fix/close-stale-connections-recurring-task

Conversation

@gagantrivedi

@gagantrivedi gagantrivedi commented Apr 29, 2026

Copy link
Copy Markdown
Member

Summary

A long-running recurring task can leave the main-thread DB connection idle past the server's session timeout. RDS terminates the session, and the follow-up task.save() / task_run.save() raises OperationalError: SSL connection has been closed unexpectedly. Fix calls close_old_connections() between _run_task() and the saves so Django opens a fresh connection if the prior one was killed.

Same root cause, same shape of fix as flagsmith#7219.

Fixes FLAGSMITH-API-5EM. and Flagsmith/flagsmith#7358

Test plan

  • Reproduced bug locally with a real Postgres + pg_terminate_backend; without the fix task.save() raises OperationalError, with the fix both saves succeed.
  • All recurring-task unit tests pass on [default]. 5 tests were switched to multi_database(transaction=True) to dodge the pytest-django atomic / close_old_connections collision documented in flagsmith#7219.

A long-running recurring task can leave the main-thread DB connection
idle past the server's session timeout. RDS terminates the session,
and the follow-up task.save() / task_run.save() then raises
OperationalError ("SSL connection has been closed unexpectedly").

Call django.db.close_old_connections() between _run_task() and the
post-run saves so Django opens a fresh connection if the prior one
was killed.

The 5 affected recurring-task tests are switched to transaction=True
because pytest-django's atomic wrapper otherwise collides with
close_old_connections' autocommit check.

Fixes FLAGSMITH-API-5EM.
@gagantrivedi gagantrivedi marked this pull request as ready for review April 29, 2026 11:12
@gagantrivedi gagantrivedi requested a review from a team as a code owner April 29, 2026 11:12
@gagantrivedi gagantrivedi requested review from emyller and removed request for a team April 29, 2026 11:12
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.29%. Comparing base (953e807) to head (d54dfb5).

Additional details and impacted files
@@                          Coverage Diff                           @@
##           refactor/simplify-run-recurring-tasks     #220   +/-   ##
======================================================================
  Coverage                                  97.28%   97.29%           
======================================================================
  Files                                        104      104           
  Lines                                       4464     4466    +2     
======================================================================
+ Hits                                        4343     4345    +2     
  Misses                                       121      121           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emyller emyller merged commit 4defaca into refactor/simplify-run-recurring-tasks Apr 29, 2026
4 checks passed
@emyller emyller deleted the fix/close-stale-connections-recurring-task branch April 29, 2026 16:29
gagantrivedi added a commit that referenced this pull request Apr 30, 2026
* refactor: simplify run_recurring_tasks for single-task processing

The underlying SQL function `get_recurringtasks_to_process` is `LIMIT 1`,
so the loop, bulk_create, and bulk_update were always operating on at
most one row. Replace the bulk pipeline with straight-line single-task
handling and rename the manager method to `get_task_to_process`,
returning `RecurringTask | None`, so the Python contract matches the
SQL.

* fix: include task identifier in debug logs and satisfy mypy narrowing

Declare `task_run` as `RecurringTaskRun | None` up front and route the
`_run_task` return through an intermediate `run` so the `assert
isinstance` narrowing carries past the `if/else` join. Also include the
task identifier in the start/finish debug logs for easier tracing.

* refactor: return RecurringTaskRun | None from run_recurring_tasks

The function only ever processes one recurring task per call (the
underlying SQL is `LIMIT 1`), so a list was always 0 or 1 items long.
Tighten the signature to `RecurringTaskRun | None` so the contract
matches reality and mirrors `RecurringTaskManager.get_task_to_process()`.

The sole production caller (`threads.run_iteration`) discards the return
value and is unaffected. All recurring test sites are updated to use the
singular value: `len(...) == 1` becomes `is not None`, list indexing is
removed, and the empty-result case becomes `is None`. A
`test_run_recurring_tasks__no_tasks__does_nothing` test is added,
mirroring the existing standard-task equivalent, to cover the
empty-queue early return.

* refactor: rename run_recurring_tasks to run_recurring_task

The function processes one task per call (the SQL function it wraps,
`get_recurringtasks_to_process`, has been `LIMIT 1` since migration
0012). The plural name reads as if multiple tasks are processed per
call, which has caused review confusion. Rename to the singular form so
the call site contract is explicit; the per-iteration cadence and
round-robin selection across tasks are still provided by
`TaskRunner.run` and `ORDER BY last_picked_at NULLS FIRST` respectively.

* fix: drop stale DB connections before recurring task save (#220)

A long-running recurring task can leave the main-thread DB connection
idle past the server's session timeout. RDS terminates the session,
and the follow-up task.save() / task_run.save() then raises
OperationalError ("SSL connection has been closed unexpectedly").

Call django.db.close_old_connections() between _run_task() and the
post-run saves so Django opens a fresh connection if the prior one
was killed.

The 5 affected recurring-task tests are switched to transaction=True
because pytest-django's atomic wrapper otherwise collides with
close_old_connections' autocommit check.

Fixes FLAGSMITH-API-5EM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants