Skip to content

fix: page sessionstorage cleanup to avoid OOM (#7830)#7831

Merged
JohnMcLear merged 4 commits into
developfrom
fix/7830-session-cleanup-paged
May 22, 2026
Merged

fix: page sessionstorage cleanup to avoid OOM (#7830)#7831
JohnMcLear merged 4 commits into
developfrom
fix/7830-session-cleanup-paged

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

  • Switch SessionStore._cleanup() from findKeys('sessionstorage:*', null) (loads everything) to findKeysPaged with a 500-key page size
  • Yield to the event loop between pages so the DB driver can release the previous page's buffered rows and request handlers can interleave
  • Bump ueberdb2 to ^6.1.0 for the new method (see feat: add paged findKeys (memory-bounded iteration) ueberDB#981)
  • Add regression test that seeds 50 sessionstorage rows, forces a 4-key page, and asserts cleanup walks the full keyspace

Why

Per the reporter on #7830, after 2.7.3 their decade-old MariaDB install OOMed within ~15 minutes. Heap snapshots pinned retention on _pool → _allConnections → … → _command._rows → keys — the unbounded findKeys result tied up mysql2 PoolConnection row buffers (and dominated JS heap with millions of session keys). Paging at the DB layer turns that into bounded N-row reads.

Cleanup termination is now driven by an empty page rather than page.length < CLEANUP_PAGE_SIZE, so the loop still iterates a stubbed/throttled paged source correctly (caught by the new test on the first run).

Blocked on

Test plan

  • pnpm --filter ep_etherpad-lite run ts-check clean
  • mocha tests/backend/specs/SessionStore.ts — 29 passing, including the new "pages across a large sessionstorage keyspace" case ("removed 25 expired/stale sessions out of 50" confirms cross-page iteration)
  • CI green once ueberdb2 6.1.0 publishes

Notes for the maintainer

  • findKeysPaged falls back to JS-side slicing for backends that don't natively support it (sqlite/rusty/redis/mongo/...). Correctness preserved; OOM-resistance for those backends is a follow-up.
  • The same unbounded findKeys pattern exists in PadManager.ts:77 and AuthorManager.ts:{361,369,484,495}. Out of scope for this fix but worth tracking — happy to file as a follow-up issue.

Closes #7830

🤖 Generated with Claude Code

SessionStore._cleanup() previously called `findKeys('sessionstorage:*',
null)`, materialising every session key into a single array. On decade-
old MariaDB installs with millions of sessions this OOMs the node
process within ~15 minutes — see #7830.

Switch to ueberdb2 6.1.0's findKeysPaged with a 500-key page size, and
yield to the event loop between pages so the DB driver can release each
page's buffered rows and request handlers can interleave.

The break is now driven by `page.length === 0` rather than `page.length
< CLEANUP_PAGE_SIZE` so a stubbed/throttled paged source still iterates
the full keyspace.

Adds a regression test that seeds 50 sessionstorage rows, monkey-patches
`DB.findKeysPaged` to use a 4-key page, runs cleanup, and asserts every
expired row is removed plus every valid row preserved across page
boundaries.

Closes #7830

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Fix session cleanup OOM via paged database iteration

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Switch SessionStore cleanup from unbounded findKeys() to paged findKeysPaged() with 500-key
  limit
• Yield to event loop between pages for DB driver buffer release and request interleaving
• Bump ueberdb2 dependency to ^6.1.0 for new paging method support
• Add regression test verifying cleanup iterates full keyspace across page boundaries
Diagram
flowchart LR
  A["SessionStore._cleanup()"] -->|"old: findKeys loads all"| B["OOM on large keyspaces"]
  A -->|"new: findKeysPaged 500-key pages"| C["Bounded memory usage"]
  C -->|"yield between pages"| D["DB buffers released"]
  D -->|"request handlers interleave"| E["Stable operation"]

Loading

File Changes

1. src/node/db/DB.ts ✨ Enhancement +2/-1

Export findKeysPaged method from DB wrapper

• Added findKeysPaged to the list of DB methods wrapped for export
• Added type guard to skip wrapping if method doesn't exist on DB instance

src/node/db/DB.ts


2. src/node/db/SessionStore.ts 🐞 Bug fix +44/-20

Implement paged iteration for session cleanup

• Added CLEANUP_PAGE_SIZE constant set to 500 keys per page
• Refactored _cleanup() to use findKeysPaged() with pagination loop instead of single
 findKeys() call
• Added scanned counter to track total keys examined across all pages
• Implemented cursor-based iteration with after parameter for page navigation
• Added defensive check to detect buggy backends returning cursor key
• Added setImmediate() yield between pages to release DB buffers and allow request interleaving
• Updated log message to report removed count vs scanned count instead of total keys

src/node/db/SessionStore.ts


3. src/tests/backend/specs/SessionStore.ts 🧪 Tests +52/-0

Add regression test for paged cleanup iteration

• Added regression test "pages across a large sessionstorage keyspace"
• Seeds 25 expired and 25 valid sessions with unique tags
• Monkey-patches DB.findKeysPaged to enforce 4-key page size for testing
• Verifies all expired sessions are removed and valid sessions preserved across pages
• Asserts minimum 12 paged calls to confirm cross-page iteration occurred

src/tests/backend/specs/SessionStore.ts


View more (1)
4. src/package.json Dependencies +1/-1

Update ueberdb2 to support findKeysPaged

• Bumped ueberdb2 dependency from ^6.0.3 to ^6.1.0

src/package.json


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 22, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Missing DB method hidden ✓ Resolved 🐞 Bug ☼ Reliability
Description
DB.init() now silently skips exporting findKeysPaged if it is missing, but SessionStore._cleanup()
calls DB.findKeysPaged unconditionally, which will crash later at runtime with a TypeError instead
of failing fast at startup.
Code

src/node/db/DB.ts[R50-53]

Evidence
DB.init() will now continue if a method is absent, which can leave exports.findKeysPaged unset.
Cleanup uses DB.findKeysPaged(...) without checking for existence, so a missing export becomes a
runtime crash path.

src/node/db/DB.ts[39-56]
src/node/db/SessionStore.ts[87-99]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/node/db/DB.ts` now does `if (typeof f !== 'function') continue;` when wiring DB methods. If `findKeysPaged` is missing for any reason (backend mismatch, dependency resolution issue), `DB.findKeysPaged` will be undefined, but `SessionStore._cleanup()` calls it unconditionally, causing a delayed runtime crash.
## Issue Context
`findKeysPaged` is now a hard requirement for cleanup. Skipping export hides misconfiguration and shifts failure from startup to the first cleanup run.
## Fix Focus Areas
- Require `findKeysPaged` to exist during `DB.init()` and throw an explicit error if missing.
- Consider keeping the guard for truly-optional methods, but not for methods that the app requires.
## Fix Focus Areas (code references)
- src/node/db/DB.ts[39-56]
- src/node/db/SessionStore.ts[87-99]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Cleanup can run unbounded ✓ Resolved 🐞 Bug ☼ Reliability
Description
SessionStore._cleanup() now stops only when findKeysPaged returns an empty page, so under sustained
creation of new sessionstorage keys it can run for an unbounded time and delay the next scheduled
cleanup indefinitely.
Code

src/node/db/SessionStore.ts[R91-99]

Evidence
The cleanup loop’s only normal termination is an empty page, and the cleanup scheduler waits for
_cleanup() to finish before scheduling the next run. SessionStore is used by express-session
middleware, so sessions can be created during cleanup and extend the keyspace being paged through.

src/node/db/SessionStore.ts[54-66]
src/node/db/SessionStore.ts[87-129]
src/node/hooks/express.ts[203-246]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`SessionStore._cleanup()` uses `while (true)` and terminates only when `findKeysPaged()` returns an empty page. If new session keys are continually being created while cleanup runs, the loop can keep discovering more keys and take unbounded time, which also delays the next cleanup run because scheduling happens only after completion.
## Issue Context
Session cleanup runs in-process alongside request handling, and the next cleanup run is scheduled only after the current run finishes.
## Fix Focus Areas
- Add an upper bound to each cleanup run (time budget and/or max keys/pages scanned).
- Once the budget is hit, log that cleanup yielded early and let the next scheduled run continue later.
### Suggested approach
- Track `const start = Date.now()` and break if `Date.now() - start > MAX_CLEANUP_RUNTIME_MS`.
- Or track `scanned` and break after `MAX_SCANNED_KEYS`.
## Fix Focus Areas (code references)
- src/node/db/SessionStore.ts[54-66]
- src/node/db/SessionStore.ts[87-129]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Silent cursor-stall break 🐞 Bug ◔ Observability
Description
If the backend returns a page whose first key is <= the previous cursor, _cleanup() breaks without
logging, silently skipping remaining keys and leaving expired/stale sessions behind.
Code

src/node/db/SessionStore.ts[R99-103]

Evidence
The code explicitly breaks on a non-advancing cursor and does not emit any log/metric, so a backend
pagination regression can cause incomplete cleanup with no signal besides leftover DB rows.

src/node/db/SessionStore.ts[93-103]
src/node/db/SessionStore.ts[124-133]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`SessionStore._cleanup()` has a defensive break when pagination does not advance (`page[0] <= after`). This prevents an infinite loop, but it currently fails silently, which makes partial cleanup hard to detect and debug.
## Issue Context
This is guarding against backend bugs (inclusive cursor or unsorted pages). Breaking is safer than looping forever, but there should be an operator-visible signal.
## Fix Focus Areas
- Log a warning/error when this condition triggers (include `after` and `page[0]`).
- Prefer filtering out keys `<= after` and continuing if any keys remain, only breaking if the filtered page is empty.
## Fix Focus Areas (code references)
- src/node/db/SessionStore.ts[93-129]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

4. Test cleanup not symmetric ✓ Resolved 🐞 Bug ☼ Reliability
Description
The new paging regression test only deletes validSids in finally; if cleanup regresses or throws
before removals, expiredSids can be left behind and accumulate across runs (even though they are
tagged).
Code

src/tests/backend/specs/SessionStore.ts[R359-362]

Evidence
The test creates expiredSids rows but the cleanup in finally iterates only validSids, so
failures can leave the expired rows present.

src/tests/backend/specs/SessionStore.ts[318-362]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new test seeds 25 expired + 25 valid sessionstorage rows. In the `finally` block it removes only `validSids`. If `_cleanup()` fails or the test fails before assertions, the expired rows may remain and accumulate across repeated test runs.
## Issue Context
The rows are tagged, so functional interference is limited, but leaving DB state behind can still slow tests over time.
## Fix Focus Areas
- In `finally`, delete both `expiredSids` and `validSids` (or delete by the shared tag prefix).
## Fix Focus Areas (code references)
- src/tests/backend/specs/SessionStore.ts[318-362]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/node/db/DB.ts
Four follow-ups raised by Qodo on the session cleanup paging fix:

- DB.ts: fail-fast at init() if any required wrapper method (incl.
  findKeysPaged) is missing, so a stale ueberdb2 pin surfaces at boot
  rather than crashing the first cleanup run an hour later.
- SessionStore: bound a single _cleanup() run to 10 minutes. Under
  sustained session creation the keyspace can grow faster than cleanup
  drains it; without a budget the next scheduled run would never fire.
  When the budget hits, log a warning and let the next run continue.
- SessionStore: log the defensive `page[0] <= after` cursor-stall break.
  Previously the loop exited silently, leaving expired rows behind with
  no operator-visible signal of the backend regression.
- Tests: the paged-cleanup regression test now removes both expiredSids
  AND validSids in finally, so a failed assertion doesn't leak rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

Addressed Qodo's four findings in 63c2493:

  1. Missing DB method hiddenDB.init() now throws fast on any missing wrapper method (no more silent skip).
  2. Unbounded cleanup — single _cleanup() run capped at 10 minutes; warning logged on exhaustion, next scheduled run picks up.
  3. Silent cursor-stall break — defensive page[0] <= after guard now logs an error with both after and page[0].
  4. Test cleanup symmetry — finally block now removes both expiredSids and validSids.

29 SessionStore backend tests still green; ts-check clean. (Wider CI still blocked on the ueberdb2 6.1.0 npm publish.)

CHANGELOG.md picks up an entry under 3.1.0 Notable fixes describing the
OOM cause, the paged iteration, the 10-minute per-run budget, the
cursor-stall logging, and the fail-fast init guard.

settings.json.template's sessionCleanup comment adds the page-size,
budget, and pointer to #7830 so admins can reason about the new
behaviour from the template alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

Permanent fix path is now ether/ueberDB#983 — migrates the ueberDB publish workflow from a stored NPM_TOKEN (the credential that just expired and broke #981's auto-publish) to npm OIDC trusted publishing. No expiring secret on our side, signed provenance attached. One-time setup on https://www.npmjs.com/package/ueberdb2/access required after merge — full recipe in the #983 description and at the top of the publish-npm job.

Once #983 is merged and the npm UI step is done, the next push to ueberDB main publishes ueberdb2@6.1.2 via OIDC. I'll then regenerate this PR's lockfile against ^6.1.0 and CI goes green for good. (#982 closed — that was the git-ref stopgap and is no longer needed.)

Now that ether/ueberDB#983 unblocked the publish workflow (OIDC trusted
publishing), ueberdb2 6.1.2 is live on npm and the `^6.1.0` pin in
src/package.json resolves cleanly. Resolves the ERR_PNPM_OUTDATED_LOCKFILE
that was blocking CI on this PR.

29 SessionStore backend tests still green against the published tarball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear merged commit d018003 into develop May 22, 2026
34 checks passed
@JohnMcLear JohnMcLear deleted the fix/7830-session-cleanup-paged branch May 22, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excessive Memory Use by Session Cleanup Leading to OOM

1 participant