Skip to content

fix(agents-server): preserve consumer_claims.lease_expires_at across heartbeats#4353

Open
kevin-dp wants to merge 1 commit into
mainfrom
fix-lease-expires-at-heartbeat
Open

fix(agents-server): preserve consumer_claims.lease_expires_at across heartbeats#4353
kevin-dp wants to merge 1 commit into
mainfrom
fix-lease-expires-at-heartbeat

Conversation

@kevin-dp
Copy link
Copy Markdown
Contributor

Summary

Fixes #4341 — the per-wake heartbeat path nulls out consumer_claims.lease_expires_at, leaving every active claim row without an expiry after the first heartbeat (~10s after dispatch).

Root cause

packages/agents-server/src/entity-registry.tsmaterializeHeartbeatClaim:

.set({
  lastHeartbeatAt: heartbeatAt,
  leaseExpiresAt: input.leaseExpiresAt ?? null,  // ← unconditionally writes null if not provided
  updatedAt: heartbeatAt,
})

packages/agents-server/src/routing/internal-router.ts:606-609 — the only production caller — never passes leaseExpiresAt. So every heartbeat overwrites the lease with null. The lease set correctly by materializeActiveClaim (from the upstream lease_ttl_ms, e.g. claimed_at + 30s) survives at most until the first heartbeat.

Observed live

Initial (at claim):  lease_expires_at = 2026-05-19T11:01:41.631Z   (claimed_at + 30s)
After 1 heartbeat:   lease_expires_at = null

Captured via the health endpoint claims.active[*] field — issue #4341 has the full trace.

The fix

Treat heartbeats as alive-pings only: update last_heartbeat_at and leave lease_expires_at alone unless the caller explicitly provides a new lease. The lease set by materializeActiveClaim from the upstream lease_ttl_ms stays authoritative.

 .set({
   lastHeartbeatAt: heartbeatAt,
-  leaseExpiresAt: input.leaseExpiresAt ?? null,
+  ...(input.leaseExpiresAt !== undefined
+    ? { leaseExpiresAt: input.leaseExpiresAt }
+    : {}),
   updatedAt: heartbeatAt,
 })

Callers that genuinely want to extend the lease can still pass leaseExpiresAt explicitly. The single production caller (internal-router.ts:606-609) doesn't, and shouldn't — it has no TTL signal to base an extension on, and the upstream lease is the authoritative window.

Tests

New integration test packages/agents-server/test/consumer-claim-registry.test.ts:

  1. Lease preserved — materialize an active claim with a lease, heartbeat without a lease, assert lease is still the original timestamp (not null).
  2. Lease extendable — materialize, heartbeat with an explicit leaseExpiresAt, assert the new lease is written.

Both run against the integration postgres backend, in the style of tag-stream-outbox-registry.test.ts.

Existing unit tests pass unchanged.

Not addressed in this PR

  • Pre-existing claim rows with lease_expires_at: null — claims that already lost their lease under the unfixed code won't recover. They'd need a reaper or admin command to clean up. Not currently a problem because nothing reaps on lease today, but worth knowing if a reaper is added later.

Base branch note

This PR targets fix-pull-wake (#4339), not main, because materializeHeartbeatClaim was introduced in #4308 which is part of the fix-pull-wake lineage but not yet in main. Merge order: this → fix-pull-wake → main. Independent of #4346 (the related #4340 fix); the two can land in either order.

🤖 Generated with Claude Code

Base automatically changed from fix-pull-wake to main May 19, 2026 12:20
…heartbeats

The per-wake heartbeat caller in callback-forward does not pass
leaseExpiresAt to materializeHeartbeatClaim. The registry method was
unconditionally writing `input.leaseExpiresAt ?? null`, so the first
heartbeat (~10s after dispatch) was nulling the lease and leaving every
active claim row without an expiry for the rest of its lifetime.

Treat heartbeats as alive-pings only: update last_heartbeat_at and leave
lease_expires_at alone unless the caller explicitly provides a new lease.
The lease set by materializeActiveClaim from the upstream lease_ttl_ms
stays authoritative.

Adds an integration test that materializes an active claim with a lease,
heartbeats without one, and verifies the lease is preserved (and a
second test verifying the lease IS updated when explicitly extended).

Fixes #4341.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kevin-dp kevin-dp force-pushed the fix-lease-expires-at-heartbeat branch from d6d95e2 to 9526cba Compare May 19, 2026 12:32
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.01%. Comparing base (ee3ef0f) to head (9526cba).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4353       +/-   ##
===========================================
+ Coverage   60.43%   74.01%   +13.57%     
===========================================
  Files         293       48      -245     
  Lines       28089     5591    -22498     
  Branches     7448     1719     -5729     
===========================================
- Hits        16976     4138    -12838     
+ Misses      11096     1440     -9656     
+ Partials       17       13        -4     
Flag Coverage Δ
electric-telemetry ?
elixir ?
packages/agents ?
packages/agents-mcp ?
packages/agents-runtime ?
packages/agents-server 74.01% <100.00%> (+0.08%) ⬆️
packages/agents-server-ui ?
packages/experimental ?
packages/react-hooks ?
packages/start ?
packages/typescript-client ?
packages/y-electric ?
typescript 74.01% <100.00%> (+13.79%) ⬆️
unit-tests 74.01% <100.00%> (+13.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pull-wake: heartbeat path nulls out lease_expires_at on consumer_claims

1 participant