fix(agents-server): preserve consumer_claims.lease_expires_at across heartbeats#4353
Open
kevin-dp wants to merge 1 commit into
Open
fix(agents-server): preserve consumer_claims.lease_expires_at across heartbeats#4353kevin-dp wants to merge 1 commit into
kevin-dp wants to merge 1 commit into
Conversation
…heartbeats The per-wake heartbeat caller in callback-forward does not pass leaseExpiresAt to materializeHeartbeatClaim. The registry method was unconditionally writing `input.leaseExpiresAt ?? null`, so the first heartbeat (~10s after dispatch) was nulling the lease and leaving every active claim row without an expiry for the rest of its lifetime. Treat heartbeats as alive-pings only: update last_heartbeat_at and leave lease_expires_at alone unless the caller explicitly provides a new lease. The lease set by materializeActiveClaim from the upstream lease_ttl_ms stays authoritative. Adds an integration test that materializes an active claim with a lease, heartbeats without one, and verifies the lease is preserved (and a second test verifying the lease IS updated when explicitly extended). Fixes #4341. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d6d95e2 to
9526cba
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4353 +/- ##
===========================================
+ Coverage 60.43% 74.01% +13.57%
===========================================
Files 293 48 -245
Lines 28089 5591 -22498
Branches 7448 1719 -5729
===========================================
- Hits 16976 4138 -12838
+ Misses 11096 1440 -9656
+ Partials 17 13 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #4341 — the per-wake heartbeat path nulls out
consumer_claims.lease_expires_at, leaving every active claim row without an expiry after the first heartbeat (~10s after dispatch).Root cause
packages/agents-server/src/entity-registry.ts—materializeHeartbeatClaim:packages/agents-server/src/routing/internal-router.ts:606-609— the only production caller — never passesleaseExpiresAt. So every heartbeat overwrites the lease withnull. The lease set correctly bymaterializeActiveClaim(from the upstreamlease_ttl_ms, e.g. claimed_at + 30s) survives at most until the first heartbeat.Observed live
Captured via the health endpoint
claims.active[*]field — issue #4341 has the full trace.The fix
Treat heartbeats as alive-pings only: update
last_heartbeat_atand leavelease_expires_atalone unless the caller explicitly provides a new lease. The lease set bymaterializeActiveClaimfrom the upstreamlease_ttl_msstays authoritative..set({ lastHeartbeatAt: heartbeatAt, - leaseExpiresAt: input.leaseExpiresAt ?? null, + ...(input.leaseExpiresAt !== undefined + ? { leaseExpiresAt: input.leaseExpiresAt } + : {}), updatedAt: heartbeatAt, })Callers that genuinely want to extend the lease can still pass
leaseExpiresAtexplicitly. The single production caller (internal-router.ts:606-609) doesn't, and shouldn't — it has no TTL signal to base an extension on, and the upstream lease is the authoritative window.Tests
New integration test
packages/agents-server/test/consumer-claim-registry.test.ts:leaseExpiresAt, assert the new lease is written.Both run against the integration postgres backend, in the style of
tag-stream-outbox-registry.test.ts.Existing unit tests pass unchanged.
Not addressed in this PR
lease_expires_at: null— claims that already lost their lease under the unfixed code won't recover. They'd need a reaper or admin command to clean up. Not currently a problem because nothing reaps on lease today, but worth knowing if a reaper is added later.Base branch note
This PR targets
fix-pull-wake(#4339), notmain, becausematerializeHeartbeatClaimwas introduced in #4308 which is part of thefix-pull-wakelineage but not yet inmain. Merge order: this → fix-pull-wake → main. Independent of #4346 (the related #4340 fix); the two can land in either order.🤖 Generated with Claude Code