v0.4.0: Fix correctness bugs, add lossless reconnect, outbound track…#3
Merged
ajit-zer07 merged 1 commit intomainfrom Mar 21, 2026
Merged
v0.4.0: Fix correctness bugs, add lossless reconnect, outbound track…#3ajit-zer07 merged 1 commit intomainfrom
ajit-zer07 merged 1 commit intomainfrom
Conversation
…ing, and operational resilience
Fix premature run finalization where Commitment messages synthesized
SESSION_STATE_RESOLVED without runtime confirmation, and decision.finalized
events auto-completed runs. Runs now only finalize via runtime authority
(session-snapshot or GetSession reconciliation).
Phase 0 — Critical fixes:
- Read clientVersion from package.json instead of hardcoded '0.2.0'
- Remove synthetic session.state.changed from Commitment normalization
- Remove decision.finalized auto-finalize in stream consumer
- Add finalizingPromise to prevent race conditions in concurrent finalization
- findByIdOrThrow now throws NotFoundException instead of plain Error
- Replace console.error with NestJS Logger in bootstrap
Phase 1 — Runtime integration:
- Implement POST /runs/:id/context endpoint
- Implement POST /runs/:id/projection/rebuild endpoint
- Persist runtime capabilities from Initialize response
- Add schema_version column to canonical events
- Fix webhook active column from integer to boolean
Phase 2 — Lossless reconnect & outbound tracking:
- Persist stream cursor after each event for lossless reconnect
- Recovery reads persisted cursor for accurate resume position
- Add run_outbound_messages table and repository
- Add GET /runs/:id/messages endpoint
- Add outboundMessages summary to RunStateProjection
Phase 3 — Signal & progress enrichment:
- Add signalType and severity fields to SendSignalDto
- Emit progress.reported for TaskUpdate, TaskComplete, TaskFail
Phase 4 — Operational resilience:
- Recovery returns batch result summary {recovered, failed}
- Durable webhook outbox with delivery tracking table
- Distributed recovery locking via PostgreSQL advisory locks
- Add 6 new Prometheus counters for observability
Phase 5 — New features & scalability:
- POST /runs/validate preflight endpoint
- Redis StreamHub strategy for horizontal scaling
- Streaming JSONL export via async generator
- Typed gRPC interfaces to reduce any casts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SESSION_STATE_RESOLVED— runs only completewhen the runtime reports resolution via
session-snapshotorGetSessionreconciliationmax(persisted_cursor, last_event_seq)run_outbound_messagestable tracks all kickoff/signal/context messages with lifecyclestatus
6 new Prometheus counters
POST /runs/validate(preflight),POST /runs/:id/context,POST /runs/:id/projection/rebuild,GET /runs/:id/messages,GET /runs/:id/export/stream(JSONL)StreamHubStrategyfor multi-instance SSE fan-outChanges by Phase
Phase 0: Critical Correctness Fixes
clientVersionfrom package.jsonapp-config.service.ts,run-executor.service.tsevent-normalizer.service.tsstream-consumer.service.tsfinalizingPromisestream-consumer.service.tsfindByIdOrThrow→NotFoundExceptionrun.repository.tsconsole.error→ NestJSLoggermain.tsPhase 1: Runtime Integration Fidelity
POST /runs/:id/context— send context updates to running sessionsPOST /runs/:id/projection/rebuild— rebuild projection from persisted eventsruntime_sessions)schema_versioncolumn onrun_events_canonicalwebhooks.activefrom integer to booleanPhase 2: Lossless Reconnect & Outbound Tracking
run_outbound_messagestable + repository +GET /runs/:id/messagesOutboundMessageSummaryinRunStateProjectionPhase 3: Signal & Progress Enrichment
signalType,severityfields onSendSignalDtoTaskUpdate/TaskComplete/TaskFail→ additionalprogress.reportedeventsPhase 4: Operational Resilience
{ recovered, failed }webhook_deliveriestablepg_try_advisory_lock)Phase 5: New Features & Scalability
POST /runs/validate— preflight validation without creating a runRedisStreamHubStrategyfor horizontal SSE scalingGET /runs/:id/export/stream— streaming JSONL exportsrc/runtime/grpc-types.ts— typed gRPC interfacesMigrations
0006_capabilities_and_stream.sql0007_canonical_schema_version.sql0008_webhook_active_boolean.sql0009_outbound_messages.sql0010_webhook_deliveries.sqlNew Files
src/storage/outbound-message.repository.tssrc/webhooks/webhook-delivery.repository.tssrc/events/redis-stream-hub.strategy.tssrc/runtime/grpc-types.tsTest plan
tsc --noEmit)npm run drizzle:migrateagainst a live databasePOST /runswith decision-mode request → verify run completes only via runtime session-snapshotPOST /runs/validatewith unsupported mode → verify error responsePOST /runs/:id/contextduring running session → verify context update flows through