Issue Analysis: Document Processing Failures on main
Overview
The main branch fails to process documents submitted via the A2A protocol (message/send). Tasks immediately transition to failed state without executing the agent, and the tasks/get response lacks the artifacts field. The feature/document-analyzer branch does not have these issues.
Commit Origin
Reference commit: 6aa857f — docs(example): update document‑analyzer README with file text field
All issues arose after commit 6aa857f. That commit (on the feature/document-analyzer branch) only touched a README. The breaking changes were introduced by two subsequent commits merged into main from separate branches that forked from the same parent (700d111):
| Commit |
Date |
Description |
Issues Introduced |
1cc2a61 |
Mar 6, 2026 |
fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization |
Issue 1 (trace context mismatch) and Issue 4 (unbounded buffer) |
a6f2206 |
Mar 7, 2026 |
refactor(storage): harden memory layer, fix OOM risks, and optimize database indexes |
Storage API changes (additional offset param, interface changes) |
16f1353 |
Mar 8, 2026 |
style: apply consistent formatting and add comprehensive docstrings |
Docstring-only follow-up to 1cc2a61 |
The critical breaking commit is 1cc2a61. It changed the scheduler's _TaskOperation TypedDict from _current_span: Span to trace_id: str | None / span_id: str | None, but did not update the worker (bindu/server/workers/base.py) which still expects _current_span. This half-completed refactor causes every task to crash.
700d111 (common ancestor)
/ \
6aa857f 1cc2a61 ← scheduler trace refactor (BROKE worker contract)
(feature/ a6f2206 ← storage refactor
document- 16f1353 ← formatting follow-up
analyzer) |
6d189cb (HEAD of main)
Issues
1. Trace Context Mismatch — Worker Crashes on Every Task (CRITICAL)
Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)
Impact: All tasks fail immediately. No documents are ever processed.
The scheduler and worker have an incompatible interface for passing OpenTelemetry trace context:
| Component |
File |
Sends/Expects |
| Scheduler base type |
bindu/server/scheduler/base.py (L67–76) |
trace_id: str, span_id: str |
| InMemoryScheduler |
bindu/server/scheduler/memory_scheduler.py (L68–72) |
Sends trace_id/span_id strings |
| Worker |
bindu/server/workers/base.py (L130) |
Expects task_operation["_current_span"] (Span object) |
Commit 1cc2a61 changed _TaskOperation and InMemoryScheduler to use primitive trace_id/span_id strings, but did not update the worker (bindu/server/workers/base.py was not in the commit's changeset). The worker still calls use_span(task_operation["_current_span"]) which raises a KeyError on every task, caught by the broad except clause which marks the task as failed.
Fix required: Either:
- (A) Update
base.py worker to reconstruct a span from trace_id/span_id strings, or
- (B) Revert the scheduler to pass the live
_current_span Span object (matching feature/document-analyzer). This involves:
bindu/server/scheduler/base.py: Change _TaskOperation fields from trace_id/span_id back to _current_span: Span
bindu/server/scheduler/memory_scheduler.py: Remove _get_trace_context() helper; pass get_current_span() directly
- Same for
bindu/server/scheduler/redis_scheduler.py
2. Response Missing artifacts Field (CONSEQUENCE OF 1)
Impact: tasks/get returns a task with no artifacts.
This is not a separate bug — it is a direct consequence of Issue 1. The processing flow is:
message/send → task created (state: "submitted", no artifacts)
→ scheduled to worker
→ worker crashes on _current_span KeyError
→ task marked "failed" (no artifacts generated)
tasks/get → returns failed task without artifacts
Artifacts are only generated in ManifestWorker._handle_terminal_state() when state is "completed". Since the worker never reaches agent execution, no artifacts are ever created.
Fix required: Resolving Issue 1 will fix this — once tasks execute successfully, build_artifacts() will produce artifacts and update_task() will persist them.
3. Frontend Does Not Pass File Parts to Agent Messages (MODERATE)
Impact: Frontend file uploads are constructed but never reach the agent due to Issue 1. If Issue 1 is fixed, this path works correctly on main.
On main, frontend/src/lib/utils/agentMessageHandler.ts accepts a files parameter, builds FilePart objects with the A2A-required text field, and includes them in the message payload. The frontend/src/lib/server/endpoints/bindu/types.ts FilePart interface also requires text: string.
On feature/document-analyzer, the frontend file upload code is entirely removed — the files parameter is dropped and messages only contain TextPart. The FilePart type also drops the text field.
No fix required for backend processing — the curl-based API path works correctly for file uploads. The frontend code on main is structurally correct but untestable while Issue 1 exists.
4. InMemoryScheduler Uses Unbounded Buffer (MINOR)
Introduced by: 1cc2a61 (fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)
File: bindu/server/scheduler/memory_scheduler.py (L53–55)
On main, the anyio memory object stream is created with math.inf buffer:
anyio.create_memory_object_stream[TaskOperation](math.inf)
On feature/document-analyzer, it uses the default (unbuffered):
anyio.create_memory_object_stream[TaskOperation]()
The math.inf buffer was added to prevent a deadlock where the API server hangs if no worker is immediately ready to receive. However, an unbounded buffer can silently accumulate tasks during failures without backpressure.
Fix required: Evaluate whether a bounded buffer (e.g., 100) is more appropriate than math.inf, or keep the default if the worker startup is guaranteed before task submission.
Root Cause Chain
curl message/send (with file parts)
↓
Task submitted to storage (state: "submitted") ✅
↓
Task scheduled via InMemoryScheduler.run_task() ✅
sends: {operation: "run", params: ..., trace_id: "...", span_id: "..."}
↓
Worker._handle_task_operation() receives TaskOperation ✅
↓
Worker accesses task_operation["_current_span"] ❌ KeyError
↓
Exception caught → storage.update_task(state="failed") ← task never runs
↓
tasks/get returns: {state: "failed", NO artifacts}
Files Requiring Changes
| File |
Change |
Priority |
bindu/server/scheduler/base.py |
Fix _TaskOperation type to match worker expectations |
Critical |
bindu/server/scheduler/memory_scheduler.py |
Fix trace context passing to match _TaskOperation type |
Critical |
bindu/server/scheduler/redis_scheduler.py |
Fix trace context passing to match _TaskOperation type |
Critical |
bindu/server/workers/base.py |
Ensure _handle_task_operation matches scheduler's task operation format |
Critical |
Verification
After fixing, the following should work:
# 1. Send document
curl -X POST http://localhost:3773/ \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "test-001",
"method": "message/send",
"params": {
"message": {
"messageId": "msg-001",
"contextId": "ctx-001",
"taskId": "task-001",
"kind": "message",
"role": "user",
"parts": [
{"kind": "text", "text": "Analyze this document"},
{"kind": "file", "text": "paper.pdf", "file": {"name": "paper.pdf", "mimeType": "application/pdf", "bytes": "<base64>"}}
]
}
}
}'
# Expected: task in "submitted" state
# 2. Check task status (after processing)
curl -X POST http://localhost:3773/ \
-H 'Content-Type: application/json' \
-d '{
"jsonrpc": "2.0",
"id": "test-002",
"method": "tasks/get",
"params": {"taskId": "task-001"}
}'
# Expected: task in "completed" state WITH artifacts array
Issue Analysis: Document Processing Failures on
mainOverview
The
mainbranch fails to process documents submitted via the A2A protocol (message/send). Tasks immediately transition tofailedstate without executing the agent, and thetasks/getresponse lacks theartifactsfield. Thefeature/document-analyzerbranch does not have these issues.Commit Origin
Reference commit:
6aa857f—docs(example): update document‑analyzer README with file text fieldAll issues arose after commit
6aa857f. That commit (on thefeature/document-analyzerbranch) only touched a README. The breaking changes were introduced by two subsequent commits merged intomainfrom separate branches that forked from the same parent (700d111):1cc2a61fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serializationa6f2206refactor(storage): harden memory layer, fix OOM risks, and optimize database indexesoffsetparam, interface changes)16f1353style: apply consistent formatting and add comprehensive docstrings1cc2a61The critical breaking commit is
1cc2a61. It changed the scheduler's_TaskOperationTypedDict from_current_span: Spantotrace_id: str | None/span_id: str | None, but did not update the worker (bindu/server/workers/base.py) which still expects_current_span. This half-completed refactor causes every task to crash.Issues
1. Trace Context Mismatch — Worker Crashes on Every Task (CRITICAL)
Introduced by:
1cc2a61(fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)Impact: All tasks fail immediately. No documents are ever processed.
The scheduler and worker have an incompatible interface for passing OpenTelemetry trace context:
bindu/server/scheduler/base.py(L67–76)trace_id: str,span_id: strbindu/server/scheduler/memory_scheduler.py(L68–72)trace_id/span_idstringsbindu/server/workers/base.py(L130)task_operation["_current_span"](Span object)Commit
1cc2a61changed_TaskOperationandInMemorySchedulerto use primitivetrace_id/span_idstrings, but did not update the worker (bindu/server/workers/base.pywas not in the commit's changeset). The worker still callsuse_span(task_operation["_current_span"])which raises aKeyErroron every task, caught by the broadexceptclause which marks the task asfailed.Fix required: Either:
base.pyworker to reconstruct a span fromtrace_id/span_idstrings, or_current_spanSpan object (matchingfeature/document-analyzer). This involves:bindu/server/scheduler/base.py: Change_TaskOperationfields fromtrace_id/span_idback to_current_span: Spanbindu/server/scheduler/memory_scheduler.py: Remove_get_trace_context()helper; passget_current_span()directlybindu/server/scheduler/redis_scheduler.py2. Response Missing
artifactsField (CONSEQUENCE OF 1)Impact:
tasks/getreturns a task with no artifacts.This is not a separate bug — it is a direct consequence of Issue 1. The processing flow is:
Artifacts are only generated in
ManifestWorker._handle_terminal_state()when state is"completed". Since the worker never reaches agent execution, no artifacts are ever created.Fix required: Resolving Issue 1 will fix this — once tasks execute successfully,
build_artifacts()will produce artifacts andupdate_task()will persist them.3. Frontend Does Not Pass File Parts to Agent Messages (MODERATE)
Impact: Frontend file uploads are constructed but never reach the agent due to Issue 1. If Issue 1 is fixed, this path works correctly on
main.On
main,frontend/src/lib/utils/agentMessageHandler.tsaccepts afilesparameter, buildsFilePartobjects with the A2A-requiredtextfield, and includes them in the message payload. Thefrontend/src/lib/server/endpoints/bindu/types.tsFilePartinterface also requirestext: string.On
feature/document-analyzer, the frontend file upload code is entirely removed — thefilesparameter is dropped and messages only containTextPart. TheFileParttype also drops thetextfield.No fix required for backend processing — the curl-based API path works correctly for file uploads. The frontend code on
mainis structurally correct but untestable while Issue 1 exists.4. InMemoryScheduler Uses Unbounded Buffer (MINOR)
Introduced by:
1cc2a61(fix(scheduler): resolve anio buffer deadlock, cpu burn loop, and trace serialization)File:
bindu/server/scheduler/memory_scheduler.py(L53–55)On
main, the anyio memory object stream is created withmath.infbuffer:On
feature/document-analyzer, it uses the default (unbuffered):The
math.infbuffer was added to prevent a deadlock where the API server hangs if no worker is immediately ready to receive. However, an unbounded buffer can silently accumulate tasks during failures without backpressure.Fix required: Evaluate whether a bounded buffer (e.g., 100) is more appropriate than
math.inf, or keep the default if the worker startup is guaranteed before task submission.Root Cause Chain
Files Requiring Changes
bindu/server/scheduler/base.py_TaskOperationtype to match worker expectationsbindu/server/scheduler/memory_scheduler.py_TaskOperationtypebindu/server/scheduler/redis_scheduler.py_TaskOperationtypebindu/server/workers/base.py_handle_task_operationmatches scheduler's task operation formatVerification
After fixing, the following should work: