feat(observability): add opt-in OpenTelemetry spans, session logs, and metrics#753
feat(observability): add opt-in OpenTelemetry spans, session logs, and metrics#753j5ik2o wants to merge 31 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughOpenTelemetry のセッションログ機能を導入。スパン→NDJSON 変換器、SessionLogSpanProcessor による JSONL 出力、MonitorJsonMetricExporter、OTel 初期化への条件付き組み込み、ワークフロー/フェーズ/ステップの観測属性(サニタイズ・ワークフロースタック・タイムスタンプ等)拡張、ブートストラップ経由の有効化制御、および対応テストを追加して検証しています。 ChangesSession Log Exporter統合
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/logging/span-to-ndjson-mapper.ts`:
- Line 108: The code currently emits failureCategory using a type cast
(failureCategory as AgentFailureCategory) which does not validate runtime input;
add a runtime guard (e.g., implement and use an isAgentFailureCategory(value):
value is AgentFailureCategory predicate) and only include failureCategory in the
mapped object when that predicate returns true; update the span-to-ndjson-mapper
logic around the failureCategory spread to call
isAgentFailureCategory(failureCategory) instead of casting so only allowed
values are logged.
In `@src/infra/observability/sessionLogSpanProcessor.ts`:
- Line 29: Wrap the appendNdjsonLine calls in a try/catch so failures don't
propagate through trace processing: for the call using
appendNdjsonLine(this.shadowLogPath, startRecord) catch any error, log a clear
error message (including this.shadowLogPath and which record — e.g. startRecord)
via the module's logger (or console.error if none) and swallow the exception;
apply the same pattern to the other appendNdjsonLine invocations referenced (the
calls at the other two sites, e.g. passing endRecord and the error record) so
none of these writes can throw back into the trace flow.
- Around line 11-16: The SessionLogSpanProcessorOptions currently exposes
allowSensitiveData which should be resolved at the boundary; remove
allowSensitiveData from the SessionLogSpanProcessorOptions interface and change
the processor to accept a resolved sanitization artifact (e.g., a boolean flag
named isSanitizationEnabled or a sanitizer function) so the processor only
writes logs and does not re-resolve config; update callers (bootstrap/creator)
to compute allowSensitiveData once and pass the resolved value into the
processor constructor/function (refer to SessionLogSpanProcessorOptions and the
SessionLogSpanProcessor/constructor usage) so no global/env resolution occurs
inside the processor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: b4d7973d-9e75-4a71-b360-c3b21a27f610
📒 Files selected for processing (14)
src/__tests__/otelFoundation.test.tssrc/__tests__/sessionLogSpanProcessor.test.tssrc/__tests__/span-to-ndjson-mapper.test.tssrc/__tests__/workflowExecution-session-loading.test.tssrc/__tests__/workflowSpans.test.tssrc/core/logging/span-to-ndjson-mapper.tssrc/core/workflow/engine/WorkflowEngine.tssrc/core/workflow/engine/WorkflowRunLoop.tssrc/core/workflow/observability/workflowSpans.tssrc/core/workflow/types.tssrc/features/tasks/execute/workflowExecution.tssrc/features/tasks/execute/workflowExecutionBootstrap.tssrc/infra/observability/otelFoundation.tssrc/infra/observability/sessionLogSpanProcessor.ts
[codex] Add phase and judge OpenTelemetry spans
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/workflow/engine/ParallelRunner.ts`:
- Around line 202-216: The code is resolving observability settings inside the
core runner; update the wiring so resolved flags are provided on
ParallelRunnerDeps and consumed by ParallelRunner instead of reading
engineOptions here. Add boolean observabilityEnabled (or similar) and
sanitizeObservabilityText on ParallelRunnerDeps at construction/wiring time,
remove direct reads of this.deps.engineOptions.observability?.enabled and
this.deps.engineOptions.sanitizeObservabilityText, and change the
runWithPhaseSpan invocation in ParallelRunner (the block calling executeAgent)
to use this.deps.observabilityEnabled and this.deps.sanitizeObservabilityText;
keep runWithPhaseSpan, ParallelRunner, and ParallelRunnerDeps identifiers to
locate the changes.
In `@src/core/workflow/engine/TeamLeaderRunner.ts`:
- Around line 102-112: The code in runTeamLeaderStep directly reads
this.deps.engineOptions (e.g., this.deps.engineOptions.observability?.enabled
and this.deps.engineOptions.sanitizeObservabilityText) which mixes option
resolution into runtime logic; instead, add resolved observability fields to
TeamLeaderRunnerDeps (for example observabilityEnabled: boolean and
sanitizeObservabilityText: boolean/string) at wiring time and update
runTeamLeaderStep to use those new deps values when calling runWithPhaseSpan
(replace this.deps.engineOptions.observability?.enabled === true with
this.deps.observabilityEnabled, and replace
this.deps.engineOptions.sanitizeObservabilityText with
this.deps.sanitizeObservabilityText), leaving runWithPhaseSpan and
getWorkflowName usage unchanged; ensure TeamLeaderRunnerDeps type and
constructors/factories are updated accordingly so execution code only consumes
resolved values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 24e97884-9e03-40d3-8fd5-bd2571fb304e
📒 Files selected for processing (10)
src/__tests__/workflowSpans.test.tssrc/core/workflow/engine/OptionsBuilder.tssrc/core/workflow/engine/ParallelRunner.tssrc/core/workflow/engine/StepExecutor.tssrc/core/workflow/engine/TeamLeaderRunner.tssrc/core/workflow/engine/WorkflowEngineSetup.tssrc/core/workflow/observability/workflowSpans.tssrc/core/workflow/phase-runner.tssrc/core/workflow/report-phase-runner.tssrc/core/workflow/status-judgment-phase.ts
…on-exporter feat(observability): add monitor.json metric exporter
|
ありがとうございます。 こちらでも少し確認していたのですが、 そのため、 結果として、後続 run の 今の段階で対処しておくと後が楽かなと思うので、一度確認いただけますでしょうか。 |
|
@nrslib ありがとうございます。確認します! |
5dbbc26 to
6fea84e
Compare
The workflow span set `takt.workflow.abort.reason` directly from the outcome without sanitization, while the canonical NDJSON session log runs the same reason through `sanitizeText`. Because the abort reason embeds the raw agent error/content (`Step "<name>" failed: <error|content>`), the shadow `.jsonl` leaked unredacted agent output in the default configuration (redaction on), breaking redaction parity. Thread `sanitizeText` through `WorkflowSpanParams` (wired from `options.sanitizeObservabilityText` like the step/phase spans) and apply it to the abort reason in `recordWorkflowOutcome`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`mapPhaseComplete` gated on `takt.phase.system_prompt` / `takt.phase.user_instruction` being present, but those fields are not part of `NdjsonPhaseComplete`. On the judge error path (when `judgeStatus` throws before `onStructuredPromptResolved` fires) the prompt parts are never resolved, so the span lacks those attributes and the mapper silently dropped the `phase_complete(status=error)` record — while the canonical session log emits it unconditionally from the judge catch block. That diagnostic record was lost from the shadow log. Gate `mapPhaseComplete` only on step/phase/phaseName/phaseExecutionId/ status. The previous parity test that enshrined the drop is rewritten to assert phase_start is still omitted (no resolved prompt parts) while phase_complete is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase spans defer their phase_start record to span onEnd because the system_prompt/user_instruction attributes are only populated then. But judge-stage child spans end *before* their parent phase span, so the processor wrote `phase_judge_stage` records ahead of the parent `phase_start`, inverting the canonical order (phase_start -> judge_stage(s) -> phase_complete) for every step that goes through Phase 3 status judgment. Buffer judge-stage records (keyed by runId + phaseExecutionId) while their parent phase span is open, and flush them right after the deferred phase_start at the phase span onEnd. Record content/timestamps are unchanged; only ordering is corrected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Only step spans carried `takt.workflow.current_name` / `takt.workflow.stack`; phase and judge spans set no stack attributes, so every shadow phase_start / phase_complete / phase_judge_stage record was emitted without the `workflow` and `stack` fields that the canonical session log always includes. For nested workflow_call runs this dropped the information identifying which (sub)workflow a phase belonged to. Add a `workflowStack` field to PhaseSpanParams / JudgeStageSpanParams, spread `workflowStackAttributes` in buildPhaseAttributes / buildJudgeStageAttributes, and thread `getCurrentWorkflowStack()` (sourced from the engine's active resume point, same as the step span) through every phase/judge call site: StepExecutor, report-phase-runner, status-judgment-phase, ParallelRunner, ArpeggioRunner, TeamLeaderRunner, and team-leader-part-runner, via the runner deps / PhaseRunnerContext. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canonical step_start record includes providerOptions and providerOptionsSources whenever a step resolves them, but the step span only emitted provider/model name+source, so the shadow step_start dropped them. Span attributes cannot hold objects, so serialize both as JSON on the step span (separate from providerAttributes() to keep the JSON blob out of metric cardinality) and parse them back in mapStepStart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lisions Two defensive hardenings for the shadow exporters: - sanitizeSpanText returned the RAW text when no sanitizer was threaded to a span call site (fail-open). If any future call site forgets to pass sanitizeText while observability is enabled, raw prompts/responses would leak. Redact to a placeholder instead. - SessionLogSpanProcessor.register / MonitorJsonMetricExporter.register silently overwrote an existing registration on a runId collision, misrouting a live run's records and emitting a duplicate workflow_start. Keep the original registration and warn instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every workflow metric is keyed by takt.run.id, and the metric SDK caps each instrument at 2000 distinct attribute sets. A long-lived process with many overlapping runs eventually overflows: new runs' series merge into an attribute-less overflow bucket, so their run-scoped monitor.json filters to empty and is silently never written. Memory stays bounded (the cap is real), so this is not the unbounded leak it first looked like, but the empty-file outcome was silent. Detect the otel.metric.overflow marker and warn once so the degradation is observable. (Full per-run metric isolation would need a larger refactor.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A NUL byte slipped into the `pendingJudgeStages` doc comment when the judge-stage ordering buffer was added, which made git treat the source as binary and `file(1)` report it as data. Replace it with a space; the runtime buffer key already used a regular space, so behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… phase Commit e3d85fc dropped the system_prompt/user_instruction gate from mapPhaseComplete to restore the judge error path, but applied it to ALL phases. The canonical log only emits phase_complete unconditionally for the judge phase (its catch fires onPhaseComplete even without resolved prompts). For execute/report, the canonical onPhaseComplete is reached only after prompts resolve (StepExecutor has no try/catch around the phase span; report guards on didEmitPhaseStart). So when an agent throws before prompt resolution, the relaxed mapper emitted a phase_complete the canonical log lacks — and an orphaned one, since mapPhaseStart still requires prompts so no phase_start precedes it. Keep the system_prompt/user_instruction gate for non-judge phases; exempt only the judge phase. Adds execute/report early-throw regression tests. Self-review found this as a regression introduced by the earlier fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guard The duplicate-runId guard added in 4d8de4f had a test for the session-log processor but not for the metric exporter. Assert that a colliding registration is ignored (the second monitor path is never written) and that its returned disposer is a no-op that leaves the original active. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
55a9bc5 to
9309a29
Compare
|
マージできる状態になりました。 |
概要
この PR は OpenTelemetry ベースの observability を opt-in で追加するため、次の 3 issue 分をまとめて対応します。既存の NDJSON session log / workflow 実行の挙動は維持しつつ、サイレントリリース可能な shadow 出力と metric 出力を追加します。
SessionLogExporterを追加し、既存 NDJSON session log と突き合わせ可能な{sessionId}-otel-session-shadow.jsonlを出力します。monitor.jsonexporter を追加します。Closes #701
Closes #702
Closes #703
変更内容
observability.enabledと各 exporter 設定が有効な場合のみ、shadow session log とmonitor.jsonを追加出力workflow_start/step_start/step_complete/phase_start/phase_complete/phase_judge_stage/workflow_complete/workflow_abortへ変換する mapper とSpanProcessorを追加monitor.jsonとして出力[redacted]にするdocs/configuration*.mdに追記レビュー対応での補強
取り込み済み PR
検証
lint: passtest: passe2e-mock: pass