Skip to content

feat(observability): add opt-in OpenTelemetry spans, session logs, and metrics#753

Open
j5ik2o wants to merge 31 commits into
mainfrom
codex/issue-701-session-log-exporter
Open

feat(observability): add opt-in OpenTelemetry spans, session logs, and metrics#753
j5ik2o wants to merge 31 commits into
mainfrom
codex/issue-701-session-log-exporter

Conversation

@j5ik2o
Copy link
Copy Markdown
Collaborator

@j5ik2o j5ik2o commented May 21, 2026

概要

この PR は OpenTelemetry ベースの observability を opt-in で追加するため、次の 3 issue 分をまとめて対応します。既存の NDJSON session log / workflow 実行の挙動は維持しつつ、サイレントリリース可能な shadow 出力と metric 出力を追加します。

Closes #701
Closes #702
Closes #703

変更内容

  • observability.enabled と各 exporter 設定が有効な場合のみ、shadow session log と monitor.json を追加出力
  • workflow / step / phase / judge stage の span と metric を追加
  • span から workflow_start / step_start / step_complete / phase_start / phase_complete / phase_judge_stage / workflow_complete / workflow_abort へ変換する mapper と SpanProcessor を追加
  • Phase 1 execute、Phase 2 report、Phase 3 judge、judge stage を通常実行・parallel・team-leader・arpeggio の各経路で計装
  • metric exporter を runId 単位で登録・フィルタし、opt-in の monitor.json として出力
  • observability 用テキストは既存の redaction 方針に合わせて sanitize し、sanitizer がない場合は fail closed で [redacted] にする
  • 関連設定を docs/configuration*.md に追記

レビュー対応での補強

  • metrics instrument を遅延解決し、SDK 初期化前 import による NoOp 固定を回避
  • exporter registration を runId で分離し、runId 衝突時は既存登録を守るように変更
  • shadow session log の parity を調整し、nested workflow terminal、phase complete、judge stage ordering、workflow stack、provider options を既存 NDJSON に合わせる
  • monitor.json exporter の per-run 書き込み失敗を個別に隔離し、cardinality overflow を警告として表面化
  • phase span の抜け漏れを補い、parallel / team-leader part / arpeggio batch も Phase 1 metric に含める

取り込み済み PR

検証

  • GitHub Actions lint: pass
  • GitHub Actions test: pass
  • GitHub Actions e2e-mock: pass
  • CodeRabbit: pass

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

OpenTelemetry のセッションログ機能を導入。スパン→NDJSON 変換器、SessionLogSpanProcessor による JSONL 出力、MonitorJsonMetricExporter、OTel 初期化への条件付き組み込み、ワークフロー/フェーズ/ステップの観測属性(サニタイズ・ワークフロースタック・タイムスタンプ等)拡張、ブートストラップ経由の有効化制御、および対応テストを追加して検証しています。

Changes

Session Log Exporter統合

Layer / File(s) Summary
SpanSnapshot とNDJSON マッピング基盤
src/core/logging/span-to-ndjson-mapper.ts
SpanSnapshot と公開関数 mapSpanStartToNdjson/mapSpanEndToNdjson を追加。step./workflow. プレフィックスに基づき NDJSON レコードを生成し、ワークフロースタックの JSON 解析・要素バリデーション、属性抽出ユーティリティ、ISO8601 タイムスタンプ変換を実装。
SessionLogSpanProcessor 実装
src/infra/observability/sessionLogSpanProcessor.ts
SessionLogSpanProcessor を追加。コンストラクタで workflow_start を初回追記し、onStart/onEnd でスパンを snapshot 化→マッピング→ファイル追記。追記失敗はログ出力して例外を握りつぶす安全実装。
MonitorJsonMetricExporter 実装
src/infra/observability/monitorJsonMetricExporter.ts
PushMetricExporter 実装を追加。ResourceMetrics を JSON 化して monitor.json へ原子的に書き出す直列化/書込ロジック、shutdown ガード、forceFlush/shutdown の基本実装を提供。
OtelFoundation span processor 統合
src/infra/observability/otelFoundation.ts
OtelFoundationOptionsinitializeOtelFoundation(config, options?) を導入し、createSpanProcessors(config, options)SessionLogSpanProcessor を条件付き構成、createMetricReaders で MonitorJsonMetricExporter を条件付きに組み込んで NodeSDK に注入する初期化フローへ変更。
ワークフロー・ステップ・フェーズの観測属性拡張
src/core/workflow/observability/workflowSpans.ts, src/core/workflow/types.ts
メトリクス計測追加、WorkflowSpanOutcomeabortReason/iterationsStepSpanParamsinstruction/workflowStack/sanitizeText を追加。属性サニタイズ、workflowStack 属性化、record* 系の属性拡張、runWithPhaseSpan の追加を実装。WorkflowConfigsanitizeObservabilityText を追加。
WorkflowEngine・WorkflowRunLoop 統合
src/core/workflow/engine/WorkflowEngine.ts, src/core/workflow/engine/WorkflowRunLoop.ts
getCurrentWorkflowStack をランループへ渡し、runWithStepSpan 呼び出しに workflowStacksanitizeText を追加してフローに伝播。フル実行/単一イテレーションで abortReason/iterations を返却。
Phase ランナーと Runner の観測ラップ
src/core/workflow/engine/..., src/core/workflow/report-phase-runner.ts, src/core/workflow/status-judgment-phase.ts
ParallelRunner/StepExecutor/TeamLeaderRunner/ReportPhaseRunner/StatusJudgmentPhase でフェーズ実行を runWithPhaseSpan でラップし、getWorkflowNameobservabilityEnabled/sanitizeObservabilityText フックを注入してフェーズ単位の観測を取得するよう変更。
ブートストラップの sessionLogExporter 条件付き初期化
src/features/tasks/execute/workflowExecutionBootstrap.ts, src/features/tasks/execute/workflowExecution.ts
WorkflowExecutionBootstrapsanitizeObservabilityText を追加。initializeOtelFoundationobservability.enabled と各エクスポータ有効時のみ sessionLogExporter/monitorJsonExporter オプションを第2引数で渡すよう変更。
機能検証テストスイート
src/__tests__/otelFoundation.test.ts, src/__tests__/sessionLogSpanProcessor.test.ts, src/__tests__/span-to-ndjson-mapper.test.ts, src/__tests__/workflowExecution-session-loading.test.ts, src/__tests__/workflowSpans.test.ts, src/__tests__/monitorJsonMetricExporter.test.ts
テストを追加/更新して、sessionLogExporter 有効時の span processor 装着および shadow JSONL の workflow_start 内容検証、SessionLogSpanProcessor の出力順検証と追記失敗時の非例外化、マッピング仕様(step/workflow の開始・完了レコード)検証、initializeOtelFoundation 呼び出し引数期待更新、メトリクス計測と monitor.json 出力の検証、sanitizeText によるスパン属性サニタイズ検証を行う。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • nrslib/takt#745: セッションログ NDJSON マッピング・SessionLogSpanProcessor・ステップ/フェーズの観測属性拡張に関連する変更を含むため関連しています。
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR は issue #701 の目的を達成している:shadow SessionLogExporter の opt-in 実装、NDJSON パリティ確保、テスト追加。
Out of Scope Changes check ✅ Passed すべての変更が shadow session log exporter 実装に関連しており、スコープ外の変更は検出されない。
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed プルリクエストのタイトルは、追加される主な機能(OpenTelemetry spans、session logs、metrics)を明確に示しており、変更内容の要点を適切に要約している。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/issue-701-session-log-exporter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@j5ik2o j5ik2o marked this pull request as ready for review May 21, 2026 04:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/logging/span-to-ndjson-mapper.ts`:
- Line 108: The code currently emits failureCategory using a type cast
(failureCategory as AgentFailureCategory) which does not validate runtime input;
add a runtime guard (e.g., implement and use an isAgentFailureCategory(value):
value is AgentFailureCategory predicate) and only include failureCategory in the
mapped object when that predicate returns true; update the span-to-ndjson-mapper
logic around the failureCategory spread to call
isAgentFailureCategory(failureCategory) instead of casting so only allowed
values are logged.

In `@src/infra/observability/sessionLogSpanProcessor.ts`:
- Line 29: Wrap the appendNdjsonLine calls in a try/catch so failures don't
propagate through trace processing: for the call using
appendNdjsonLine(this.shadowLogPath, startRecord) catch any error, log a clear
error message (including this.shadowLogPath and which record — e.g. startRecord)
via the module's logger (or console.error if none) and swallow the exception;
apply the same pattern to the other appendNdjsonLine invocations referenced (the
calls at the other two sites, e.g. passing endRecord and the error record) so
none of these writes can throw back into the trace flow.
- Around line 11-16: The SessionLogSpanProcessorOptions currently exposes
allowSensitiveData which should be resolved at the boundary; remove
allowSensitiveData from the SessionLogSpanProcessorOptions interface and change
the processor to accept a resolved sanitization artifact (e.g., a boolean flag
named isSanitizationEnabled or a sanitizer function) so the processor only
writes logs and does not re-resolve config; update callers (bootstrap/creator)
to compute allowSensitiveData once and pass the resolved value into the
processor constructor/function (refer to SessionLogSpanProcessorOptions and the
SessionLogSpanProcessor/constructor usage) so no global/env resolution occurs
inside the processor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b4d7973d-9e75-4a71-b360-c3b21a27f610

📥 Commits

Reviewing files that changed from the base of the PR and between f8c2b4a and 2128323.

📒 Files selected for processing (14)
  • src/__tests__/otelFoundation.test.ts
  • src/__tests__/sessionLogSpanProcessor.test.ts
  • src/__tests__/span-to-ndjson-mapper.test.ts
  • src/__tests__/workflowExecution-session-loading.test.ts
  • src/__tests__/workflowSpans.test.ts
  • src/core/logging/span-to-ndjson-mapper.ts
  • src/core/workflow/engine/WorkflowEngine.ts
  • src/core/workflow/engine/WorkflowRunLoop.ts
  • src/core/workflow/observability/workflowSpans.ts
  • src/core/workflow/types.ts
  • src/features/tasks/execute/workflowExecution.ts
  • src/features/tasks/execute/workflowExecutionBootstrap.ts
  • src/infra/observability/otelFoundation.ts
  • src/infra/observability/sessionLogSpanProcessor.ts

Comment thread src/core/logging/span-to-ndjson-mapper.ts Outdated
Comment thread src/infra/observability/sessionLogSpanProcessor.ts
Comment thread src/infra/observability/sessionLogSpanProcessor.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/workflow/engine/ParallelRunner.ts`:
- Around line 202-216: The code is resolving observability settings inside the
core runner; update the wiring so resolved flags are provided on
ParallelRunnerDeps and consumed by ParallelRunner instead of reading
engineOptions here. Add boolean observabilityEnabled (or similar) and
sanitizeObservabilityText on ParallelRunnerDeps at construction/wiring time,
remove direct reads of this.deps.engineOptions.observability?.enabled and
this.deps.engineOptions.sanitizeObservabilityText, and change the
runWithPhaseSpan invocation in ParallelRunner (the block calling executeAgent)
to use this.deps.observabilityEnabled and this.deps.sanitizeObservabilityText;
keep runWithPhaseSpan, ParallelRunner, and ParallelRunnerDeps identifiers to
locate the changes.

In `@src/core/workflow/engine/TeamLeaderRunner.ts`:
- Around line 102-112: The code in runTeamLeaderStep directly reads
this.deps.engineOptions (e.g., this.deps.engineOptions.observability?.enabled
and this.deps.engineOptions.sanitizeObservabilityText) which mixes option
resolution into runtime logic; instead, add resolved observability fields to
TeamLeaderRunnerDeps (for example observabilityEnabled: boolean and
sanitizeObservabilityText: boolean/string) at wiring time and update
runTeamLeaderStep to use those new deps values when calling runWithPhaseSpan
(replace this.deps.engineOptions.observability?.enabled === true with
this.deps.observabilityEnabled, and replace
this.deps.engineOptions.sanitizeObservabilityText with
this.deps.sanitizeObservabilityText), leaving runWithPhaseSpan and
getWorkflowName usage unchanged; ensure TeamLeaderRunnerDeps type and
constructors/factories are updated accordingly so execution code only consumes
resolved values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 24e97884-9e03-40d3-8fd5-bd2571fb304e

📥 Commits

Reviewing files that changed from the base of the PR and between d3efed0 and e8d090d.

📒 Files selected for processing (10)
  • src/__tests__/workflowSpans.test.ts
  • src/core/workflow/engine/OptionsBuilder.ts
  • src/core/workflow/engine/ParallelRunner.ts
  • src/core/workflow/engine/StepExecutor.ts
  • src/core/workflow/engine/TeamLeaderRunner.ts
  • src/core/workflow/engine/WorkflowEngineSetup.ts
  • src/core/workflow/observability/workflowSpans.ts
  • src/core/workflow/phase-runner.ts
  • src/core/workflow/report-phase-runner.ts
  • src/core/workflow/status-judgment-phase.ts

Comment thread src/core/workflow/engine/ParallelRunner.ts
Comment thread src/core/workflow/engine/TeamLeaderRunner.ts
@nrslib
Copy link
Copy Markdown
Owner

nrslib commented May 23, 2026

ありがとうございます。

こちらでも少し確認していたのですが、initializeOtelFoundation() は process-global な singleton SDK を共有している一方で、sessionLogExporter.shadowLogPathmonitorJsonExporter.monitorPath は run ごとの値として渡されているように見えました。

そのため、takt runconcurrency > 1 で複数 run が同時に開始された場合、最初の run の exporter だけが SDK に登録され、後続 run の exporter option が使われない可能性がありそうです。

結果として、後続 run の monitor.json / *-otel-session-shadow.jsonl が出ない、または別 run のログに混ざる懸念があります。

今の段階で対処しておくと後が楽かなと思うので、一度確認いただけますでしょうか。

@j5ik2o
Copy link
Copy Markdown
Collaborator Author

j5ik2o commented May 24, 2026

@nrslib ありがとうございます。確認します!

@j5ik2o j5ik2o marked this pull request as draft May 24, 2026 04:06
@j5ik2o j5ik2o force-pushed the codex/issue-701-session-log-exporter branch from 5dbbc26 to 6fea84e Compare May 28, 2026 13:08
j5ik2o and others added 9 commits May 28, 2026 23:14
The workflow span set `takt.workflow.abort.reason` directly from the
outcome without sanitization, while the canonical NDJSON session log
runs the same reason through `sanitizeText`. Because the abort reason
embeds the raw agent error/content (`Step "<name>" failed: <error|content>`),
the shadow `.jsonl` leaked unredacted agent output in the default
configuration (redaction on), breaking redaction parity.

Thread `sanitizeText` through `WorkflowSpanParams` (wired from
`options.sanitizeObservabilityText` like the step/phase spans) and apply
it to the abort reason in `recordWorkflowOutcome`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`mapPhaseComplete` gated on `takt.phase.system_prompt` /
`takt.phase.user_instruction` being present, but those fields are not
part of `NdjsonPhaseComplete`. On the judge error path (when
`judgeStatus` throws before `onStructuredPromptResolved` fires) the
prompt parts are never resolved, so the span lacks those attributes and
the mapper silently dropped the `phase_complete(status=error)` record —
while the canonical session log emits it unconditionally from the judge
catch block. That diagnostic record was lost from the shadow log.

Gate `mapPhaseComplete` only on step/phase/phaseName/phaseExecutionId/
status. The previous parity test that enshrined the drop is rewritten to
assert phase_start is still omitted (no resolved prompt parts) while
phase_complete is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
j5ik2o and others added 9 commits May 30, 2026 12:44
Phase spans defer their phase_start record to span onEnd because the
system_prompt/user_instruction attributes are only populated then. But
judge-stage child spans end *before* their parent phase span, so the
processor wrote `phase_judge_stage` records ahead of the parent
`phase_start`, inverting the canonical order
(phase_start -> judge_stage(s) -> phase_complete) for every step that
goes through Phase 3 status judgment.

Buffer judge-stage records (keyed by runId + phaseExecutionId) while
their parent phase span is open, and flush them right after the deferred
phase_start at the phase span onEnd. Record content/timestamps are
unchanged; only ordering is corrected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Only step spans carried `takt.workflow.current_name` / `takt.workflow.stack`;
phase and judge spans set no stack attributes, so every shadow
phase_start / phase_complete / phase_judge_stage record was emitted
without the `workflow` and `stack` fields that the canonical session log
always includes. For nested workflow_call runs this dropped the
information identifying which (sub)workflow a phase belonged to.

Add a `workflowStack` field to PhaseSpanParams / JudgeStageSpanParams,
spread `workflowStackAttributes` in buildPhaseAttributes /
buildJudgeStageAttributes, and thread `getCurrentWorkflowStack()`
(sourced from the engine's active resume point, same as the step span)
through every phase/judge call site: StepExecutor, report-phase-runner,
status-judgment-phase, ParallelRunner, ArpeggioRunner, TeamLeaderRunner,
and team-leader-part-runner, via the runner deps / PhaseRunnerContext.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canonical step_start record includes providerOptions and
providerOptionsSources whenever a step resolves them, but the step span
only emitted provider/model name+source, so the shadow step_start
dropped them. Span attributes cannot hold objects, so serialize both as
JSON on the step span (separate from providerAttributes() to keep the
JSON blob out of metric cardinality) and parse them back in mapStepStart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lisions

Two defensive hardenings for the shadow exporters:

- sanitizeSpanText returned the RAW text when no sanitizer was threaded
  to a span call site (fail-open). If any future call site forgets to
  pass sanitizeText while observability is enabled, raw prompts/responses
  would leak. Redact to a placeholder instead.

- SessionLogSpanProcessor.register / MonitorJsonMetricExporter.register
  silently overwrote an existing registration on a runId collision,
  misrouting a live run's records and emitting a duplicate workflow_start.
  Keep the original registration and warn instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every workflow metric is keyed by takt.run.id, and the metric SDK caps
each instrument at 2000 distinct attribute sets. A long-lived process
with many overlapping runs eventually overflows: new runs' series merge
into an attribute-less overflow bucket, so their run-scoped monitor.json
filters to empty and is silently never written.

Memory stays bounded (the cap is real), so this is not the unbounded
leak it first looked like, but the empty-file outcome was silent. Detect
the otel.metric.overflow marker and warn once so the degradation is
observable. (Full per-run metric isolation would need a larger refactor.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A NUL byte slipped into the `pendingJudgeStages` doc comment when the
judge-stage ordering buffer was added, which made git treat the source
as binary and `file(1)` report it as data. Replace it with a space; the
runtime buffer key already used a regular space, so behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… phase

Commit e3d85fc dropped the system_prompt/user_instruction gate from
mapPhaseComplete to restore the judge error path, but applied it to ALL
phases. The canonical log only emits phase_complete unconditionally for
the judge phase (its catch fires onPhaseComplete even without resolved
prompts). For execute/report, the canonical onPhaseComplete is reached
only after prompts resolve (StepExecutor has no try/catch around the
phase span; report guards on didEmitPhaseStart). So when an agent throws
before prompt resolution, the relaxed mapper emitted a phase_complete the
canonical log lacks — and an orphaned one, since mapPhaseStart still
requires prompts so no phase_start precedes it.

Keep the system_prompt/user_instruction gate for non-judge phases; exempt
only the judge phase. Adds execute/report early-throw regression tests.

Self-review found this as a regression introduced by the earlier fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guard

The duplicate-runId guard added in 4d8de4f had a test for the session-log
processor but not for the metric exporter. Assert that a colliding
registration is ignored (the second monitor path is never written) and
that its returned disposer is a no-op that leaves the original active.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@j5ik2o j5ik2o force-pushed the codex/issue-701-session-log-exporter branch from 55a9bc5 to 9309a29 Compare May 31, 2026 16:28
@j5ik2o j5ik2o marked this pull request as ready for review June 1, 2026 01:03
@j5ik2o j5ik2o changed the title feat: add shadow session log exporter feat(observability): add opt-in OpenTelemetry spans, session logs, and metrics Jun 1, 2026
@j5ik2o
Copy link
Copy Markdown
Collaborator Author

j5ik2o commented Jun 1, 2026

マージできる状態になりました。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants