Skip to content

feat: Adding Codex tracing and setup skill#14

Draft
duncankmckinnon wants to merge 4 commits intomainfrom
codex
Draft

feat: Adding Codex tracing and setup skill#14
duncankmckinnon wants to merge 4 commits intomainfrom
codex

Conversation

@duncankmckinnon
Copy link
Copy Markdown
Contributor

@duncankmckinnon duncankmckinnon commented Mar 19, 2026

Summary

This PR upgrades Codex tracing to use a local OTEL collector that captures Codex’s native telemetry and turns it into richer Arize/Phoenix trace trees.

Instead of relying only on the notify hook’s flat turn-level span, Codex now routes OTEL log events to a local collector, and the notify hook drains those events to build:

  • a parent Turn LLM span
  • TOOL child spans for tool calls
  • request child spans for Codex API/websocket activity

It also updates the installer, docs, and setup skill so new users can configure tracing out of the box with the current Codex config schema.

What changed

Collector architecture

  • Added a lightweight local OTEL collector at plugins/codex-tracing/scripts/collector.py
  • Added collector lifecycle helpers in plugins/codex-tracing/scripts/collector_ctl.sh
  • Collector now buffers Codex OTEL events by conversation id
  • Collector supports a drain flow for returning only new events since the last processed watermark
  • Collector now uses a threaded HTTP server so drain requests and incoming OTEL POSTs can run concurrently

Notify hook span building

  • Updated plugins/codex-tracing/hooks/notify.sh to:
    • normalize input-messages to the latest user message instead of storing full history arrays
    • drain collector events and build multi-span OTLP payloads
    • create TOOL child spans from codex.tool_decision + codex.tool_result
    • create INTERNAL/API request child spans from codex.websocket_request / codex.api_request
    • enrich the parent Turn span with model metadata, turn/thread ids, and trace duration
    • keep aggregate token totals on the parent Turn span only

Setup and docs

  • Updated README.md to document the current collector-based setup
  • Updated plugins/codex-tracing/skills/setup-codex-tracing/SKILL.md to match the new architecture and config shape
  • Updated plugins/codex-tracing/install.sh to write the collector OTLP config and start/auto-start the collector
  • Added .gitignore entries for pycache/ and *.pyc

Metadata now captured

Parent Turn span

  • latest user input
  • assistant output
  • session.id
  • codex.thread_id
  • codex.turn_id
  • model name
  • aggregate token totals
  • total trace latency via codex.trace.duration_ms

Child spans

  • TOOL spans:
    • tool name
    • approval status
    • tool output summary
  • Request spans:
    • model
    • status
    • attempt
    • duration
    • auth mode
    • connection reuse

Why

Recent Codex versions emit useful OTEL telemetry, but the previous setup/docs were using outdated config syntax and only produced flat turn spans. This change makes Codex tracing much more useful in Arize/
Phoenix by surfacing tool calls and request metadata as child spans, while keeping the setup simple for end users.

Notes

  • Token totals are intentionally attached only to the parent Turn span because Codex currently exposes usage at the completed-turn level, not per individual request.
  • The collector path now matches the current Codex config schema:

Testing

Validated locally by:

  • fixing Codex config parsing for OTEL exporter setup
  • confirming collector ingestion of native Codex OTEL events
  • confirming child TOOL spans and request spans appear in successful turns
  • confirming setup docs/skill align with the working collector-based flow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant