Skip to content

feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations#6377

Closed
bcsherma wants to merge 37 commits intomasterfrom
ben/agent-data-model
Closed

feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations#6377
bcsherma wants to merge 37 commits intomasterfrom
ben/agent-data-model

Conversation

@bcsherma
Copy link
Copy Markdown
Member

@bcsherma bcsherma commented Mar 18, 2026

Summary

Spike to validate whether OTel GenAI semantic conventions are sufficient to capture the full structure of LLM agent execution and whether we can build a complete storage + rendering pipeline on top of them. The result is a working end-to-end system covering ingest, normalization, storage, trajectory projection, and out-of-process agent tracing.

Design docs in docs/design/ — see design_review_discussion.md for the overview and design_review.html for slides.

What's in this branch

ClickHouse schema (migrations/026_genai.up.sql)

  • genai_spans — wide normalized table with typed columns for all GenAI fields (messages, tokens, model, agent name, operation type, conversation ID, tool calls, etc.). ReplacingMergeTree, partitioned by month, ordered by (project_id, started_at, span_id).
  • genai_span_attributes — typed EAV for custom OTel attributes not in the column set.
  • genai_agents / genai_conversations — SummingMergeTree tables auto-populated by materialized views. O(1) agent and conversation list queries.
  • entity_annotations — generic annotation EAV for spans, agents, conversations.

Ingest + normalization (weave/trace_server/opentelemetry/)

  • OTLP protobuf ingest endpoint at /otel/v1/genai/traces
  • extract_genai_span() with vendor fallback chains: normalizes OpenAI Agents SDK, Google ADK, Traceloop/OpenInference, and standard OTel GenAI attributes into the same columns
  • Message normalization: all provider formats → Array(Tuple(role, content, tool_call_id, tool_name))

Trajectory projection (weave/trace_server/genai_chat_view.py)

  • Read-time span → chat-view projection (never persisted)
  • Depth-first walk branching on operation_name → produces user_message, agent_start, agent_message, tool_call, agent_handoff, context_compacted
  • Multi-turn conversation composition via conversation_id

Read APIs

  • /genai/spans/query, /genai/spans/trace, /genai/spans/active
  • /genai/traces/chat, /genai/conversations/chat
  • /genai/agents/query, /genai/agents/metrics
  • /genai/conversations/query
  • /genai/annotations/upsert, /genai/annotations/delete, /genai/annotations/query

SDK instrumentations (weave/otel/instrumentors/)

  • OpenAI Agents SDK — auto-discovers agent instructions, tools, handoffs; patches reasoning token capture and context compaction tracking
  • Google ADK — auto-discovers agent instructions, tools, sub-agents; patches Gemini media capture
  • Anthropic — Claude SDK instrumentation
  • setup_tracing() one-call OTel configuration with instrument() per framework
  • ConversationIdInjector, SystemPromptInjector, ToolDefinitionsInjector span processors
  • log_content() for media capture (images, audio)
  • LiveSpanProcessor for real-time span-start notifications

Daemon for IDE agents (weave/agent_hooks/)

  • Out-of-process tracing for Cursor, Claude Code, and other standalone agents
  • Relay (stdlib-only Python) — thin stdin→HTTP forwarder invoked by IDE hooks
  • Daemon — long-running HTTP server that builds OTel spans from hook events
  • SpanBuilder — translates normalized events into OTel span lifecycle
  • Produces same OTLP protobuf as SDK path — trace server can't distinguish the source

Format interop

  • ATIF → OTel adapter (reference implementation in examples/otel_genai/)

Design docs (docs/design/)

  • architecture.md — system overview
  • data_model.md — span patterns, normalized schema, vendor fallback chains
  • chat_view_algorithm.md — trajectory projection algorithm
  • instrumentation_guide.md — how to emit data (SDK + daemon)
  • format_interoperability.md — adapter architecture + ATIF reference
  • design_review.html — presentation slides
  • design_review_discussion.md — Notion-ready discussion doc

Status

This is a spike / draft — not intended for merge as-is. The purpose is to validate the approach and inform a design discussion about how this relates to Weave's existing call model.

@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 bot commented Mar 18, 2026

@bcsherma bcsherma force-pushed the ben/agent-data-model branch from 17b81d9 to b98be30 Compare March 20, 2026 00:22
@bcsherma bcsherma force-pushed the ben/agent-data-model branch from a7f2440 to 9851c4c Compare March 20, 2026 19:33
@bcsherma bcsherma changed the title feat(weave): OTel-native GenAI agent data model with trajectory capture feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations Mar 24, 2026
@bcsherma bcsherma closed this Apr 12, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant