feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations by bcsherma · Pull Request #6377 · wandb/weave

bcsherma · 2026-03-18T06:02:53Z

Summary

Spike to validate whether OTel GenAI semantic conventions are sufficient to capture the full structure of LLM agent execution and whether we can build a complete storage + rendering pipeline on top of them. The result is a working end-to-end system covering ingest, normalization, storage, trajectory projection, and out-of-process agent tracing.

Design docs in docs/design/ — see design_review_discussion.md for the overview and design_review.html for slides.

What's in this branch

ClickHouse schema (`migrations/026_genai.up.sql`)

genai_spans — wide normalized table with typed columns for all GenAI fields (messages, tokens, model, agent name, operation type, conversation ID, tool calls, etc.). ReplacingMergeTree, partitioned by month, ordered by (project_id, started_at, span_id).
genai_span_attributes — typed EAV for custom OTel attributes not in the column set.
genai_agents / genai_conversations — SummingMergeTree tables auto-populated by materialized views. O(1) agent and conversation list queries.
entity_annotations — generic annotation EAV for spans, agents, conversations.

Ingest + normalization (`weave/trace_server/opentelemetry/`)

OTLP protobuf ingest endpoint at /otel/v1/genai/traces
extract_genai_span() with vendor fallback chains: normalizes OpenAI Agents SDK, Google ADK, Traceloop/OpenInference, and standard OTel GenAI attributes into the same columns
Message normalization: all provider formats → Array(Tuple(role, content, tool_call_id, tool_name))

Trajectory projection (`weave/trace_server/genai_chat_view.py`)

Read-time span → chat-view projection (never persisted)
Depth-first walk branching on operation_name → produces user_message, agent_start, agent_message, tool_call, agent_handoff, context_compacted
Multi-turn conversation composition via conversation_id

Read APIs

/genai/spans/query, /genai/spans/trace, /genai/spans/active
/genai/traces/chat, /genai/conversations/chat
/genai/agents/query, /genai/agents/metrics
/genai/conversations/query
/genai/annotations/upsert, /genai/annotations/delete, /genai/annotations/query

SDK instrumentations (`weave/otel/instrumentors/`)

OpenAI Agents SDK — auto-discovers agent instructions, tools, handoffs; patches reasoning token capture and context compaction tracking
Google ADK — auto-discovers agent instructions, tools, sub-agents; patches Gemini media capture
Anthropic — Claude SDK instrumentation
setup_tracing() one-call OTel configuration with instrument() per framework
ConversationIdInjector, SystemPromptInjector, ToolDefinitionsInjector span processors
log_content() for media capture (images, audio)
LiveSpanProcessor for real-time span-start notifications

Daemon for IDE agents (`weave/agent_hooks/`)

Out-of-process tracing for Cursor, Claude Code, and other standalone agents
Relay (stdlib-only Python) — thin stdin→HTTP forwarder invoked by IDE hooks
Daemon — long-running HTTP server that builds OTel spans from hook events
SpanBuilder — translates normalized events into OTel span lifecycle
Produces same OTLP protobuf as SDK path — trace server can't distinguish the source

Format interop

ATIF → OTel adapter (reference implementation in examples/otel_genai/)

Design docs (`docs/design/`)

architecture.md — system overview
data_model.md — span patterns, normalized schema, vendor fallback chains
chat_view_algorithm.md — trajectory projection algorithm
instrumentation_guide.md — how to emit data (SDK + daemon)
format_interoperability.md — adapter architecture + ATIF reference
design_review.html — presentation slides
design_review_discussion.md — Notion-ready discussion doc

Status

This is a spike / draft — not intended for merge as-is. The purpose is to validate the approach and inform a design discussion about how this relates to Weave's existing call model.

…nted with otel

wandbot-3000 · 2026-03-18T06:04:42Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=fa4a72bab58ea593023d12402f2158598c423f2c

codecov · 2026-03-18T06:08:16Z

Codecov Report

❌ Patch coverage is 16.23437% with 4489 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
weave/agent_hooks/span_builder.py	0.00%	690 Missing ⚠️
...ve/trace_server/clickhouse_trace_server_batched.py	10.22%	641 Missing ⚠️
weave/otel/instrumentors/openai_agents.py	0.00%	517 Missing ⚠️
...ave/trace_server/opentelemetry/genai_extraction.py	11.36%	343 Missing ⚠️
weave/otel/instrumentors/claude_agent_sdk.py	0.00%	293 Missing ⚠️
weave/trace_server/genai_chat_view.py	12.38%	198 Missing ⚠️
weave/agent_hooks/daemon.py	0.00%	151 Missing ⚠️
weave/otel/instrumentors/google_adk.py	0.00%	137 Missing ⚠️
weave/agent_hooks/installer.py	0.00%	122 Missing ⚠️
weave/agents/conversation.py	0.00%	118 Missing ⚠️
... and 30 more

📢 Thoughts on this report? Let us know!

bcsherma added 6 commits March 17, 2026 15:34

adding test scripts for google, openai, anthropic agent sdks instrume…

41b654f

…nted with otel

adding apis and stuff

3c6e87a

added basic ui, fixed examples

ef620fc

fixing up examples

d031dc7

media support

d623d71

live updates

58f1a18

bcsherma added 12 commits March 17, 2026 23:09

cleanup example

d6b48e1

system prompts

6fa7bef

don't inline stuff

f5cce42

chat representation comes from api

3511d3e

agents queries

1c3dee4

conversations api

9a9f1af

reasoning + atif -> otel

490da58

daemon for cursor integration

afa09c4

fixed cursor integration

3f48cd4

capture cursor imgs

362485e

Merge remote-tracking branch 'origin/master' into ben/agent-data-model

5c1165b

docs

b98be30

bcsherma force-pushed the ben/agent-data-model branch from 17b81d9 to b98be30 Compare March 20, 2026 00:22

bcsherma added 4 commits March 19, 2026 17:52

cleanup

21d3b04

entity attribute fun

fc60e9d

sort n filter

b48a3ae

migration consolidation

9851c4c

bcsherma force-pushed the ben/agent-data-model branch from a7f2440 to 9851c4c Compare March 20, 2026 19:33

bcsherma added 3 commits March 23, 2026 17:50

claude code working pretty flippen well

5076a42

fixed a lot fo stuff

6253eb7

more docs, rough claude sdk

e5f404a

bcsherma changed the title ~~feat(weave): OTel-native GenAI agent data model with trajectory capture~~ feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations Mar 24, 2026

bcsherma added 12 commits March 24, 2026 16:58

docs

83bdb29

signulls

76dfa6f

docs

950f93e

serach

9a81349

structured ingest example

48d7848

moar atif examples

6420afc

agents restructure

9e8be4c

stats apis

5665d31

removing docs

8a7b1fd

load generation

5752884

separate llm call spans in structured ingest api

c47c9e5

fix score/annotation mixup

55e84ea

bcsherma closed this Apr 12, 2026

github-actions bot locked and limited conversation to collaborators Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations#6377

feat(weave): GenAI agent data model — OTel GenAI ingest, ClickHouse schema, normalization, trajectory projection, daemon, and SDK instrumentations#6377
bcsherma wants to merge 37 commits intomasterfrom
ben/agent-data-model

bcsherma commented Mar 18, 2026 •

edited

Loading

Uh oh!

wandbot-3000 bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bcsherma commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this branch

ClickHouse schema (migrations/026_genai.up.sql)

Ingest + normalization (weave/trace_server/opentelemetry/)

Trajectory projection (weave/trace_server/genai_chat_view.py)

Read APIs

SDK instrumentations (weave/otel/instrumentors/)

Daemon for IDE agents (weave/agent_hooks/)

Format interop

Design docs (docs/design/)

Status

Uh oh!

wandbot-3000 bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bcsherma commented Mar 18, 2026 •

edited

Loading

ClickHouse schema (`migrations/026_genai.up.sql`)

Ingest + normalization (`weave/trace_server/opentelemetry/`)

Trajectory projection (`weave/trace_server/genai_chat_view.py`)

SDK instrumentations (`weave/otel/instrumentors/`)

Daemon for IDE agents (`weave/agent_hooks/`)

Design docs (`docs/design/`)

wandbot-3000 bot commented Mar 18, 2026 •

edited

Loading

codecov bot commented Mar 18, 2026 •

edited

Loading