Skip to content

feat: integrate Langfuse LLM observability via OTEL#3900

Open
crivetimihai wants to merge 20 commits intomainfrom
langfuse
Open

feat: integrate Langfuse LLM observability via OTEL#3900
crivetimihai wants to merge 20 commits intomainfrom
langfuse

Conversation

@crivetimihai
Copy link
Copy Markdown
Member

Summary

  • Add Langfuse as an LLM observability backend, receiving traces via standard OTLP/HTTP
  • New docker-compose.with-langfuse.yml overlay with 6 services (web, worker, PostgreSQL, ClickHouse, MinIO, Redis)
  • Makefile targets: make langfuse-up, langfuse-down, langfuse-status, langfuse-logs, langfuse-clean, langfuse-monitoring-up
  • Full documentation at docs/docs/manage/observability/langfuse.md
  • .env.example section with all Langfuse configuration variables

Design Decisions

  • Follows Phoenix overlay pattern (docker-compose.with-phoenix.yml precedent)
  • Separate infrastructure — Langfuse uses its own PostgreSQL, Redis, ClickHouse, MinIO to avoid coupling
  • OTLP/HTTP — Langfuse does not support gRPC; gateway override sets OTEL_EXPORTER_OTLP_PROTOCOL=http
  • Hyphenated service names — AWS S3 SDK rejects underscore hostnames (langfuse-minio not langfuse_minio)
  • No host port bindings for infrastructure — only Langfuse UI exposed on port 3100 (avoids conflicts with Grafana on 3000)
  • Auto-provisioned org, project, and API keys via LANGFUSE_INIT_* env vars — works out of the box

Test plan

  • docker compose config --quiet validates with Langfuse overlay
  • docker compose config --quiet --profile monitoring validates combined stack
  • All 6 Langfuse services start healthy (make langfuse-status)
  • Gateway OTEL exporter initializes and connects to Langfuse
  • Tool invocations generate traces visible in Langfuse API (/api/public/traces)
  • 13+ traces verified flowing from tool calls and health checks into Langfuse v3.162.0
  • make test-mcp-cli — 17/23 passed (5 failures pre-existing, unrelated to Langfuse)
  • No S3 upload errors after hostname fix
  • make langfuse-up/down/status/logs/clean targets all work

@crivetimihai
Copy link
Copy Markdown
Member Author

Implementation update for this branch.

This PR now covers the full Langfuse observability rollout across both the Python gateway path and the Rust MCP runtime path.

What changed:

  • Added Langfuse/OTEL integration to the gateway with fail-closed Langfuse auth validation, bounded and redacted payload capture, sanitized error export, and env-driven configuration rather than hardcoded credentials.
  • Completed trace-context propagation for user, session, auth, team, request, and correlation metadata, including team.name, Langfuse tags, and trace-name derivation.
  • Expanded Python-side span coverage across tool, prompt, resource, root, A2A, chat, and LLM proxy flows, including list operations and child spans.
  • Cleaned up the local Langfuse stack model so ContextForge remains externally configurable while the self-hosted Langfuse compose overlay keeps its own compose-local defaults and reset/cleanup flows.
  • Added native Rust observability support in tools_rust/mcp_runtime, including OTEL initialization, root/child span creation, parity metadata injection, sanitized payload capture, and error/status mapping.
  • Removed the observability-specific Python fallback for Rust tools/call, so traced MCP tool execution now stays on the Rust fast path.
  • Fixed the local Langfuse stack so make langfuse-up also starts the lightweight helper services needed for live MCP smoke traffic, which makes it usable for real trace validation.
  • Closed a number of parity and stability issues found during deep review: prompt argument normalization, auth-context propagation on Rust forwarded paths, Langfuse header validation, string-list redaction gaps, helper-row compatibility in batched auth resolution, and Rust span failure marking for resolve-time JSON-RPC errors.

Validation run on the current branch:

  • make flake8
  • make bandit
  • make interrogate
  • make pylint
  • cargo check --manifest-path tools_rust/mcp_runtime/Cargo.toml
  • make -C tools_rust/mcp_runtime fmt-check
  • make -C tools_rust/mcp_runtime check
  • make -C tools_rust/mcp_runtime check-all-targets
  • make -C tools_rust/mcp_runtime clippy
  • make -C tools_rust/mcp_runtime clippy-all
  • make -C tools_rust/mcp_runtime test
  • make -C tools_rust/mcp_runtime test-rmcp
  • make -C tools_rust/mcp_runtime doc-test
  • make -C tools_rust/mcp_runtime coverage
  • cargo test --release --manifest-path tools_rust/mcp_runtime/Cargo.toml
  • make doctest
  • make verify
  • make test
  • make testing-rebuild-rust-full
  • make test-mcp-cli
  • make test-mcp-rbac
  • make test-mcp-access-matrix
  • make test-mcp-session-isolation
  • make test-mcp-session-isolation-load MCP_ISOLATION_LOAD_RUN_TIME=30s
  • make test-ui-ci-smoke
  • make test-ui-headless PLAYWRIGHT_TEST_TARGET='tests/playwright/ -m smoke'
  • RUST_MCP_MODE=full RUST_MCP_LOG=info make langfuse-up
  • make test-mcp-cli against the Langfuse-backed Rust stack
  • LANGFUSE_PUBLIC_KEY=pk-lf-contextforge LANGFUSE_SECRET_KEY=sk-lf-contextforge uv run pytest tests/e2e/test_langfuse_traces.py -q
  • make benchmark-mcp-tools-300 MCP_BENCHMARK_HIGH_USERS=1000 MCP_BENCHMARK_HIGH_RUN_TIME=60s

Latest broad suite result on this branch:

  • 15544 passed, 540 skipped, 4 xfailed

Runtime outcome:

  • the rebuilt Rust full stack is healthy and serves the public MCP edge in rust-managed mode
  • live MCP, RBAC, session-isolation, UI smoke, and Langfuse ingestion all work on the current branch
  • Rust now emits native observability data instead of depending on the old observability-triggered Python tool-call fallback

Add Langfuse as an observability backend for ContextForge, receiving
traces via standard OTLP/HTTP. Follows the existing Phoenix overlay
pattern with a dedicated docker-compose overlay file.

New files:
- docker-compose.with-langfuse.yml: Complete overlay with 6 services
  (langfuse-web, langfuse-worker, langfuse-db, langfuse-clickhouse,
  langfuse-minio, langfuse-cache) using separate infrastructure to
  avoid coupling with ContextForge databases
- docs/docs/manage/observability/langfuse.md: Full integration guide

Modified files:
- Makefile: 7 new targets (langfuse-up/down/status/logs/clean,
  langfuse-monitoring-up/down)
- .env.example: Langfuse configuration section with all tunable vars
- docker-compose.yml: Document Langfuse overlay in profiles header
- docs/docs/manage/observability/.pages: Add to docs navigation
- docs/docs/manage/observability/observability.md: Add as recommended
  backend option

Key design decisions:
- OTLP/HTTP protocol (Langfuse does not support gRPC)
- Hyphenated service names (AWS S3 SDK rejects underscore hostnames)
- No host port bindings for infrastructure (only UI on port 3100)
- Auto-provisioned org/project/keys via LANGFUSE_INIT_* env vars
- Health check uses $(hostname) (Next.js binds to container hostname)

Tested with 13+ traces flowing from tool invocations and gateway
health checks into Langfuse v3.162.0.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai added wxo wxo integration release-fix Critical bugfix required for the release labels Mar 29, 2026
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-fix Critical bugfix required for the release wxo wxo integration

Projects

None yet

1 participant