Skip to content

LiteLLM + AgentOps integration: Provider-specific tracking issue with Anthropic models #1079

@devin-ai-integration

Description

@devin-ai-integration

LiteLLM + AgentOps integration: Provider-specific tracking issue with Anthropic models

Summary

When using LiteLLM's success_callback = ["agentops"] feature, LLM events appear correctly in AgentOps dashboard traces for OpenAI models (e.g., GPT-4o) but do NOT appear for Anthropic models (e.g., Claude 3.5 Sonnet). This is a provider-specific tracking issue, not a universal dependency problem.

Root Cause

The issue is provider-specific and affects how LiteLLM's callback system interacts with AgentOps' existing instrumentation:

  • OpenAI models: LLM events appear correctly in AgentOps dashboard traces when using litellm.success_callback = ["agentops"]
  • Anthropic models: LLM events do NOT appear in AgentOps dashboard traces when using the same callback configuration
  • API calls succeed: Both providers complete API calls successfully, but only OpenAI events are tracked in AgentOps
  • Silent failure: No error messages indicate the tracking problem - events simply don't appear in dashboard traces

Issue Details

The problem manifests as missing LLM events in AgentOps dashboard traces for Anthropic models:

  • OpenAI behavior: When using litellm.completion(model="gpt-4o", ...) with success_callback = ["agentops"], LLM events appear in the AgentOps dashboard trace
  • Anthropic behavior: When using litellm.completion(model="anthropic/claude-3-5-sonnet-20240620", ...) with the same callback, LLM events do NOT appear in the dashboard trace
  • API calls work: Both providers successfully complete API calls and return responses
  • Sessions created: AgentOps sessions are created for both providers, but only OpenAI sessions show LLM events
  • Silent failure: No error messages or warnings indicate that Anthropic events are not being tracked
  • Instrumentation conflict: Likely caused by conflicts between LiteLLM's callback system and AgentOps' existing Anthropic instrumentation

Steps to Reproduce

  1. Install AgentOps and LiteLLM:
pip install agentops litellm openai anthropic
  1. Set up environment variables:
export AGENTOPS_API_KEY="your_agentops_key"
export OPENAI_API_KEY="your_openai_key"
export ANTHROPIC_API_KEY="your_anthropic_key"
  1. Create and run the reproduction script:
#!/usr/bin/env python3
import litellm
import agentops

def test_provider(provider_name, model, message):
    print(f"\n=== Testing {provider_name} ===")
    
    agentops.init(auto_start_session=False)
    tracer = agentops.start_trace(trace_name=f"{provider_name} Test", tags=[f"{provider_name.lower()}-test"])
    
    litellm.success_callback = ["agentops"]
    
    try:
        response = litellm.completion(
            model=model,
            messages=[{"role": "user", "content": message}],
            max_tokens=30
        )
        
        print(f"✅ {provider_name} API call successful")
        print(f"   Response: {response.choices[0].message.content}")
        
        agentops.end_trace(tracer, end_state="Success")
        print(f"   ^ Check AgentOps dashboard - {provider_name} events should {'appear' if provider_name == 'OpenAI' else 'be MISSING'}")
        
    except Exception as e:
        print(f"❌ {provider_name} Error: {e}")
        agentops.end_trace(tracer, end_state="Fail")

# Test OpenAI (should show LLM events in dashboard)
test_provider("OpenAI", "gpt-4o", "Say hello from OpenAI!")

# Test Anthropic (LLM events will be missing from dashboard)
test_provider("Anthropic", "anthropic/claude-3-5-sonnet-20240620", "Say hello from Anthropic!")
  1. Run the script:
python reproduce_script.py
  1. Check the AgentOps dashboard sessions:
    • OpenAI session: Should show LLM events in the trace
    • Anthropic session: Should NOT show LLM events in the trace (events missing)

Expected Behavior

  • LLM calls made through LiteLLM with success_callback = ["agentops"] should be tracked and visible in the AgentOps dashboard for ALL supported providers
  • Both OpenAI and Anthropic models should show LLM events in AgentOps dashboard traces
  • The callback integration should work consistently across different LLM providers

Actual Behavior

  • OpenAI models: LLM events appear correctly in AgentOps dashboard traces ✅
  • Anthropic models: LLM events do NOT appear in AgentOps dashboard traces ❌
  • API calls succeed for both providers, but tracking behavior differs
  • No error messages indicate the tracking failure for Anthropic models
  • AgentOps sessions are created for both providers, but only OpenAI sessions contain LLM events

Environment

  • AgentOps version: 0.4.14
  • LiteLLM version: 1.72.6
  • Python version: 3.12
  • Testing confirmed with valid API keys for both providers

Potential Solutions

Option 1: Fix LiteLLM callback integration for Anthropic models (Recommended)

Investigate and fix the conflict between LiteLLM's callback system and AgentOps' existing Anthropic instrumentation:

  • Examine how LiteLLM's success_callback = ["agentops"] interacts with AgentOps' direct Anthropic instrumentation
  • Ensure callback events are properly forwarded to AgentOps for Anthropic models
  • Test that the fix doesn't break existing direct AgentOps instrumentation

Option 2: Update AgentOps instrumentation priority

Modify AgentOps' instrumentation system to properly handle LiteLLM callback events:

  • Ensure LiteLLM callback events take precedence over direct instrumentation when both are present
  • Add proper event deduplication to prevent conflicts between callback and direct instrumentation
  • Update instrumentation order to prioritize callback-based tracking when configured

Option 3: Document the limitation and provide workaround

If the integration conflict cannot be easily resolved:

  • Document that LiteLLM callback integration doesn't work with Anthropic models
  • Recommend using AgentOps' direct Anthropic instrumentation instead of LiteLLM callback for Anthropic models
  • Provide clear guidance on when to use callback vs. direct instrumentation

Impact

  • Users cannot reliably use LiteLLM's success_callback = ["agentops"] feature for Anthropic models
  • Anthropic LLM calls made through LiteLLM are not tracked in AgentOps, leading to incomplete observability
  • The tracking failure is silent - no error messages indicate that Anthropic events are missing
  • Users may not realize their Anthropic models are not being tracked until they check the dashboard
  • Mixed provider applications will have inconsistent tracking (OpenAI tracked, Anthropic not tracked)
  • This affects observability and monitoring for applications using multiple LLM providers through LiteLLM

Additional Notes

  • The issue is provider-specific: OpenAI models work correctly, Anthropic models do not
  • AgentOps' direct instrumentation works correctly for both providers when not using LiteLLM callback
  • The LiteLLM integration test in AgentOps is currently skipped with reason "TODO: instrumentation for callback handlers and external integrations"
  • This suggests the callback integration has known limitations that need to be addressed
  • The tracking failure is silent - no error messages or warnings indicate the problem
  • Both providers successfully complete API calls, but only OpenAI events appear in AgentOps dashboard traces
  • This likely indicates a conflict between LiteLLM's callback system and AgentOps' existing Anthropic instrumentation

Reproduction Script

A complete reproduction script is available that demonstrates the provider-specific tracking behavior difference. The script tests both OpenAI and Anthropic models with LiteLLM callback integration and generates AgentOps session URLs for dashboard verification.

Test Results

When running the reproduction script:

  • OpenAI test: API call succeeds, LLM events appear in AgentOps dashboard trace
  • Anthropic test: API call succeeds, but LLM events do NOT appear in AgentOps dashboard trace
  • Dashboard verification: Checking the session URLs confirms the tracking behavior difference

This demonstrates that the issue is provider-specific and affects dashboard trace visibility, not API call success.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions