Skip to content

feat: Extract LLM calls to Node.js Claude service #202

@marccampbell

Description

@marccampbell

Summary

Replace custom Anthropic API calls in the Go worker with a dedicated Node.js service using the official Claude SDK. This enables access to SDK-exclusive features (prompt caching, better streaming, extended thinking) while preserving the existing Go infrastructure.

Background

The current Go worker (~19K LOC) handles:

  • LLM calls (~4K LOC in pkg/llm/) - Anthropic Claude + Groq
  • Helm SDK integration - Chart parsing, rendering, validation
  • PostgreSQL persistence - Workspaces, plans, files, chat history
  • Real-time updates - Centrifugo for streaming to frontend
  • Queue/listener infrastructure - Event-driven processing

The Anthropic Go SDK is community-maintained and lacks features available in the official Node.js/Python SDKs:

  • Prompt caching
  • Extended thinking (Claude 3.7)
  • Better streaming primitives
  • Faster feature parity with API releases

Proposed Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Go Worker                             │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  Helm SDK    │  │  PostgreSQL  │  │  Centrifugo  │       │
│  │  Rendering   │  │  Persistence │  │  Real-time   │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │              LLM Client (HTTP)                    │       │
│  │  • Calls Node service for Claude                  │       │
│  │  • Handles streaming via SSE/WebSocket            │       │
│  │  • Groq calls remain in Go (optional)             │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ HTTP/SSE
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Node.js Claude Service                    │
│                                                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │           Official @anthropic-ai/sdk              │       │
│  │                                                   │       │
│  │  • Streaming responses (SSE to Go)               │       │
│  │  • Prompt caching support                         │       │
│  │  • Extended thinking (Claude 3.7)                │       │
│  │  • Tool use / function calling                    │       │
│  └──────────────────────────────────────────────────┘       │
│                                                              │
│  Endpoints:                                                  │
│  POST /v1/messages          - Standard completion           │
│  POST /v1/messages/stream   - Streaming completion          │
│  GET  /health               - Health check                  │
└─────────────────────────────────────────────────────────────┘

API Interface

Request (Go → Node)

interface ClaudeRequest {
  model: string;                    // "claude-3-7-sonnet-20250219"
  system?: string;                  // System prompt
  messages: Message[];              // Conversation history
  max_tokens: number;
  tools?: Tool[];                   // For tool use
  stream?: boolean;                 // Enable streaming
  
  // SDK-specific features
  prompt_caching?: {
    cache_system?: boolean;         // Cache system prompt
    cache_tools?: boolean;          // Cache tool definitions
  };
  thinking?: {                      // Extended thinking (3.7)
    enabled: boolean;
    budget_tokens?: number;
  };
}

interface Message {
  role: "user" | "assistant";
  content: string | ContentBlock[];
}

Response (Node → Go)

Non-streaming:

interface ClaudeResponse {
  id: string;
  content: ContentBlock[];
  model: string;
  stop_reason: string;
  usage: {
    input_tokens: number;
    output_tokens: number;
    cache_creation_input_tokens?: number;
    cache_read_input_tokens?: number;
  };
}

Streaming (SSE):

event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

Migration Plan

Phase 1: Create Node Service (New Package)

  • Create claude-service/ directory
  • Set up Express/Fastify server
  • Implement /v1/messages endpoint (non-streaming)
  • Implement /v1/messages/stream endpoint (SSE)
  • Add health check endpoint
  • Docker container setup
  • Add to docker-compose for local dev

Phase 2: Go Client Abstraction

  • Create pkg/llm/claude/client.go - HTTP client for Node service
  • Implement streaming SSE consumer in Go
  • Add circuit breaker / retry logic
  • Environment config for service URL

Phase 3: Migrate LLM Calls (One at a Time)

Files to migrate in pkg/llm/:

File Priority Streaming Notes
conversational.go High Yes Main chat flow
execute-action.go High Yes Core action execution
execute-plan.go High Yes Plan execution
initial-plan.go High Yes Plan creation
plan.go High Yes Plan updates
expand.go Medium No Prompt expansion
cleanup-converted-values.go Medium No Values cleanup
convert-file.go Medium No File conversion
summarize.go Low No Can stay Groq
intent.go Low No Uses Groq, can stay

Phase 4: Enable SDK Features

  • Implement prompt caching for system prompts
  • Add extended thinking support for complex operations
  • Optimize tool definitions caching

Phase 5: Cleanup

  • Remove direct Anthropic Go SDK dependency
  • Update deployment configs (Helm chart, Terraform)
  • Update documentation

Deployment Considerations

Local Development

# docker-compose.yml addition
services:
  claude-service:
    build: ./claude-service
    ports:
      - "3100:3100"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}

Production

  • Deploy as sidecar container in same pod (lowest latency)
  • Or as separate service if horizontal scaling needed
  • Health checks required for orchestration

Open Questions

  1. Groq calls - Move to Node too, or keep in Go?
  2. Embedding calls - Currently using Voyage API, include in Node service?
  3. Protocol - HTTP/SSE vs gRPC vs WebSocket for streaming?
  4. Caching layer - Add Redis for prompt cache persistence across restarts?

Success Metrics

  • All Claude calls route through Node service
  • Streaming latency within 50ms of direct API calls
  • Prompt caching reduces token usage by 30%+ on repeated system prompts
  • No regressions in existing functionality

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions