feat: Extract LLM calls to Node.js Claude service

## Summary

Replace custom Anthropic API calls in the Go worker with a dedicated Node.js service using the official Claude SDK. This enables access to SDK-exclusive features (prompt caching, better streaming, extended thinking) while preserving the existing Go infrastructure.

## Background

The current Go worker (~19K LOC) handles:
- **LLM calls** (~4K LOC in `pkg/llm/`) - Anthropic Claude + Groq
- **Helm SDK integration** - Chart parsing, rendering, validation
- **PostgreSQL persistence** - Workspaces, plans, files, chat history
- **Real-time updates** - Centrifugo for streaming to frontend
- **Queue/listener infrastructure** - Event-driven processing

The Anthropic Go SDK is community-maintained and lacks features available in the official Node.js/Python SDKs:
- Prompt caching
- Extended thinking (Claude 3.7)
- Better streaming primitives
- Faster feature parity with API releases

## Proposed Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                        Go Worker                             │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  Helm SDK    │  │  PostgreSQL  │  │  Centrifugo  │       │
│  │  Rendering   │  │  Persistence │  │  Real-time   │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │              LLM Client (HTTP)                    │       │
│  │  • Calls Node service for Claude                  │       │
│  │  • Handles streaming via SSE/WebSocket            │       │
│  │  • Groq calls remain in Go (optional)             │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ HTTP/SSE
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Node.js Claude Service                    │
│                                                              │
│  ┌──────────────────────────────────────────────────┐       │
│  │           Official @anthropic-ai/sdk              │       │
│  │                                                   │       │
│  │  • Streaming responses (SSE to Go)               │       │
│  │  • Prompt caching support                         │       │
│  │  • Extended thinking (Claude 3.7)                │       │
│  │  • Tool use / function calling                    │       │
│  └──────────────────────────────────────────────────┘       │
│                                                              │
│  Endpoints:                                                  │
│  POST /v1/messages          - Standard completion           │
│  POST /v1/messages/stream   - Streaming completion          │
│  GET  /health               - Health check                  │
└─────────────────────────────────────────────────────────────┘
```

## API Interface

### Request (Go → Node)

```typescript
interface ClaudeRequest {
  model: string;                    // "claude-3-7-sonnet-20250219"
  system?: string;                  // System prompt
  messages: Message[];              // Conversation history
  max_tokens: number;
  tools?: Tool[];                   // For tool use
  stream?: boolean;                 // Enable streaming
  
  // SDK-specific features
  prompt_caching?: {
    cache_system?: boolean;         // Cache system prompt
    cache_tools?: boolean;          // Cache tool definitions
  };
  thinking?: {                      // Extended thinking (3.7)
    enabled: boolean;
    budget_tokens?: number;
  };
}

interface Message {
  role: "user" | "assistant";
  content: string | ContentBlock[];
}
```

### Response (Node → Go)

**Non-streaming:**
```typescript
interface ClaudeResponse {
  id: string;
  content: ContentBlock[];
  model: string;
  stop_reason: string;
  usage: {
    input_tokens: number;
    output_tokens: number;
    cache_creation_input_tokens?: number;
    cache_read_input_tokens?: number;
  };
}
```

**Streaming (SSE):**
```
event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}
```

## Migration Plan

### Phase 1: Create Node Service (New Package)
- [ ] Create `claude-service/` directory
- [ ] Set up Express/Fastify server
- [ ] Implement `/v1/messages` endpoint (non-streaming)
- [ ] Implement `/v1/messages/stream` endpoint (SSE)
- [ ] Add health check endpoint
- [ ] Docker container setup
- [ ] Add to docker-compose for local dev

### Phase 2: Go Client Abstraction
- [ ] Create `pkg/llm/claude/client.go` - HTTP client for Node service
- [ ] Implement streaming SSE consumer in Go
- [ ] Add circuit breaker / retry logic
- [ ] Environment config for service URL

### Phase 3: Migrate LLM Calls (One at a Time)
Files to migrate in `pkg/llm/`:

| File | Priority | Streaming | Notes |
|------|----------|-----------|-------|
| `conversational.go` | High | Yes | Main chat flow |
| `execute-action.go` | High | Yes | Core action execution |
| `execute-plan.go` | High | Yes | Plan execution |
| `initial-plan.go` | High | Yes | Plan creation |
| `plan.go` | High | Yes | Plan updates |
| `expand.go` | Medium | No | Prompt expansion |
| `cleanup-converted-values.go` | Medium | No | Values cleanup |
| `convert-file.go` | Medium | No | File conversion |
| `summarize.go` | Low | No | Can stay Groq |
| `intent.go` | Low | No | Uses Groq, can stay |

### Phase 4: Enable SDK Features
- [ ] Implement prompt caching for system prompts
- [ ] Add extended thinking support for complex operations
- [ ] Optimize tool definitions caching

### Phase 5: Cleanup
- [ ] Remove direct Anthropic Go SDK dependency
- [ ] Update deployment configs (Helm chart, Terraform)
- [ ] Update documentation

## Deployment Considerations

### Local Development
```yaml
# docker-compose.yml addition
services:
  claude-service:
    build: ./claude-service
    ports:
      - "3100:3100"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
```

### Production
- Deploy as sidecar container in same pod (lowest latency)
- Or as separate service if horizontal scaling needed
- Health checks required for orchestration

## Open Questions

1. **Groq calls** - Move to Node too, or keep in Go?
2. **Embedding calls** - Currently using Voyage API, include in Node service?
3. **Protocol** - HTTP/SSE vs gRPC vs WebSocket for streaming?
4. **Caching layer** - Add Redis for prompt cache persistence across restarts?

## Success Metrics

- [ ] All Claude calls route through Node service
- [ ] Streaming latency within 50ms of direct API calls
- [ ] Prompt caching reduces token usage by 30%+ on repeated system prompts
- [ ] No regressions in existing functionality

## References

- [Anthropic Node.js SDK](https://github.com/anthropics/anthropic-sdk-node)
- [Prompt Caching Docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
- [Extended Thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Extract LLM calls to Node.js Claude service #202

Summary

Background

Proposed Architecture

API Interface

Request (Go → Node)

Response (Node → Go)

Migration Plan

Phase 1: Create Node Service (New Package)

Phase 2: Go Client Abstraction

Phase 3: Migrate LLM Calls (One at a Time)

Phase 4: Enable SDK Features

Phase 5: Cleanup

Deployment Considerations

Local Development

Production

Open Questions

Success Metrics

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Priority	Streaming	Notes
`conversational.go`	High	Yes	Main chat flow
`execute-action.go`	High	Yes	Core action execution
`execute-plan.go`	High	Yes	Plan execution
`initial-plan.go`	High	Yes	Plan creation
`plan.go`	High	Yes	Plan updates
`expand.go`	Medium	No	Prompt expansion
`cleanup-converted-values.go`	Medium	No	Values cleanup
`convert-file.go`	Medium	No	File conversion
`summarize.go`	Low	No	Can stay Groq
`intent.go`	Low	No	Uses Groq, can stay

feat: Extract LLM calls to Node.js Claude service #202

Description

Summary

Background

Proposed Architecture

API Interface

Request (Go → Node)

Response (Node → Go)

Migration Plan

Phase 1: Create Node Service (New Package)

Phase 2: Go Client Abstraction

Phase 3: Migrate LLM Calls (One at a Time)

Phase 4: Enable SDK Features

Phase 5: Cleanup

Deployment Considerations

Local Development

Production

Open Questions

Success Metrics

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions