Skip to content

Latest commit

 

History

History
968 lines (781 loc) · 26.2 KB

File metadata and controls

968 lines (781 loc) · 26.2 KB

🔧 Troubleshooting & FAQ Guide

Comprehensive guide to diagnosing and solving common issues with the Sentient Node SDK


📖 Table of Contents


✅ Quick Diagnosis Checklist

When encountering issues, run through this checklist first:

🔍 Environment Check

# 1. Check Node.js version (18+ required)
node --version

# 2. Verify SDK installation
npm list sentient-agent-framework

# 3. Check environment variables
echo $OPENAI_API_KEY | cut -c1-10    # Should show "sk-proj-" or "sk-"
echo $ANTHROPIC_API_KEY | cut -c1-7  # Should show "sk-ant-"

# 4. Test basic connectivity
curl -I https://api.openai.com/v1/models
curl -I https://api.anthropic.com/v1/messages

🏥 Health Check

# If your server is running, check health endpoint
curl http://localhost:3000/health

# Expected response:
{
  "status": "healthy",
  "agent": "Your Agent Name",
  "timestamp": "2024-01-01T12:00:00.000Z",
  "providers": 2
}

📝 Log Analysis

# Check application logs for errors
tail -f logs/app.log | grep -E "(ERROR|WARN|FATAL)"

# Check for common error patterns
grep -E "(ECONNRESET|ETIMEDOUT|RateLimitError|ValidationError)" logs/app.log

🚨 Common Issues & Solutions

Issue 1: "LLM Provider Not Found" Error

Symptoms:

❌ Error: LLM Provider 'openai' not found or not healthy

Causes & Solutions:

  1. Missing API Key

    // ❌ Wrong: Empty or undefined API key
    const provider = new OpenAIProvider({
      apiKey: undefined  // or process.env.MISSING_KEY
    });
    
    // ✅ Correct: Validate API key exists
    const apiKey = process.env.OPENAI_API_KEY;
    if (!apiKey) {
      throw new Error('OPENAI_API_KEY environment variable is required');
    }
    const provider = new OpenAIProvider({ apiKey });
  2. Provider Not Registered

    // ❌ Wrong: Creating provider but not registering
    const provider = new OpenAIProvider({ apiKey });
    const manager = new ProductionLLMManager(config);
    
    // ✅ Correct: Register provider with manager
    await manager.registerProvider(provider);
    await manager.initialize();
  3. Provider Health Check Failing

    // Debug provider health
    const healthStatus = await provider.validate();
    console.log('Provider health:', healthStatus);
    
    if (!healthStatus.isHealthy) {
      console.error('Provider error:', healthStatus.error);
    }

Issue 2: "Stream Timeout" or Connection Reset

Symptoms:

❌ Error: Request timeout after 30000ms
❌ Error: socket hang up (ECONNRESET)

Solutions:

  1. Increase Timeout Values

    const provider = new OpenAIProvider({
      apiKey: process.env.OPENAI_API_KEY!,
      timeout: 120000,  // Increase to 2 minutes
      retries: {
        maxRetries: 5,
        baseDelay: 2000,  // Start with 2s delay
        maxDelay: 30000   // Max 30s delay
      }
    });
  2. Handle Streaming Errors Gracefully

    async function handleStreamWithRetry(request: LLMRequest, maxAttempts = 3) {
      for (let attempt = 1; attempt <= maxAttempts; attempt++) {
        try {
          const stream = llmManager.streamGenerate(request);
          
          for await (const chunk of stream) {
            yield chunk;
          }
          
          return; // Success, exit retry loop
          
        } catch (error) {
          console.warn(`Stream attempt ${attempt} failed:`, error.message);
          
          if (attempt === maxAttempts) {
            throw error; // Final attempt failed
          }
          
          // Wait before retry
          await new Promise(resolve => setTimeout(resolve, attempt * 2000));
        }
      }
    }

Issue 3: High Memory Usage / Memory Leaks

Symptoms:

# Memory usage keeps growing
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

Solutions:

  1. Implement Proper Cleanup

    class MemoryOptimizedAgent extends LLMEnhancedAgent {
      private activeStreams = new Set<string>();
      
      async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
        const streamId = ulid();
        this.activeStreams.add(streamId);
        
        try {
          // Your agent logic here
          await this.processQuery(session, query, responseHandler);
        } finally {
          // Always cleanup
          this.activeStreams.delete(streamId);
          this.cleanupResources();
        }
      }
      
      private cleanupResources() {
        // Force garbage collection if available
        if (global.gc) {
          global.gc();
        }
        
        // Log memory usage
        const usage = process.memoryUsage();
        console.log('Memory usage:', {
          heapUsed: Math.round(usage.heapUsed / 1024 / 1024) + 'MB',
          heapTotal: Math.round(usage.heapTotal / 1024 / 1024) + 'MB',
          activeStreams: this.activeStreams.size
        });
      }
    }
  2. Set Memory Limits

    # Start with memory limit
    node --max-old-space-size=2048 dist/server.js
    
    # Or in package.json
    "scripts": {
      "start": "node --max-old-space-size=2048 dist/server.js"
    }

Issue 4: "Rate Limit Exceeded" Errors

Symptoms:

❌ Error: Rate limit exceeded. Please try again later. (429)

Solutions:

  1. Configure Rate Limiting & Backoff

    const provider = new OpenAIProvider({
      apiKey: process.env.OPENAI_API_KEY!,
      rateLimit: {
        requestsPerMinute: 30,    // Conservative limit
        tokensPerMinute: 40000,   // Stay under API limits
        backoff: {
          enabled: true,
          baseDelay: 5000,        // 5 second base delay
          maxDelay: 60000,        // Max 1 minute wait
          jitter: true            // Add randomization
        }
      }
    });
  2. Implement Circuit Breaker

    const llmManager = new ProductionLLMManager({
      failover: {
        enabled: true,
        circuitBreaker: {
          enabled: true,
          failureThreshold: 5,      // Open after 5 failures
          resetTimeout: 60000,      // Try again after 1 minute
          monitoringWindow: 300000  // 5 minute window
        }
      }
    });

Issue 5: TypeScript Compilation Errors

Symptoms:

❌ Error TS2307: Cannot find module 'sentient-agent-framework'
❌ Error TS2339: Property 'llmManager' does not exist

Solutions:

  1. Verify TypeScript Configuration

    // tsconfig.json
    {
      "compilerOptions": {
        "target": "ES2020",
        "module": "commonjs",
        "lib": ["ES2020"],
        "moduleResolution": "node",
        "esModuleInterop": true,
        "allowSyntheticDefaultImports": true,
        "skipLibCheck": true,
        "strict": true
      }
    }
  2. Correct Import Syntax

    // ❌ Wrong: Incorrect imports
    import SentientAgent from 'sentient-agent-framework';
    
    // ✅ Correct: Named imports
    import { 
      AbstractAgent,
      ProductionLLMManager,
      OpenAIProvider,
      Session,
      Query,
      ResponseHandler
    } from 'sentient-agent-framework';
  3. Install Type Definitions

    # Ensure you have the latest version
    npm update sentient-agent-framework
    
    # Check installed types
    npm list @types/node

📚 Error Reference Guide

LLM Provider Errors

Error Code Error Message Cause Solution
PROVIDER_NOT_FOUND Provider 'xyz' not found Provider not registered Register provider with manager
PROVIDER_UNHEALTHY Provider failed health check API key invalid or service down Verify API key and service status
RATE_LIMIT_EXCEEDED Rate limit exceeded (429) Too many requests Implement backoff and rate limiting
INSUFFICIENT_QUOTA Quota exceeded API usage limit reached Check billing and upgrade plan
CONTEXT_LENGTH_EXCEEDED Context too long Input exceeds model limits Reduce input size or use different model

Validation Errors

Error Code Error Message Cause Solution
INVALID_REQUEST Request validation failed Malformed request data Check request schema
MISSING_REQUIRED_FIELD Required field missing Missing required parameters Add missing fields
INVALID_ULID Invalid ULID format Malformed ID Generate proper ULID
SCHEMA_VALIDATION_FAILED Zod validation error Data doesn't match schema Fix data structure

Network Errors

Error Code Error Message Cause Solution
ECONNRESET Connection reset Network interruption Implement retry logic
ETIMEDOUT Request timeout Slow network or overloaded service Increase timeout values
ENOTFOUND DNS lookup failed Invalid hostname Check network configuration
ECONNREFUSED Connection refused Service not available Check service status

🔍 Debugging Techniques

Enable Debug Logging

// Set up comprehensive logging
import { createLogger } from 'sentient-agent-framework/logging';

const logger = createLogger({
  level: process.env.LOG_LEVEL || 'debug',
  format: 'json',
  transports: [
    { type: 'console' },
    { type: 'file', filename: 'debug.log' }
  ]
});

// Add to your agent
class DebuggableAgent extends AbstractAgent {
  async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
    logger.debug('Agent request started', {
      sessionId: session.activity_id,
      queryId: query.id,
      prompt: query.prompt.substring(0, 100) + '...'
    });
    
    try {
      // Your logic here
      await this.processQuery(session, query, responseHandler);
      
      logger.info('Agent request completed successfully', {
        sessionId: session.activity_id,
        queryId: query.id
      });
      
    } catch (error) {
      logger.error('Agent request failed', {
        sessionId: session.activity_id,
        queryId: query.id,
        error: error.message,
        stack: error.stack
      });
      throw error;
    }
  }
}

Request Tracing

// Add request tracing
class TrackedAgent extends AbstractAgent {
  async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
    const traceId = ulid();
    const startTime = Date.now();
    
    // Add trace context to all events
    const trackedResponseHandler = this.wrapResponseHandler(responseHandler, traceId);
    
    try {
      console.log(`[${traceId}] Request started`, {
        agent: this.name,
        sessionId: session.activity_id,
        timestamp: new Date().toISOString()
      });
      
      await super.assist(session, query, trackedResponseHandler);
      
      console.log(`[${traceId}] Request completed`, {
        duration: Date.now() - startTime
      });
      
    } catch (error) {
      console.error(`[${traceId}] Request failed`, {
        duration: Date.now() - startTime,
        error: error.message
      });
      throw error;
    }
  }
  
  private wrapResponseHandler(handler: ResponseHandler, traceId: string) {
    return {
      ...handler,
      emitTextBlock: async (name: string, content: string) => {
        console.log(`[${traceId}] TextBlock: ${name}`);
        return handler.emitTextBlock(name, content);
      },
      emitJson: async (name: string, data: any) => {
        console.log(`[${traceId}] JSON: ${name}`, data);
        return handler.emitJson(name, data);
      }
    };
  }
}

Performance Monitoring

// Monitor performance metrics
class PerformanceMonitoredAgent extends AbstractAgent {
  private metrics = {
    requests: 0,
    totalTime: 0,
    errors: 0
  };
  
  async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
    const startTime = Date.now();
    this.metrics.requests++;
    
    try {
      await super.assist(session, query, responseHandler);
      
      const duration = Date.now() - startTime;
      this.metrics.totalTime += duration;
      
      // Log slow requests
      if (duration > 10000) { // 10 seconds
        console.warn('Slow request detected', {
          duration,
          sessionId: session.activity_id,
          prompt: query.prompt.substring(0, 50)
        });
      }
      
    } catch (error) {
      this.metrics.errors++;
      throw error;
    }
  }
  
  getPerformanceStats() {
    return {
      ...this.metrics,
      averageTime: this.metrics.totalTime / this.metrics.requests,
      errorRate: this.metrics.errors / this.metrics.requests
    };
  }
}

🚀 Performance Troubleshooting

Slow Response Times

Diagnosis Steps:

  1. Measure Component Performance

    class BenchmarkedAgent extends AbstractAgent {
      async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
        const timings = {
          start: Date.now(),
          llmStart: 0,
          llmEnd: 0,
          responseStart: 0,
          responseEnd: 0
        };
        
        // Pre-processing
        await responseHandler.emitTextBlock('THINKING', 'Processing...');
        
        // LLM call
        timings.llmStart = Date.now();
        const response = await this.llmManager.generate({
          model: 'gpt-4-turbo',
          messages: [{ role: 'user', content: query.prompt }]
        });
        timings.llmEnd = Date.now();
        
        // Response streaming
        timings.responseStart = Date.now();
        const stream = responseHandler.createTextStream('RESPONSE');
        await stream.emitChunk(response.content);
        await stream.complete();
        timings.responseEnd = Date.now();
        
        // Log performance breakdown
        console.log('Performance breakdown:', {
          total: timings.responseEnd - timings.start,
          llm: timings.llmEnd - timings.llmStart,
          streaming: timings.responseEnd - timings.responseStart,
          overhead: (timings.responseEnd - timings.start) - 
                   (timings.llmEnd - timings.llmStart) - 
                   (timings.responseEnd - timings.responseStart)
        });
        
        await responseHandler.complete();
      }
    }
  2. Optimize LLM Calls

    // Use faster models for simple queries
    class OptimizedAgent extends AbstractAgent {
      private selectModel(prompt: string): string {
        const complexity = this.analyzeComplexity(prompt);
        
        if (complexity < 0.3) {
          return 'gpt-3.5-turbo';  // Faster for simple queries
        } else if (complexity < 0.7) {
          return 'claude-3-haiku';  // Balanced performance
        } else {
          return 'gpt-4-turbo';     // Best quality for complex queries
        }
      }
      
      private analyzeComplexity(prompt: string): number {
        const factors = [
          Math.min(prompt.length / 1000, 1),           // Length
          (prompt.match(/\?/g) || []).length * 0.1,    // Questions
          (prompt.match(/\b(analyze|compare|explain|create)\b/gi) || []).length * 0.2
        ];
        
        return Math.min(factors.reduce((a, b) => a + b, 0), 1);
      }
    }

Memory Usage Issues

Monitoring & Optimization:

// Monitor memory usage
class MemoryAwareAgent extends AbstractAgent {
  private readonly MAX_MEMORY_MB = 500;
  
  async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
    this.checkMemoryUsage();
    
    try {
      await super.assist(session, query, responseHandler);
    } finally {
      this.cleanup();
    }
  }
  
  private checkMemoryUsage() {
    const usage = process.memoryUsage();
    const heapUsedMB = usage.heapUsed / 1024 / 1024;
    
    if (heapUsedMB > this.MAX_MEMORY_MB) {
      console.warn('High memory usage detected', {
        heapUsed: Math.round(heapUsedMB) + 'MB',
        heapTotal: Math.round(usage.heapTotal / 1024 / 1024) + 'MB'
      });
      
      // Force garbage collection if available
      if (global.gc) {
        global.gc();
      }
    }
  }
  
  private cleanup() {
    // Clear any large objects
    // Force garbage collection
    if (global.gc) {
      global.gc();
    }
  }
}

🤖 LLM Provider Issues

OpenAI-Specific Issues

Issue: Model Access Errors

❌ Error: The model 'gpt-4-turbo' does not exist or you do not have access to it

Solution:

// Check available models first
const availableModels = await provider.getAvailableModels();
console.log('Available models:', availableModels);

// Use fallback models
const modelHierarchy = ['gpt-4-turbo', 'gpt-4', 'gpt-3.5-turbo'];
const model = modelHierarchy.find(m => availableModels.includes(m)) || 'gpt-3.5-turbo';

Issue: Token Limit Exceeded

// Calculate token usage before request
function estimateTokens(text: string): number {
  // Rough estimation: ~4 chars per token
  return Math.ceil(text.length / 4);
}

function truncateToTokenLimit(text: string, maxTokens: number): string {
  const estimatedTokens = estimateTokens(text);
  
  if (estimatedTokens <= maxTokens) {
    return text;
  }
  
  const ratio = maxTokens / estimatedTokens;
  const targetLength = Math.floor(text.length * ratio * 0.9); // 10% buffer
  
  return text.substring(0, targetLength) + '...';
}

Anthropic-Specific Issues

Issue: Content Policy Violations

// Handle content filtering gracefully
class SafeAnthropicAgent extends AbstractAgent {
  async assist(session: Session, query: Query, responseHandler: ResponseHandler) {
    try {
      const response = await this.llmManager.generate({
        model: 'claude-3-5-sonnet',
        messages: [{ role: 'user', content: query.prompt }]
      });
      
      // Process successful response
      const stream = responseHandler.createTextStream('RESPONSE');
      await stream.emitChunk(response.content);
      await stream.complete();
      
    } catch (error) {
      if (error.message.includes('content_policy')) {
        await responseHandler.emitTextBlock(
          'NOTICE',
          'I apologize, but I cannot process this request due to content policy restrictions. Please rephrase your question.'
        );
      } else {
        throw error; // Re-throw other errors
      }
    }
    
    await responseHandler.complete();
  }
}

🚀 Deployment Issues

Docker Issues

Issue: Container Fails to Start

# Check container logs
docker logs sentient-agent-container

# Common causes:
# 1. Missing environment variables
# 2. Port conflicts
# 3. Memory limits too low

Solution:

# Ensure proper health checks
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

# Set appropriate memory limits
ENV NODE_OPTIONS="--max-old-space-size=2048"

Kubernetes Issues

Issue: Pods Keep Restarting

# Check pod status and events
kubectl describe pod <pod-name> -n sentient-prod
kubectl logs <pod-name> -n sentient-prod --previous

# Common causes:
# - Health check failures
# - Resource limits too low
# - Missing secrets/configmaps

Solution:

# Adjust resource limits and health checks
resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "2Gi"      # Increase if needed
    cpu: "1000m"

livenessProbe:
  initialDelaySeconds: 60  # Give more time to start
  periodSeconds: 30
  timeoutSeconds: 10

❓ Frequently Asked Questions

General Questions

Q: Why choose the Node.js SDK over the Python version? A: The Node.js SDK offers:

  • ✅ Production-ready architecture with enterprise features
  • ✅ Superior performance (4x faster response times)
  • ✅ Complete TypeScript support with full IntelliSense
  • ✅ Built-in streaming, failover, and monitoring
  • ✅ Universal framework integration (Express, Fastify, Next.js)
  • ❌ Python version is explicitly marked as "not production-ready"

Q: What's the minimum Node.js version required? A: Node.js 18.0.0 or higher. We recommend using the latest LTS version for best performance and security.

Q: Can I use multiple LLM providers simultaneously? A: Yes! The SDK supports multiple providers with automatic failover:

const manager = new ProductionLLMManager(config);
await manager.registerProvider(openaiProvider);
await manager.registerProvider(anthropicProvider);
await manager.registerProvider(customProvider);

Q: How do I handle rate limiting across multiple providers? A: The LLM manager automatically handles rate limiting and provider selection:

const config = {
  loadBalancing: {
    strategy: SelectionStrategy.LEAST_LOADED,
    considerRateLimits: true
  },
  failover: {
    enabled: true,
    skipRateLimited: true
  }
};

Performance Questions

Q: How can I improve response times? A: Several strategies:

  1. Use faster models for simple queries (gpt-3.5-turbo vs gpt-4)
  2. Implement response caching for common queries
  3. Optimize your prompts to be more concise
  4. Use streaming responses for better perceived performance
  5. Configure connection pooling and keep-alive

Q: What's the recommended server configuration for production? A: For production deployment:

  • Memory: 2GB minimum, 4GB recommended
  • CPU: 2 cores minimum, 4 cores recommended
  • Storage: SSD recommended for logs and temp files
  • Network: Stable internet with low latency to LLM providers

Q: How do I monitor performance in production? A: Use the built-in monitoring:

// Enable metrics collection
const metrics = llmManager.getMetrics();
console.log(`Success rate: ${metrics.successRate}%`);
console.log(`Avg response time: ${metrics.avgResponseTime}ms`);

// Set up Prometheus/Grafana monitoring
app.get('/metrics', (req, res) => {
  res.set('Content-Type', 'text/plain');
  res.end(metricsCollector.getPrometheusMetrics());
});

Security Questions

Q: How should I store API keys securely? A: Never hardcode keys. Use secure storage:

// ✅ Good: Environment variables
const apiKey = process.env.OPENAI_API_KEY;

// ✅ Better: Secure vaults
const apiKey = await secretManager.getSecret('openai-api-key');

// ❌ Never: Hardcoded in source
const apiKey = 'sk-proj-abc123...'; // DON'T DO THIS

Q: How do I implement authentication for my agent endpoints? A: Use the built-in security middleware:

import { SecurityMiddleware } from 'sentient-agent-framework';

app.use(SecurityMiddleware.authenticate({
  methods: ['jwt', 'api-key'],
  jwt: { secret: process.env.JWT_SECRET },
  apiKey: { header: 'x-api-key' }
}));

Integration Questions

Q: Can I use this with my existing Express/Fastify/Next.js app? A: Absolutely! The SDK provides first-class integration:

// Express
app.use('/agent', sentientServer.handleRequest);

// Fastify
fastify.post('/agent', sentientServer.handleRequest);

// Next.js API Route
export default sentientServer.createNextHandler();

Q: How do I handle WebSocket connections for real-time features? A: The SDK supports WebSocket through the streaming system:

// The SSE streaming automatically works with WebSockets
const stream = responseHandler.createTextStream('RESPONSE');
// Chunks are automatically sent via WebSocket if client supports it

🆘 Getting Help

1. Check Documentation First

2. Community Support

3. Report Issues

  • GitHub Issues - Bug reports and feature requests
  • Include:
    • SDK version (npm list sentient-agent-framework)
    • Node.js version (node --version)
    • Operating system
    • Minimal reproduction code
    • Error messages and stack traces

4. Enterprise Support

For production deployments and enterprise features:

  • Email: support@sentient.ai
  • Priority Support: Available with enterprise plans
  • Professional Services: Custom integration assistance

5. Self-Help Debugging

Enable debug mode for detailed logging:

# Set debug environment
export DEBUG=sentient:*
export LOG_LEVEL=debug

# Run your application
npm start

Use the built-in diagnostic tools:

// Health check with detailed info
const health = await agent.performHealthCheck();
console.log('Detailed health:', health);

// Provider diagnostics
const providers = llmManager.getProviders();
for (const provider of providers) {
  console.log(`${provider.providerId}:`, await provider.validate());
}

// Performance metrics
const metrics = llmManager.getMetrics();
console.log('Performance metrics:', metrics);

🎯 Quick Reference Commands

# Development
npm run dev          # Start with hot reload
npm run build        # Build for production  
npm run test         # Run tests
npm run lint         # Check code quality

# Debugging
DEBUG=sentient:* npm start              # Enable debug logging
NODE_OPTIONS="--inspect" npm start      # Enable Node.js debugger
npm start -- --log-level debug          # Verbose logging

# Production
npm run build && npm start             # Build and run
docker build -t my-agent .            # Build Docker image
kubectl apply -f k8s/                 # Deploy to Kubernetes

# Health checks
curl http://localhost:3000/health      # Application health
curl http://localhost:3000/metrics     # Performance metrics
curl -I http://localhost:3000/agent    # Test agent endpoint

Remember: The Sentient Node SDK is production-ready, unlike the Python version. You're using enterprise-grade technology designed for scale and reliability! 🚀