Frequently Asked Questions (FAQ)

Common questions about Lynkr, installation, configuration, and usage.

General Questions

What is Lynkr?

Lynkr is a self-hosted proxy server that enables Claude Code CLI and Cursor IDE to work with multiple LLM providers (Databricks, AWS Bedrock, OpenRouter, Ollama, Moonshot AI, etc.) instead of being locked to Anthropic's API.

Key benefits:

💰 60-80% cost savings through token optimization
🔓 Provider flexibility - Choose from 12+ providers
🔒 Privacy - Run 100% locally with Ollama or llama.cpp
✅ Zero code changes - Drop-in replacement for Anthropic backend

Can I use Lynkr with the official Claude Code CLI?

Yes! Lynkr is designed as a drop-in replacement for Anthropic's backend. Simply set ANTHROPIC_BASE_URL to point to your Lynkr server:

export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy  # Required by CLI, but ignored by Lynkr
claude "Your prompt here"

All Claude Code CLI features work through Lynkr.

Does Lynkr work with Cursor IDE?

Yes! Lynkr provides OpenAI-compatible endpoints that work with Cursor:

Start Lynkr: lynkr start
Configure Cursor Settings → Models:
- API Key: sk-lynkr (any non-empty value)
- Base URL: http://localhost:8081/v1
- Model: Your provider's model (e.g., claude-3.5-sonnet)

All Cursor features work: chat (Cmd+L), inline edits (Cmd+K), and @Codebase search (with embeddings).

See Cursor Integration Guide for details.

How much does Lynkr cost?

Lynkr itself is 100% FREE and open source (Apache 2.0 license).

Costs depend on your provider:

Ollama/llama.cpp: 100% FREE (runs on your hardware)
OpenRouter: ~$5-10/month (100+ models)
AWS Bedrock: ~$10-20/month (100+ models)
Databricks: Enterprise pricing (contact Databricks)
Azure/OpenAI: Standard provider pricing

With token optimization, Lynkr reduces provider costs by 60-80% through smart tool selection, prompt caching, and memory deduplication.

What's the difference between Lynkr and native Claude Code?

Feature	Native Claude Code	Lynkr
Providers	Anthropic only	12+ providers
Cost	Full Anthropic pricing	60-80% cheaper
Local models	❌ Cloud-only	✅ Ollama, llama.cpp
Privacy	☁️ Cloud	🔒 Can run 100% locally
Token optimization	❌ None	✅ 6 optimization phases
MCP support	Limited	✅ Full orchestration
Enterprise features	Limited	✅ Circuit breakers, metrics, K8s-ready
Cost transparency	Hidden	✅ Full tracking
License	Proprietary	✅ Apache 2.0 (open source)

Installation & Setup

How do I install Lynkr?

Option 1: NPM (Recommended)

npm install -g lynkr
lynkr start

Option 2: Homebrew (macOS)

brew tap vishalveerareddy123/lynkr
brew install lynkr
lynkr start

Option 3: Git Clone

git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr && npm install && npm start

See Installation Guide for all methods.

Which provider should I use?

Depends on your priorities:

For Privacy (100% Local, FREE):

✅ Ollama - Easy setup, 100% private
✅ llama.cpp - Maximum performance, GGUF models
Setup: 5-15 minutes
Cost: $0 (runs on your hardware)

For Simplicity (Easiest Cloud):

✅ OpenRouter - One key for 100+ models
Setup: 2 minutes
Cost: ~$5-10/month

For AWS Ecosystem:

✅ AWS Bedrock - 100+ models, Claude + alternatives
Setup: 5 minutes
Cost: ~$10-20/month

For Affordable Cloud + Reasoning:

✅ Moonshot AI - Kimi K2, thinking models
Setup: 2 minutes
Cost: ~$5-10/month

For Enterprise:

✅ Databricks - Claude 4.5, enterprise SLA
Setup: 10 minutes
Cost: Enterprise pricing

See Provider Configuration Guide for detailed comparison.

Can I use multiple providers?

Yes! Lynkr supports tier-based routing:

# Set all 4 TIER_* env vars to enable tier-based routing
export TIER_SIMPLE=ollama:llama3.2
export TIER_MEDIUM=openrouter:openai/gpt-4o-mini
export TIER_COMPLEX=azure-openai:gpt-4o
export TIER_REASONING=azure-openai:gpt-4o
export FALLBACK_ENABLED=true
export FALLBACK_PROVIDER=databricks

How it works:

Each request is scored for complexity (0-100) and mapped to a tier
SIMPLE (0-25): Ollama (free, local, fast) or Moonshot (affordable cloud)
MEDIUM (26-50): OpenRouter or mid-range cloud model
COMPLEX (51-75): Capable cloud models
REASONING (76-100): Best available models
Provider failures: Automatic transparent fallback

Cost savings: 65-100% for requests routed to local/cheap models.

What is MODEL_PROVIDER and do I still need it?

MODEL_PROVIDER sets a single static provider for all requests. When you set MODEL_PROVIDER=ollama, every request goes to Ollama regardless of complexity.

With TIER_* vars configured: MODEL_PROVIDER is not used for routing — the tier system picks the provider per-request. However, MODEL_PROVIDER is still read for startup checks (e.g. waiting for Ollama) and as a fallback default in edge cases. Keep it set to your most-used provider.

Without TIER_* vars: MODEL_PROVIDER is the only thing that controls where requests go.

How do MODEL_PROVIDER and TIER_* work together?

They are two separate routing modes:

Scenario	What happens
`MODEL_PROVIDER` only	Static routing — all requests go to that provider
All 4 `TIER_*` set	Tier routing — TIER_* overrides MODEL_PROVIDER for routing
Only 1-3 `TIER_*` set	Tier routing disabled — falls back to `MODEL_PROVIDER`
Both set	TIER_* takes priority for routing; MODEL_PROVIDER is kept as a config default

Example: If you have MODEL_PROVIDER=ollama and TIER_COMPLEX=databricks:claude-sonnet, complex requests go to Databricks even though MODEL_PROVIDER says ollama.

What happens if I only set some TIER_* vars?

All 4 must be set (TIER_SIMPLE, TIER_MEDIUM, TIER_COMPLEX, TIER_REASONING) for tier routing to activate. If any are missing, tier routing is disabled entirely and MODEL_PROVIDER is used for all requests.

This is intentional — partial tier config could lead to unexpected gaps where some complexity levels have no provider assigned.

What is FALLBACK_PROVIDER?

The fallback provider is a safety net for when the tier-selected provider fails (timeout, connection refused, rate limit). If FALLBACK_ENABLED=true and the primary provider for a request fails, Lynkr retries the request against FALLBACK_PROVIDER transparently.

Only triggers when tier routing is active
Cannot be a local provider (ollama, llamacpp, lmstudio) — use cloud providers
Defaults to databricks
If you don't have cloud credentials, set FALLBACK_ENABLED=false

Provider-Specific Questions

Can I use Ollama models with Lynkr and Cursor?

Yes! Ollama works for both chat AND embeddings (100% local, FREE):

Chat setup:

export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=llama3.1:8b  # or qwen2.5-coder, mistral, etc.
lynkr start

Embeddings setup (for @Codebase):

ollama pull nomic-embed-text
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Recommended models:

Chat: llama3.1:8b - Good balance, tool calling supported
Chat: qwen2.5:14b - Better reasoning (7b struggles with tools)
Embeddings: nomic-embed-text (137M) - Best all-around

100% local, 100% private, 100% FREE! 🔒

How do I enable @Codebase search in Cursor with Lynkr?

@Codebase semantic search requires embeddings. Choose ONE option:

Option 1: Ollama (100% Local, FREE) 🔒

ollama pull nomic-embed-text
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Option 2: llama.cpp (100% Local, FREE) 🔒

./llama-server -m nomic-embed-text.gguf --port 8080 --embedding
export LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings

Option 3: OpenRouter (Cloud, ~$0.01-0.10/month)

export OPENROUTER_API_KEY=sk-or-v1-your-key
# Works automatically if you're already using OpenRouter for chat!

Option 4: OpenAI (Cloud, ~$0.01-0.10/month)

export OPENAI_API_KEY=sk-your-key

After configuring, restart Lynkr. @Codebase will then work in Cursor!

See Embeddings Guide for details.

What are the performance differences between providers?

Provider	Latency	Cost	Tool Support	Best For
Ollama	100-500ms	FREE	Good	Local, privacy, offline
llama.cpp	50-300ms	FREE	Good	Performance, GPU
OpenRouter	500ms-2s	$-$$	Excellent	Flexibility, 100+ models
Databricks/Azure	500ms-2s	$$$	Excellent	Enterprise, Claude 4.5
AWS Bedrock	500ms-2s	$-$$$	Excellent*	AWS, 100+ models
Moonshot AI	500ms-2s	$	Good	Affordable, thinking models
OpenAI	500ms-2s	$$	Excellent	GPT-4o, o1, o3

* Tool calling only supported by Claude models on Bedrock

Does AWS Bedrock support tool calling?

Only Claude models support tool calling on Bedrock.

✅ Supported (with tools):

anthropic.claude-3-5-sonnet-20241022-v2:0
anthropic.claude-3-opus-20240229-v1:0
us.anthropic.claude-sonnet-4-5-20250929-v1:0

❌ Not supported (no tools):

Amazon Titan models
Meta Llama models
Mistral models
Cohere models
AI21 models

Other models work via Converse API but won't use Read/Write/Bash tools.

See BEDROCK_MODELS.md for complete model catalog.

Features & Capabilities

What is token optimization and how does it save costs?

Lynkr includes 6 token optimization phases that reduce costs by 60-80%:

Smart Tool Selection (50-70% reduction)
- Filters tools based on request type
- Only sends relevant tools to model
- Example: Chat query doesn't need git tools
Prompt Caching (30-45% reduction)
- Caches repeated prompts
- Reuses system prompts
- Reduces redundant token usage
Memory Deduplication (20-30% reduction)
- Removes duplicate memories
- Compresses conversation history
- Eliminates redundant context
Tool Response Truncation (15-25% reduction)
- Truncates long tool outputs
- Keeps only relevant portions
- Reduces tool result tokens
Dynamic System Prompts (10-20% reduction)
- Adapts prompts to request type
- Shorter prompts for simple queries
- Longer prompts only when needed
Conversation Compression (15-25% reduction)
- Summarizes old messages
- Keeps recent context full
- Compresses historical turns

At 100k requests/month, this translates to $6,400-9,600/month savings ($77k-115k/year).

See Token Optimization Guide for details.

What is the memory system?

Lynkr includes a Titans-inspired long-term memory system that remembers important context across conversations:

Key features:

🧠 Surprise-Based Updates - Only stores novel, important information
🔍 Semantic Search - Full-text search with Porter stemmer
📊 Multi-Signal Retrieval - Ranks by recency, importance, relevance
⚡ Automatic Integration - Zero latency overhead (<50ms retrieval)
🛠️ Management Tools - memory_search, memory_add, memory_forget

What gets remembered:

✅ User preferences ("I prefer Python")
✅ Important decisions ("Decided to use React")
✅ Project facts ("This app uses PostgreSQL")
✅ New entities (first mention of files, functions)
❌ Greetings, confirmations, repeated info

Configuration:

export MEMORY_ENABLED=true                  # Enable/disable
export MEMORY_RETRIEVAL_LIMIT=5             # Memories per request
export MEMORY_SURPRISE_THRESHOLD=0.3        # Min score to store

See Memory System Guide for details.

What are tool execution modes?

Lynkr supports two tool execution modes:

Server Mode (Default)

export TOOL_EXECUTION_MODE=server

Tools run on the machine running Lynkr
Good for: Standalone proxy, shared team server
File operations access server filesystem

Client Mode (Passthrough)

export TOOL_EXECUTION_MODE=client

Tools run on Claude Code CLI side (your local machine)
Good for: Local development, accessing local files
Full integration with local environment

Does Lynkr support MCP (Model Context Protocol)?

Yes! Lynkr includes full MCP orchestration:

🔍 Automatic Discovery - Scans ~/.claude/mcp for manifests
🚀 JSON-RPC 2.0 Client - Communicates with MCP servers
🛠️ Dynamic Tool Registration - Exposes MCP tools in proxy
🔒 Docker Sandbox - Optional container isolation

Configuration:

export MCP_MANIFEST_DIRS=~/.claude/mcp
export MCP_SANDBOX_ENABLED=true

MCP tools integrate seamlessly with Claude Code CLI and Cursor.

Deployment & Production

Can I deploy Lynkr to production?

Yes! Lynkr includes 14 production-hardening features:

Reliability: Circuit breakers, exponential backoff, load shedding
Observability: Prometheus metrics, structured logging, health checks
Security: Input validation, policy enforcement, sandboxing
Performance: Prompt caching, token optimization, connection pooling
Deployment: Kubernetes-ready health checks, graceful shutdown, Docker support

See Production Hardening Guide for details.

How do I deploy with Docker?

docker-compose (Recommended):

git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
cp .env.example .env
# Edit .env with your credentials
docker-compose up -d

Standalone Docker:

docker build -t lynkr .
docker run -d -p 8081:8081 -e MODEL_PROVIDER=databricks -e DATABRICKS_API_KEY=your-key lynkr

See Docker Deployment Guide for advanced options (GPU, K8s, volumes).

What metrics does Lynkr collect?

Lynkr collects comprehensive metrics in Prometheus format:

Request Metrics:

Request rate (requests/sec)
Latency percentiles (p50, p95, p99)
Error rate and types
Status code distribution

Token Metrics:

Token usage per request
Token cost per request
Cumulative token usage
Cache hit rate

System Metrics:

Memory usage
CPU usage
Active connections
Circuit breaker state

Access metrics:

curl http://localhost:8081/metrics
# Returns Prometheus-format metrics

See Production Guide for metrics configuration.

Troubleshooting

Lynkr won't start - what should I check?

Missing credentials:

echo $MODEL_PROVIDER
echo $DATABRICKS_API_KEY  # or other provider key

Port already in use:

lsof -i :8081
kill -9 <PID>
# Or use different port: export PORT=8082

Missing dependencies:

npm install
# Or: npm install -g lynkr --force

See Troubleshooting Guide for more issues.

Why is my first request slow?

This is normal:

Ollama/llama.cpp: Model loading (1-5 seconds)
Cloud providers: Cold start (2-5 seconds)
Subsequent requests are fast

Solutions:

Keep Ollama running:

ollama serve  # Keep running in background

Warm up after startup:

curl http://localhost:8081/health/ready?deep=true

How do I enable debug logging?

export LOG_LEVEL=debug
lynkr start

# Check logs for detailed request/response info

Cost & Pricing

How much can I save with Lynkr?

Scenario: 100,000 requests/month, average 50k input tokens, 2k output tokens

Provider	Without Lynkr	With Lynkr (60% savings)	Monthly Savings
Claude Sonnet 4.5	$16,000	$6,400	$9,600
GPT-4o	$12,000	$4,800	$7,200
Ollama (Local)	API costs	$0	$12,000+

ROI: $77k-115k/year in savings.

Token optimization breakdown:

Smart tool selection: 50-70% reduction
Prompt caching: 30-45% reduction
Memory deduplication: 20-30% reduction
Tool truncation: 15-25% reduction

What's the cheapest setup?

100% FREE Setup:

# Chat: Ollama (local, free)
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=llama3.1:8b

# Embeddings: Ollama (local, free)
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Total cost: $0/month 🔒

100% private (all data stays on your machine)
Works offline
Full Claude Code CLI + Cursor support

Hardware requirements:

8GB+ RAM for 7-8B models
16GB+ RAM for 14B models
Optional: GPU for faster inference

Security & Privacy

Is Lynkr secure for production use?

Yes! Lynkr includes multiple security features:

Input Validation: Zero-dependency schema validation
Policy Enforcement: Git, test, web fetch policies
Sandboxing: Optional Docker isolation for MCP tools
Authentication: API key support (provider-level)
Rate Limiting: Load shedding during overload
Logging: Structured logs with request ID correlation

Best practices:

Run behind reverse proxy (nginx, Caddy)
Use HTTPS for external access
Rotate API keys regularly
Enable policy restrictions
Monitor metrics and logs

Can I run Lynkr completely offline?

Yes! Use local providers:

Option 1: Ollama

export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=llama3.1:8b
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text

Option 2: llama.cpp

export MODEL_PROVIDER=llamacpp
export LLAMACPP_ENDPOINT=http://localhost:8080
export LLAMACPP_EMBEDDINGS_ENDPOINT=http://localhost:8080/embeddings

Result:

✅ Zero internet required
✅ 100% private (all data stays local)
✅ Works in air-gapped environments
✅ Full Claude Code CLI + Cursor support

Where is my data stored?

Local data (on machine running Lynkr):

SQLite databases: data/ directory
- memories.db - Long-term memories
- sessions.db - Conversation history
- workspace-index.db - Workspace metadata
Configuration: .env file
Logs: stdout (or log file if configured)

Provider data:

Cloud providers: Sent to provider (Databricks, Bedrock, OpenRouter, etc.)
Local providers: Stays on your machine (Ollama, llama.cpp)

Privacy recommendation: Use Ollama or llama.cpp for 100% local, private operation.

Getting Help

Where can I get help?

Troubleshooting Guide - Common issues and solutions
GitHub Discussions - Community Q&A
GitHub Issues - Report bugs
Documentation - Complete guides

How do I report a bug?

Check GitHub Issues for existing reports
If new, create an issue with:
- Lynkr version
- Provider being used
- Full error message
- Steps to reproduce
- Debug logs (with LOG_LEVEL=debug)

How can I contribute?

See Contributing Guide for:

Code contributions
Documentation improvements
Bug reports
Feature requests

License

What license is Lynkr under?

Apache 2.0 - Free and open source.

You can:

✅ Use commercially
✅ Modify the code
✅ Distribute
✅ Sublicense
✅ Use privately

No restrictions for:

Personal use
Commercial use
Internal company use
Redistribution

See LICENSE file for details.

Uh oh!

FilesExpand file tree

faq.md

Latest commit

History