Run local AI models directly in VS Code — no cloud, no API keys, no manual server setup.
The extension scans your machine for models, starts the inference server automatically, and exposes a built-in MCP endpoint so any AI client (Claude Desktop, Cursor, Continue.dev) can use your local model too.
| Feature | Description |
|---|---|
| 🔍 Auto model discovery | Scans common folders for GGUF, SafeTensors, PyTorch, and Ollama models |
| ⚡ Auto server start | Starts llama-server or ollama serve automatically — no terminal needed |
| 🌐 MCP server | Built-in MCP endpoint at http://127.0.0.1:3333/mcp for any MCP client |
| 💬 Chat panel | Full streaming chat with your local model |
| 🔧 Code tools | Explain, refactor, and ask about selected code |
| 📁 Add any folder | Browse to any folder to add more models on the fly |
| 🔌 Format support | GGUF · SafeTensors · PyTorch · Ollama |
| Format | Runtime needed | Where to get models |
|---|---|---|
| GGUF | llama.cpp (llama-server in PATH) |
HuggingFace — search any model + GGUF |
| SafeTensors / PyTorch | Python + transformers (auto-installed) |
HuggingFace — any standard model |
| Ollama | Ollama | ollama pull llama3 |
The extension looks for models in these locations automatically:
~/models ~/Models ~/Downloads
~/Documents ~/.cache/huggingface/hub
~/.cache/lm-studio/models
~/llama.cpp/models ~/.local/share/nomic.ai/gpt4all
Plus any custom paths you add via the "+ Add folder..." option or settings.
Pick one of the following (or both):
Option A — GGUF models (fastest)
# Build llama.cpp (or download a release binary)
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target llama-server -j$(nproc)
sudo cp build/bin/llama-server /usr/local/bin/
# Download any GGUF model to ~/models/
# e.g. from https://huggingface.co/models?library=ggufOption B — Ollama models (easiest)
# Install from https://ollama.com, then:
ollama pull llama3 # or any modelFrom GitHub Releases (recommended)
- Go to github.com/hit1001/vscode-local-llm-/releases/latest
- Download
local-llm-connect-2.0.0.vsix - Install it:
code --install-extension local-llm-connect-2.0.0.vsixOr install via VS Code UI:
- Open VS Code → Extensions panel (
Ctrl+Shift+X) - Click
...(top-right) → Install from VSIX... - Select the downloaded
.vsixfile
From VS Code Marketplace (coming soon)
- Search for
Local LLM Connectin the Extensions panel
From source
git clone https://github.com/hit1001/vscode-local-llm-
cd vscode-local-llm-
npm install
npm run compile
# Press F5 in VS Code to run, or package with vsce- Open VS Code — look for
⊙ Select Modelin the bottom-right status bar - Press Ctrl+Shift+M to scan and pick a model
- The server starts automatically
- Press Ctrl+Shift+L to open the chat panel
The extension starts an MCP server automatically on http://127.0.0.1:3333/mcp.
Run Ctrl+Shift+P → Local LLM: Show MCP Connection Info for a full guide. Quick configs:
Edit ~/.config/claude/claude_desktop_config.json:
{
"mcpServers": {
"local-llm": {
"url": "http://127.0.0.1:3333/mcp",
"transport": "http"
}
}
}{
"mcpServers": {
"local-llm": {
"url": "http://127.0.0.1:3333/mcp",
"transport": "http"
}
}
}curl -X POST http://127.0.0.1:3333/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {
"name": "chat",
"arguments": {
"messages": [{"role": "user", "content": "Hello! What can you do?"}]
}
}
}'| Tool | Description |
|---|---|
chat |
Full conversation with message history |
explain_code |
Explain a code snippet |
refactor_code |
Suggest code improvements |
complete |
Complete any text prompt |
| Command | Shortcut | Description |
|---|---|---|
| Local LLM: Select Model | Ctrl+Shift+M |
Scan machine and pick a model |
| Local LLM: Open Chat | Ctrl+Shift+L |
Open the chat panel |
| Local LLM: Ask About Selection | Ctrl+Shift+A |
Ask about selected code/text |
| Local LLM: Explain Selected Code | Right-click menu | Explain selected code |
| Local LLM: Refactor Selected Code | Right-click menu | Refactor selected code |
| Local LLM: Show MCP Connection Info | Command Palette | Get MCP endpoint and connection configs |
| Local LLM: Stop Server | Command Palette | Stop the running inference server |
| Setting | Default | Description |
|---|---|---|
localLLM.mcpPort |
3333 |
MCP server port |
localLLM.extraModelPaths |
[] |
Extra directories to scan for models |
localLLM.temperature |
0.7 |
Generation temperature (0–2) |
localLLM.maxTokens |
2048 |
Max tokens per response |
localLLM.systemPrompt |
(coding assistant) | System prompt for all requests |
VS Code Extension
├── localModelScanner.ts — scans filesystem for GGUF / HuggingFace / Ollama models
├── serverManager.ts — starts llama-server, ollama serve, or Python/transformers server
├── mcpServer.ts — MCP HTTP server (port 3333) for external AI clients
├── llmClient.ts — HTTP client for Ollama/OpenAI-compatible APIs (streaming)
├── chatPanel.ts — WebView chat UI
└── extension.ts — commands, status bar, activation
External MCP clients ──→ MCP server (port 3333) ──→ local model
VS Code chat panel ──→ LLM client ──→ local model
↑
llama-server / ollama / python
git clone https://github.com/YOUR_USERNAME/local-llm-connect
cd local-llm-connect
npm install
npm run compile # or: npm run watch
# Press F5 in VS Code to launch Extension Development HostTo package:
npm install -g @vscode/vsce
vsce package # → local-llm-connect-2.0.0.vsixPull requests are welcome. For major changes, open an issue first to discuss.
- Fork the repo
- Create a feature branch (
git checkout -b feature/my-feature) - Commit your changes
- Push and open a Pull Request
MIT — see LICENSE