LM Studio exposes both a native v1 REST API and OpenAI-compatible HTTP server so you can run local or LAN-hosted models without changing your existing API integration. VT Code supports both the native LM Studio v1 API (introduced in 0.4.0) and OpenAI-compatible endpoints, which means streaming, tool calling, structured outputs, and stateful chats work seamlessly while keeping inference on your own hardware.
- LM Studio 0.4.0+ installed on your machine (download)
- Enable the Developer HTTP server via the LM Studio desktop app or CLI (docs)
- At least one model downloaded inside LM Studio (e.g., Meta Llama 3.1 8B or 3 8B, Qwen2.5 7B, Gemma 2 2B/9B IT, or Phi-3.1 Mini 4K)
-
Install LM Studio: Follow the platform-specific installer from lmstudio.ai/download.
-
Download a model: In the app, open the "Models" tab and pull one of the supported open models (Meta Llama 3.1 8B, Meta Llama 3 8B, Qwen2.5 7B, Gemma 2 2B/9B IT, Phi-3.1 Mini 4K, or any other compatible model hosted in the LM Studio catalog). Alternatively, use the CLI:
lms get deepseek-r1 # Download by keyword lms get <hugging-face-url> # Download by URL
-
Start the Developer server:
- GUI: From the "Developer" panel, enable the server and confirm the port (defaults to
1234). - CLI: Run
lms server startto launch the server. Append--host 0.0.0.0to expose it to other machines on your network.
- GUI: From the "Developer" panel, enable the server and confirm the port (defaults to
-
Verify the server: Send a quick health check:
# Native v1 API curl http://localhost:1234/api/v1/models # OpenAI-compatible API (still supported) curl http://localhost:1234/v1/models
The response lists every model LM Studio currently exposes through the API.
LMSTUDIO_BASE_URL(optional): Override the API endpoint (defaults tohttp://localhost:1234/v1for OpenAI-compatible endpoints, orhttp://localhost:1234/api/v1for native v1 API). Useful when the server runs on another port or host.LMSTUDIO_API_KEY(optional): Set when you enable authentication in the LM Studio server (introduced in 0.4.0). Leave unset for local testing without authentication.
Configure vtcode.toml in your workspace to point at LM Studio:
[agent]
provider = "lmstudio" # LM Studio provider
default_model = "lmstudio-community/meta-llama-3.1-8b-instruct"
[tools]
default_policy = "prompt"
[tools.policies]
read_file = "allow"
write_file = "prompt"
run_pty_cmd = "prompt"You can also override the provider and model via CLI:
vtcode --provider lmstudio --model lmstudio-community/qwen2.5-7b-instructLM Studio 0.4.0+ provides multiple API surfaces:
The recommended API for new integrations, offering enhanced features:
POST /api/v1/chat: Chat with a model (supports streaming, stateful chats, MCP)GET /api/v1/models: List available modelsPOST /api/v1/models/load: Load a model into memoryPOST /api/v1/models/unload: Unload a model from memoryPOST /api/v1/models/download: Download a modelGET /api/v1/models/download/status: Check download status
Maintained for backward compatibility:
POST /v1/chat/completions: Standard OpenAI chat completionsPOST /v1/responses: Stateful interactions withprevious_response_id, custom tools, and MCP supportPOST /v1/embeddings: Generate embeddingsGET /v1/models: List models
Added in LM Studio 0.4.1:
POST /v1/messages: Anthropic Messages API compatibility
VT Code currently uses the OpenAI-compatible endpoints for maximum compatibility. Future versions may migrate to the native v1 API for enhanced features.
The /model picker now lists LM Studio's default catalog so you can select a model
without typing IDs manually. Choose "Custom LM Studio model" to enter any other model
ID exposed by the LM Studio server.
When you sideload a GGUF or add a local GGML/ONNX pipeline through LM Studio, make sure
it appears under the server's GET /v1/models response. Once listed, VT Code can target
it by passing the exact model ID via CLI or configuration.
LM Studio's API stack supports multiple inference endpoints with varying capabilities:
| Feature | /api/v1/chat |
/v1/responses |
/v1/chat/completions |
/v1/messages |
|---|---|---|---|---|
| Streaming | v | v | v | v |
| Stateful chat | v | v | x | x |
| Remote MCPs | v | v | x | x |
| LM Studio MCPs | v | v | x | x |
| Custom tools | x | v | v | v |
| Assistant messages | x | v | v | v |
| Model load events | v | x | x | x |
| Prompt processing events | v | x | x | x |
| Context length control | v | x | x | x |
VT Code forwards tool definitions, function-calling metadata, and JSON schema expectations so models can call tools or produce structured output. Streaming is enabled by default, and you will see incremental tokens in the TUI just as you would with remote OpenAI deployments.
Because the provider shares the OpenAI surface area, features such as parallel_tool_calls, reasoning effort flags, and JSON Schema validation behave consistently—subject to the capabilities of the model you are running locally.
- Stateful Chats: Use
previous_response_idto maintain conversation context across requests - MCP via API: Access Model Context Protocol tools through the API
- Authentication: Configure API tokens for secure access
- Model Management: Load, unload, and download models programmatically
- Idle TTL: Set time-to-live for models loaded via API (auto-evict after inactivity)
- Connection refused: Ensure the LM Studio server is running and that
LMSTUDIO_BASE_URLpoints to the correct host/port. Default ishttp://localhost:1234. - Model not found: Confirm the model appears in the LM Studio catalog and that the
server exposes it via
GET /api/v1/modelsorGET /v1/models. - 401 Unauthorized: Provide the configured API key through
LMSTUDIO_API_KEYif authentication is enabled (LM Studio 0.4.0+). - Slow responses: Local inference speed depends on your hardware and the model size. Consider using smaller models (Gemma 2 2B, Qwen2.5 7B) for faster iteration.
- Tool payload errors: Check the LM Studio server logs to ensure your runtime supports the tools and structured outputs you are invoking.
- Server not starting: Run
lms server startfrom the command line to see detailed error messages. - Model download fails: Use
lms get <model>to download models directly via CLI.