AbstractCore is a unified Python interface for cloud, gateway, and local LLM providers. The default install is lightweight; add only the extras your application needs.
- Python 3.9+
pip
Extras compose. For example, abstractcore[remote,media,tools] installs hosted
API SDKs plus document/media handling and built-in tools in one command.
# Core: local HTTP servers and gateways that need no SDK
# Includes Ollama, LM Studio, OpenRouter, Portkey, and OpenAI-compatible /v1 endpoints
pip install abstractcore
# Hosted API SDKs (OpenAI + Anthropic). OpenRouter/Portkey still work from core.
pip install "abstractcore[remote]"
# Individual provider SDKs / local runtimes
pip install "abstractcore[openai]" # OpenAI SDK
pip install "abstractcore[anthropic]" # Anthropic SDK
pip install "abstractcore[huggingface]" # Transformers / torch (heavy)
pip install "abstractcore[apple]" # Apple Silicon local LLM stack (alias of mlx; heavy)
pip install "abstractcore[gpu]" # GPU local LLM stack (alias of vllm; heavy)
pip install "abstractcore[mlx]" # Explicit MLX provider extra
pip install "abstractcore[vllm]" # Explicit vLLM provider extra
# Optional features
pip install "abstractcore[tools]" # built-in tools (web/file/command helpers)
pip install "abstractcore[media]" # images, PDFs, Office docs
pip install "abstractcore[compression]" # glyph visual-text compression (Pillow renderer)
pip install "abstractcore[embeddings]" # EmbeddingManager + local embedding models
pip install "abstractcore[tokens]" # precise token counting (tiktoken)
pip install "abstractcore[server]" # OpenAI-compatible HTTP gateway
# Combine extras (zsh: keep quotes)
pip install "abstractcore[remote,media,tools]"
# Turnkey local-runtime installs
pip install "abstractcore[all-apple]" # Apple Silicon: remote SDKs + HF/GGUF + MLX + features + server
pip install "abstractcore[all-gpu]" # NVIDIA GPU: remote SDKs + HF/GGUF + vLLM + features + serverapple/gpu are hardware-profile aliases for the local LLM engine stack.
Capability extras such as voice, audio, vision, and music install the
lightweight plugin paths used for remote-capable routing. all-apple/all-gpu
are larger aggregate profiles for a full local-development environment,
including local plugin engines where supported.
Local OpenAI-compatible servers (Ollama, LMStudio, vLLM, llama.cpp, LocalAI, etc.) work with the core install; you just point AbstractCore at the server base URL. See Prerequisites for provider setup.
Optional capability plugins (deterministic multimodal outputs):
pip install "abstractcore[voice]" # enables llm.voice / llm.audio via remote-light abstractvoice
pip install "abstractcore[vision]" # enables llm.vision via abstractvision (generative vision)For abstractvoice 0.10.17+, the base AbstractCore plugin path can install on
Python 3.9 and does not install OmniVoice, torch, or torchaudio. Python 3.10+ is
recommended. Local voice engines and voice-clone backends are part of explicit
local aggregate profiles such as abstractcore[all-apple] and
abstractcore[all-gpu].
See: Capabilities and Server.
For generative vision, AbstractCore does not hardcode a local model default.
Configure an AbstractVision/OpenAI-compatible image default and omit model, or
route explicitly with model="diffusers/default",
model="diffusers/<huggingface-repo>", model="sdcpp/default", or
model="openai-compatible/<model>" on /v1/images/generations /
/v1/images/edits. Local Diffusers runs cache-only unless you opt in to
downloads.
AbstractCore uses a provider ID plus a model name:
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
# llm = create_llm("anthropic", model="claude-haiku-4-5")
# llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
# llm = create_llm("lmstudio", model="qwen/qwen3-4b-2507")
# llm = create_llm("openai-compatible", model="default", base_url="http://localhost:1234/v1")Tip: you can omit model=..., but it’s usually better to pass an explicit model to avoid surprises when defaults change.
Open-source-first: start with local providers (Ollama, LMStudio, MLX, HuggingFace), then add cloud or gateway providers as needed.
Gateway providers (OpenRouter, Portkey) examples:
from abstractcore import create_llm
llm_openrouter = create_llm("openrouter", model="openai/gpt-4o-mini")
llm_portkey = create_llm("portkey", model="gpt-5-mini", api_key="PORTKEY_API_KEY", config_id="pcfg_...")Note: gateway providers only forward optional generation params (e.g. temperature, top_p, max_output_tokens) when you explicitly set them.
OpenAI example (requires pip install "abstractcore[openai]"):
from abstractcore import create_llm
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What is the capital of France?")
print(resp.content)Use a session to keep conversation state (system prompt + message history) across turns:
from abstractcore import BasicSession, create_llm
llm = create_llm("openai", model="gpt-4o-mini")
session = BasicSession(provider=llm, system_prompt="You are a helpful assistant.")
print(session.generate("Hello!").content)
print(session.generate("Now continue.").content)For prompt-cache-aware long chats (reuse stable prefixes like system/tools/files), use CachedSession:
- See Prompt Caching.
Many modern models can optionally emit a reasoning/thinking trace (sometimes in a separate channel, sometimes inline). AbstractCore exposes a single unified control:
from abstractcore import create_llm
llm = create_llm("lmstudio", model="qwen3.5-27b@q4_k_m", base_url="http://localhost:1234/v1")
# Disable thinking (tries to suppress any reasoning trace)
resp = llm.generate("Compute 17*23 - 19*11. Reply with the integer only.", thinking="none")
print(resp.content)
# Enable thinking (levels are best-effort; not all backends support budgets)
resp = llm.generate("Solve a hard logic puzzle.", thinking="high")
print(resp.content)
print(resp.metadata.get("reasoning")) # when the backend exposes itNotes:
- For Qwen3 / Qwen3.5 on LM Studio, AbstractCore uses LM Studio’s model template variables (
enable_thinking/enableThinking) and a Qwen template “hard switch” forthinking="none"(empty<think></think>), rather than injecting “Reasoning effort …” text into the system prompt. - For Qwen3 / Qwen3.5 GGUF via HuggingFaceProvider (llama-cpp-python), there is no template-kwargs knob exposed by llama-cpp-python today, so
thinking="none"also uses the Qwen hard-switch marker. If GGUF loading fails due to huge advertised context windows, AbstractCore will retry with smallern_ctxvalues (best-effort); you can also passmax_tokens=...when constructingHuggingFaceProvider()to explicitly control llama.cppn_ctx. - For Ollama, enabling thinking may consume a lot of output tokens in the thinking channel; consider using a larger
max_output_tokenswhenthinkingis enabled.
For server usage (OpenAI-compatible HTTP), see Server and Generation Parameters.
from abstractcore import create_llm
llm = create_llm("ollama", model="qwen3:4b-instruct-2507-q4_K_M")
for chunk in llm.generate("Write a short poem about distributed systems.", stream=True):
print(chunk.content or "", end="", flush=True)AbstractCore supports native tool calling (when the provider supports it) and prompted tool syntax (when it doesn’t).
By default, tool execution is pass-through (execute_tools=False): you get tool calls in resp.tool_calls, and your host/runtime decides how to execute them.
In the AbstractFramework ecosystem, AbstractRuntime is the recommended runtime for executing tool calls durably (policy, retries, persistence). See Architecture and Tool Calling.
from abstractcore import create_llm, tool
@tool
def get_weather(city: str) -> str:
return f"{city}: 22°C and sunny"
llm = create_llm("openai", model="gpt-4o-mini")
resp = llm.generate("What's the weather in Paris? Use the tool.", tools=[get_weather])
print(resp.content)
print(resp.tool_calls)See Tool Calling and Tool Syntax Rewriting (tool_call_tags, server agent_format).
Note:
- If you pass both
tools=[...]andresponse_model=...togenerate(), AbstractCore uses a 2-pass hybrid flow (tool-capable call, then structured-output call). Streaming is not supported in this hybrid mode.
If you want a ready-made toolset for agentic scripts, install:
pip install "abstractcore[tools]"Then import from abstractcore.tools.common_tools:
skim_websearchvsweb_search: compact/filtered links vs full resultsskim_urlvsfetch_url: fast URL triage (small output) vs full fetch + parsing for text-first types (HTML/JSON/text)
See Tool Calling for a recommended workflow and the full built-in tool list.
Pass a Pydantic model via response_model=... to get a typed result back (instead of parsing JSON yourself):
from pydantic import BaseModel
from abstractcore import create_llm
class Answer(BaseModel):
title: str
bullets: list[str]
llm = create_llm("openai", model="gpt-4o-mini")
answer = llm.generate("Summarize HTTP/3 in 3 bullets.", response_model=Answer)
print(answer.bullets)See Structured Output for strategy details and limitations.
Images and document extraction require pip install "abstractcore[media]" (Pillow + PDF/Office deps).
from abstractcore import create_llm
llm = create_llm("anthropic", model="claude-haiku-4-5")
resp = llm.generate("Describe the image.", media=["./image.png"])
print(resp.content)Audio and video attachments are also supported, but they are policy-driven (no silent semantic changes):
- audio:
audio_policy(native_only|speech_to_text|auto|caption) - video:
video_policy(native_only|frames_caption|auto)
Speech-to-text fallback (audio_policy="speech_to_text") typically requires installing abstractvoice (capability plugin).
What you need (quick checklist):
- Images:
abstractcore[media]+ either a vision-capable model (VLM/VL) or configured vision fallback (abstractcore --set-vision-provider PROVIDER MODEL). - Video:
ffmpeg/ffprobeonPATH+ either a vision-capable model or configured vision fallback (for frame sampling). Native video input is model/provider dependent. - Audio: either an audio-capable model or speech-to-text fallback via
abstractvoice+audio_policy="auto"/"speech_to_text".
Defaults can be configured via the config CLI (abstractcore --config, abstractcore --status). See Centralized Config.
If your main model is text-only, you can configure vision fallback (two-stage captioning) so images are automatically described and injected as short observations. See Media Handling, Vision Capabilities, and Centralized Config.
For long documents, AbstractCore can optionally apply Glyph visual-text compression. Install pip install "abstractcore[compression]" (and pip install "abstractcore[media]" for PDFs) and see Glyph Visual-Text Compression.
import asyncio
from abstractcore import create_llm
async def main():
llm = create_llm("openai", model="gpt-4o-mini")
resp = await llm.agenerate("Give me 3 bullet points about HTTP caching.")
print(resp.content)
asyncio.run(main())# Configure defaults and API keys
abstractcore --config
abstractcore --status
# Interactive chat
abstractcore-chat --provider openai --model gpt-4o-mini- Prerequisites — provider setup (keys, base URLs, hardware notes)
- FAQ — common questions and setup gotchas
- Examples — end-to-end patterns and recipes
- API (Python) — public API map and common patterns
- API Reference — complete function/class listing
- Troubleshooting — common errors and fixes
- Server — OpenAI-compatible HTTP gateway
- Endpoint — single-model OpenAI-compatible endpoint (one provider/model per worker)