AbstractCore Server

Transform AbstractCore into an OpenAI-compatible API server. One server, all models, any client.

If you want a dedicated single-model /v1 server (one provider/model per worker), see Endpoint.

Interactive API docs (start here)

Visit while the server is running:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
Lightweight endpoint index: http://localhost:8000/docs-lite

Swagger UI keeps its standard Authorize button when server auth is enabled. When ABSTRACTCORE_AUTH_TOKEN is set, AbstractCore wraps that authorize flow and validates the entered bearer token through /acore/auth/validate before Swagger stores it for Try it out requests. Invalid tokens stay unauthorized and render an auth error inside the modal. The docs and OpenAPI schema are public by default so the UI can load before authentication, but API operations remain protected. Set ABSTRACTCORE_SERVER_PROTECT_DOCS=1 if you also want /docs, /docs-lite, /redoc, and /openapi.json behind server auth. When server auth is disabled, the server bearer scheme is omitted from the docs, so Swagger does not render a misleading server-token authorize flow.

The OpenAPI schema includes executable examples for every request body. JSON examples intentionally show optional aliases as null when sending both fields would be ambiguous; the server drops nulls before routing. For local/custom OpenAI-compatible endpoints, set base_url only when you intentionally want to route away from the provider's default API host.

Quick Start

Install and Run (2 minutes)

# Install
pip install "abstractcore[server]"

# Configure server auth and provider keys
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"
export OPENAI_API_KEY="sk-..."

# Start server
python -m abstractcore.server.app

# Or with uvicorn directly
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Test
curl http://localhost:8000/health
# Response: {"status":"healthy"}

First Request

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Or with Python:

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)

Configuration

You can configure the server through environment variables or through AbstractCore's centralized config. Environment variables always take precedence over config-persisted values.

# Persisted local/server config
abstractcore --set-server-auth-token acore-server-secret
abstractcore --set-api-key openai sk-...
abstractcore --set-api-key anthropic sk-ant-...
abstractcore --set-api-key openrouter sk-or-...
abstractcore --set-api-key portkey pk_...

# Optional hardening/defaults
abstractcore --set-server-base-url-allowlist "https://example.com/v1"
abstractcore --set-server-url-fetch-allowlist "https://files.example.com"
abstractcore --set-server-media-root /srv/abstractcore-media
abstractcore --set-server-host 127.0.0.1
abstractcore --set-server-port 8000

Environment Variables

# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."
export PORTKEY_API_KEY="pk_..."         # optional (Portkey)
export PORTKEY_CONFIG="pcfg_..."        # required for Portkey routing

# Server auth token. Authenticated clients can use all server-configured providers.
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"

# Optional: also protect /docs, /docs-lite, /redoc, and /openapi.json.
export ABSTRACTCORE_SERVER_PROTECT_DOCS=1

# Local providers
export OLLAMA_BASE_URL="http://localhost:11434"          # (or legacy: OLLAMA_HOST)
export LMSTUDIO_BASE_URL="http://localhost:1234/v1"
export VLLM_BASE_URL="http://localhost:8000/v1"
export OPENAI_BASE_URL="http://localhost:1234/v1"
export OPENAI_API_KEY="your-endpoint-key"                # optional, if the endpoint requires auth

# Server bind (only used by `python -m abstractcore.server.app`)
export HOST="0.0.0.0"
export PORT="8000"

# Debug mode
export ABSTRACTCORE_DEBUG=true

# Dangerous (multi-tenant hazard): allow unload_after for providers that can unload shared server state (e.g. Ollama)
export ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1

# Server security controls (recommended)
#
# - Request-level base_url overrides are loopback-only by default.
#   URL entries match scheme + exact host + default/explicit port + path-segment prefix.
#   Bare entries match hostname globs, e.g. "*.example.com".
export ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST="https://api.openai.com,https://example.com/v1"
#
# - Remote URL fetches for attachments are blocked for private/loopback/link-local targets by default (SSRF protection).
#   To allow specific hosts/prefixes, use the same structured allowlist syntax:
export ABSTRACTCORE_SERVER_URL_FETCH_ALLOWLIST="https://www.berkshirehathaway.com"
#
# - Local file paths in HTTP requests are disabled by default (including @/path/to/file in message strings).
#   To allow local file paths safely, restrict them under a single directory:
export ABSTRACTCORE_SERVER_MEDIA_ROOT="/srv/abstractcore-media"
#
# - Unsafe escape hatch: allow arbitrary local file paths from HTTP requests (not recommended)
export ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1

Startup Options

# Using AbstractCore's built-in CLI
python -m abstractcore.server.app --help                    # View all options
python -m abstractcore.server.app --debug                   # Debug mode
python -m abstractcore.server.app --host 127.0.0.1 --port 8080  # Custom host/port
python -m abstractcore.server.app --debug --port 8001       # Debug on custom port

# Using uvicorn directly
uvicorn abstractcore.server.app:app --reload                # Development with auto-reload
uvicorn abstractcore.server.app:app --workers 4             # Production with multiple workers
uvicorn abstractcore.server.app:app --port 3000             # Custom port

API Endpoints

Endpoint Map

All API operations except GET /health use the same server auth policy: send Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN when ABSTRACTCORE_AUTH_TOKEN is configured. Provider-key overrides use X-AbstractCore-Provider-API-Key. Provider keys in request bodies remain disabled; select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.

Group	Method	Endpoint	Purpose	Main parameters
Health	GET	`/health`	Liveness/version probe; never requires auth	none
Configuration	GET	`/v1/config/capability-defaults`	List explicit input/output/embedding/rerank route defaults	none
Configuration	PUT	`/v1/config/capability-defaults/{kind}/{modality}`	Set one capability route default	path `kind`, `modality`; body `provider`, `model`, `base_url`, `options`
Configuration	DELETE	`/v1/config/capability-defaults/{kind}/{modality}`	Clear one capability route default	path `kind`, `modality`
Discovery	GET	`/v1/models`	List models and filter by provider/capabilities	`provider`, `input_type`, `output_type`, `base_url`, `api_key`
Discovery	GET	`/providers`	Provider status/capabilities	`include_models`
Discovery	GET	`/v1/vision/providers/`	AbstractVision provider catalog for image/video generation models	optional `task`, `provider`, `include_models`, `base_url`, `api_key`
Discovery	GET	`/v1/audio/voices`	AbstractVoice voice/profile catalog for TTS	optional `provider`, `model`, `providers_only`, `base_url`, `api_key`
Discovery	GET	`/v1/audio/speech/models`	AbstractVoice TTS model/provider catalog	optional `provider`, `base_url`, `api_key`
Discovery	GET	`/v1/audio/speech/providers`	AbstractVoice TTS provider catalog	optional `base_url`
Discovery	GET	`/v1/audio/transcriptions/models`	AbstractVoice STT model/provider catalog	optional `provider`, `base_url`, `api_key`
Discovery	GET	`/v1/audio/transcriptions/providers`	AbstractVoice STT provider catalog	optional `base_url`
Discovery	GET	`/v1/voice/clone/providers`	AbstractVoice voice clone provider catalog	optional `base_url`
Chat	POST	`/v1/chat/completions`	OpenAI-compatible chat, streaming, tools, media	`model`, `messages`, `stream`, `tools`, `tool_choice`, `temperature`, `max_tokens`, `base_url`, `agent_format`, `thinking`
Chat	POST	`/{provider}/v1/chat/completions`	Provider-scoped chat route where body model is unprefixed	path `provider`, body `model`, `messages`, chat parameters
Responses	POST	`/v1/responses`	OpenAI Responses API (`object:"response"`) + legacy chat fallback	`model`, `input` or `messages`, `stream`, generation parameters, `base_url`, `agent_format`, `thinking`, `prompt_cache_key`, `prompt_cache_binding`
Embeddings	POST	`/v1/embeddings`	OpenAI-compatible embedding vectors	`model`, `input`, `dimensions`, `encoding_format`, `user`, `base_url`
Images	POST	`/v1/images/generations`	Text-to-image generation	`prompt`, optional `model`, `provider`, `base_url`, `width`, `height`, `size`, `n`, `steps`, `guidance_scale`, `seed`, `quality`, `extra`
Images	POST	`/{provider}/v1/images/generations`	Provider-scoped text-to-image route where body model is unprefixed	path `provider`, body `model`, optional `base_url`, image generation parameters
Images	POST	`/v1/images/edits`	Image edit/inpaint via multipart form	`prompt`, `image`, optional `mask`, `model`, `provider`, `base_url`, `size`, `steps`, `guidance_scale`, `seed`, `extra_json`
Images	POST	`/{provider}/v1/images/edits`	Provider-scoped image edit route where body model is unprefixed	path `provider`, optional `base_url`, image edit form fields
Videos	POST	`/v1/videos/generations`	Text-to-video generation	`prompt`, optional `model`, `provider`, `base_url`, `width`, `height`, `fps`, `num_frames`, `steps`, `guidance_scale`, `extra`
Videos	POST	`/{provider}/v1/videos/generations`	Provider-scoped text-to-video route where body model is unprefixed	path `provider`, body `model`, optional `base_url`, video generation parameters
Videos	POST	`/v1/videos/edits`	Image-to-video via multipart form	`prompt`, `image`, optional `model`, `provider`, `base_url`, `width`, `height`, `fps`, `num_frames`, `extra_json`
Videos	POST	`/{provider}/v1/videos/edits`	Provider-scoped image-to-video route where body model is unprefixed	path `provider`, optional `base_url`, image-to-video form fields
Vision Jobs	POST	`/v1/vision/jobs/images/generations`	Async image generation with polling	same body as `/v1/images/generations`
Vision Jobs	POST	`/v1/vision/jobs/images/edits`	Async image edit with polling	same form fields as `/v1/images/edits`
Vision Jobs	POST	`/v1/vision/jobs/videos/generations`	Async text-to-video with polling and progress events	same body as `/v1/videos/generations`
Vision Jobs	POST	`/v1/vision/jobs/videos/edits`	Async image-to-video with polling and progress events	same form fields as `/v1/videos/edits`
Vision Jobs	GET	`/v1/vision/jobs/{job_id}`	Poll/consume async job state	path `job_id`, query `consume`
Vision Models	GET	`/v1/vision/models`	Available AbstractVision model catalog	optional `task`, `provider`, `base_url`, `api_key`
Audio	POST	`/v1/audio/transcriptions`	Speech-to-text multipart endpoint	`file`, optional `provider`, `model`, `language`, `prompt`, `response_format`, `temperature`, `format`, `base_url`
Audio	POST	`/{provider}/v1/audio/transcriptions`	Provider-scoped speech-to-text route where body model is unprefixed	path `provider`, optional `base_url`, STT form fields
Audio	POST	`/v1/audio/speech`	Text-to-speech endpoint	`input`/`text`, optional `provider`, `model`, `voice`, `response_format`/`format`, `speed`, `instructions`, `profile`, `quality_preset`, `quality`, `base_url`
Audio	POST	`/{provider}/v1/audio/speech`	Provider-scoped text-to-speech route where body model is unprefixed	path `provider`, optional `base_url`, TTS body fields
Audio	POST	`/v1/voice/clone`	AbstractVoice-compatible voice-clone/custom-voice extension	`file`, optional `provider`, `model`, `tts_model`, `cloning_engine`, `base_url`, `name`, `reference_text`, `validate`
Audio	POST	`/{provider}/v1/voice/clone`	Provider-scoped voice-clone route where body model is unprefixed	path `provider`, optional `base_url`, voice-clone form fields
Audio	POST	`/v1/audio/translations`	Reserved OpenAI-compatible translation route	`file`, `model`; returns `501` in this version
Audio	POST	`/v1/audio/music`	Extension endpoint for text-to-music plugins	`prompt`/`input`/`text`, optional `provider`, `model`, `lyrics`, `duration_s`, `seed`, `num_inference_steps`, `guidance_scale`, `format`; requires a music capability plugin
Audio	POST	`/{provider}/v1/audio/music`	Backend-scoped text-to-music route	path `provider`, music body fields
Runtime	POST	`/acore/models/load`	Load and keep warm a task-specific model runtime	optional `task` (`text_generation` default, `image_generation`, `video_generation`, `text_to_video`, `image_to_video`, `tts`, `stt`), `provider`, `model`, `options`, `pin`, `base_url`, `timeout_s`
Runtime	GET	`/acore/models/loaded`	List task-aware loaded runtimes	optional `task`, `provider`, `model`
Runtime	POST	`/acore/models/unload`	Unload a task-specific runtime	`runtime_id` or `provider` + `model`, optional `task`, `base_url`, `options`
Prompt Cache	GET	`/acore/prompt_cache/stats`	Cache stats on a loaded gateway runtime or upstream AbstractEndpoint	`provider` + `model` or `base_url`; provider key header if required
Prompt Cache	GET	`/acore/prompt_cache/capabilities`	Cache capability discovery on a loaded gateway runtime or upstream AbstractEndpoint	`provider` + `model` or `base_url`; provider key header if required
Prompt Cache	POST	`/acore/prompt_cache/set`	Select/create a cache key locally or upstream	`provider` + `model` or `base_url`, `key`, `make_default`, `ttl_s`
Prompt Cache	POST	`/acore/prompt_cache/update`	Prepare prompt/messages/tools locally or upstream	`provider` + `model` or `base_url`, `key`, `prompt` or `messages`, `system_prompt`, `tools`, optional `thinking`, `ttl_s`
Prompt Cache	POST	`/acore/prompt_cache/fork`	Fork one cache key to another locally or upstream	`provider` + `model` or `base_url`, `from_key`, `to_key`, `make_default`, `ttl_s`
Prompt Cache	POST	`/acore/prompt_cache/clear`	Clear local or upstream cache state	`provider` + `model` or `base_url`, optional `key`
Prompt Cache	POST	`/acore/prompt_cache/prepare_modules`	Prepare reusable module/tool context locally or upstream	`provider` + `model` or `base_url`, `namespace`, `modules`, `make_default`, `ttl_s`, `version`
Memory Blocs	POST	`/acore/blocs/upsert_text`	Persist extracted text into the gateway-local bloc store or an upstream AbstractEndpoint bloc store	optional `base_url`, `path`, `content`, optional bloc metadata
Memory Blocs	GET	`/acore/blocs`	List gateway-local or upstream bloc records	optional `base_url`, `sha256`, `bloc_id`
Memory Blocs	GET	`/acore/blocs/record`	Inspect a gateway-local or upstream bloc record	optional `base_url`, `sha256` or `bloc_id`
Memory Blocs	POST	`/acore/blocs/delete`	Delete one bloc with optional live KV safety checks	optional `base_url`, `sha256` or `bloc_id`, `delete_kv`, `clear_loaded`, `force`, `dry_run`
Memory Blocs	GET	`/acore/blocs/kv/manifest`	Inspect a gateway-local or upstream bloc KV manifest	`provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`
Memory Blocs	GET	`/acore/blocs/kv/list`	List manifest-backed bloc KV artifacts	optional `base_url`, `provider`, `model`, `sha256`, `bloc_id`
Memory Blocs	POST	`/acore/blocs/kv/ensure`	Compile or validate a local or upstream provider-backed bloc KV artifact	`provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`, `force_rebuild`, `debug`
Memory Blocs	POST	`/acore/blocs/kv/load`	Load or fork a local or upstream provider-backed bloc KV artifact into a cache key	`provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`, `stable_cache_key`, `key`, `make_default`, `force_rebuild`, `debug`
Memory Blocs	POST	`/acore/blocs/kv/delete`	Delete one bloc KV artifact with live-binding safety	`provider` + `model` or `base_url` when checking live state, `sha256` or `bloc_id`, optional `artifact_path`, `clear_loaded`, `force`, `dry_run`, `debug`
Memory Blocs	POST	`/acore/blocs/kv/prune`	Delete matching bloc KV artifacts by filter	optional `provider`, `model`, `base_url`, `sha256`, `bloc_id`, `clear_loaded`, `force`, `dry_run`, `debug`
Capabilities	GET	`/v1/capabilities`	Inspect optional capability plugin availability and backend metadata	none
Capabilities	GET	`/v1/capabilities/{capability}/providers`	List normalized providers for one capability plugin	path `capability`, optional `task`
Capabilities	GET	`/v1/capabilities/{capability}/models`	List normalized models for one capability plugin	path `capability`, optional `task`, `provider`
Audio	GET	`/v1/audio/music/providers`	List music capability providers	optional `task`
Audio	GET	`/v1/audio/music/models`	List music capability models	optional `task`, `provider`

Capability Routing Defaults

/v1/config/capability-defaults exposes the execution host's explicit route defaults for input, output, embedding, and rerank capabilities. Gateway uses this route as its control-plane source when ABSTRACTCORE_SERVER_BASE_URL points at a remote Core server.

These defaults are configuration only; they do not load a model. Use the runtime residency routes under /acore/models/* to inspect or change provider-loaded state.

Shared Request Conventions

model usually uses provider/model format, for example openai/gpt-4o-mini, anthropic/claude-haiku-4-5, ollama/qwen3:4b, lmstudio/qwen/qwen3-vl-4b, or openai-compatible/my-model.
base_url is an AbstractCore extension for routing a provider to a specific OpenAI-compatible endpoint. Loopback URLs are allowed by default; non-loopback URLs require ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.
Media routes also accept an optional provider routing hint. This is mainly useful when you omit model, use a provider-scoped route, or pair a custom base_url with the default local/plugin path.
X-AbstractCore-Provider-API-Key overrides only the requested upstream provider for that request. It does not replace the AbstractCore server token.
Provider keys in request bodies remain disabled; use X-AbstractCore-Provider-API-Key for per-request upstream overrides. Select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.
Remote URL media fetches are SSRF-protected by default. Local file paths are disabled unless ABSTRACTCORE_SERVER_MEDIA_ROOT or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 is configured.

Chat Completions

Endpoint: POST /v1/chat/completions

Standard OpenAI-compatible endpoint. Works with all providers.

Server auth:

If ABSTRACTCORE_AUTH_TOKEN is configured, every non-health endpoint requires Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN. Authenticated clients can use all provider keys/endpoints configured on the server.
If ABSTRACTCORE_AUTH_TOKEN is not configured, either set ABSTRACTCORE_SERVER_ALLOW_UNAUTHENTICATED=1 for intentional local/dev use, or provide an upstream provider key explicitly via X-AbstractCore-Provider-API-Key.
Health checks (GET /health) are always unauthenticated.

Request:

{
  "model": "provider/model-name",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Key Parameters:

model (required): Prefer "provider/model-name" (e.g., "openai/gpt-4o-mini"). If you pass a bare model name (no /), the server will best-effort auto-detect a provider.
messages (required): Array of message objects
stream (optional): Enable streaming responses
tools (optional): Tools for function calling
agent_format (optional, AbstractCore extension): Tool-call syntax output format for agentic clients ("auto"|"openai"|"codex"|"qwen3"|"llama3"|"gemma"|"xml"|"passthrough"). When omitted, the server auto-detects from user-agent + model heuristics.
api_key (deprecated/disabled, AbstractCore extension): Provider API keys are not accepted in request bodies. Configure provider keys on the server or use X-AbstractCore-Provider-API-Key for a per-request provider override. Select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.
base_url (optional, AbstractCore extension): Override the provider endpoint (include /v1 for OpenAI-compatible servers like LM Studio / vLLM / OpenRouter)
unload_after (optional, AbstractCore extension): If true, calls llm.unload_model(model) after the request completes. Disabled for ollama/* unless ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1.
prompt_cache_key (optional, AbstractCore extension): Best-effort prompt caching key (semantics depend on provider/backend). See docs/prompt-caching.md.
prompt_cache_binding (optional, AbstractCore extension): Exact durable bloc binding returned by /acore/blocs/kv/load. When supplied, the server verifies the cache key before generation or streaming; stale/missing bindings return 409.
prompt_cache_retention (optional, AbstractCore extension): Prompt cache retention policy (OpenAI: "in_memory" or "24h"; ignored by other providers). See docs/prompt-caching.md.
thinking (optional, AbstractCore extension): Unified thinking/reasoning control (null|"auto"|"on"|"off"|"none" or "low"|"medium"|"high"|"xhigh" when supported). Note: "none" is treated as an alias for "off".
temperature, max_tokens, top_p: Standard LLM parameters

Thinking (AbstractCore extension)

The server forwards thinking to the underlying provider using AbstractCore’s unified thinking mapping (see Generation Parameters).

Example (route to LM Studio + Qwen3.5, disable thinking):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen3.5-27b@q4_k_m",
    "base_url": "http://localhost:1234/v1",
    "messages": [{"role": "user", "content": "Compute 17*23 - 19*11. Reply with the integer only."}],
    "thinking": "none",
    "max_tokens": 64
  }'

Notes:

For Qwen3 / Qwen3.5 on LM Studio, thinking="none" maps to LM Studio’s template variables (enable_thinking / enableThinking) plus a Qwen template “hard switch” fallback (empty <think></think>) when needed. This avoids injecting “reasoning effort” instructions into the system prompt.
Not every backend supports per-effort budgets for low|medium|high; when unavailable, levels degrade to “thinking enabled”.

Example with streaming:

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

stream = client.chat.completions.create(
    model="ollama/qwen3-coder:30b",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Provider `base_url` override (AbstractCore extension)

Route a provider to a specific endpoint (useful for remote OpenAI-compatible servers):

Security notes:

Request-level base_url overrides are loopback-only by default. To allow additional origins or host globs, set ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST. URL entries are parsed and matched on scheme, exact host, effective port, and path-segment prefix.
If the server has an environment provider key set (e.g. OPENAI_API_KEY) and you route to a non-loopback base_url, the request is refused unless the provider key was supplied explicitly with X-AbstractCore-Provider-API-Key, or with Authorization when server auth is disabled.

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen/qwen3-4b-2507",
    "base_url": "http://localhost:1234/v1",
    "messages": [{"role": "user", "content": "Hello from a remote LM Studio endpoint"}]
  }'

Provider Authentication

Do not put provider keys in request bodies. Those fields are disabled because they leak through logs, shell history, browser history, and reverse proxies. For discovery/model catalog endpoints, an api_key query parameter exists for tooling/Swagger UI convenience, but headers remain preferred.

# Preferred: configure provider keys on the server and authenticate to AbstractCore.
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

When ABSTRACTCORE_AUTH_TOKEN is not configured, either set ABSTRACTCORE_SERVER_ALLOW_UNAUTHENTICATED=1 for intentional local/dev use, or provide an upstream provider key explicitly via X-AbstractCore-Provider-API-Key. Once server auth is enabled, Authorization is reserved for the AbstractCore server auth token and is never forwarded upstream.

To override a single upstream provider while still using the server auth token, send the provider key in X-AbstractCore-Provider-API-Key. The override applies only to the requested provider:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "X-AbstractCore-Provider-API-Key: $ANTHROPIC_API_KEY" \
  -d '{
    "model": "anthropic/claude-haiku-4-5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provider-Specific Chat Route

Endpoint: POST /{provider}/v1/chat/completions

This route is useful for clients that already route by base URL path and expect the body model to be provider-local. It is equivalent to using POST /v1/chat/completions with model="{provider}/{model}".

Parameters:

Path provider (required): provider route prefix such as openai, anthropic, ollama, openrouter, portkey, lmstudio, vllm, or openai-compatible.
Body model (required): provider-local model id, without the provider prefix.
Body messages, stream, tools, tool_choice, agent_format, thinking, base_url, and other chat parameters behave like /v1/chat/completions.

Example:

curl -X POST http://localhost:8000/openai/v1/chat/completions \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Media generation endpoints (optional)

AbstractCore Server can optionally expose OpenAI-compatible image/video generation and audio endpoints.

Important notes:

These are interoperability-first endpoints (return b64_json or raw bytes), not an artifact-first durability contract.
If the required plugin/backend is not available, the server returns 501 with actionable messaging.

Capability catalogs

Thin clients can preflight the configured media surface without importing abstractvision or abstractvoice directly:

Endpoint	Purpose	Notes
`GET /v1/vision/providers/`	Lists provider image/video catalog entries through the selected AbstractVision backend.	Optional `task`, `provider`, `include_models`, `base_url`, `api_key`. Set `include_models=true` to include full provider model catalogs (slower).
`GET /v1/audio/voices`	Lists TTS profiles/voices, active profile, active model, and bounded catalog data through AbstractVoice.	Optional `provider`, `model`, `providers_only`, `base_url`, `api_key`.
`GET /v1/audio/speech/models`	TTS model id projection with provider/model route strings.	Includes `models_by_provider` and `provider_models` for clients that route via `provider/model`.
`GET /v1/audio/speech/providers`	TTS provider projection.	Useful for clients that pick `/{provider}/v1/audio/speech` first and then choose a model.
`GET /v1/audio/transcriptions/models`	STT model id projection with provider/model route strings.	Includes `models_by_provider` and `provider_models`.
`GET /v1/audio/transcriptions/providers`	STT provider projection.	Mirrors speech provider discovery for `/v1/audio/transcriptions`.
`GET /v1/voice/clone/providers`	Voice cloning provider projection.	Uses AbstractVoice clone provider availability.

These routes instantiate only the selected capability backend needed for deep catalog discovery. Shallow plugin availability remains available through the library llm.capabilities.status() call. Server-held provider keys remain behind server auth; per-request upstream key overrides must use X-AbstractCore-Provider-API-Key. For tooling/Swagger UI convenience, these catalog routes also accept an api_key query parameter (redacted from server logs).

Images (generate/edit)

Endpoints:

POST /v1/images/generations
POST /{provider}/v1/images/generations
POST /v1/images/edits
POST /{provider}/v1/images/edits

Remote OpenAI-compatible image proxying is included in abstractcore[server] and is enabled by setting OPENAI_BASE_URL. The synchronous image routes use the same internal generate(..., output="image") dispatcher as the Python API, then serialize the result back to the OpenAI-compatible b64_json response shape.

Install for remote image proxying:

pip install "abstractcore[server]"

Install local image backends only when you want the server to load Diffusers, MLX-Gen, or stable-diffusion.cpp models itself:

pip install "abstractcore[server,vision]"

Use provider/model-style image ids:

Omit model only when this server has a configured AbstractVision/OpenAI-compatible image default, for example via OPENAI_BASE_URL plus an optional default model id.
Provider-scoped routes such as /openai-compatible/v1/images/generations and /diffusers/v1/images/generations accept an unprefixed body model and internally route it as provider/model, matching /{provider}/v1/chat/completions.
diffusers/default selects the configured local Diffusers default: ABSTRACTCORE_VISION_MODEL_ID / ABSTRACTVISION_DIFFUSERS_MODEL_ID / ABSTRACTVISION_MODEL_ID.
diffusers/<huggingface-repo> selects an explicit local Diffusers model.
mlx-gen/default selects the configured local MLX-Gen model; use AbstractVision's q4 AbstractFramework presets by default and q8 variants when quality is paramount.
mlx-gen/<exact-huggingface-repo> selects an explicit cached MLX-Gen model such as mlx-gen/AbstractFramework/flux.2-klein-4b-4bit or mlx-gen/AbstractFramework/qwen-image-edit-2511-4bit. Official MLX-Gen runtime snapshots such as mlx-gen/briaai/FIBO and mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers are selected the same way. Legacy mflux prefixes remain accepted as compatibility aliases, but the model id itself must be the exact published repo id.
sdcpp/default selects the configured stable-diffusion.cpp model.
openai-compatible/<model> routes to the configured OpenAI-compatible image endpoint.
openai/gpt-image-1 or provider-scoped /openai/v1/images/generations routes to OpenAI's Images API and uses OPENAI_API_KEY when an explicit AbstractVision upstream base URL is not configured.

Local Diffusers generation is cache-only by default; set ABSTRACTCORE_VISION_ALLOW_DOWNLOAD=1 or ABSTRACTVISION_DIFFUSERS_ALLOW_DOWNLOAD=1 only when runtime downloads are intentional.

POST /v1/images/generations JSON parameters:

Field	Required	Notes
`prompt`	yes	Text prompt to render.
`model`	no	Omit for the server's configured AbstractVision default. If present, use provider/model routing: `diffusers/default`, `diffusers/<huggingface-repo>`, `mlx-gen/default`, `mlx-gen/<exact-huggingface-repo>`, `sdcpp/default`, `openai-compatible/<model>`, or `openai/gpt-image-1`. Provider-scoped routes accept the same model without the prefix.
`provider`	no	Optional routing hint when you want the configured default model/backend for a specific provider, or when pairing a request with `base_url`.
`width`, `height`	no	Requested output dimensions in pixels. These are the natural fields for local engines and remain accepted for remote routes.
`size`	no	OpenAI-style size such as `1024x1024`. The server normalizes `size` with `width`/`height` so OpenAI-style and local-engine clients can use the same route.
`n`	no	Number of images; clamped to `1..10`.
`response_format`	no	Server response format. `b64_json` is the supported response shape.
`negative_prompt`	no	Local/backend-specific negative prompt. Strict OpenAI-compatible upstreams do not receive this top-level field; use `extra` only when your custom upstream supports it.
`seed`	no	Local deterministic seed. Strict OpenAI-compatible upstreams do not receive this top-level field; use `extra.seed` only when your custom upstream supports it.
`steps`	no	Local denoising/inference step count. Strict OpenAI-compatible upstreams do not receive this top-level field; use `extra.steps` only when your custom upstream supports it.
`guidance_scale`	no	Local classifier-free guidance scale. Strict OpenAI-compatible upstreams do not receive this top-level field; use `extra.guidance_scale` only when your custom upstream supports it.
`quality`, `style`, `user`, `background`, `output_format`, `output_compression`, `moderation`	no	Named OpenAI-compatible passthrough fields for upstream image endpoints.
`base_url`	no	OpenAI-compatible endpoint override. Prefer this with `openai-compatible/...`; if set with `openai/...`, the request is sent to that URL instead of api.openai.com. Loopback is allowed by default; non-loopback requires allowlist.
`extra`	no	JSON object for backend-specific passthrough fields. Prefer this over arbitrary top-level keys so the schema stays explicit.

POST /v1/images/edits multipart parameters:

Field	Required	Notes
`prompt`	yes	Edit/inpaint instruction.
`image`	yes	Source image file.
`mask`	no	Optional mask image for inpainting/edit-capable backends.
`model`	no	Same provider/model routing as generation; omit for the server default. Provider-scoped routes accept the same model without the prefix.
`provider`	no	Optional routing hint when you want the configured default backend for a specific provider, or when pairing a request with `base_url`.
`size`	no	OpenAI-style edit output size such as `1024x1024`; multipart edit compatibility keeps this field.
`response_format`	no	Server response shape; `b64_json` is supported.
`negative_prompt`, `seed`, `steps`, `guidance_scale`	no	Local/backend-specific fields. Strict OpenAI-compatible upstreams do not receive them as top-level fields; use `extra_json` only when your custom upstream supports them.
`base_url`	no	OpenAI-compatible endpoint override. Loopback is allowed by default; non-loopback requires allowlist.
`extra_json`	no	JSON object string with backend/upstream-specific parameters.

Async image jobs are available when a request can take long enough that polling is preferable:

POST /v1/vision/jobs/images/generations uses the same JSON body as /v1/images/generations and returns {"job_id": "..."}.
POST /v1/vision/jobs/images/edits uses the same multipart fields as /v1/images/edits and returns {"job_id": "..."}.
GET /v1/vision/jobs/{job_id} returns queued, running, succeeded, or failed. Add ?consume=true to remove a completed job from the in-memory job store after reading it.

Videos (text-to-video/image-to-video)

Endpoints:

POST /v1/videos/generations
POST /{provider}/v1/videos/generations
POST /v1/videos/edits
POST /{provider}/v1/videos/edits
POST /v1/vision/jobs/videos/generations
POST /v1/vision/jobs/videos/edits

The synchronous video routes use the same internal generate(..., output={"modality": "video"}) dispatcher as the Python API and return {"data":[{"b64_json":"..."}]} with MP4 bytes encoded in base64. Async video jobs are the preferred path for longer local runs; polling GET /v1/vision/jobs/{job_id} includes progress.last_event when the selected backend reports richer progress events.

Use exact provider/model ids. For MLX-Gen, select the published model repo id, for example:

mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers for text-to-video or image-to-video.
mlx-gen/AbstractFramework/qwen-image-2512-4bit for text-to-image.

Core does not expose a quantization override. Q4/Q8 choices are part of the model id that AbstractVision/MLX-Gen loads.

POST /v1/videos/generations JSON parameters:

Field	Required	Notes
`prompt`	yes	Text prompt to render as video.
`model`	no	Provider/model id such as `mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers` or `openai-compatible/<model>`. Provider-scoped routes accept the same model without the prefix.
`provider`	no	Optional routing hint, e.g. `mlx-gen` or `openai-compatible`.
`width`, `height`, `size`	no	Requested output dimensions. `size` accepts `WIDTHxHEIGHT`.
`fps`, `num_frames` / `frames`	no	Video frame rate and frame count.
`response_format`	no	`b64_json` is the supported response shape.
`negative_prompt`, `seed`, `steps`, `guidance_scale`	no	Backend-specific generation controls.
`extra.max_sequence_length`	no	Useful for MLX-Gen Wan-style video runs.

POST /v1/videos/edits multipart parameters mirror generation and add required image=@first-frame.png. This route is the image-to-video path; the alias /v1/videos/from-image is accepted for clients that prefer a literal name.

Examples:

# Remote OpenAI-compatible image endpoint.
BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

curl -sS -X POST "$BASE/v1/images/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai-compatible/gpt-image-1","prompt":"A clean product photo of a red ceramic mug on a white table.","n":1,"width":1024,"height":1024,"response_format":"b64_json","quality":"low"}' \
  > /tmp/acore-image.json

python - <<'PY'
import base64
import json
from pathlib import Path

data = json.loads(Path("/tmp/acore-image.json").read_text())
Path("/tmp/acore-image.png").write_bytes(base64.b64decode(data["data"][0]["b64_json"]))
PY

# Image edit using the generated image.
curl -sS -X POST "$BASE/v1/images/edits" \
  -H "Authorization: Bearer $TOKEN" \
  -F "model=openai-compatible/gpt-image-1" \
  -F "prompt=Make the mug blue while keeping the white table." \
  -F "image=@/tmp/acore-image.png;type=image/png" \
  -F "size=1024x1024" \
  -F "response_format=b64_json" \
  -F 'extra_json={"quality":"low"}' \
  > /tmp/acore-edit.json

python - <<'PY'
import base64
import json
from pathlib import Path

data = json.loads(Path("/tmp/acore-edit.json").read_text())
Path("/tmp/acore-edit.png").write_bytes(base64.b64decode(data["data"][0]["b64_json"]))
PY

# Configured server image default
curl -sS -X POST "$BASE/v1/images/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a red fox in snow","width":512,"height":512,"response_format":"b64_json"}'

# Text-to-video, asynchronous job with progress polling.
curl -sS -X POST "$BASE/v1/vision/jobs/videos/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"provider":"mlx-gen","model":"Wan-AI/Wan2.2-TI2V-5B-Diffusers","prompt":"A slow camera move through a luminous data center.","width":1280,"height":704,"fps":24,"num_frames":121,"steps":50,"guidance_scale":5.0,"extra":{"max_sequence_length":256}}'

# Image-to-video, synchronous multipart route.
curl -sS -X POST "$BASE/v1/videos/edits" \
  -H "Authorization: Bearer $TOKEN" \
  -F "provider=mlx-gen" \
  -F "model=Wan-AI/Wan2.2-TI2V-5B-Diffusers" \
  -F "prompt=Slow camera push-in." \
  -F "image=@./first-frame.png;type=image/png" \
  -F "width=1280" \
  -F "height=704" \
  -F "fps=24" \
  -F "num_frames=121" \
  -F "steps=50" \
  -F 'extra_json={"max_sequence_length":256}'

Local vision model helper endpoint:

Endpoint	Purpose	Notes
`GET /v1/vision/models`	List available AbstractVision provider models.	Includes remote providers when their API key/base URL is configured and local models when they are present in known caches.

Audio (STT/TTS)

Endpoints:

POST /v1/audio/transcriptions (multipart; file=...)
POST /{provider}/v1/audio/transcriptions (multipart; provider-scoped STT)
POST /v1/audio/speech (json; input=..., optional voice, optional format)
POST /{provider}/v1/audio/speech (json; provider-scoped TTS)
POST /v1/voice/clone (multipart; extension route for AbstractVoice-compatible voice cloning)
POST /{provider}/v1/voice/clone (multipart; provider-scoped voice cloning)
POST /v1/audio/translations (multipart; reserved for compatibility, returns 501)
POST /v1/audio/music (json; extension endpoint, requires a music capability plugin)
POST /{provider}/v1/audio/music (json; provider/backend-scoped music route)

Local plugin fallback is enabled when model is omitted. OpenAI SDK-style clients that require a non-empty model string can use abstractvoice/default.

Remote provider routing is enabled when model is supplied in provider/model format:

openai/gpt-4o-mini-transcribe, openai/whisper-1
openai/gpt-4o-mini-tts, openai/tts-1
openrouter/... for OpenRouter STT/TTS models
portkey/... for Portkey-routed OpenAI-compatible audio models
openai-compatible/... for endpoints that implement OpenAI-compatible audio routes

Provider-scoped audio routes mirror chat routing. For example, POST /openai/v1/audio/transcriptions with model=gpt-4o-mini-transcribe is equivalent to POST /v1/audio/transcriptions with model=openai/gpt-4o-mini-transcribe; the same applies to /openai-compatible/v1/audio/speech and other supported provider prefixes.

For openai-compatible/..., request-level base_url can point to a local AbstractVoice/OpenAI-compatible audio server. Loopback URLs are allowed by default; non-loopback URLs require ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.

If model is omitted, the endpoint delegates to local capability plugins (typically abstractvoice) and returns 501 when no suitable plugin is installed. Those local/plugin paths use the same internal generate(..., output=...) dispatcher as the Python API; provider/model remote routes keep their OpenAI-compatible HTTP wire behavior.

Install for remote audio:

pip install "abstractcore[server,remote]"

Install for plugin-backed routing:

pip install "abstractcore[server]"
pip install "abstractcore[voice]"
pip install "abstractcore[music]"

Notes:

abstractvoice 0.10.17+ can install the base plugin path on Python 3.9 without OmniVoice, torch, or torchaudio. Python 3.10+ is recommended. Use explicit local aggregate profiles such as abstractcore[all-apple] or abstractcore[all-gpu] when you want local voice engines; AEC requires Python 3.11+.
/v1/audio/transcriptions requires python-multipart for form parsing (included in the server extra).
Uploaded audio is limited by ABSTRACTCORE_SERVER_AUDIO_MAX_BYTES (default: 25 MB).

POST /v1/audio/transcriptions multipart parameters:

Field	Required	Notes
`file`	yes	Audio file to transcribe, commonly `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, or `webm`.
`model`	no	Provider/model id for remote STT (`openai/gpt-4o-mini-transcribe`, `openai/whisper-1`, `openrouter/...`, `portkey/...`, `openai-compatible/...`). Omit for local `abstractvoice` plugin fallback; `abstractvoice/default` is accepted for clients that require a model string.
`provider`	no	Optional routing hint when omitting `model`, using a provider-scoped route, or pairing the request with `base_url`.
`language`	no	Input language code such as `en` or `fr`.
`prompt`	no	Provider transcription prompt/context.
`response_format`	no	Provider response format such as `json`, `text`, `srt`, or `vtt`.
`temperature`	no	Provider sampling temperature where supported.
`format`	no	Audio format override for providers that need it, notably OpenRouter base64 audio input.
`base_url`	no	Endpoint override for local/gateway routing. Prefer this with `openai-compatible/...`; if set with `openai/...`, the request is sent to that URL instead of api.openai.com. Loopback is allowed by default; non-loopback requires allowlist.

POST /v1/audio/speech JSON parameters:

Field	Required	Notes
`input` or `text`	yes	Text to synthesize. `text` is the AbstractCore-compatible alias.
`model`	no	Provider/model id for remote TTS (`openai/gpt-4o-mini-tts`, `openai/tts-1`, `openrouter/...`, `portkey/...`, `openai-compatible/...`). Omit for local plugin fallback; `abstractvoice/default` is accepted.
`voice`	no	Provider/backend voice name; remote OpenAI-compatible routing defaults to `alloy`. OpenAI TTS voices include `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `nova`, `onyx`, `sage`, `shimmer`, `verse`, `marin`, and `cedar`; the Swagger example uses `coral`.
`response_format` or `format`	no	Audio output format. Remote providers commonly support `mp3`, `wav`, `opus`, `aac`, `flac`, or `pcm`; local plugin fallback defaults to `wav`.
`speed`	no	Speech speed multiplier when supported.
`instructions`	no	Provider-specific style/instruction text for expressive TTS.
`provider`	no	Optional routing hint when omitting `model`, using a provider-scoped route, or pairing the request with `base_url`.
`profile`	no	AbstractVoice profile hint for compatible local/plugin backends.
`quality_preset`	no	AbstractVoice/local-backend quality preset when supported.
`quality`	no	OpenAI-compatible quality selector or backend-specific quality hint.
`base_url`	no	Endpoint override for local/gateway routing. Prefer this with `openai-compatible/...`; if set with `openai/...`, the request is sent to that URL instead of api.openai.com. Loopback is allowed by default, non-loopback requires allowlist.

Swagger UI can execute /v1/audio/speech. AbstractCore serves a small custom Swagger wrapper that converts authenticated binary audio POST responses into browser blob: URLs before Swagger renders the player. The example uses response_format="wav" because WAV has explicit duration metadata and is the most reliable inline preview format. If a browser still cannot play the inline preview, use the response download or a curl --output command; the endpoint returns normal audio/* bytes and includes a filename in Content-Disposition.

POST /v1/voice/clone and POST /{provider}/v1/voice/clone multipart parameters:

Field	Required	Notes
`file`	yes	Reference voice audio file.
`model`	no	Provider/model id for remote clone routing. Use `openai-compatible/default` for an AbstractVoice-compatible server, or `openai/default` where OpenAI custom voice creation is available. Omit for local AbstractVoice clone fallback.
`provider`	no	Optional routing hint when omitting `model`, using a provider-scoped route, or pairing the request with `base_url`.
`tts_model`	no	Optional TTS model to associate with the clone for compatible local/plugin backends.
`cloning_engine`	no	Optional clone backend/engine selector for compatible local/plugin backends.
`name`	no	Friendly cloned voice name.
`reference_text`	no	Transcript of the reference audio when available.
`validate`	no	Ask compatible clone servers to validate/smoke-test the clone before returning.
`base_url`	no	OpenAI-compatible endpoint override for `openai-compatible/...`; loopback is allowed by default, non-loopback requires allowlist.
`clone_path`	no	Provider-specific clone path. Defaults to `/voice/clone` for OpenAI-compatible servers and `/audio/voices` for OpenAI.
`file_field`	no	Provider-specific multipart file field. Defaults to `file`; OpenAI uses `audio_sample`.
`consent`	no	Provider-specific consent id when custom voice creation requires it.

The returned voice_id / id can be used as the voice value in /v1/audio/speech when the selected backend supports custom voices.

POST /v1/audio/music and POST /{provider}/v1/audio/music JSON parameters:

Field	Required	Notes
`prompt` or `input` or `text`	yes	Music generation prompt.
`provider`	no	Music backend selector, for example `acemusic`, `acestep`, `stable-audio`, `stable-audio-3`, or `diffusers`. The provider-scoped path can also select a backend, e.g. `/acemusic/v1/audio/music` or `/diffusers/v1/audio/music`.
`model`	no	Music model id for the selected backend, for example `acemusic/ace-step-api` for remote ACE Music or a Hugging Face repo id for local AbstractMusic backends.
`lyrics`	no	Optional lyrics for vocal music backends.
`duration_s`	no	Requested output duration in seconds.
`seed`	no	Deterministic seed when supported.
`num_inference_steps`	no	Diffusion/sampling step count when supported.
`guidance_scale`	no	Guidance scale when supported.
`instrumental`	no	Request instrumental output when supported.
`enhance_prompt` / `structure_prompt` / `auto_lyrics`	no	Prompt/lyrics planning controls for compatible music backends.
`text_planner_mode`	no	Host/plugin text-planning mode such as `auto`, `on`, or `off`.
`response_format` or `format`	no	Server contract supports `wav`, `mp3`, and `flac`; backend support can be narrower.
extra top-level fields	no	Best-effort passthrough to the installed music capability plugin.

With abstractmusic>=0.1.12, the base install includes the remote ACE Music backend. Configure ACEMUSIC_API_KEY in the server environment, optionally set ACEMUSIC_BASE_URL, and use provider="acemusic" or the /acemusic/v1/audio/music path. Local ACE-Step/Diffusers routes remain opt-in AbstractMusic extras.

Examples:

BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

# Local/plugin TTS through AbstractCore's unified output dispatcher.
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: audio/wav" \
  -d '{"input":"Hello from the updated AbstractCore server.","voice":"coral","response_format":"wav"}' \
  --output /tmp/acore-speech.wav

# Local/plugin STT through AbstractCore's unified output dispatcher.
curl -sS -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@/tmp/acore-speech.wav;type=audio/wav" \
  -F "language=en"

# Remote speech-to-text (STT)
curl -sS -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@speech.wav" \
  -F "model=openai/gpt-4o-mini-transcribe" \
  -F "language=en"

# Remote text-to-speech (TTS)
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o-mini-tts","input":"Hello!","voice":"coral","response_format":"wav"}' \
  --output hello.wav

# Local abstractvoice TTS through the OpenAI-compatible endpoint
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"abstractvoice/default","input":"Hello!","voice":"alloy","format":"wav"}' \
  --output hello.wav

# Remote ACE Music through AbstractMusic.
# Start the server with ACEMUSIC_API_KEY set in its environment.
curl -sS -X POST "$BASE/v1/audio/music" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A short calm piano loop.","provider":"acemusic","duration_s":8,"format":"mp3"}' \
  --output music.mp3

# Remote/local OpenAI-compatible voice clone endpoint
curl -sS -X POST "$BASE/v1/voice/clone" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@reference.wav" \
  -F "model=openai-compatible/default" \
  -F "base_url=http://127.0.0.1:5000/v1" \
  -F "name=my_voice" \
  -F "reference_text=Hello from my reference recording." \
  -F "validate=true"

If you want to “ask a model about an audio file”, prefer one of:

Run STT first (/v1/audio/transcriptions) then send the transcript to POST /v1/chat/completions, or
Configure the server’s default audio strategy (config.audio.strategy) to enable STT fallback for audio attachments, then attach audio in chat requests.

Multimodal Requests (Images, Documents, Files)

AbstractCore server supports comprehensive file attachments using OpenAI-compatible multimodal message format, plus AbstractCore's convenient @filename syntax.

Security note (HTTP server): local file paths are disabled by default (including @/path/to/file and {"url": "/path/to/file"}). Use http(s) URLs or data: base64, or enable local paths via ABSTRACTCORE_SERVER_MEDIA_ROOT (safe) / ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 (unsafe).

Image analysis example using a local generated image:

BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

python - <<'PY'
import base64
from pathlib import Path

Path("/tmp/acore-image.b64").write_text(base64.b64encode(Path("/tmp/acore-image.png").read_bytes()).decode("ascii"))
PY

jq -n --rawfile img /tmp/acore-image.b64 '{
  model: "openai/gpt-4o-mini",
  messages: [{
    role: "user",
    content: [
      {type: "text", text: "Describe this image in one concise sentence."},
      {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}}
    ]
  }],
  max_tokens: 80,
  temperature: 0
}' > /tmp/acore-vision-chat.json

curl -sS -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @/tmp/acore-vision-chat.json \
  | jq -r '.choices[0].message.content'

Supported File Types

Images: PNG, JPEG, GIF, WEBP, BMP, TIFF
Documents: PDF, DOCX, XLSX, PPTX
Data/Text: CSV, TSV, TXT, MD, JSON, XML
Size Limits: 10MB per file, 32MB total per request

Method 1: @filename Syntax (AbstractCore Extension)

Simple syntax that works with all providers (requires local paths enabled via ABSTRACTCORE_SERVER_MEDIA_ROOT or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "What is in this document? @/path/to/report.pdf"}
    ]
  }'

Method 2: OpenAI Vision API Format (Image URLs)

Standard OpenAI format for images:

{
  "model": "anthropic/claude-haiku-4-5",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Base64 Images:

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."
  }
}

Method 3: OpenAI File Format (Forward-Compatible)

AbstractCore supports OpenAI's planned file format with simplified structure (consistent with image_url):

File URL Format (Recommended - Same Pattern as image_url):

{
  "model": "ollama/qwen3:4b",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Analyze this document"},
        {
          "type": "file",
          "file_url": {
            "url": "https://example.com/documents/report.pdf"
          }
        }
      ]
    }
  ]
}

Local File Path:

{
  "type": "file",
  "file_url": {
    "url": "/Users/username/documents/data.csv"
  }
}

Note: local file paths require ABSTRACTCORE_SERVER_MEDIA_ROOT (safe) or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 (unsafe) on the server.

Base64 Data URL:

{
  "type": "file",
  "file_url": {
    "url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago<PAovVHlwZS..."
  }
}

Filename Extraction:

URLs/Paths: Extracted automatically (/path/file.pdf → file.pdf)
Base64: Generated from MIME type (data:application/pdf;base64,... → document.pdf)

Mixed Content Example

Combine text, images, and documents in a single request:

{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Compare this chart with the data in the spreadsheet"},
        {
          "type": "image_url",
          "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANS..."}
        },
        {
          "type": "file",
          "file_url": {
            "url": "https://example.com/data/sales_data.xlsx"
          }
        }
      ]
    }
  ]
}

Python Client Examples

Using OpenAI Client:

import os
from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

# Method 1: @filename syntax
response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{"role": "user", "content": "Summarize @document.pdf"}]
)

# Method 2: File URL (HTTP/HTTPS)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the key findings?"},
            {
                "type": "file",
                "file_url": {
                    "url": "https://example.com/documents/report.pdf"
                }
            }
        ]
    }]
)

# Method 3: Local file path
response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this local document"},
            {
                "type": "file",
                "file_url": {
                    "url": "/Users/username/documents/report.pdf"
                }
            }
        ]
    }]
)

# Method 4: Base64 data URL
with open("report.pdf", "rb") as f:
    file_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="lmstudio/qwen/qwen3-next-80b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the key findings?"},
            {
                "type": "file",
                "file_url": {
                    "url": f"data:application/pdf;base64,{file_data}"
                }
            }
        ]
    }]
)

Universal Provider Support:

# Same syntax works across all providers
providers_models = [
    "openai/gpt-4o",
    "anthropic/claude-haiku-4-5",
    "ollama/qwen2.5vl:7b",
    "lmstudio/qwen/qwen2.5-vl-7b"
]

for model in providers_models:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Analyze @data.csv and @chart.png"}]
    )
    print(f"{model}: {response.choices[0].message.content[:100]}...")

OpenAI Responses API

Endpoint: POST /v1/responses

AbstractCore implements an OpenAI-compatible Responses-style API, including input_file support.

Why Use /v1/responses?

OpenAI Compatible: Accepts OpenAI Responses API requests and returns an OpenAI Responses object: "response" payload
Native File Support: input_file type designed specifically for document attachments
Cleaner API: Explicit separation between text (input_text) and files (input_file)
Backward Compatible: Existing messages format still works alongside new input format
Optional Streaming: "stream": true streams OpenAI Responses events (OpenAI format) or chat-completions chunks (legacy format)

Request Format

OpenAI Responses API Format (Recommended):

{
  "model": "gpt-4o",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Analyze this document"},
        {"type": "input_file", "file_url": "https://example.com/report.pdf"}
      ]
    }
  ],
  "tools": [
    {"type": "web_search", "external_web_access": true}
  ],
  "tool_choice": "auto",
  "stream": false,
  "max_output_tokens": 2000,
  "temperature": 0.7
}

Key parameters:

Field	Required	Notes
`model`	yes	Provider/model id. Bare model ids may be auto-detected, but provider/model is preferred.
`input`	yes, unless `messages` is used	OpenAI Responses input. Supports a string, or an array of input items such as `{"type":"message","role":"user","content":"..."}` and `{"type":"function_call_output","call_id":"...","output":"..."}`. Message content can be a string or an array of `input_text` / `input_file` / `input_image` items.
`messages`	yes, unless `input` is used	Backward-compatible chat-completions request shape.
`instructions`	no	System-level instructions prepended ahead of `input` (best-effort).
`stream`	no	When `true`, returns server-sent events.
`tools`	no	Responses-style tools. AbstractCore does not execute tools server-side; tools are only transported to the model prompt. `web_search*` tools are normalized into function tools for local-model prompting and host-side execution. Unsupported built-in tool types return a 400 error.
`tool_choice`	no	Tool selection control; normalized where needed (best-effort).
`max_output_tokens` / `max_tokens`, `temperature`, `top_p`, `stop`, `seed`, `frequency_penalty`, `presence_penalty`	no	Standard generation controls, forwarded where supported.
`base_url`, `agent_format`, `thinking`, `prompt_cache_key`, `prompt_cache_retention`, `timeout_s`, `unload_after`	no	AbstractCore text-inference extensions with the same behavior as `/v1/chat/completions` for shared fields.

Legacy Format (Still Supported):

{
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Tell me a story"}
  ],
  "stream": false
}

Automatic Format Detection

The server automatically detects which format you're using:

OpenAI Format: Presence of input field → converts to internal format
Legacy Format: Presence of messages field → processes directly
Error: Missing both fields → returns 400 error with clear message

Examples

Simple Text Request:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen/qwen3-next-80b",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "What is Python?"}
        ]
      }
    ]
  }'

File Analysis:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Analyze the letter and summarize key points"},
          {"type": "input_file", "file_url": "https://www.berkshirehathaway.com/letters/2024ltr.pdf"}
        ]
      }
    ],
    "thinking": "off",
    "prompt_cache_key": "tenantA:doc-review"
  }'

Multiple Files:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-haiku-4-5",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Compare these documents"},
          {"type": "input_file", "file_url": "https://example.com/report1.pdf"},
          {"type": "input_file", "file_url": "https://example.com/report2.pdf"},
          {"type": "input_file", "file_url": "https://example.com/chart.png"}
        ]
      }
    ],
    "max_tokens": 2000
  }'

Streaming Response:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Summarize this document"},
          {"type": "input_file", "file_url": "https://example.com/document.pdf"}
        ]
      }
    ],
    "stream": true
  }' --no-buffer

Supported Media Types

All file types supported via URL, local path, or base64:

Documents: PDF, DOCX, XLSX, PPTX
Data Files: CSV, TSV, JSON, XML
Text Files: TXT, MD
Images: PNG, JPEG, GIF, WEBP, BMP, TIFF
Size Limits: 10MB per file, 32MB total per request

Source Options:

// HTTP/HTTPS URL
{"type": "input_file", "file_url": "https://example.com/report.pdf"}

// Local file path
{"type": "input_file", "file_url": "/path/to/document.xlsx"}

// Base64 data URL
{"type": "input_file", "file_url": "data:application/pdf;base64,JVBERi0x..."}

Python Client Example

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

# Direct request to /v1/responses endpoint
import requests

response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "gpt-4o",
        "input": [
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": "Analyze this document"},
                    {"type": "input_file", "file_url": "https://example.com/report.pdf"}
                ]
            }
        ]
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Embeddings

Endpoint: POST /v1/embeddings

Generate embedding vectors for semantic search, RAG, and similarity analysis.

Request:

{
  "input": "Text to embed",
  "model": "huggingface/sentence-transformers/all-MiniLM-L6-v2"
}

Supported Providers:

HuggingFace: Local models with ONNX acceleration
Ollama: ollama/granite-embedding:278m, etc.
LMStudio: Any loaded embedding model
OpenAI: openai/text-embedding-3-small, openai/text-embedding-3-large
OpenRouter: openrouter/openai/text-embedding-3-small, etc.
Portkey: portkey/... with your Portkey routing configuration
OpenAI-compatible: openai-compatible/... against configured/local /v1/embeddings endpoints

Anthropic does not expose a native embeddings API. Use OpenAI, OpenRouter, Portkey, an OpenAI-compatible endpoint, or a local embedding provider.

For endpoint-backed providers such as LM Studio, vLLM, and generic OpenAI-compatible servers, the embedding route does not require the embedding model to appear in a chat model catalogue before the request is sent. This supports embedding-only endpoints whose /models response is incomplete or chat-only.

OpenAI-compatible request fields are forwarded where supported:

dimensions
encoding_format
user
base_url (AbstractCore extension; loopback by default, allowlist required for non-loopback)

Parameters:

Field	Required	Notes
`input`	yes	String or array of strings. Arrays return one vector per input item.
`model`	yes	Provider/model id such as `openai/text-embedding-3-small`, `openrouter/openai/text-embedding-3-small`, `portkey/...`, `openai-compatible/...`, `ollama/...`, `lmstudio/...`, or `huggingface/...`.
`encoding_format`	no	`float` by default; `base64` is accepted where supported by the provider/backend.
`dimensions`	no	Requested output dimensions for providers that support native dimension reduction; local backends may truncate when appropriate.
`user`	no	End-user identifier forwarded to providers that support abuse monitoring.
`base_url`	no	OpenAI-compatible endpoint override with the same allowlist policy as chat.
`api_key`	no	Deprecated/disabled in the body. Use `X-AbstractCore-Provider-API-Key` for provider overrides.

Batch Embedding:

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["text 1", "text 2", "text 3"],
    "model": "ollama/granite-embedding:278m"
  }'

Model Discovery

Endpoint: GET /v1/models

List all available models from configured providers.

Query Parameters:

provider: Filter by provider (e.g., ollama, openai, anthropic, lmstudio, openai-compatible).
input_type: Filter by input capability: text, image, audio, or video.
output_type: Filter by output capability: text or embeddings.
base_url: Optional upstream base URL override for providers that support OpenAI-compatible discovery. Loopback is allowed by default; non-loopback requires ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.
api_key: Optional upstream provider API key override for discovery. Requires provider=... so the override target is unambiguous. Prefer X-AbstractCore-Provider-API-Key.

Examples:

# All models
curl http://localhost:8000/v1/models

# Ollama models only
curl http://localhost:8000/v1/models?provider=ollama

# Embedding models only
curl http://localhost:8000/v1/models?output_type=embeddings

# Vision-capable input models
curl http://localhost:8000/v1/models?input_type=image

# Ollama embeddings
curl http://localhost:8000/v1/models?provider=ollama&output_type=embeddings

Provider Status

Endpoint: GET /providers

List all available providers and their status.

Query Parameters:

include_models (optional, default false): Include model lists for each provider. This is slower because it may query provider registries/endpoints.

Response:

{
  "providers": [
    {
      "name": "ollama",
      "type": "llm",
      "model_count": 15,
      "status": "available"
    }
  ]
}

Health Check

Endpoint: GET /health

Server health check for monitoring.

Response: includes status, server version, and enabled feature flags.

Runtime Control Plane

If you want the gateway itself to keep a local model warm, use:

POST /acore/models/load
GET /acore/models/loaded
POST /acore/models/unload

/acore/models/load creates or reuses a task-specific runtime. Omitted task keeps the existing text behavior, keyed by provider, model, optional base_url, and the explicit provider-key override when one is supplied. Later /v1/chat/completions calls that target the same provider/model automatically reuse that warm runtime instead of creating a fresh provider instance per request.

For text-generation runtimes, Core reports provider-owned loaded-model truth separately from gateway client cache state. A configured default model, model catalog row, reachable server, or cached Core client is not proof that the provider has a model loaded. Providers that can verify residency expose it through get_model_residency(...); providers that cannot verify it report provider_residency_verified=false, provider_resident=null, and loaded=false. When a provider exposes a native load/warm hook, /acore/models/load calls it and then verifies the result through the same residency contract.

For non-text tasks, the same route delegates to capability-owned load/list/unload: image_generation, video_generation, text_to_video, and image_to_video reuse the server's AbstractVision backend cache, while tts and stt delegate through the shared AbstractVoice capability core when the selected plugin exposes residency hooks. Remote OpenAI-compatible image/video/audio providers are reported as configured rather than locally loaded unless the upstream exposes a real loaded-state signal.

loaded_new is an event signal for the load call, not a synonym for loaded. For capability-backed tasks it is true only when the backend reports or clearly implies that this request transitioned the model from not loaded to loaded. Already-loaded models should return loaded_new=false.

Prompt Cache Control Plane

Prompt-cache routes support two modes:

direct gateway mode:
- target a previously loaded runtime with provider + model
proxy mode:
- target an upstream AbstractEndpoint with base_url

In proxy mode, the gateway normalizes base_url, enforces the same base URL allowlist rules as other request-level routing, and forwards provider auth only from X-AbstractCore-Provider-API-Key or from Authorization when server auth is disabled.

Common fields:

Field	Location	Required	Notes
`runtime_id`	query or JSON body	no	Stable selector returned by `/acore/models/load`. Use this when multiple warm runtimes share the same `provider` + `model`.
`provider` + `model`	query or JSON body	yes, unless `base_url` is provided	Select a loaded gateway-local runtime.
`base_url`	query or JSON body	yes, unless `runtime_id` or `provider` + `model` is provided	Upstream AbstractEndpoint URL. It may include `/v1`; the proxy strips that suffix for control-plane calls. In local mode it can also disambiguate a warm runtime that was loaded with a base URL.
`X-AbstractCore-Provider-API-Key`	header	no	Upstream endpoint token when required.
`api_key`	query/body	no	Deprecated/disabled; do not use.
`ttl_s`	JSON body	no	Optional upstream cache TTL in seconds, where supported.

Operations:

Endpoint	Method	Parameters	Result
`/acore/prompt_cache/capabilities`	GET	`provider` + `model` or `base_url`	Cache features on the selected local or upstream runtime.
`/acore/prompt_cache/stats`	GET	`provider` + `model` or `base_url`	Cache stats on the selected local or upstream runtime.
`/acore/prompt_cache/set`	POST	`provider` + `model` or `base_url`, `key`, `make_default`, `ttl_s`	Select/create a cache key locally or upstream.
`/acore/prompt_cache/update`	POST	`provider` + `model` or `base_url`, `key`, `prompt` or `messages`, `system_prompt`, `tools`, optional `thinking`, `add_generation_prompt`, `ttl_s`	Prepare prompt/messages/tools into a local or upstream cache key.
`/acore/prompt_cache/fork`	POST	`provider` + `model` or `base_url`, `from_key`, `to_key`, `make_default`, `ttl_s`	Fork an existing local or upstream key.
`/acore/prompt_cache/clear`	POST	`provider` + `model` or `base_url`, optional `key`	Clear a local or upstream key, or default/all cache state depending on backend support.
`/acore/prompt_cache/prepare_modules`	POST	`provider` + `model` or `base_url`, `namespace`, `modules`, `make_default`, `ttl_s`, `version`	Prepare reusable module/tool context locally or upstream.

Example:

curl -X POST http://localhost:8000/acore/prompt_cache/update \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "http://127.0.0.1:8001/v1",
    "key": "project-default",
    "messages": [{"role": "system", "content": "You are concise."}],
    "thinking": "off",
    "ttl_s": 3600
  }'

thinking on /acore/prompt_cache/update is applied before the provider appends the cached fragment. This keeps cache-prefilled prompt state aligned with later /v1/chat/completions or /v1/responses calls when reasoning control changes prompt serialization.

Memory Blocs Control Plane

Memory-bloc routes also support two modes:

direct gateway mode:
- POST /acore/models/load
- local POST /acore/blocs/upsert_text
- local POST /acore/blocs/kv/ensure
- local POST /acore/blocs/kv/load
- then normal /v1/chat/completions
proxy mode:
- the same /acore/blocs/* routes with base_url targeting an upstream AbstractEndpoint

That distinction matters:

gateway-local bloc records live in the gateway bloc store
gateway-local loaded cache keys live on the selected loaded runtime
proxy-mode loaded cache keys live on the upstream AbstractEndpoint

Operations:

Endpoint	Method	Parameters	Result
`/acore/blocs/upsert_text`	POST	optional `base_url`, `path`, `content`, optional bloc metadata	Persist extracted text into the local bloc store or upstream bloc store.
`/acore/blocs/record`	GET	optional `base_url`, `sha256` or `bloc_id`	Inspect a local or upstream bloc record.
`/acore/blocs/kv/manifest`	GET	`runtime_id` or `provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`	Inspect the local or upstream KV manifest for the selected model.
`/acore/blocs/kv/ensure`	POST	`runtime_id` or `provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`, `force_rebuild`, `debug`	Compile or validate the durable provider/model bloc KV artifact locally or upstream.
`/acore/blocs/kv/load`	POST	`runtime_id` or `provider` + `model` or `base_url`, `sha256` or `bloc_id`, optional `artifact_path`, `stable_cache_key`, `key`, `make_default`, `force_rebuild`, `debug`	Load or fork the local or upstream artifact into a prompt-cache key and return `prompt_cache_binding`.

Typical direct gateway flow:

POST /acore/models/load
POST /acore/blocs/upsert_text
POST /acore/blocs/kv/ensure
POST /acore/blocs/kv/load
call /v1/chat/completions with returned artifact.prompt_cache_binding when exact binding is required

Typical remote flow:

POST /acore/blocs/upsert_text
POST /acore/blocs/kv/ensure
POST /acore/blocs/kv/load
call /v1/chat/completions with returned artifact.prompt_cache_binding when exact binding is required

Example:

curl -X POST http://localhost:8000/acore/blocs/kv/load \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "http://127.0.0.1:8001/v1",
    "sha256": "abababababababababababababababababababababababababababababababab",
    "stable_cache_key": "stable:orbit",
    "key": "work:orbit",
    "make_default": false,
    "debug": true
  }'

The load response includes:

artifact.key: the worker-local runtime cache key
artifact.binding_id: the opaque exact-artifact identity
artifact.prompt_cache_binding: object to pass to chat as prompt_cache_binding
artifact.debug: verbose proof fields when debug=true

Supported local artifact backends share this route shape: MLX, HuggingFace transformers, and HuggingFace GGUF exact-renderer paths. Remote providers and unsupported GGUF chat formats remain best-effort prompt_cache_key paths.

Agentic CLI integration

AbstractCore Server is OpenAI-compatible. Most OpenAI-compatible CLIs/SDKs can be pointed at it by setting:

OPENAI_BASE_URL="http://localhost:8000/v1" (or an equivalent flag)
OPENAI_API_KEY="unused" (many clients require a non-empty key even for local servers)

Tool calling interoperability

The server does not execute tools (it always returns tool calls; your host/runtime executes them).
It can emit tool calls either as structured tool_calls (OpenAI/Codex style) or as tagged content for clients that parse tool calls from assistant text.
Control the output format with agent_format (request body, AbstractCore extension), or rely on auto-detection (user-agent + model heuristics).

Supported agent_format values: auto, openai, codex, qwen3, llama3, gemma, xml, passthrough.

Codex CLI (example)

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"

codex --model "ollama/qwen3-coder:30b" "Write a factorial function"

Forcing a format (curl)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3:4b-instruct-2507-q4_K_M",
    "messages": [{"role": "user", "content": "Use the tool."}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather by city",
          "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
          }
        }
      }
    ],
    "agent_format": "llama3"
  }'

Deployment

Docker

Release images are published to GitHub Container Registry after the matching PyPI release succeeds:

ghcr.io/lpalbou/abstractcore-server:<version>

The image is built from PyPI, not from the repository checkout, and installs:

abstractcore[server,remote,media,tokens,compression]==<version>

It includes remote chat/responses, remote embeddings, remote STT/TTS routing, remote OpenAI-compatible image proxying, server dependencies, media parsing, token counting, and compression helpers. It intentionally does not include AbstractCore local LLM runtimes (vllm, mlx, huggingface), local embedding dependencies (sentence-transformers), or optional capability plugin entry points. Remote image/audio OpenAI-compatible endpoint routes still work without those plugins. Build a custom image with abstractcore[server,remote,media,tokens,compression,voice,vision] when you want plugin-backed media catalogs or plugin default routes; these capability extras stay remote-light. Add explicit local aggregate profiles such as abstractcore[all-apple] or abstractcore[all-gpu] only when you want local native inference engines.

Run:

docker pull ghcr.io/lpalbou/abstractcore-server:2.13.12

For local development, keep secrets in an uncommitted .env file:

ABSTRACTCORE_AUTH_TOKEN=replace-with-a-server-token
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...
ANTHROPIC_API_KEY=sk-ant-...
PORTKEY_API_KEY=pk_...
PORTKEY_CONFIG=pcfg_...
OPENAI_BASE_URL=http://host.docker.internal:1234/v1

Then run the image with that environment file:

docker run --rm --name abstractcore-server \
  -p 127.0.0.1:8000:8000 \
  --env-file .env \
  ghcr.io/lpalbou/abstractcore-server:2.13.12

ABSTRACTCORE_AUTH_TOKEN is the AbstractCore server auth token. Clients send it as Authorization: Bearer <token>. At /docs, use Swagger UI's normal Authorize button when server auth is enabled; AbstractCore validates that bearer token before Swagger marks it authorized. Provider keys such as OPENAI_API_KEY, OPENROUTER_API_KEY, ANTHROPIC_API_KEY, and PORTKEY_API_KEY stay inside the server container.

Set ABSTRACTCORE_SERVER_PROTECT_DOCS=1 if /docs, /redoc, and /openapi.json should require the same server token.

For local OpenAI-compatible endpoints such as LM Studio or Ollama's /v1 server, point the container at a URL reachable from Docker:

docker run --rm --name abstractcore-server \
  -p 127.0.0.1:8000:8000 \
  -e ABSTRACTCORE_AUTH_TOKEN="$ABSTRACTCORE_AUTH_TOKEN" \
  -e OPENAI_BASE_URL="http://host.docker.internal:1234/v1" \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  ghcr.io/lpalbou/abstractcore-server:2.13.12

Docker Compose

version: '3.8'

services:
  abstractcore:
    image: ghcr.io/lpalbou/abstractcore-server:2.13.12
    ports:
      - "8000:8000"
    environment:
      - ABSTRACTCORE_AUTH_TOKEN=${ABSTRACTCORE_AUTH_TOKEN}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - PORTKEY_API_KEY=${PORTKEY_API_KEY}
      - PORTKEY_CONFIG=${PORTKEY_CONFIG}
      - OPENAI_BASE_URL=${OPENAI_BASE_URL}
    restart: unless-stopped

Production with Gunicorn

pip install gunicorn

gunicorn abstractcore.server.app:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000

Debug and Monitoring

Enable Debug Mode

Debug mode provides comprehensive logging and detailed error reporting for troubleshooting API issues.

# Method 1: Using command line flag (recommended)
python -m abstractcore.server.app --debug

# Method 2: Using environment variable
export ABSTRACTCORE_DEBUG=true
python -m abstractcore.server.app

# Method 3: With uvicorn directly
export ABSTRACTCORE_DEBUG=true
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

Debug Features

Enhanced Error Reporting:

Before: Uninformative "422 Unprocessable Entity" messages
After: Detailed field validation errors with request body capture

Example Debug Output:

🔴 Request Validation Error (422) | method=POST | error_count=2 | errors=[
  {"field": "body -> model", "message": "Field required", "type": "missing"},
  {"field": "body -> messages", "message": "Field required", "type": "missing"}
] | client=127.0.0.1

📋 Request Body (Validation Error) | body={"invalid": "data"}

Request/Response Tracking:

Full HTTP request details (method, URL, headers, client IP)
Response status codes and processing times
Structured JSON logging for machine processing

Log Files:

logs/abstractcore_TIMESTAMP.log - Structured events
logs/YYYYMMDD-payloads.jsonl - Full request bodies
logs/verbatim_TIMESTAMP.jsonl - Complete I/O

Useful Commands:

# Find errors
grep '"level": "error"' logs/abstractcore_*.log

# Track token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
  awk '{sum+=$1} END {print "Total:", sum}'

# Monitor specific model
grep '"model": "qwen3-coder:30b"' logs/verbatim_*.jsonl

Common Patterns

Multi-Provider Fallback

import requests

providers = [
    "ollama/qwen3-coder:30b",
    "openai/gpt-4o-mini",
    "anthropic/claude-haiku-4-5"
]

def generate_with_fallback(prompt):
    for model in providers:
        try:
            response = requests.post(
                "http://localhost:8000/v1/chat/completions",
                json={"model": model, "messages": [{"role": "user", "content": prompt}]},
                timeout=30
            )
            if response.status_code == 200:
                return response.json()
        except Exception:
            continue
    raise Exception("All providers failed")

Local Model Gateway

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b

# Use via AbstractCore server
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3-coder:30b",
    "messages": [{"role": "user", "content": "Write a Python function"}]
  }'

Troubleshooting

Server Won't Start

# Check port availability
lsof -i :8000

# Use different port
uvicorn abstractcore.server.app:app --port 3000

No Models Available

# Check providers
curl http://localhost:8000/providers

# Check API keys
echo $OPENAI_API_KEY

# Start Ollama
ollama serve
ollama list

Authentication Errors

# Set API keys
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Restart server after setting keys

Why AbstractCore Server?

Universal: One API for all providers
OpenAI Compatible: Drop-in replacement
Simple: Clean, focused endpoints
Fast: Lightweight, high-performance
Debuggable: Comprehensive logging
CLI Ready: Codex, Gemini CLI, Crush support
Production Ready: Docker, multi-worker, health checks

FilesExpand file tree

server.md

Latest commit

History

server.md

File metadata and controls

AbstractCore Server

Interactive API docs (start here)

Quick Start

Install and Run (2 minutes)

First Request

Configuration

Environment Variables

Startup Options

API Endpoints

Endpoint Map

Capability Routing Defaults

Shared Request Conventions

Chat Completions

Thinking (AbstractCore extension)

Provider base_url override (AbstractCore extension)

Provider Authentication

Provider-Specific Chat Route

Media generation endpoints (optional)

Capability catalogs

Images (generate/edit)

Videos (text-to-video/image-to-video)

Audio (STT/TTS)

Multimodal Requests (Images, Documents, Files)

Supported File Types

Method 1: @filename Syntax (AbstractCore Extension)

Method 2: OpenAI Vision API Format (Image URLs)

Method 3: OpenAI File Format (Forward-Compatible)

Mixed Content Example

Python Client Examples

OpenAI Responses API

Why Use /v1/responses?

Request Format

Automatic Format Detection

Examples

Supported Media Types

Python Client Example

Embeddings

Model Discovery

Provider Status

Health Check

Runtime Control Plane

Prompt Cache Control Plane

Memory Blocs Control Plane

Agentic CLI integration

Tool calling interoperability

Codex CLI (example)

Forcing a format (curl)

Deployment

Docker

Docker Compose

Production with Gunicorn

Debug and Monitoring

Enable Debug Mode

Debug Features

Common Patterns

Multi-Provider Fallback

Local Model Gateway

Troubleshooting

Server Won't Start

No Models Available

Authentication Errors

Why AbstractCore Server?

Related Documentation

Provider `base_url` override (AbstractCore extension)