Skip to content

Latest commit

 

History

History
2005 lines (1640 loc) · 87.8 KB

File metadata and controls

2005 lines (1640 loc) · 87.8 KB

AbstractCore Server

Transform AbstractCore into an OpenAI-compatible API server. One server, all models, any client.

If you want a dedicated single-model /v1 server (one provider/model per worker), see Endpoint.

Interactive API docs (start here)

Visit while the server is running:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc
  • Lightweight endpoint index: http://localhost:8000/docs-lite

Swagger UI keeps its standard Authorize button when server auth is enabled. When ABSTRACTCORE_AUTH_TOKEN is set, AbstractCore wraps that authorize flow and validates the entered bearer token through /acore/auth/validate before Swagger stores it for Try it out requests. Invalid tokens stay unauthorized and render an auth error inside the modal. The docs and OpenAPI schema are public by default so the UI can load before authentication, but API operations remain protected. Set ABSTRACTCORE_SERVER_PROTECT_DOCS=1 if you also want /docs, /docs-lite, /redoc, and /openapi.json behind server auth. When server auth is disabled, the server bearer scheme is omitted from the docs, so Swagger does not render a misleading server-token authorize flow.

The OpenAPI schema includes executable examples for every request body. JSON examples intentionally show optional aliases as null when sending both fields would be ambiguous; the server drops nulls before routing. For local/custom OpenAI-compatible endpoints, set base_url only when you intentionally want to route away from the provider's default API host.

Quick Start

Install and Run (2 minutes)

# Install
pip install "abstractcore[server]"

# Configure server auth and provider keys
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"
export OPENAI_API_KEY="sk-..."

# Start server
python -m abstractcore.server.app

# Or with uvicorn directly
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

# Test
curl http://localhost:8000/health
# Response: {"status":"healthy"}

First Request

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Or with Python:

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)

Configuration

You can configure the server through environment variables or through AbstractCore's centralized config. Environment variables always take precedence over config-persisted values.

# Persisted local/server config
abstractcore --set-server-auth-token acore-server-secret
abstractcore --set-api-key openai sk-...
abstractcore --set-api-key anthropic sk-ant-...
abstractcore --set-api-key openrouter sk-or-...
abstractcore --set-api-key portkey pk_...

# Optional hardening/defaults
abstractcore --set-server-base-url-allowlist "https://example.com/v1"
abstractcore --set-server-url-fetch-allowlist "https://files.example.com"
abstractcore --set-server-media-root /srv/abstractcore-media
abstractcore --set-server-host 127.0.0.1
abstractcore --set-server-port 8000

Environment Variables

# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."
export PORTKEY_API_KEY="pk_..."         # optional (Portkey)
export PORTKEY_CONFIG="pcfg_..."        # required for Portkey routing

# Server auth token. Authenticated clients can use all server-configured providers.
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"

# Optional: also protect /docs, /docs-lite, /redoc, and /openapi.json.
export ABSTRACTCORE_SERVER_PROTECT_DOCS=1

# Local providers
export OLLAMA_BASE_URL="http://localhost:11434"          # (or legacy: OLLAMA_HOST)
export LMSTUDIO_BASE_URL="http://localhost:1234/v1"
export VLLM_BASE_URL="http://localhost:8000/v1"
export OPENAI_BASE_URL="http://localhost:1234/v1"
export OPENAI_API_KEY="your-endpoint-key"                # optional, if the endpoint requires auth

# Server bind (only used by `python -m abstractcore.server.app`)
export HOST="0.0.0.0"
export PORT="8000"

# Debug mode
export ABSTRACTCORE_DEBUG=true

# Dangerous (multi-tenant hazard): allow unload_after for providers that can unload shared server state (e.g. Ollama)
export ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1

# Server security controls (recommended)
#
# - Request-level base_url overrides are loopback-only by default.
#   URL entries match scheme + exact host + default/explicit port + path-segment prefix.
#   Bare entries match hostname globs, e.g. "*.example.com".
export ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST="https://api.openai.com,https://example.com/v1"
#
# - Remote URL fetches for attachments are blocked for private/loopback/link-local targets by default (SSRF protection).
#   To allow specific hosts/prefixes, use the same structured allowlist syntax:
export ABSTRACTCORE_SERVER_URL_FETCH_ALLOWLIST="https://www.berkshirehathaway.com"
#
# - Local file paths in HTTP requests are disabled by default (including @/path/to/file in message strings).
#   To allow local file paths safely, restrict them under a single directory:
export ABSTRACTCORE_SERVER_MEDIA_ROOT="/srv/abstractcore-media"
#
# - Unsafe escape hatch: allow arbitrary local file paths from HTTP requests (not recommended)
export ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1

Startup Options

# Using AbstractCore's built-in CLI
python -m abstractcore.server.app --help                    # View all options
python -m abstractcore.server.app --debug                   # Debug mode
python -m abstractcore.server.app --host 127.0.0.1 --port 8080  # Custom host/port
python -m abstractcore.server.app --debug --port 8001       # Debug on custom port

# Using uvicorn directly
uvicorn abstractcore.server.app:app --reload                # Development with auto-reload
uvicorn abstractcore.server.app:app --workers 4             # Production with multiple workers
uvicorn abstractcore.server.app:app --port 3000             # Custom port

API Endpoints

Endpoint Map

All API operations except GET /health use the same server auth policy: send Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN when ABSTRACTCORE_AUTH_TOKEN is configured. Provider-key overrides use X-AbstractCore-Provider-API-Key. Provider keys in request bodies remain disabled; select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.

Group Method Endpoint Purpose Main parameters
Health GET /health Liveness/version probe; never requires auth none
Configuration GET /v1/config/capability-defaults List explicit input/output/embedding/rerank route defaults none
Configuration PUT /v1/config/capability-defaults/{kind}/{modality} Set one capability route default path kind, modality; body provider, model, base_url, options
Configuration DELETE /v1/config/capability-defaults/{kind}/{modality} Clear one capability route default path kind, modality
Discovery GET /v1/models List models and filter by provider/capabilities provider, input_type, output_type, base_url, api_key
Discovery GET /providers Provider status/capabilities include_models
Discovery GET /v1/vision/providers/ AbstractVision provider catalog for image/video generation models optional task, provider, include_models, base_url, api_key
Discovery GET /v1/audio/voices AbstractVoice voice/profile catalog for TTS optional provider, model, providers_only, base_url, api_key
Discovery GET /v1/audio/speech/models AbstractVoice TTS model/provider catalog optional provider, base_url, api_key
Discovery GET /v1/audio/speech/providers AbstractVoice TTS provider catalog optional base_url
Discovery GET /v1/audio/transcriptions/models AbstractVoice STT model/provider catalog optional provider, base_url, api_key
Discovery GET /v1/audio/transcriptions/providers AbstractVoice STT provider catalog optional base_url
Discovery GET /v1/voice/clone/providers AbstractVoice voice clone provider catalog optional base_url
Chat POST /v1/chat/completions OpenAI-compatible chat, streaming, tools, media model, messages, stream, tools, tool_choice, temperature, max_tokens, base_url, agent_format, thinking
Chat POST /{provider}/v1/chat/completions Provider-scoped chat route where body model is unprefixed path provider, body model, messages, chat parameters
Responses POST /v1/responses OpenAI Responses API (object:"response") + legacy chat fallback model, input or messages, stream, generation parameters, base_url, agent_format, thinking, prompt_cache_key, prompt_cache_binding
Embeddings POST /v1/embeddings OpenAI-compatible embedding vectors model, input, dimensions, encoding_format, user, base_url
Images POST /v1/images/generations Text-to-image generation prompt, optional model, provider, base_url, width, height, size, n, steps, guidance_scale, seed, quality, extra
Images POST /{provider}/v1/images/generations Provider-scoped text-to-image route where body model is unprefixed path provider, body model, optional base_url, image generation parameters
Images POST /v1/images/edits Image edit/inpaint via multipart form prompt, image, optional mask, model, provider, base_url, size, steps, guidance_scale, seed, extra_json
Images POST /{provider}/v1/images/edits Provider-scoped image edit route where body model is unprefixed path provider, optional base_url, image edit form fields
Videos POST /v1/videos/generations Text-to-video generation prompt, optional model, provider, base_url, width, height, fps, num_frames, steps, guidance_scale, extra
Videos POST /{provider}/v1/videos/generations Provider-scoped text-to-video route where body model is unprefixed path provider, body model, optional base_url, video generation parameters
Videos POST /v1/videos/edits Image-to-video via multipart form prompt, image, optional model, provider, base_url, width, height, fps, num_frames, extra_json
Videos POST /{provider}/v1/videos/edits Provider-scoped image-to-video route where body model is unprefixed path provider, optional base_url, image-to-video form fields
Vision Jobs POST /v1/vision/jobs/images/generations Async image generation with polling same body as /v1/images/generations
Vision Jobs POST /v1/vision/jobs/images/edits Async image edit with polling same form fields as /v1/images/edits
Vision Jobs POST /v1/vision/jobs/videos/generations Async text-to-video with polling and progress events same body as /v1/videos/generations
Vision Jobs POST /v1/vision/jobs/videos/edits Async image-to-video with polling and progress events same form fields as /v1/videos/edits
Vision Jobs GET /v1/vision/jobs/{job_id} Poll/consume async job state path job_id, query consume
Vision Models GET /v1/vision/models Available AbstractVision model catalog optional task, provider, base_url, api_key
Audio POST /v1/audio/transcriptions Speech-to-text multipart endpoint file, optional provider, model, language, prompt, response_format, temperature, format, base_url
Audio POST /{provider}/v1/audio/transcriptions Provider-scoped speech-to-text route where body model is unprefixed path provider, optional base_url, STT form fields
Audio POST /v1/audio/speech Text-to-speech endpoint input/text, optional provider, model, voice, response_format/format, speed, instructions, profile, quality_preset, quality, base_url
Audio POST /{provider}/v1/audio/speech Provider-scoped text-to-speech route where body model is unprefixed path provider, optional base_url, TTS body fields
Audio POST /v1/voice/clone AbstractVoice-compatible voice-clone/custom-voice extension file, optional provider, model, tts_model, cloning_engine, base_url, name, reference_text, validate
Audio POST /{provider}/v1/voice/clone Provider-scoped voice-clone route where body model is unprefixed path provider, optional base_url, voice-clone form fields
Audio POST /v1/audio/translations Reserved OpenAI-compatible translation route file, model; returns 501 in this version
Audio POST /v1/audio/music Extension endpoint for text-to-music plugins prompt/input/text, optional provider, model, lyrics, duration_s, seed, num_inference_steps, guidance_scale, format; requires a music capability plugin
Audio POST /{provider}/v1/audio/music Backend-scoped text-to-music route path provider, music body fields
Runtime POST /acore/models/load Load and keep warm a task-specific model runtime optional task (text_generation default, image_generation, video_generation, text_to_video, image_to_video, tts, stt), provider, model, options, pin, base_url, timeout_s
Runtime GET /acore/models/loaded List task-aware loaded runtimes optional task, provider, model
Runtime POST /acore/models/unload Unload a task-specific runtime runtime_id or provider + model, optional task, base_url, options
Prompt Cache GET /acore/prompt_cache/stats Cache stats on a loaded gateway runtime or upstream AbstractEndpoint provider + model or base_url; provider key header if required
Prompt Cache GET /acore/prompt_cache/capabilities Cache capability discovery on a loaded gateway runtime or upstream AbstractEndpoint provider + model or base_url; provider key header if required
Prompt Cache POST /acore/prompt_cache/set Select/create a cache key locally or upstream provider + model or base_url, key, make_default, ttl_s
Prompt Cache POST /acore/prompt_cache/update Prepare prompt/messages/tools locally or upstream provider + model or base_url, key, prompt or messages, system_prompt, tools, optional thinking, ttl_s
Prompt Cache POST /acore/prompt_cache/fork Fork one cache key to another locally or upstream provider + model or base_url, from_key, to_key, make_default, ttl_s
Prompt Cache POST /acore/prompt_cache/clear Clear local or upstream cache state provider + model or base_url, optional key
Prompt Cache POST /acore/prompt_cache/prepare_modules Prepare reusable module/tool context locally or upstream provider + model or base_url, namespace, modules, make_default, ttl_s, version
Memory Blocs POST /acore/blocs/upsert_text Persist extracted text into the gateway-local bloc store or an upstream AbstractEndpoint bloc store optional base_url, path, content, optional bloc metadata
Memory Blocs GET /acore/blocs List gateway-local or upstream bloc records optional base_url, sha256, bloc_id
Memory Blocs GET /acore/blocs/record Inspect a gateway-local or upstream bloc record optional base_url, sha256 or bloc_id
Memory Blocs POST /acore/blocs/delete Delete one bloc with optional live KV safety checks optional base_url, sha256 or bloc_id, delete_kv, clear_loaded, force, dry_run
Memory Blocs GET /acore/blocs/kv/manifest Inspect a gateway-local or upstream bloc KV manifest provider + model or base_url, sha256 or bloc_id, optional artifact_path
Memory Blocs GET /acore/blocs/kv/list List manifest-backed bloc KV artifacts optional base_url, provider, model, sha256, bloc_id
Memory Blocs POST /acore/blocs/kv/ensure Compile or validate a local or upstream provider-backed bloc KV artifact provider + model or base_url, sha256 or bloc_id, optional artifact_path, force_rebuild, debug
Memory Blocs POST /acore/blocs/kv/load Load or fork a local or upstream provider-backed bloc KV artifact into a cache key provider + model or base_url, sha256 or bloc_id, optional artifact_path, stable_cache_key, key, make_default, force_rebuild, debug
Memory Blocs POST /acore/blocs/kv/delete Delete one bloc KV artifact with live-binding safety provider + model or base_url when checking live state, sha256 or bloc_id, optional artifact_path, clear_loaded, force, dry_run, debug
Memory Blocs POST /acore/blocs/kv/prune Delete matching bloc KV artifacts by filter optional provider, model, base_url, sha256, bloc_id, clear_loaded, force, dry_run, debug
Capabilities GET /v1/capabilities Inspect optional capability plugin availability and backend metadata none
Capabilities GET /v1/capabilities/{capability}/providers List normalized providers for one capability plugin path capability, optional task
Capabilities GET /v1/capabilities/{capability}/models List normalized models for one capability plugin path capability, optional task, provider
Audio GET /v1/audio/music/providers List music capability providers optional task
Audio GET /v1/audio/music/models List music capability models optional task, provider

Capability Routing Defaults

/v1/config/capability-defaults exposes the execution host's explicit route defaults for input, output, embedding, and rerank capabilities. Gateway uses this route as its control-plane source when ABSTRACTCORE_SERVER_BASE_URL points at a remote Core server.

These defaults are configuration only; they do not load a model. Use the runtime residency routes under /acore/models/* to inspect or change provider-loaded state.

Shared Request Conventions

  • model usually uses provider/model format, for example openai/gpt-4o-mini, anthropic/claude-haiku-4-5, ollama/qwen3:4b, lmstudio/qwen/qwen3-vl-4b, or openai-compatible/my-model.
  • base_url is an AbstractCore extension for routing a provider to a specific OpenAI-compatible endpoint. Loopback URLs are allowed by default; non-loopback URLs require ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.
  • Media routes also accept an optional provider routing hint. This is mainly useful when you omit model, use a provider-scoped route, or pair a custom base_url with the default local/plugin path.
  • X-AbstractCore-Provider-API-Key overrides only the requested upstream provider for that request. It does not replace the AbstractCore server token.
  • Provider keys in request bodies remain disabled; use X-AbstractCore-Provider-API-Key for per-request upstream overrides. Select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.
  • Remote URL media fetches are SSRF-protected by default. Local file paths are disabled unless ABSTRACTCORE_SERVER_MEDIA_ROOT or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 is configured.

Chat Completions

Endpoint: POST /v1/chat/completions

Standard OpenAI-compatible endpoint. Works with all providers.

Server auth:

  • If ABSTRACTCORE_AUTH_TOKEN is configured, every non-health endpoint requires Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN. Authenticated clients can use all provider keys/endpoints configured on the server.
  • If ABSTRACTCORE_AUTH_TOKEN is not configured, either set ABSTRACTCORE_SERVER_ALLOW_UNAUTHENTICATED=1 for intentional local/dev use, or provide an upstream provider key explicitly via X-AbstractCore-Provider-API-Key.
  • Health checks (GET /health) are always unauthenticated.

Request:

{
  "model": "provider/model-name",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Key Parameters:

  • model (required): Prefer "provider/model-name" (e.g., "openai/gpt-4o-mini"). If you pass a bare model name (no /), the server will best-effort auto-detect a provider.
  • messages (required): Array of message objects
  • stream (optional): Enable streaming responses
  • tools (optional): Tools for function calling
  • agent_format (optional, AbstractCore extension): Tool-call syntax output format for agentic clients ("auto"|"openai"|"codex"|"qwen3"|"llama3"|"gemma"|"xml"|"passthrough"). When omitted, the server auto-detects from user-agent + model heuristics.
  • api_key (deprecated/disabled, AbstractCore extension): Provider API keys are not accepted in request bodies. Configure provider keys on the server or use X-AbstractCore-Provider-API-Key for a per-request provider override. Select discovery endpoints accept an api_key query parameter for tooling/Swagger UI convenience.
  • base_url (optional, AbstractCore extension): Override the provider endpoint (include /v1 for OpenAI-compatible servers like LM Studio / vLLM / OpenRouter)
  • unload_after (optional, AbstractCore extension): If true, calls llm.unload_model(model) after the request completes. Disabled for ollama/* unless ABSTRACTCORE_ALLOW_UNSAFE_UNLOAD_AFTER=1.
  • prompt_cache_key (optional, AbstractCore extension): Best-effort prompt caching key (semantics depend on provider/backend). See docs/prompt-caching.md.
  • prompt_cache_binding (optional, AbstractCore extension): Exact durable bloc binding returned by /acore/blocs/kv/load. When supplied, the server verifies the cache key before generation or streaming; stale/missing bindings return 409.
  • prompt_cache_retention (optional, AbstractCore extension): Prompt cache retention policy (OpenAI: "in_memory" or "24h"; ignored by other providers). See docs/prompt-caching.md.
  • thinking (optional, AbstractCore extension): Unified thinking/reasoning control (null|"auto"|"on"|"off"|"none" or "low"|"medium"|"high"|"xhigh" when supported). Note: "none" is treated as an alias for "off".
  • temperature, max_tokens, top_p: Standard LLM parameters

Thinking (AbstractCore extension)

The server forwards thinking to the underlying provider using AbstractCore’s unified thinking mapping (see Generation Parameters).

Example (route to LM Studio + Qwen3.5, disable thinking):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen3.5-27b@q4_k_m",
    "base_url": "http://localhost:1234/v1",
    "messages": [{"role": "user", "content": "Compute 17*23 - 19*11. Reply with the integer only."}],
    "thinking": "none",
    "max_tokens": 64
  }'

Notes:

  • For Qwen3 / Qwen3.5 on LM Studio, thinking="none" maps to LM Studio’s template variables (enable_thinking / enableThinking) plus a Qwen template “hard switch” fallback (empty <think></think>) when needed. This avoids injecting “reasoning effort” instructions into the system prompt.
  • Not every backend supports per-effort budgets for low|medium|high; when unavailable, levels degrade to “thinking enabled”.

Example with streaming:

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

stream = client.chat.completions.create(
    model="ollama/qwen3-coder:30b",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Provider base_url override (AbstractCore extension)

Route a provider to a specific endpoint (useful for remote OpenAI-compatible servers):

Security notes:

  • Request-level base_url overrides are loopback-only by default. To allow additional origins or host globs, set ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST. URL entries are parsed and matched on scheme, exact host, effective port, and path-segment prefix.
  • If the server has an environment provider key set (e.g. OPENAI_API_KEY) and you route to a non-loopback base_url, the request is refused unless the provider key was supplied explicitly with X-AbstractCore-Provider-API-Key, or with Authorization when server auth is disabled.
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen/qwen3-4b-2507",
    "base_url": "http://localhost:1234/v1",
    "messages": [{"role": "user", "content": "Hello from a remote LM Studio endpoint"}]
  }'

Provider Authentication

Do not put provider keys in request bodies. Those fields are disabled because they leak through logs, shell history, browser history, and reverse proxies. For discovery/model catalog endpoints, an api_key query parameter exists for tooling/Swagger UI convenience, but headers remain preferred.

# Preferred: configure provider keys on the server and authenticate to AbstractCore.
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

When ABSTRACTCORE_AUTH_TOKEN is not configured, either set ABSTRACTCORE_SERVER_ALLOW_UNAUTHENTICATED=1 for intentional local/dev use, or provide an upstream provider key explicitly via X-AbstractCore-Provider-API-Key. Once server auth is enabled, Authorization is reserved for the AbstractCore server auth token and is never forwarded upstream.

To override a single upstream provider while still using the server auth token, send the provider key in X-AbstractCore-Provider-API-Key. The override applies only to the requested provider:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "X-AbstractCore-Provider-API-Key: $ANTHROPIC_API_KEY" \
  -d '{
    "model": "anthropic/claude-haiku-4-5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provider-Specific Chat Route

Endpoint: POST /{provider}/v1/chat/completions

This route is useful for clients that already route by base URL path and expect the body model to be provider-local. It is equivalent to using POST /v1/chat/completions with model="{provider}/{model}".

Parameters:

  • Path provider (required): provider route prefix such as openai, anthropic, ollama, openrouter, portkey, lmstudio, vllm, or openai-compatible.
  • Body model (required): provider-local model id, without the provider prefix.
  • Body messages, stream, tools, tool_choice, agent_format, thinking, base_url, and other chat parameters behave like /v1/chat/completions.

Example:

curl -X POST http://localhost:8000/openai/v1/chat/completions \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Media generation endpoints (optional)

AbstractCore Server can optionally expose OpenAI-compatible image/video generation and audio endpoints.

Important notes:

  • These are interoperability-first endpoints (return b64_json or raw bytes), not an artifact-first durability contract.
  • If the required plugin/backend is not available, the server returns 501 with actionable messaging.

Capability catalogs

Thin clients can preflight the configured media surface without importing abstractvision or abstractvoice directly:

Endpoint Purpose Notes
GET /v1/vision/providers/ Lists provider image/video catalog entries through the selected AbstractVision backend. Optional task, provider, include_models, base_url, api_key. Set include_models=true to include full provider model catalogs (slower).
GET /v1/audio/voices Lists TTS profiles/voices, active profile, active model, and bounded catalog data through AbstractVoice. Optional provider, model, providers_only, base_url, api_key.
GET /v1/audio/speech/models TTS model id projection with provider/model route strings. Includes models_by_provider and provider_models for clients that route via provider/model.
GET /v1/audio/speech/providers TTS provider projection. Useful for clients that pick /{provider}/v1/audio/speech first and then choose a model.
GET /v1/audio/transcriptions/models STT model id projection with provider/model route strings. Includes models_by_provider and provider_models.
GET /v1/audio/transcriptions/providers STT provider projection. Mirrors speech provider discovery for /v1/audio/transcriptions.
GET /v1/voice/clone/providers Voice cloning provider projection. Uses AbstractVoice clone provider availability.

These routes instantiate only the selected capability backend needed for deep catalog discovery. Shallow plugin availability remains available through the library llm.capabilities.status() call. Server-held provider keys remain behind server auth; per-request upstream key overrides must use X-AbstractCore-Provider-API-Key. For tooling/Swagger UI convenience, these catalog routes also accept an api_key query parameter (redacted from server logs).

Images (generate/edit)

Endpoints:

  • POST /v1/images/generations
  • POST /{provider}/v1/images/generations
  • POST /v1/images/edits
  • POST /{provider}/v1/images/edits

Remote OpenAI-compatible image proxying is included in abstractcore[server] and is enabled by setting OPENAI_BASE_URL. The synchronous image routes use the same internal generate(..., output="image") dispatcher as the Python API, then serialize the result back to the OpenAI-compatible b64_json response shape.

Install for remote image proxying:

pip install "abstractcore[server]"

Install local image backends only when you want the server to load Diffusers, MLX-Gen, or stable-diffusion.cpp models itself:

pip install "abstractcore[server,vision]"

Use provider/model-style image ids:

  • Omit model only when this server has a configured AbstractVision/OpenAI-compatible image default, for example via OPENAI_BASE_URL plus an optional default model id.
  • Provider-scoped routes such as /openai-compatible/v1/images/generations and /diffusers/v1/images/generations accept an unprefixed body model and internally route it as provider/model, matching /{provider}/v1/chat/completions.
  • diffusers/default selects the configured local Diffusers default: ABSTRACTCORE_VISION_MODEL_ID / ABSTRACTVISION_DIFFUSERS_MODEL_ID / ABSTRACTVISION_MODEL_ID.
  • diffusers/<huggingface-repo> selects an explicit local Diffusers model.
  • mlx-gen/default selects the configured local MLX-Gen model; use AbstractVision's q4 AbstractFramework presets by default and q8 variants when quality is paramount.
  • mlx-gen/<exact-huggingface-repo> selects an explicit cached MLX-Gen model such as mlx-gen/AbstractFramework/flux.2-klein-4b-4bit or mlx-gen/AbstractFramework/qwen-image-edit-2511-4bit. Official MLX-Gen runtime snapshots such as mlx-gen/briaai/FIBO and mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers are selected the same way. Legacy mflux prefixes remain accepted as compatibility aliases, but the model id itself must be the exact published repo id.
  • sdcpp/default selects the configured stable-diffusion.cpp model.
  • openai-compatible/<model> routes to the configured OpenAI-compatible image endpoint.
  • openai/gpt-image-1 or provider-scoped /openai/v1/images/generations routes to OpenAI's Images API and uses OPENAI_API_KEY when an explicit AbstractVision upstream base URL is not configured.

Local Diffusers generation is cache-only by default; set ABSTRACTCORE_VISION_ALLOW_DOWNLOAD=1 or ABSTRACTVISION_DIFFUSERS_ALLOW_DOWNLOAD=1 only when runtime downloads are intentional.

POST /v1/images/generations JSON parameters:

Field Required Notes
prompt yes Text prompt to render.
model no Omit for the server's configured AbstractVision default. If present, use provider/model routing: diffusers/default, diffusers/<huggingface-repo>, mlx-gen/default, mlx-gen/<exact-huggingface-repo>, sdcpp/default, openai-compatible/<model>, or openai/gpt-image-1. Provider-scoped routes accept the same model without the prefix.
provider no Optional routing hint when you want the configured default model/backend for a specific provider, or when pairing a request with base_url.
width, height no Requested output dimensions in pixels. These are the natural fields for local engines and remain accepted for remote routes.
size no OpenAI-style size such as 1024x1024. The server normalizes size with width/height so OpenAI-style and local-engine clients can use the same route.
n no Number of images; clamped to 1..10.
response_format no Server response format. b64_json is the supported response shape.
negative_prompt no Local/backend-specific negative prompt. Strict OpenAI-compatible upstreams do not receive this top-level field; use extra only when your custom upstream supports it.
seed no Local deterministic seed. Strict OpenAI-compatible upstreams do not receive this top-level field; use extra.seed only when your custom upstream supports it.
steps no Local denoising/inference step count. Strict OpenAI-compatible upstreams do not receive this top-level field; use extra.steps only when your custom upstream supports it.
guidance_scale no Local classifier-free guidance scale. Strict OpenAI-compatible upstreams do not receive this top-level field; use extra.guidance_scale only when your custom upstream supports it.
quality, style, user, background, output_format, output_compression, moderation no Named OpenAI-compatible passthrough fields for upstream image endpoints.
base_url no OpenAI-compatible endpoint override. Prefer this with openai-compatible/...; if set with openai/..., the request is sent to that URL instead of api.openai.com. Loopback is allowed by default; non-loopback requires allowlist.
extra no JSON object for backend-specific passthrough fields. Prefer this over arbitrary top-level keys so the schema stays explicit.

POST /v1/images/edits multipart parameters:

Field Required Notes
prompt yes Edit/inpaint instruction.
image yes Source image file.
mask no Optional mask image for inpainting/edit-capable backends.
model no Same provider/model routing as generation; omit for the server default. Provider-scoped routes accept the same model without the prefix.
provider no Optional routing hint when you want the configured default backend for a specific provider, or when pairing a request with base_url.
size no OpenAI-style edit output size such as 1024x1024; multipart edit compatibility keeps this field.
response_format no Server response shape; b64_json is supported.
negative_prompt, seed, steps, guidance_scale no Local/backend-specific fields. Strict OpenAI-compatible upstreams do not receive them as top-level fields; use extra_json only when your custom upstream supports them.
base_url no OpenAI-compatible endpoint override. Loopback is allowed by default; non-loopback requires allowlist.
extra_json no JSON object string with backend/upstream-specific parameters.

Async image jobs are available when a request can take long enough that polling is preferable:

  • POST /v1/vision/jobs/images/generations uses the same JSON body as /v1/images/generations and returns {"job_id": "..."}.
  • POST /v1/vision/jobs/images/edits uses the same multipart fields as /v1/images/edits and returns {"job_id": "..."}.
  • GET /v1/vision/jobs/{job_id} returns queued, running, succeeded, or failed. Add ?consume=true to remove a completed job from the in-memory job store after reading it.

Videos (text-to-video/image-to-video)

Endpoints:

  • POST /v1/videos/generations
  • POST /{provider}/v1/videos/generations
  • POST /v1/videos/edits
  • POST /{provider}/v1/videos/edits
  • POST /v1/vision/jobs/videos/generations
  • POST /v1/vision/jobs/videos/edits

The synchronous video routes use the same internal generate(..., output={"modality": "video"}) dispatcher as the Python API and return {"data":[{"b64_json":"..."}]} with MP4 bytes encoded in base64. Async video jobs are the preferred path for longer local runs; polling GET /v1/vision/jobs/{job_id} includes progress.last_event when the selected backend reports richer progress events.

Use exact provider/model ids. For MLX-Gen, select the published model repo id, for example:

  • mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers for text-to-video or image-to-video.
  • mlx-gen/AbstractFramework/qwen-image-2512-4bit for text-to-image.

Core does not expose a quantization override. Q4/Q8 choices are part of the model id that AbstractVision/MLX-Gen loads.

POST /v1/videos/generations JSON parameters:

Field Required Notes
prompt yes Text prompt to render as video.
model no Provider/model id such as mlx-gen/Wan-AI/Wan2.2-TI2V-5B-Diffusers or openai-compatible/<model>. Provider-scoped routes accept the same model without the prefix.
provider no Optional routing hint, e.g. mlx-gen or openai-compatible.
width, height, size no Requested output dimensions. size accepts WIDTHxHEIGHT.
fps, num_frames / frames no Video frame rate and frame count.
response_format no b64_json is the supported response shape.
negative_prompt, seed, steps, guidance_scale no Backend-specific generation controls.
extra.max_sequence_length no Useful for MLX-Gen Wan-style video runs.

POST /v1/videos/edits multipart parameters mirror generation and add required image=@first-frame.png. This route is the image-to-video path; the alias /v1/videos/from-image is accepted for clients that prefer a literal name.

Examples:

# Remote OpenAI-compatible image endpoint.
BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

curl -sS -X POST "$BASE/v1/images/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai-compatible/gpt-image-1","prompt":"A clean product photo of a red ceramic mug on a white table.","n":1,"width":1024,"height":1024,"response_format":"b64_json","quality":"low"}' \
  > /tmp/acore-image.json

python - <<'PY'
import base64
import json
from pathlib import Path

data = json.loads(Path("/tmp/acore-image.json").read_text())
Path("/tmp/acore-image.png").write_bytes(base64.b64decode(data["data"][0]["b64_json"]))
PY

# Image edit using the generated image.
curl -sS -X POST "$BASE/v1/images/edits" \
  -H "Authorization: Bearer $TOKEN" \
  -F "model=openai-compatible/gpt-image-1" \
  -F "prompt=Make the mug blue while keeping the white table." \
  -F "image=@/tmp/acore-image.png;type=image/png" \
  -F "size=1024x1024" \
  -F "response_format=b64_json" \
  -F 'extra_json={"quality":"low"}' \
  > /tmp/acore-edit.json

python - <<'PY'
import base64
import json
from pathlib import Path

data = json.loads(Path("/tmp/acore-edit.json").read_text())
Path("/tmp/acore-edit.png").write_bytes(base64.b64decode(data["data"][0]["b64_json"]))
PY

# Configured server image default
curl -sS -X POST "$BASE/v1/images/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a red fox in snow","width":512,"height":512,"response_format":"b64_json"}'

# Text-to-video, asynchronous job with progress polling.
curl -sS -X POST "$BASE/v1/vision/jobs/videos/generations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"provider":"mlx-gen","model":"Wan-AI/Wan2.2-TI2V-5B-Diffusers","prompt":"A slow camera move through a luminous data center.","width":1280,"height":704,"fps":24,"num_frames":121,"steps":50,"guidance_scale":5.0,"extra":{"max_sequence_length":256}}'

# Image-to-video, synchronous multipart route.
curl -sS -X POST "$BASE/v1/videos/edits" \
  -H "Authorization: Bearer $TOKEN" \
  -F "provider=mlx-gen" \
  -F "model=Wan-AI/Wan2.2-TI2V-5B-Diffusers" \
  -F "prompt=Slow camera push-in." \
  -F "image=@./first-frame.png;type=image/png" \
  -F "width=1280" \
  -F "height=704" \
  -F "fps=24" \
  -F "num_frames=121" \
  -F "steps=50" \
  -F 'extra_json={"max_sequence_length":256}'

Local vision model helper endpoint:

Endpoint Purpose Notes
GET /v1/vision/models List available AbstractVision provider models. Includes remote providers when their API key/base URL is configured and local models when they are present in known caches.

Audio (STT/TTS)

Endpoints:

  • POST /v1/audio/transcriptions (multipart; file=...)
  • POST /{provider}/v1/audio/transcriptions (multipart; provider-scoped STT)
  • POST /v1/audio/speech (json; input=..., optional voice, optional format)
  • POST /{provider}/v1/audio/speech (json; provider-scoped TTS)
  • POST /v1/voice/clone (multipart; extension route for AbstractVoice-compatible voice cloning)
  • POST /{provider}/v1/voice/clone (multipart; provider-scoped voice cloning)
  • POST /v1/audio/translations (multipart; reserved for compatibility, returns 501)
  • POST /v1/audio/music (json; extension endpoint, requires a music capability plugin)
  • POST /{provider}/v1/audio/music (json; provider/backend-scoped music route)

Local plugin fallback is enabled when model is omitted. OpenAI SDK-style clients that require a non-empty model string can use abstractvoice/default.

Remote provider routing is enabled when model is supplied in provider/model format:

  • openai/gpt-4o-mini-transcribe, openai/whisper-1
  • openai/gpt-4o-mini-tts, openai/tts-1
  • openrouter/... for OpenRouter STT/TTS models
  • portkey/... for Portkey-routed OpenAI-compatible audio models
  • openai-compatible/... for endpoints that implement OpenAI-compatible audio routes

Provider-scoped audio routes mirror chat routing. For example, POST /openai/v1/audio/transcriptions with model=gpt-4o-mini-transcribe is equivalent to POST /v1/audio/transcriptions with model=openai/gpt-4o-mini-transcribe; the same applies to /openai-compatible/v1/audio/speech and other supported provider prefixes.

For openai-compatible/..., request-level base_url can point to a local AbstractVoice/OpenAI-compatible audio server. Loopback URLs are allowed by default; non-loopback URLs require ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.

If model is omitted, the endpoint delegates to local capability plugins (typically abstractvoice) and returns 501 when no suitable plugin is installed. Those local/plugin paths use the same internal generate(..., output=...) dispatcher as the Python API; provider/model remote routes keep their OpenAI-compatible HTTP wire behavior.

Install for remote audio:

pip install "abstractcore[server,remote]"

Install for plugin-backed routing:

pip install "abstractcore[server]"
pip install "abstractcore[voice]"
pip install "abstractcore[music]"

Notes:

  • abstractvoice 0.10.17+ can install the base plugin path on Python 3.9 without OmniVoice, torch, or torchaudio. Python 3.10+ is recommended. Use explicit local aggregate profiles such as abstractcore[all-apple] or abstractcore[all-gpu] when you want local voice engines; AEC requires Python 3.11+.
  • /v1/audio/transcriptions requires python-multipart for form parsing (included in the server extra).
  • Uploaded audio is limited by ABSTRACTCORE_SERVER_AUDIO_MAX_BYTES (default: 25 MB).

POST /v1/audio/transcriptions multipart parameters:

Field Required Notes
file yes Audio file to transcribe, commonly mp3, mp4, mpeg, mpga, m4a, wav, or webm.
model no Provider/model id for remote STT (openai/gpt-4o-mini-transcribe, openai/whisper-1, openrouter/..., portkey/..., openai-compatible/...). Omit for local abstractvoice plugin fallback; abstractvoice/default is accepted for clients that require a model string.
provider no Optional routing hint when omitting model, using a provider-scoped route, or pairing the request with base_url.
language no Input language code such as en or fr.
prompt no Provider transcription prompt/context.
response_format no Provider response format such as json, text, srt, or vtt.
temperature no Provider sampling temperature where supported.
format no Audio format override for providers that need it, notably OpenRouter base64 audio input.
base_url no Endpoint override for local/gateway routing. Prefer this with openai-compatible/...; if set with openai/..., the request is sent to that URL instead of api.openai.com. Loopback is allowed by default; non-loopback requires allowlist.

POST /v1/audio/speech JSON parameters:

Field Required Notes
input or text yes Text to synthesize. text is the AbstractCore-compatible alias.
model no Provider/model id for remote TTS (openai/gpt-4o-mini-tts, openai/tts-1, openrouter/..., portkey/..., openai-compatible/...). Omit for local plugin fallback; abstractvoice/default is accepted.
voice no Provider/backend voice name; remote OpenAI-compatible routing defaults to alloy. OpenAI TTS voices include alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse, marin, and cedar; the Swagger example uses coral.
response_format or format no Audio output format. Remote providers commonly support mp3, wav, opus, aac, flac, or pcm; local plugin fallback defaults to wav.
speed no Speech speed multiplier when supported.
instructions no Provider-specific style/instruction text for expressive TTS.
provider no Optional routing hint when omitting model, using a provider-scoped route, or pairing the request with base_url.
profile no AbstractVoice profile hint for compatible local/plugin backends.
quality_preset no AbstractVoice/local-backend quality preset when supported.
quality no OpenAI-compatible quality selector or backend-specific quality hint.
base_url no Endpoint override for local/gateway routing. Prefer this with openai-compatible/...; if set with openai/..., the request is sent to that URL instead of api.openai.com. Loopback is allowed by default, non-loopback requires allowlist.

Swagger UI can execute /v1/audio/speech. AbstractCore serves a small custom Swagger wrapper that converts authenticated binary audio POST responses into browser blob: URLs before Swagger renders the player. The example uses response_format="wav" because WAV has explicit duration metadata and is the most reliable inline preview format. If a browser still cannot play the inline preview, use the response download or a curl --output command; the endpoint returns normal audio/* bytes and includes a filename in Content-Disposition.

POST /v1/voice/clone and POST /{provider}/v1/voice/clone multipart parameters:

Field Required Notes
file yes Reference voice audio file.
model no Provider/model id for remote clone routing. Use openai-compatible/default for an AbstractVoice-compatible server, or openai/default where OpenAI custom voice creation is available. Omit for local AbstractVoice clone fallback.
provider no Optional routing hint when omitting model, using a provider-scoped route, or pairing the request with base_url.
tts_model no Optional TTS model to associate with the clone for compatible local/plugin backends.
cloning_engine no Optional clone backend/engine selector for compatible local/plugin backends.
name no Friendly cloned voice name.
reference_text no Transcript of the reference audio when available.
validate no Ask compatible clone servers to validate/smoke-test the clone before returning.
base_url no OpenAI-compatible endpoint override for openai-compatible/...; loopback is allowed by default, non-loopback requires allowlist.
clone_path no Provider-specific clone path. Defaults to /voice/clone for OpenAI-compatible servers and /audio/voices for OpenAI.
file_field no Provider-specific multipart file field. Defaults to file; OpenAI uses audio_sample.
consent no Provider-specific consent id when custom voice creation requires it.

The returned voice_id / id can be used as the voice value in /v1/audio/speech when the selected backend supports custom voices.

POST /v1/audio/music and POST /{provider}/v1/audio/music JSON parameters:

Field Required Notes
prompt or input or text yes Music generation prompt.
provider no Music backend selector, for example acemusic, acestep, stable-audio, stable-audio-3, or diffusers. The provider-scoped path can also select a backend, e.g. /acemusic/v1/audio/music or /diffusers/v1/audio/music.
model no Music model id for the selected backend, for example acemusic/ace-step-api for remote ACE Music or a Hugging Face repo id for local AbstractMusic backends.
lyrics no Optional lyrics for vocal music backends.
duration_s no Requested output duration in seconds.
seed no Deterministic seed when supported.
num_inference_steps no Diffusion/sampling step count when supported.
guidance_scale no Guidance scale when supported.
instrumental no Request instrumental output when supported.
enhance_prompt / structure_prompt / auto_lyrics no Prompt/lyrics planning controls for compatible music backends.
text_planner_mode no Host/plugin text-planning mode such as auto, on, or off.
response_format or format no Server contract supports wav, mp3, and flac; backend support can be narrower.
extra top-level fields no Best-effort passthrough to the installed music capability plugin.

With abstractmusic>=0.1.12, the base install includes the remote ACE Music backend. Configure ACEMUSIC_API_KEY in the server environment, optionally set ACEMUSIC_BASE_URL, and use provider="acemusic" or the /acemusic/v1/audio/music path. Local ACE-Step/Diffusers routes remain opt-in AbstractMusic extras.

Examples:

BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

# Local/plugin TTS through AbstractCore's unified output dispatcher.
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: audio/wav" \
  -d '{"input":"Hello from the updated AbstractCore server.","voice":"coral","response_format":"wav"}' \
  --output /tmp/acore-speech.wav

# Local/plugin STT through AbstractCore's unified output dispatcher.
curl -sS -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@/tmp/acore-speech.wav;type=audio/wav" \
  -F "language=en"

# Remote speech-to-text (STT)
curl -sS -X POST "$BASE/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@speech.wav" \
  -F "model=openai/gpt-4o-mini-transcribe" \
  -F "language=en"

# Remote text-to-speech (TTS)
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o-mini-tts","input":"Hello!","voice":"coral","response_format":"wav"}' \
  --output hello.wav

# Local abstractvoice TTS through the OpenAI-compatible endpoint
curl -sS -X POST "$BASE/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"abstractvoice/default","input":"Hello!","voice":"alloy","format":"wav"}' \
  --output hello.wav

# Remote ACE Music through AbstractMusic.
# Start the server with ACEMUSIC_API_KEY set in its environment.
curl -sS -X POST "$BASE/v1/audio/music" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A short calm piano loop.","provider":"acemusic","duration_s":8,"format":"mp3"}' \
  --output music.mp3

# Remote/local OpenAI-compatible voice clone endpoint
curl -sS -X POST "$BASE/v1/voice/clone" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@reference.wav" \
  -F "model=openai-compatible/default" \
  -F "base_url=http://127.0.0.1:5000/v1" \
  -F "name=my_voice" \
  -F "reference_text=Hello from my reference recording." \
  -F "validate=true"

If you want to “ask a model about an audio file”, prefer one of:

  • Run STT first (/v1/audio/transcriptions) then send the transcript to POST /v1/chat/completions, or
  • Configure the server’s default audio strategy (config.audio.strategy) to enable STT fallback for audio attachments, then attach audio in chat requests.

Multimodal Requests (Images, Documents, Files)

AbstractCore server supports comprehensive file attachments using OpenAI-compatible multimodal message format, plus AbstractCore's convenient @filename syntax.

Security note (HTTP server): local file paths are disabled by default (including @/path/to/file and {"url": "/path/to/file"}). Use http(s) URLs or data: base64, or enable local paths via ABSTRACTCORE_SERVER_MEDIA_ROOT (safe) / ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 (unsafe).

Image analysis example using a local generated image:

BASE=http://127.0.0.1:8000
TOKEN=replace-with-server-token

python - <<'PY'
import base64
from pathlib import Path

Path("/tmp/acore-image.b64").write_text(base64.b64encode(Path("/tmp/acore-image.png").read_bytes()).decode("ascii"))
PY

jq -n --rawfile img /tmp/acore-image.b64 '{
  model: "openai/gpt-4o-mini",
  messages: [{
    role: "user",
    content: [
      {type: "text", text: "Describe this image in one concise sentence."},
      {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}}
    ]
  }],
  max_tokens: 80,
  temperature: 0
}' > /tmp/acore-vision-chat.json

curl -sS -X POST "$BASE/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary @/tmp/acore-vision-chat.json \
  | jq -r '.choices[0].message.content'

Supported File Types

  • Images: PNG, JPEG, GIF, WEBP, BMP, TIFF
  • Documents: PDF, DOCX, XLSX, PPTX
  • Data/Text: CSV, TSV, TXT, MD, JSON, XML
  • Size Limits: 10MB per file, 32MB total per request

Method 1: @filename Syntax (AbstractCore Extension)

Simple syntax that works with all providers (requires local paths enabled via ABSTRACTCORE_SERVER_MEDIA_ROOT or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "What is in this document? @/path/to/report.pdf"}
    ]
  }'

Method 2: OpenAI Vision API Format (Image URLs)

Standard OpenAI format for images:

{
  "model": "anthropic/claude-haiku-4-5",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Base64 Images:

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."
  }
}

Method 3: OpenAI File Format (Forward-Compatible)

AbstractCore supports OpenAI's planned file format with simplified structure (consistent with image_url):

File URL Format (Recommended - Same Pattern as image_url):

{
  "model": "ollama/qwen3:4b",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Analyze this document"},
        {
          "type": "file",
          "file_url": {
            "url": "https://example.com/documents/report.pdf"
          }
        }
      ]
    }
  ]
}

Local File Path:

{
  "type": "file",
  "file_url": {
    "url": "/Users/username/documents/data.csv"
  }
}

Note: local file paths require ABSTRACTCORE_SERVER_MEDIA_ROOT (safe) or ABSTRACTCORE_SERVER_ALLOW_LOCAL_FILES=1 (unsafe) on the server.

Base64 Data URL:

{
  "type": "file",
  "file_url": {
    "url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago<PAovVHlwZS..."
  }
}

Filename Extraction:

  • URLs/Paths: Extracted automatically (/path/file.pdffile.pdf)
  • Base64: Generated from MIME type (data:application/pdf;base64,...document.pdf)

Mixed Content Example

Combine text, images, and documents in a single request:

{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Compare this chart with the data in the spreadsheet"},
        {
          "type": "image_url",
          "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANS..."}
        },
        {
          "type": "file",
          "file_url": {
            "url": "https://example.com/data/sales_data.xlsx"
          }
        }
      ]
    }
  ]
}

Python Client Examples

Using OpenAI Client:

import os
from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

# Method 1: @filename syntax
response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{"role": "user", "content": "Summarize @document.pdf"}]
)

# Method 2: File URL (HTTP/HTTPS)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the key findings?"},
            {
                "type": "file",
                "file_url": {
                    "url": "https://example.com/documents/report.pdf"
                }
            }
        ]
    }]
)

# Method 3: Local file path
response = client.chat.completions.create(
    model="anthropic/claude-haiku-4-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this local document"},
            {
                "type": "file",
                "file_url": {
                    "url": "/Users/username/documents/report.pdf"
                }
            }
        ]
    }]
)

# Method 4: Base64 data URL
with open("report.pdf", "rb") as f:
    file_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="lmstudio/qwen/qwen3-next-80b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the key findings?"},
            {
                "type": "file",
                "file_url": {
                    "url": f"data:application/pdf;base64,{file_data}"
                }
            }
        ]
    }]
)

Universal Provider Support:

# Same syntax works across all providers
providers_models = [
    "openai/gpt-4o",
    "anthropic/claude-haiku-4-5",
    "ollama/qwen2.5vl:7b",
    "lmstudio/qwen/qwen2.5-vl-7b"
]

for model in providers_models:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Analyze @data.csv and @chart.png"}]
    )
    print(f"{model}: {response.choices[0].message.content[:100]}...")

OpenAI Responses API

Endpoint: POST /v1/responses

AbstractCore implements an OpenAI-compatible Responses-style API, including input_file support.

Why Use /v1/responses?

  • OpenAI Compatible: Accepts OpenAI Responses API requests and returns an OpenAI Responses object: "response" payload
  • Native File Support: input_file type designed specifically for document attachments
  • Cleaner API: Explicit separation between text (input_text) and files (input_file)
  • Backward Compatible: Existing messages format still works alongside new input format
  • Optional Streaming: "stream": true streams OpenAI Responses events (OpenAI format) or chat-completions chunks (legacy format)

Request Format

OpenAI Responses API Format (Recommended):

{
  "model": "gpt-4o",
  "input": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "Analyze this document"},
        {"type": "input_file", "file_url": "https://example.com/report.pdf"}
      ]
    }
  ],
  "tools": [
    {"type": "web_search", "external_web_access": true}
  ],
  "tool_choice": "auto",
  "stream": false,
  "max_output_tokens": 2000,
  "temperature": 0.7
}

Key parameters:

Field Required Notes
model yes Provider/model id. Bare model ids may be auto-detected, but provider/model is preferred.
input yes, unless messages is used OpenAI Responses input. Supports a string, or an array of input items such as {"type":"message","role":"user","content":"..."} and {"type":"function_call_output","call_id":"...","output":"..."}. Message content can be a string or an array of input_text / input_file / input_image items.
messages yes, unless input is used Backward-compatible chat-completions request shape.
instructions no System-level instructions prepended ahead of input (best-effort).
stream no When true, returns server-sent events.
tools no Responses-style tools. AbstractCore does not execute tools server-side; tools are only transported to the model prompt. web_search* tools are normalized into function tools for local-model prompting and host-side execution. Unsupported built-in tool types return a 400 error.
tool_choice no Tool selection control; normalized where needed (best-effort).
max_output_tokens / max_tokens, temperature, top_p, stop, seed, frequency_penalty, presence_penalty no Standard generation controls, forwarded where supported.
base_url, agent_format, thinking, prompt_cache_key, prompt_cache_retention, timeout_s, unload_after no AbstractCore text-inference extensions with the same behavior as /v1/chat/completions for shared fields.

Legacy Format (Still Supported):

{
  "model": "openai/gpt-4",
  "messages": [
    {"role": "user", "content": "Tell me a story"}
  ],
  "stream": false
}

Automatic Format Detection

The server automatically detects which format you're using:

  • OpenAI Format: Presence of input field → converts to internal format
  • Legacy Format: Presence of messages field → processes directly
  • Error: Missing both fields → returns 400 error with clear message

Examples

Simple Text Request:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio/qwen/qwen3-next-80b",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "What is Python?"}
        ]
      }
    ]
  }'

File Analysis:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Analyze the letter and summarize key points"},
          {"type": "input_file", "file_url": "https://www.berkshirehathaway.com/letters/2024ltr.pdf"}
        ]
      }
    ],
    "thinking": "off",
    "prompt_cache_key": "tenantA:doc-review"
  }'

Multiple Files:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-haiku-4-5",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Compare these documents"},
          {"type": "input_file", "file_url": "https://example.com/report1.pdf"},
          {"type": "input_file", "file_url": "https://example.com/report2.pdf"},
          {"type": "input_file", "file_url": "https://example.com/chart.png"}
        ]
      }
    ],
    "max_tokens": 2000
  }'

Streaming Response:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "input": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "Summarize this document"},
          {"type": "input_file", "file_url": "https://example.com/document.pdf"}
        ]
      }
    ],
    "stream": true
  }' --no-buffer

Supported Media Types

All file types supported via URL, local path, or base64:

  • Documents: PDF, DOCX, XLSX, PPTX
  • Data Files: CSV, TSV, JSON, XML
  • Text Files: TXT, MD
  • Images: PNG, JPEG, GIF, WEBP, BMP, TIFF
  • Size Limits: 10MB per file, 32MB total per request

Source Options:

// HTTP/HTTPS URL
{"type": "input_file", "file_url": "https://example.com/report.pdf"}

// Local file path
{"type": "input_file", "file_url": "/path/to/document.xlsx"}

// Base64 data URL
{"type": "input_file", "file_url": "data:application/pdf;base64,JVBERi0x..."}

Python Client Example

import os
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key=os.environ["ABSTRACTCORE_AUTH_TOKEN"])

# Direct request to /v1/responses endpoint
import requests

response = requests.post(
    "http://localhost:8000/v1/responses",
    json={
        "model": "gpt-4o",
        "input": [
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": "Analyze this document"},
                    {"type": "input_file", "file_url": "https://example.com/report.pdf"}
                ]
            }
        ]
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Embeddings

Endpoint: POST /v1/embeddings

Generate embedding vectors for semantic search, RAG, and similarity analysis.

Request:

{
  "input": "Text to embed",
  "model": "huggingface/sentence-transformers/all-MiniLM-L6-v2"
}

Supported Providers:

  • HuggingFace: Local models with ONNX acceleration
  • Ollama: ollama/granite-embedding:278m, etc.
  • LMStudio: Any loaded embedding model
  • OpenAI: openai/text-embedding-3-small, openai/text-embedding-3-large
  • OpenRouter: openrouter/openai/text-embedding-3-small, etc.
  • Portkey: portkey/... with your Portkey routing configuration
  • OpenAI-compatible: openai-compatible/... against configured/local /v1/embeddings endpoints

Anthropic does not expose a native embeddings API. Use OpenAI, OpenRouter, Portkey, an OpenAI-compatible endpoint, or a local embedding provider.

For endpoint-backed providers such as LM Studio, vLLM, and generic OpenAI-compatible servers, the embedding route does not require the embedding model to appear in a chat model catalogue before the request is sent. This supports embedding-only endpoints whose /models response is incomplete or chat-only.

OpenAI-compatible request fields are forwarded where supported:

  • dimensions
  • encoding_format
  • user
  • base_url (AbstractCore extension; loopback by default, allowlist required for non-loopback)

Parameters:

Field Required Notes
input yes String or array of strings. Arrays return one vector per input item.
model yes Provider/model id such as openai/text-embedding-3-small, openrouter/openai/text-embedding-3-small, portkey/..., openai-compatible/..., ollama/..., lmstudio/..., or huggingface/....
encoding_format no float by default; base64 is accepted where supported by the provider/backend.
dimensions no Requested output dimensions for providers that support native dimension reduction; local backends may truncate when appropriate.
user no End-user identifier forwarded to providers that support abuse monitoring.
base_url no OpenAI-compatible endpoint override with the same allowlist policy as chat.
api_key no Deprecated/disabled in the body. Use X-AbstractCore-Provider-API-Key for provider overrides.

Batch Embedding:

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["text 1", "text 2", "text 3"],
    "model": "ollama/granite-embedding:278m"
  }'

Model Discovery

Endpoint: GET /v1/models

List all available models from configured providers.

Query Parameters:

  • provider: Filter by provider (e.g., ollama, openai, anthropic, lmstudio, openai-compatible).
  • input_type: Filter by input capability: text, image, audio, or video.
  • output_type: Filter by output capability: text or embeddings.
  • base_url: Optional upstream base URL override for providers that support OpenAI-compatible discovery. Loopback is allowed by default; non-loopback requires ABSTRACTCORE_SERVER_BASE_URL_ALLOWLIST.
  • api_key: Optional upstream provider API key override for discovery. Requires provider=... so the override target is unambiguous. Prefer X-AbstractCore-Provider-API-Key.

Examples:

# All models
curl http://localhost:8000/v1/models

# Ollama models only
curl http://localhost:8000/v1/models?provider=ollama

# Embedding models only
curl http://localhost:8000/v1/models?output_type=embeddings

# Vision-capable input models
curl http://localhost:8000/v1/models?input_type=image

# Ollama embeddings
curl http://localhost:8000/v1/models?provider=ollama&output_type=embeddings

Provider Status

Endpoint: GET /providers

List all available providers and their status.

Query Parameters:

  • include_models (optional, default false): Include model lists for each provider. This is slower because it may query provider registries/endpoints.

Response:

{
  "providers": [
    {
      "name": "ollama",
      "type": "llm",
      "model_count": 15,
      "status": "available"
    }
  ]
}

Health Check

Endpoint: GET /health

Server health check for monitoring.

Response: includes status, server version, and enabled feature flags.


Runtime Control Plane

If you want the gateway itself to keep a local model warm, use:

  • POST /acore/models/load
  • GET /acore/models/loaded
  • POST /acore/models/unload

/acore/models/load creates or reuses a task-specific runtime. Omitted task keeps the existing text behavior, keyed by provider, model, optional base_url, and the explicit provider-key override when one is supplied. Later /v1/chat/completions calls that target the same provider/model automatically reuse that warm runtime instead of creating a fresh provider instance per request.

For text-generation runtimes, Core reports provider-owned loaded-model truth separately from gateway client cache state. A configured default model, model catalog row, reachable server, or cached Core client is not proof that the provider has a model loaded. Providers that can verify residency expose it through get_model_residency(...); providers that cannot verify it report provider_residency_verified=false, provider_resident=null, and loaded=false. When a provider exposes a native load/warm hook, /acore/models/load calls it and then verifies the result through the same residency contract.

For non-text tasks, the same route delegates to capability-owned load/list/unload: image_generation, video_generation, text_to_video, and image_to_video reuse the server's AbstractVision backend cache, while tts and stt delegate through the shared AbstractVoice capability core when the selected plugin exposes residency hooks. Remote OpenAI-compatible image/video/audio providers are reported as configured rather than locally loaded unless the upstream exposes a real loaded-state signal.

loaded_new is an event signal for the load call, not a synonym for loaded. For capability-backed tasks it is true only when the backend reports or clearly implies that this request transitioned the model from not loaded to loaded. Already-loaded models should return loaded_new=false.

Prompt Cache Control Plane

Prompt-cache routes support two modes:

  • direct gateway mode:
    • target a previously loaded runtime with provider + model
  • proxy mode:

In proxy mode, the gateway normalizes base_url, enforces the same base URL allowlist rules as other request-level routing, and forwards provider auth only from X-AbstractCore-Provider-API-Key or from Authorization when server auth is disabled.

Common fields:

Field Location Required Notes
runtime_id query or JSON body no Stable selector returned by /acore/models/load. Use this when multiple warm runtimes share the same provider + model.
provider + model query or JSON body yes, unless base_url is provided Select a loaded gateway-local runtime.
base_url query or JSON body yes, unless runtime_id or provider + model is provided Upstream AbstractEndpoint URL. It may include /v1; the proxy strips that suffix for control-plane calls. In local mode it can also disambiguate a warm runtime that was loaded with a base URL.
X-AbstractCore-Provider-API-Key header no Upstream endpoint token when required.
api_key query/body no Deprecated/disabled; do not use.
ttl_s JSON body no Optional upstream cache TTL in seconds, where supported.

Operations:

Endpoint Method Parameters Result
/acore/prompt_cache/capabilities GET provider + model or base_url Cache features on the selected local or upstream runtime.
/acore/prompt_cache/stats GET provider + model or base_url Cache stats on the selected local or upstream runtime.
/acore/prompt_cache/set POST provider + model or base_url, key, make_default, ttl_s Select/create a cache key locally or upstream.
/acore/prompt_cache/update POST provider + model or base_url, key, prompt or messages, system_prompt, tools, optional thinking, add_generation_prompt, ttl_s Prepare prompt/messages/tools into a local or upstream cache key.
/acore/prompt_cache/fork POST provider + model or base_url, from_key, to_key, make_default, ttl_s Fork an existing local or upstream key.
/acore/prompt_cache/clear POST provider + model or base_url, optional key Clear a local or upstream key, or default/all cache state depending on backend support.
/acore/prompt_cache/prepare_modules POST provider + model or base_url, namespace, modules, make_default, ttl_s, version Prepare reusable module/tool context locally or upstream.

Example:

curl -X POST http://localhost:8000/acore/prompt_cache/update \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "http://127.0.0.1:8001/v1",
    "key": "project-default",
    "messages": [{"role": "system", "content": "You are concise."}],
    "thinking": "off",
    "ttl_s": 3600
  }'

thinking on /acore/prompt_cache/update is applied before the provider appends the cached fragment. This keeps cache-prefilled prompt state aligned with later /v1/chat/completions or /v1/responses calls when reasoning control changes prompt serialization.

Memory Blocs Control Plane

Memory-bloc routes also support two modes:

  • direct gateway mode:
    • POST /acore/models/load
    • local POST /acore/blocs/upsert_text
    • local POST /acore/blocs/kv/ensure
    • local POST /acore/blocs/kv/load
    • then normal /v1/chat/completions
  • proxy mode:
    • the same /acore/blocs/* routes with base_url targeting an upstream AbstractEndpoint

That distinction matters:

  • gateway-local bloc records live in the gateway bloc store
  • gateway-local loaded cache keys live on the selected loaded runtime
  • proxy-mode loaded cache keys live on the upstream AbstractEndpoint

Operations:

Endpoint Method Parameters Result
/acore/blocs/upsert_text POST optional base_url, path, content, optional bloc metadata Persist extracted text into the local bloc store or upstream bloc store.
/acore/blocs/record GET optional base_url, sha256 or bloc_id Inspect a local or upstream bloc record.
/acore/blocs/kv/manifest GET runtime_id or provider + model or base_url, sha256 or bloc_id, optional artifact_path Inspect the local or upstream KV manifest for the selected model.
/acore/blocs/kv/ensure POST runtime_id or provider + model or base_url, sha256 or bloc_id, optional artifact_path, force_rebuild, debug Compile or validate the durable provider/model bloc KV artifact locally or upstream.
/acore/blocs/kv/load POST runtime_id or provider + model or base_url, sha256 or bloc_id, optional artifact_path, stable_cache_key, key, make_default, force_rebuild, debug Load or fork the local or upstream artifact into a prompt-cache key and return prompt_cache_binding.

Typical direct gateway flow:

  1. POST /acore/models/load
  2. POST /acore/blocs/upsert_text
  3. POST /acore/blocs/kv/ensure
  4. POST /acore/blocs/kv/load
  5. call /v1/chat/completions with returned artifact.prompt_cache_binding when exact binding is required

Typical remote flow:

  1. POST /acore/blocs/upsert_text
  2. POST /acore/blocs/kv/ensure
  3. POST /acore/blocs/kv/load
  4. call /v1/chat/completions with returned artifact.prompt_cache_binding when exact binding is required

Example:

curl -X POST http://localhost:8000/acore/blocs/kv/load \
  -H "Authorization: Bearer $ABSTRACTCORE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "http://127.0.0.1:8001/v1",
    "sha256": "abababababababababababababababababababababababababababababababab",
    "stable_cache_key": "stable:orbit",
    "key": "work:orbit",
    "make_default": false,
    "debug": true
  }'

The load response includes:

  • artifact.key: the worker-local runtime cache key
  • artifact.binding_id: the opaque exact-artifact identity
  • artifact.prompt_cache_binding: object to pass to chat as prompt_cache_binding
  • artifact.debug: verbose proof fields when debug=true

Supported local artifact backends share this route shape: MLX, HuggingFace transformers, and HuggingFace GGUF exact-renderer paths. Remote providers and unsupported GGUF chat formats remain best-effort prompt_cache_key paths.


Agentic CLI integration

AbstractCore Server is OpenAI-compatible. Most OpenAI-compatible CLIs/SDKs can be pointed at it by setting:

  • OPENAI_BASE_URL="http://localhost:8000/v1" (or an equivalent flag)
  • OPENAI_API_KEY="unused" (many clients require a non-empty key even for local servers)

Tool calling interoperability

  • The server does not execute tools (it always returns tool calls; your host/runtime executes them).
  • It can emit tool calls either as structured tool_calls (OpenAI/Codex style) or as tagged content for clients that parse tool calls from assistant text.
  • Control the output format with agent_format (request body, AbstractCore extension), or rely on auto-detection (user-agent + model heuristics).

Supported agent_format values: auto, openai, codex, qwen3, llama3, gemma, xml, passthrough.

Codex CLI (example)

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="unused"

codex --model "ollama/qwen3-coder:30b" "Write a factorial function"

Forcing a format (curl)

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3:4b-instruct-2507-q4_K_M",
    "messages": [{"role": "user", "content": "Use the tool."}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather by city",
          "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]
          }
        }
      }
    ],
    "agent_format": "llama3"
  }'

Deployment

Docker

Release images are published to GitHub Container Registry after the matching PyPI release succeeds:

ghcr.io/lpalbou/abstractcore-server:<version>

The image is built from PyPI, not from the repository checkout, and installs:

abstractcore[server,remote,media,tokens,compression]==<version>

It includes remote chat/responses, remote embeddings, remote STT/TTS routing, remote OpenAI-compatible image proxying, server dependencies, media parsing, token counting, and compression helpers. It intentionally does not include AbstractCore local LLM runtimes (vllm, mlx, huggingface), local embedding dependencies (sentence-transformers), or optional capability plugin entry points. Remote image/audio OpenAI-compatible endpoint routes still work without those plugins. Build a custom image with abstractcore[server,remote,media,tokens,compression,voice,vision] when you want plugin-backed media catalogs or plugin default routes; these capability extras stay remote-light. Add explicit local aggregate profiles such as abstractcore[all-apple] or abstractcore[all-gpu] only when you want local native inference engines.

Run:

docker pull ghcr.io/lpalbou/abstractcore-server:2.13.12

For local development, keep secrets in an uncommitted .env file:

ABSTRACTCORE_AUTH_TOKEN=replace-with-a-server-token
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-...
ANTHROPIC_API_KEY=sk-ant-...
PORTKEY_API_KEY=pk_...
PORTKEY_CONFIG=pcfg_...
OPENAI_BASE_URL=http://host.docker.internal:1234/v1

Then run the image with that environment file:

docker run --rm --name abstractcore-server \
  -p 127.0.0.1:8000:8000 \
  --env-file .env \
  ghcr.io/lpalbou/abstractcore-server:2.13.12

ABSTRACTCORE_AUTH_TOKEN is the AbstractCore server auth token. Clients send it as Authorization: Bearer <token>. At /docs, use Swagger UI's normal Authorize button when server auth is enabled; AbstractCore validates that bearer token before Swagger marks it authorized. Provider keys such as OPENAI_API_KEY, OPENROUTER_API_KEY, ANTHROPIC_API_KEY, and PORTKEY_API_KEY stay inside the server container.

Set ABSTRACTCORE_SERVER_PROTECT_DOCS=1 if /docs, /redoc, and /openapi.json should require the same server token.

For local OpenAI-compatible endpoints such as LM Studio or Ollama's /v1 server, point the container at a URL reachable from Docker:

docker run --rm --name abstractcore-server \
  -p 127.0.0.1:8000:8000 \
  -e ABSTRACTCORE_AUTH_TOKEN="$ABSTRACTCORE_AUTH_TOKEN" \
  -e OPENAI_BASE_URL="http://host.docker.internal:1234/v1" \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  ghcr.io/lpalbou/abstractcore-server:2.13.12

Docker Compose

version: '3.8'

services:
  abstractcore:
    image: ghcr.io/lpalbou/abstractcore-server:2.13.12
    ports:
      - "8000:8000"
    environment:
      - ABSTRACTCORE_AUTH_TOKEN=${ABSTRACTCORE_AUTH_TOKEN}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - PORTKEY_API_KEY=${PORTKEY_API_KEY}
      - PORTKEY_CONFIG=${PORTKEY_CONFIG}
      - OPENAI_BASE_URL=${OPENAI_BASE_URL}
    restart: unless-stopped

Production with Gunicorn

pip install gunicorn

gunicorn abstractcore.server.app:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000

Debug and Monitoring

Enable Debug Mode

Debug mode provides comprehensive logging and detailed error reporting for troubleshooting API issues.

# Method 1: Using command line flag (recommended)
python -m abstractcore.server.app --debug

# Method 2: Using environment variable
export ABSTRACTCORE_DEBUG=true
python -m abstractcore.server.app

# Method 3: With uvicorn directly
export ABSTRACTCORE_DEBUG=true
uvicorn abstractcore.server.app:app --host 0.0.0.0 --port 8000

Debug Features

Enhanced Error Reporting:

  • Before: Uninformative "422 Unprocessable Entity" messages
  • After: Detailed field validation errors with request body capture

Example Debug Output:

🔴 Request Validation Error (422) | method=POST | error_count=2 | errors=[
  {"field": "body -> model", "message": "Field required", "type": "missing"},
  {"field": "body -> messages", "message": "Field required", "type": "missing"}
] | client=127.0.0.1

📋 Request Body (Validation Error) | body={"invalid": "data"}

Request/Response Tracking:

  • Full HTTP request details (method, URL, headers, client IP)
  • Response status codes and processing times
  • Structured JSON logging for machine processing

Log Files:

  • logs/abstractcore_TIMESTAMP.log - Structured events
  • logs/YYYYMMDD-payloads.jsonl - Full request bodies
  • logs/verbatim_TIMESTAMP.jsonl - Complete I/O

Useful Commands:

# Find errors
grep '"level": "error"' logs/abstractcore_*.log

# Track token usage
cat logs/verbatim_*.jsonl | jq '.metadata.tokens | .input + .output' | \
  awk '{sum+=$1} END {print "Total:", sum}'

# Monitor specific model
grep '"model": "qwen3-coder:30b"' logs/verbatim_*.jsonl

Common Patterns

Multi-Provider Fallback

import requests

providers = [
    "ollama/qwen3-coder:30b",
    "openai/gpt-4o-mini",
    "anthropic/claude-haiku-4-5"
]

def generate_with_fallback(prompt):
    for model in providers:
        try:
            response = requests.post(
                "http://localhost:8000/v1/chat/completions",
                json={"model": model, "messages": [{"role": "user", "content": prompt}]},
                timeout=30
            )
            if response.status_code == 200:
                return response.json()
        except Exception:
            continue
    raise Exception("All providers failed")

Local Model Gateway

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b

# Use via AbstractCore server
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3-coder:30b",
    "messages": [{"role": "user", "content": "Write a Python function"}]
  }'

Troubleshooting

Server Won't Start

# Check port availability
lsof -i :8000

# Use different port
uvicorn abstractcore.server.app:app --port 3000

No Models Available

# Check providers
curl http://localhost:8000/providers

# Check API keys
echo $OPENAI_API_KEY

# Start Ollama
ollama serve
ollama list

Authentication Errors

# Set API keys
export ABSTRACTCORE_AUTH_TOKEN="acore-server-secret"
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Restart server after setting keys

Why AbstractCore Server?

  • Universal: One API for all providers
  • OpenAI Compatible: Drop-in replacement
  • Simple: Clean, focused endpoints
  • Fast: Lightweight, high-performance
  • Debuggable: Comprehensive logging
  • CLI Ready: Codex, Gemini CLI, Crush support
  • Production Ready: Docker, multi-worker, health checks

Related Documentation


AbstractCore Server - One server, all models, any client.