Get your first AI response in under 5 minutes.
pip install aicortex-corefrom aicortex import chat
response = chat("What is quantum computing?")
print(response)That's it. No API key. No config file. No server setup required.
AI Cortex automatically selects a model and routes your request to an available server from its bundled registry of community-hosted cloud endpoints.
AI Cortex comes pre-loaded with metadata for hundreds of models across five families. You can specify exactly which one you want:
from aicortex import chat, families, models
# See what model families are available
print(families())
# → ['llama', 'mistral', 'gemma', 'deepseek', 'qwen']
# List all models in a family
print(models("mistral"))
# → ['mistral:7b', 'mistral:instruct', ...]
# Chat with a specific model
response = chat(
"Explain transformer architecture in plain English.",
model="mistral:7b",
temperature=0.6,
max_tokens=300,
)
print(response)Streaming gives you token-by-token output as the model generates — perfect for chatbots and interactive UIs:
from aicortex import chat
stream = chat("Write a short poem about the ocean.", stream=True)
for event in stream:
if event.type == "start":
print("🟢 Generating...\n")
elif event.type == "token":
print(event.content, end="", flush=True)
elif event.type == "end":
print("\n\n✅ Done!")💡 Tip: Use
stream.text()to get the full concatenated response after iterating:full_text = stream.text()
from aicortex import get_model_info, list_model_servers, get_server_info
# Full metadata for a model (size, family, quantization, etc.)
info = get_model_info("llama3.2:3b")
print(info)
# See all servers hosting a specific model — cloud and local
servers = list_model_servers("llama3.2:3b")
for s in servers:
print(f" {s['url']} — {s['location']['city']}, {s['location']['country']}")
# Get connection params for use with LangChain's OllamaLLM
from aicortex import get_llm_params
params = get_llm_params("llama3.2:3b")
print(params) # {'model': 'llama3.2:3b', 'base_url': 'http://...'}Turn AI Cortex into an OpenAI-compatible REST API with one call:
pip install aicortex-core[server]from aicortex.tools import run_server
run_server(host="127.0.0.1", port=8000, default_model="llama3.2:3b")Then use it with curl, the openai Python SDK, or any OpenAI-compatible tool:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'| Task | How |
|---|---|
| Simple chat | chat("your prompt") |
| Specific model | chat("...", model="mistral:7b") |
| Streaming | chat("...", stream=True) |
| List models | models() / models("llama") |
| Model info | get_model_info("model:tag") |
| Server mode | run_server(port=8000) |
- 📖 Basic Usage — all parameters, error handling, and advanced patterns
- 🔀 Streaming — deep dive into
StreamEventtypes and real-time patterns - 🤖 Model Management — how the model registry works
- 🖥️ Server Mode — full OpenAI-compatible proxy docs
- 🔧 Tools — update the model database with live endpoint scanning