Calibrating Autonomy and Predictability in Agent Systems
An experiment in building software where the LLM is the core, not an add-on.
This repo is the companion implementation. The paper covers the conceptual framework: the fundamental inversion, the determinism model, the design spectrum, and when ANA fits (or doesn't).
Classical apps are schema-first: you define data models, build features, then maybe add AI. The schema dictates what's possible.
This project inverts that. It's semantic-first: the LLM's understanding of language is the core. Structured data becomes an output of understanding, not an input required from users.
I'm using a todo/personal assistant as a familiar example to explore what this means in practice.
# Clone and enter
cd agent-native-app
# Install dependencies
uv sync
# Copy example env and configure
cp .env.example .env
# Edit .env with your API key (get one at https://openrouter.ai/keys)
# Run
uv run python main.pyTalk naturally:
You: Add a task to review the quarterly report
Assistant: Created a task: "review the quarterly report"
You: I prefer to do deep work in the morning
Assistant: I've added that to your Global Context β I'll factor in your
preference for morning deep work when making suggestions.
You: What should I focus on today?
Assistant: Based on your preference for morning deep work,
I'd suggest tackling the quarterly report review first...
The assistant has 7 primitive tools and builds everything else through reasoning:
| Tool | Purpose |
|---|---|
create_item |
Store anything (task, note, idea, reminder) |
update_item |
Modify content or properties |
delete_item |
Remove |
query_items |
Find by meaning or properties |
append_context |
Add knowledge to Global Context |
replace_context |
Update a line in Global Context |
delete_context |
Remove a line from Global Context |
βββββββββββββββββββββββββββββββββββββββββββ
β CLI β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββ
β LLM Agent β
β (OpenRouter, any model) β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββ
β 7 Primitive Tools β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββ
β Store (Protocol) β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββ
β ChromaDB β
β (items + global_context collections) β
βββββββββββββββββββββββββββββββββββββββββββ
I'm testing whether a vector database can serve as the single persistence layer.
Traditional approach:
- SQLite for structured data
- Vector DB for embeddings
- Two systems to sync
This experiment:
- ChromaDB stores everything
- Every item is semantically searchable by default
- "Find tasks similar to this" just works
Property Embedding for Semantic Search:
A key insight: ChromaDB's semantic search only operates on document content, not metadata. Dates like due_date: "2026-01-13" stored in metadata are invisible to queries like "what's due on 2026-01-13?"
The solution: automatically embed properties into the document content before storing, with dates converted to human-readable format. The agent never sees thisβproperties are stripped on retrieval.
# What gets stored (for semantic search)
Review quarterly report
---ANA_PROPS---
type: task
status: active
due date: Tuesday January 13 2026
# What the agent sees (clean API)
content: "Review quarterly report"
properties: {type: task, status: active, due_date: 2026-01-13}
Now "what's due Tuesday?" has real semantic similarity to find.
Tradeoffs I'm accepting:
- Flat metadata (no nested objects)
- Less battle-tested for CRUD
- Embedding cost for every item
What I hope to gain:
- Semantic search on items and their properties for free
- Simpler architecture
- Natural language queries everywhere
The hedge: I built behind an abstract Store protocol. If ChromaDB doesn't work, we can swap to SQLite without touching tools or agent code.
agent_native_app/
βββ config.py # Configuration from .env
βββ logging_config.py # Central logging setup
βββ store.py # Store protocol + ChromaStore (with property embedding)
βββ tools.py # 7 primitives + OpenAI-compatible schemas
βββ agent.py # OpenRouter agent with tool calling
βββ cli.py # Interactive REPL
βββ prompts/
βββ system.md # "How to think" prompt
scripts/
βββ db_describe.py # Inspect ChromaDB collections
βββ migrate_embed_props.py # Migration script for property embedding
tests/
βββ test_store.py # Store module tests (33 tests)
Configuration is managed via .env file (copy from .env.example):
| Variable | Description |
|---|---|
OPENROUTER_API_KEY |
Your OpenRouter API key (get one here) |
OPENROUTER_MODEL |
Model to use (e.g., anthropic/claude-sonnet-4) |
LOG_LEVEL_APP |
Log level for app code (DEBUG, INFO, WARNING, ERROR) |
LOG_LEVEL_DEPS |
Log level for dependencies (default: INFO) |
LOG_TO_CONSOLE |
Whether to log to stderr (true/false) |
LOG_FILE_PATH |
Path to log file (e.g., logs/app.log) |
Data: Stored in .data/ directory (ChromaDB persistent storage) with two collections:
itemsβ Tasks, notes, reminders, ideasglobal_contextβ Always-present knowledge that shapes agent reasoning
Inspect the database:
uv run python scripts/db_describe.py # Overview
uv run python scripts/db_describe.py -s 3 # With 3 sample items per collectionoutputs something like:
β― uv run python scripts/db_describe.py
Database: .data
Collections: 2
============================================================
π Collection: global_context
----------------------------------------
Config: {'hnsw:space': 'cosine'}
Items: 1
Metadata fields:
- created_at: str
- item_type: str
- updated_at: str
item_type values: global_context
π Collection: items
----------------------------------------
Config: {'hnsw:space': 'cosine'}
Items: 6
Metadata fields:
- context: str
- created_at: str
- due_date: str
- priority: int
- project: str
- status: str
- type: str
- updated_at: str
type values: task
status values: active, in-progress
============================================================
Chroma CLI: https://docs.trychroma.com/docs/cli/installThe system prompt teaches the assistant how to think, not what to do:
- Items are flexible containers, not rigid task records
- Properties emerge from context (type, status, priority, project...)
- Global Context captures user patterns and preferences (always present, not retrieved)
- Explain reasoning, but don't be verbose
- Ask clarifying questions rather than guess
- Be an advisor, not an autocrat
- Global Context Design β Always-present knowledge layer
- Architecture Diagrams β Visual overview of ChromaDB and tool flow
The agent uses the OpenAI function calling format to communicate tools to the LLM.
Each tool is defined as a JSON schema in agent_native_app/tools.py:
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "create_item",
"description": "Create a new item...",
"parameters": {
"type": "object",
"properties": {
"content": {"type": "string", "description": "..."},
"properties": {"type": "object", "description": "..."}
},
"required": ["content"]
}
}
},
# ... more tools (7 total)
]A separate TOOLS dictionary maps names to Python functions for execution:
TOOLS = {
"create_item": create_item,
"update_item": update_item,
# ...
}In agent_native_app/agent.py, tools are sent on every API call:
response = self._client.chat.completions.create(
model=self._model,
messages=messages,
tools=TOOL_SCHEMAS, # All 7 tool schemas
tool_choice="auto" # LLM decides when to use them
)User Input
β
βΌ
βββββββββββββββββββββββββββββββββββ
β API Call β
β - System prompt β
β - Message history β
β - Tool schemas βββββββββββββββββ
βββββββββββββββββ¬ββββββββββββββββββ β
β β
βΌ β
LLM Response β
β β
βββββββββ΄ββββββββ β
β β β
Has tool_calls? No tool_calls β
β β β
βΌ βΌ β
Execute tools Return content β
β (done) β
βΌ β
Add results as β
"tool" messages ββββββββββββββββββββββββββββββββ
When the LLM wants to use a tool, it returns a tool_calls array. The agent:
- Parses each tool call (name + JSON arguments)
- Looks up the function in
TOOLS - Executes it:
TOOLS[name](**arguments) - Appends the result as a
"role": "tool"message - Loops back to the LLM with updated history
The loop continues until the LLM responds with content and no tool calls.
messages = [
{"role": "system", "content": "..."}, # How to think
{"role": "user", "content": "Add a task..."}, # User input
{"role": "assistant", "tool_calls": [...]}, # LLM requests tool
{"role": "tool", "tool_call_id": "...", # Tool result
"content": "{\"id\": \"abc123\", ...}"},
{"role": "assistant", "content": "Created..."} # Final response
]This is the standard OpenAI tool calling protocol, which works with any compatible API (OpenRouter, OpenAI, Anthropic via adapters, etc.).
MIT
