A fully containerised, multi-service simulation that demonstrates how easy it is to build your own AI agent using the Strands Agents framework. The agent manages a water-gallon delivery operation: it receives requests from autonomous clients, decides how to fulfil them, tracks inventory, and records everything to a vector knowledge base — all in real time, all observable through a live dashboard.
This project was built as a hands-on showcase. The core message: you don't need a complex platform to run your own agent. A handful of Python files, a @tool decorator, and a model of your choice is all it takes. The rest of the stack (database, knowledge base, monitoring) is ordinary software that you already know how to build.
Agents are often presented as magical black boxes sitting behind a managed platform. This project cuts through that and shows the mechanics directly:
- What does a tool actually look like in code?
- How does the agent decide what to call?
- How do you wire an agent into an HTTP service?
- How do you give an agent memory across calls?
- How do you swap the underlying model without rewriting your agent?
The simulation is intentionally concrete. Gallons are delivered, inventory is depleted, clients die if they run dry for too long. You can watch it happen in a browser. Every agent decision is logged. The knowledge base accumulates real behavioral history. It is small enough to read in an afternoon and real enough to learn from.
Five clients start with a small stock of water gallons. Each tick they consume some (with one random "burst" day per week). When stock drops below a threshold they call the agent to request a delivery. The agent checks inventory, fulfils or rejects, and logs the event to a vector knowledge base. A background monthly loop summarises operations — total deliveries, gallons, top client, anomalies — and records that summary to the knowledge base. A dashboard shows everything live.
tick every N seconds
│
┌────────────▼────────────┐
│ gallon-clients │ 5 concurrent async clients
│ (consume, request, die)│ PostgreSQL → gallon_clients DB
└────────────┬────────────┘
│ POST /invocations
┌────────────▼────────────┐
│ gallon-agent │ Strands agent + tools
│ (deliver, monthly loop)│ connects to inference provider
└────────────┬────────────┘
│ HTTP
┌────────────▼────────────┐
│ gallon-service │ FastAPI data layer
│ inventory + knowledge │ PostgreSQL → gallon_service DB
│ base │ LanceDB (vector store)
└─────────────────────────┘
┌─────────────────────────┐
│ gallon-monitor │ dashboard at :8888
│ reads all three above │
└─────────────────────────┘
workshop/
├── docker-compose.yml # one command to run everything
├── .env # all runtime config in one place
├── init-scripts/
│ └── init-db.sql # creates gallon_service + gallon_clients DBs
│
├── gallon-agent/ # the agent service
│ └── gallon_agent/
│ ├── agent.py # model construction + agent factory
│ ├── tools.py # @tool definitions (the agent's hands)
│ ├── server.py # FastAPI: /invocations + /ping + weekly loop
│ ├── service_client.py # HTTP client for gallon-service
│ └── config.py # pydantic-settings
│
├── gallon-service/ # data layer (no agent logic)
│ └── gallon_service/
│ ├── server.py # inventory, deliveries, KB endpoints
│ ├── db.py # asyncpg queries
│ ├── vector_store.py # LanceDB + fastembed
│ └── config.py
│
├── gallon-clients/ # simulation clients
│ └── gallon_clients/
│ ├── runner.py # spawns N concurrent clients
│ ├── client.py # GallonClient: consume, request, die
│ ├── agent_gateway.py # calls /invocations on the agent
│ └── db.py # client state in PostgreSQL
│
└── gallon-monitor/ # live dashboard
└── gallon_monitor/
├── server.py # aggregates state from all services
└── dashboard.html # single-page dashboard (vanilla JS)
- Docker + Docker Compose (or Colima on Mac)
- An inference provider — see Inference providers below
git clone <repo>
cd workshop
cp .env.example .env # then edit .env — see belowdocker compose up -d --buildhttp://localhost:8888
That is it. Clients start running immediately. The agent processes deliveries in the background and runs a monthly summary every 30 ticks. The dashboard refreshes every two seconds.
docker compose restart gallon-clientsThe agent supports three inference providers, switched entirely through environment variables — no code changes.
Best for development on Apple Silicon. Rapid-MLX runs models natively via MLX.
# install and start the model (run on the host, not in Docker)
pip install rapid-mlx
rapid-mlx serve qwen3-0.6b-8bit --port 8000.env:
INFERENCE_PROVIDER=local
MODEL_ID=qwen3-0.6b-8bit
INFERENCE_PORT=8000
Works with any OpenAI-compatible local server (Ollama, LM Studio, vLLM, etc.). For Ollama, set MODEL_ID=smollm2:135m and ensure it is pulled first.
Note on small models: models under ~4B parameters often struggle with reliable tool calling. The simulation works around this by making deterministic decisions (restock, OOS probability) in Python and only asking the LLM to call a single tool per invocation.
INFERENCE_PROVIDER=openai
MODEL_ID=gpt-4o-mini
OPENAI_API_KEY=sk-...
INFERENCE_PROVIDER=bedrock
MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
Cross-region inference profile IDs (the us. / apac. / eu. prefix) are required for most Bedrock models. Find yours in the Bedrock console under Cross-region inference.
All config lives in .env at the workspace root. Every value has a sensible default so you only need to set what you want to change.
| Variable | Default | Description |
|---|---|---|
INFERENCE_PROVIDER |
local |
local / openai / bedrock |
MODEL_ID |
qwen3-0.6b-8bit |
Model identifier for the chosen provider |
INFERENCE_PORT |
8000 |
Port of the local inference server |
OPENAI_API_KEY |
— | Required when INFERENCE_PROVIDER=openai |
AWS_REGION |
us-east-1 |
Required when INFERENCE_PROVIDER=bedrock |
AWS_ACCESS_KEY_ID |
— | Required when INFERENCE_PROVIDER=bedrock |
AWS_SECRET_ACCESS_KEY |
— | Required when INFERENCE_PROVIDER=bedrock |
NUM_CLIENTS |
5 |
Number of simulation clients |
SIM_DAYS |
0 |
Ticks per client before stopping (0 = indefinite) |
TICK_SECONDS |
5.0 |
Real seconds per simulation day |
SIM_TICK_SECONDS |
1.0 |
Real seconds per simulation day (agent side) |
INITIAL_STOCK |
5.0 |
Starting gallons per client |
DAILY_CONSUMPTION |
1.0 |
Gallons consumed on a normal day |
BURST_CONSUMPTION |
5.0 |
Gallons consumed on burst day |
REORDER_THRESHOLD |
2.0 |
Stock level that triggers a delivery request |
DELIVERY_AMOUNT_MIN |
3.0 |
Minimum gallons requested per delivery (randomised per request) |
DELIVERY_AMOUNT_MAX |
10.0 |
Maximum gallons requested per delivery |
DEATH_THRESHOLD_DAYS |
3 |
Consecutive zero-stock days before a client dies |
OOS_PROBABILITY |
0.2 |
Probability agent is unavailable per check_inventory call |
RESTOCK_THRESHOLD |
20.0 |
Inventory level that triggers automatic restock |
MAX_CAPACITY |
100.0 |
Warehouse max capacity (gallons) |
EMBED_MODEL |
BAAI/bge-small-en-v1.5 |
fastembed model for the knowledge base |
POSTGRES_PASSWORD |
postgres |
Shared PostgreSQL password |
The agent exposes two HTTP endpoints following the Bedrock AgentCore contract:
GET /ping— health check, returns{"status": "Healthy"}POST /invocations— all client interactions, prompt in body
Clients send plain text prompts:
| Client action | Prompt | Agent response |
|---|---|---|
| Check warehouse | check_inventory CLIENT_ID:uuid |
{"available_gallons": 80.0, "is_out_of_stock": false, "unavailability_reason": null} |
| Request delivery | deliver CLIENT_ID:uuid CLIENT_NAME:Alice GALLONS:5.0 |
{"delivered": true, "gallons_delivered": 5.0, "status": "delivered"} |
Tools are plain Python functions decorated with @tool. The Strands framework automatically generates the JSON schema from the type annotations and docstring, passes them to the model, and calls the function when the model requests it.
from strands import tool
@tool
def deliver_gallons(client_id: str, client_name: str, gallons_requested: float, sim_day: int) -> dict:
"""
Execute a delivery request: deduct gallons from inventory and log it.
Returns gallons_delivered and status.
"""
# ... implementation calls gallon-service HTTP endpoints
# auto-restocks if inventory drops below threshold after deliveryThe full tool set:
| Tool | Purpose |
|---|---|
deliver_gallons |
Fulfil a delivery: check live inventory, deduct, log to DB and KB, auto-restock if needed |
record_monthly_summary |
Persist the monthly operational summary to the knowledge base |
Rather than one general-purpose agent, the simulation uses two single-tool agents. This makes the model's job trivial: read the values in the prompt, call the one tool, done. It works reliably even with very small models.
def build_delivery_agent() -> Agent:
return Agent(model=_make_model(), system_prompt=_delivery_prompt(), tools=[deliver_gallons])
def build_monthly_agent() -> Agent:
return Agent(model=_make_model(), system_prompt=_monthly_prompt(), tools=[record_monthly_summary])Not everything should be delegated to the model. The simulation handles several things in Python and passes pre-computed values to the agent as facts:
- Restock: triggered automatically inside
deliver_gallonswhen inventory drops below threshold. No LLM needed. - Agent unavailability:
random.random() < oos_probabilitypicks a random human-readable reason (holiday, force majeure, etc.) atcheck_inventorytime. A dice roll, not a judgment call. - Monthly stats: all arithmetic (top client, failure rate, anomaly detection) is computed in Python. The LLM only calls
record_monthly_summarywith the pre-computed values.
The LLM's job is to call the right tool with the right arguments — that is what it is actually good at.
Strands agents are synchronous. The simulation runs them in asyncio.to_thread so the FastAPI event loop stays unblocked, and serialises concurrent calls with asyncio.Lock so only one agent runs at a time.
Inference provider is selected at startup from the INFERENCE_PROVIDER env var. The _make_model() factory returns the appropriate Strands model object — OpenAIModel for local and OpenAI, BedrockModel for Bedrock. Everything above that layer (agents, tools, server) is provider-agnostic.
Every delivery and every weekly agent decision is written to LanceDB, a local vector database. Text is embedded with fastembed using BAAI/bge-small-en-v1.5 (384 dimensions, runs on CPU, no GPU needed).
Two collections:
client_behaviors — one record per delivery attempt
"Alice on day 42 requested 5.0g, received 5.0g (delivered)"
metadata: client_id, client_name, sim_day, gallons_requested, gallons_delivered, status
agent_decisions — one record per monthly summary
"Month 3: 47 deliveries, 218.5g delivered, 12.0g restocked, 6% failure rate.
Top client: Alice (64.0g). Anomalies: No anomalies detected"
metadata: month_number, total_gallons_delivered, total_gallons_restocked,
total_deliveries, failed_deliveries, top_client_name, top_client_gallons,
anomaly_summary
The knowledge base is browsable in the dashboard (bottom two panels) and queryable via:
curl -s http://localhost:8081/kb/client_behaviors?limit=10
curl -s http://localhost:8081/kb/agent_decisions?limit=10
# semantic search
curl -s -X POST http://localhost:8081/kb/search \
-H 'Content-Type: application/json' \
-d '{"collection": "client_behaviors", "query_text": "out of stock", "limit": 5}'| Method | Path | Description |
|---|---|---|
GET |
/inventories |
Current inventory |
POST |
/inventories/restocks |
Refill to max capacity |
POST |
/deliveries |
Log a delivery (atomically deducts inventory) |
GET |
/deliveries |
Delivery history (optional ?client_id= filter) |
GET |
/kb/{collection} |
Recent KB entries (?limit=50) |
POST |
/kb/ingest |
Add a record to the KB |
POST |
/kb/search |
Semantic search |
POST |
/kb/embed |
Embed a string (returns vector) |
| Method | Path | Description |
|---|---|---|
GET |
/ping |
Health check |
POST |
/invocations |
Agent entry point |
GET |
/conversations |
Last 200 interactions (for the dashboard) |
Open http://localhost:8888. Refreshes every 2 seconds.
- Header — agent health badge + last-updated time
- Summary stats — alive clients, dead clients, total requests, gallons delivered
- Inventory — live gauge with capacity bar
- Clients table — per-client stock, status, request/fulfil counts
- Delivery feed — last 20 deliveries, newest first
- Agent conversation log — all agent interactions including monthly summaries, rendered as a chat
- Knowledge base — last 30 entries from each KB collection, with metadata
All scrollable panels are resizable (drag the bottom edge).
The gallon delivery domain is a placeholder. The pattern is the point.
-
Define your tools in
tools.py— plain Python functions with@tooland type annotations. The framework handles schema generation and invocation. -
Write a focused system prompt — tell the agent what it is and what single action it should take. One tool per agent is more reliable than a Swiss-army prompt.
-
Wire it into FastAPI in
server.py— callasyncio.to_thread(agent, prompt)from your handler. Clearagent.messagesafter each call if you want stateless invocations. -
Choose your model in
agent.pyvia the provider factory. Switch with an env var, not a code change. -
Separate data access into its own service (
gallon-service) so the agent never touches the database directly. This makes the agent testable, replaceable, and deployable independently.
| Layer | Technology |
|---|---|
| Agent framework | Strands Agents |
| Local inference | Rapid-MLX (Apple Silicon) |
| Cloud inference | Amazon Bedrock, OpenAI |
| HTTP server | FastAPI + Uvicorn |
| HTTP client | httpx |
| Relational DB | PostgreSQL 16 via asyncpg |
| Vector DB | LanceDB |
| Embeddings | fastembed (BAAI/bge-small-en-v1.5) |
| Config | pydantic-settings |
| Packaging | uv + hatchling |
| Containers | Docker Compose |
If you are using Colima instead of Docker Desktop:
# first-time setup — allocate enough disk and mount your repo volume
colima start --disk 100 --mount /Volumes/Repository:w --mount $HOME:w
# start Rapid-MLX on the host (accessible from containers via host.docker.internal)
rapid-mlx serve qwen3-0.6b-8bit --port 8000
# then start the stack
docker compose up -d --buildIf Colima was previously stopped and the runtime directory is missing:
mkdir -p ~/tmp/colima
colima start --disk 100 --mount /Volumes/Repository:w --mount $HOME:w