Gallon Delivery Simulation

A fully containerised, multi-service simulation that demonstrates how easy it is to build your own AI agent using the Strands Agents framework. The agent manages a water-gallon delivery operation: it receives requests from autonomous clients, decides how to fulfil them, tracks inventory, and records everything to a vector knowledge base — all in real time, all observable through a live dashboard.

This project was built as a hands-on showcase. The core message: you don't need a complex platform to run your own agent. A handful of Python files, a @tool decorator, and a model of your choice is all it takes. The rest of the stack (database, knowledge base, monitoring) is ordinary software that you already know how to build.

Background

Agents are often presented as magical black boxes sitting behind a managed platform. This project cuts through that and shows the mechanics directly:

What does a tool actually look like in code?
How does the agent decide what to call?
How do you wire an agent into an HTTP service?
How do you give an agent memory across calls?
How do you swap the underlying model without rewriting your agent?

The simulation is intentionally concrete. Gallons are delivered, inventory is depleted, clients die if they run dry for too long. You can watch it happen in a browser. Every agent decision is logged. The knowledge base accumulates real behavioral history. It is small enough to read in an afternoon and real enough to learn from.

What the simulation does

Five clients start with a small stock of water gallons. Each tick they consume some (with one random "burst" day per week). When stock drops below a threshold they call the agent to request a delivery. The agent checks inventory, fulfils or rejects, and logs the event to a vector knowledge base. A background monthly loop summarises operations — total deliveries, gallons, top client, anomalies — and records that summary to the knowledge base. A dashboard shows everything live.

              tick every N seconds
                      │
         ┌────────────▼────────────┐
         │      gallon-clients     │  5 concurrent async clients
         │  (consume, request, die)│  PostgreSQL → gallon_clients DB
         └────────────┬────────────┘
                      │  POST /invocations
         ┌────────────▼────────────┐
         │      gallon-agent       │  Strands agent + tools
         │  (deliver, monthly loop)│  connects to inference provider
         └────────────┬────────────┘
                      │  HTTP
         ┌────────────▼────────────┐
         │     gallon-service      │  FastAPI data layer
         │  inventory + knowledge  │  PostgreSQL → gallon_service DB
         │         base            │  LanceDB (vector store)
         └─────────────────────────┘

         ┌─────────────────────────┐
         │     gallon-monitor      │  dashboard at :8888
         │  reads all three above  │
         └─────────────────────────┘

Project structure

workshop/
├── docker-compose.yml          # one command to run everything
├── .env                        # all runtime config in one place
├── init-scripts/
│   └── init-db.sql             # creates gallon_service + gallon_clients DBs
│
├── gallon-agent/               # the agent service
│   └── gallon_agent/
│       ├── agent.py            # model construction + agent factory
│       ├── tools.py            # @tool definitions (the agent's hands)
│       ├── server.py           # FastAPI: /invocations + /ping + weekly loop
│       ├── service_client.py   # HTTP client for gallon-service
│       └── config.py           # pydantic-settings
│
├── gallon-service/             # data layer (no agent logic)
│   └── gallon_service/
│       ├── server.py           # inventory, deliveries, KB endpoints
│       ├── db.py               # asyncpg queries
│       ├── vector_store.py     # LanceDB + fastembed
│       └── config.py
│
├── gallon-clients/             # simulation clients
│   └── gallon_clients/
│       ├── runner.py           # spawns N concurrent clients
│       ├── client.py           # GallonClient: consume, request, die
│       ├── agent_gateway.py    # calls /invocations on the agent
│       └── db.py               # client state in PostgreSQL
│
└── gallon-monitor/             # live dashboard
    └── gallon_monitor/
        ├── server.py           # aggregates state from all services
        └── dashboard.html      # single-page dashboard (vanilla JS)

Quickstart

Prerequisites

Docker + Docker Compose (or Colima on Mac)
An inference provider — see Inference providers below

1. Clone and configure

git clone <repo>
cd workshop
cp .env.example .env   # then edit .env — see below

2. Start everything

docker compose up -d --build

3. Open the dashboard

http://localhost:8888

That is it. Clients start running immediately. The agent processes deliveries in the background and runs a monthly summary every 30 ticks. The dashboard refreshes every two seconds.

4. Restart clients (after they all die)

docker compose restart gallon-clients

Inference providers

The agent supports three inference providers, switched entirely through environment variables — no code changes.

Local (default) — Rapid-MLX or any OpenAI-compatible server

Best for development on Apple Silicon. Rapid-MLX runs models natively via MLX.

# install and start the model (run on the host, not in Docker)
pip install rapid-mlx
rapid-mlx serve qwen3-0.6b-8bit --port 8000

.env:

INFERENCE_PROVIDER=local
MODEL_ID=qwen3-0.6b-8bit
INFERENCE_PORT=8000

Works with any OpenAI-compatible local server (Ollama, LM Studio, vLLM, etc.). For Ollama, set MODEL_ID=smollm2:135m and ensure it is pulled first.

Note on small models: models under ~4B parameters often struggle with reliable tool calling. The simulation works around this by making deterministic decisions (restock, OOS probability) in Python and only asking the LLM to call a single tool per invocation.

OpenAI

INFERENCE_PROVIDER=openai
MODEL_ID=gpt-4o-mini
OPENAI_API_KEY=sk-...

Amazon Bedrock

INFERENCE_PROVIDER=bedrock
MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Cross-region inference profile IDs (the us. / apac. / eu. prefix) are required for most Bedrock models. Find yours in the Bedrock console under Cross-region inference.

Configuration reference

All config lives in .env at the workspace root. Every value has a sensible default so you only need to set what you want to change.

Variable	Default	Description
`INFERENCE_PROVIDER`	`local`	`local` / `openai` / `bedrock`
`MODEL_ID`	`qwen3-0.6b-8bit`	Model identifier for the chosen provider
`INFERENCE_PORT`	`8000`	Port of the local inference server
`OPENAI_API_KEY`	—	Required when `INFERENCE_PROVIDER=openai`
`AWS_REGION`	`us-east-1`	Required when `INFERENCE_PROVIDER=bedrock`
`AWS_ACCESS_KEY_ID`	—	Required when `INFERENCE_PROVIDER=bedrock`
`AWS_SECRET_ACCESS_KEY`	—	Required when `INFERENCE_PROVIDER=bedrock`
`NUM_CLIENTS`	`5`	Number of simulation clients
`SIM_DAYS`	`0`	Ticks per client before stopping (0 = indefinite)
`TICK_SECONDS`	`5.0`	Real seconds per simulation day
`SIM_TICK_SECONDS`	`1.0`	Real seconds per simulation day (agent side)
`INITIAL_STOCK`	`5.0`	Starting gallons per client
`DAILY_CONSUMPTION`	`1.0`	Gallons consumed on a normal day
`BURST_CONSUMPTION`	`5.0`	Gallons consumed on burst day
`REORDER_THRESHOLD`	`2.0`	Stock level that triggers a delivery request
`DELIVERY_AMOUNT_MIN`	`3.0`	Minimum gallons requested per delivery (randomised per request)
`DELIVERY_AMOUNT_MAX`	`10.0`	Maximum gallons requested per delivery
`DEATH_THRESHOLD_DAYS`	`3`	Consecutive zero-stock days before a client dies
`OOS_PROBABILITY`	`0.2`	Probability agent is unavailable per check_inventory call
`RESTOCK_THRESHOLD`	`20.0`	Inventory level that triggers automatic restock
`MAX_CAPACITY`	`100.0`	Warehouse max capacity (gallons)
`EMBED_MODEL`	`BAAI/bge-small-en-v1.5`	fastembed model for the knowledge base
`POSTGRES_PASSWORD`	`postgres`	Shared PostgreSQL password

How the agent works

The agent contract

The agent exposes two HTTP endpoints following the Bedrock AgentCore contract:

GET /ping — health check, returns {"status": "Healthy"}
POST /invocations — all client interactions, prompt in body

Clients send plain text prompts:

Client action	Prompt	Agent response
Check warehouse	`check_inventory CLIENT_ID:uuid`	`{"available_gallons": 80.0, "is_out_of_stock": false, "unavailability_reason": null}`
Request delivery	`deliver CLIENT_ID:uuid CLIENT_NAME:Alice GALLONS:5.0`	`{"delivered": true, "gallons_delivered": 5.0, "status": "delivered"}`

Tools

Tools are plain Python functions decorated with @tool. The Strands framework automatically generates the JSON schema from the type annotations and docstring, passes them to the model, and calls the function when the model requests it.

from strands import tool

@tool
def deliver_gallons(client_id: str, client_name: str, gallons_requested: float, sim_day: int) -> dict:
    """
    Execute a delivery request: deduct gallons from inventory and log it.
    Returns gallons_delivered and status.
    """
    # ... implementation calls gallon-service HTTP endpoints
    # auto-restocks if inventory drops below threshold after delivery

The full tool set:

Tool	Purpose
`deliver_gallons`	Fulfil a delivery: check live inventory, deduct, log to DB and KB, auto-restock if needed
`record_monthly_summary`	Persist the monthly operational summary to the knowledge base

Two focused agents

Rather than one general-purpose agent, the simulation uses two single-tool agents. This makes the model's job trivial: read the values in the prompt, call the one tool, done. It works reliably even with very small models.

def build_delivery_agent() -> Agent:
    return Agent(model=_make_model(), system_prompt=_delivery_prompt(), tools=[deliver_gallons])

def build_monthly_agent() -> Agent:
    return Agent(model=_make_model(), system_prompt=_monthly_prompt(), tools=[record_monthly_summary])

Deterministic vs LLM decisions

Not everything should be delegated to the model. The simulation handles several things in Python and passes pre-computed values to the agent as facts:

Restock: triggered automatically inside deliver_gallons when inventory drops below threshold. No LLM needed.
Agent unavailability: random.random() < oos_probability picks a random human-readable reason (holiday, force majeure, etc.) at check_inventory time. A dice roll, not a judgment call.
Monthly stats: all arithmetic (top client, failure rate, anomaly detection) is computed in Python. The LLM only calls record_monthly_summary with the pre-computed values.

The LLM's job is to call the right tool with the right arguments — that is what it is actually good at.

Concurrency

Strands agents are synchronous. The simulation runs them in asyncio.to_thread so the FastAPI event loop stays unblocked, and serialises concurrent calls with asyncio.Lock so only one agent runs at a time.

Model switching

Inference provider is selected at startup from the INFERENCE_PROVIDER env var. The _make_model() factory returns the appropriate Strands model object — OpenAIModel for local and OpenAI, BedrockModel for Bedrock. Everything above that layer (agents, tools, server) is provider-agnostic.

Knowledge base

Every delivery and every weekly agent decision is written to LanceDB, a local vector database. Text is embedded with fastembed using BAAI/bge-small-en-v1.5 (384 dimensions, runs on CPU, no GPU needed).

Two collections:

client_behaviors — one record per delivery attempt

"Alice on day 42 requested 5.0g, received 5.0g (delivered)"
metadata: client_id, client_name, sim_day, gallons_requested, gallons_delivered, status

agent_decisions — one record per monthly summary

"Month 3: 47 deliveries, 218.5g delivered, 12.0g restocked, 6% failure rate.
 Top client: Alice (64.0g). Anomalies: No anomalies detected"
metadata: month_number, total_gallons_delivered, total_gallons_restocked,
          total_deliveries, failed_deliveries, top_client_name, top_client_gallons,
          anomaly_summary

The knowledge base is browsable in the dashboard (bottom two panels) and queryable via:

curl -s http://localhost:8081/kb/client_behaviors?limit=10
curl -s http://localhost:8081/kb/agent_decisions?limit=10

# semantic search
curl -s -X POST http://localhost:8081/kb/search \
  -H 'Content-Type: application/json' \
  -d '{"collection": "client_behaviors", "query_text": "out of stock", "limit": 5}'

Service API reference

gallon-service `:8081`

Method	Path	Description
`GET`	`/inventories`	Current inventory
`POST`	`/inventories/restocks`	Refill to max capacity
`POST`	`/deliveries`	Log a delivery (atomically deducts inventory)
`GET`	`/deliveries`	Delivery history (optional `?client_id=` filter)
`GET`	`/kb/{collection}`	Recent KB entries (`?limit=50`)
`POST`	`/kb/ingest`	Add a record to the KB
`POST`	`/kb/search`	Semantic search
`POST`	`/kb/embed`	Embed a string (returns vector)

gallon-agent `:8080`

Method	Path	Description
`GET`	`/ping`	Health check
`POST`	`/invocations`	Agent entry point
`GET`	`/conversations`	Last 200 interactions (for the dashboard)

Dashboard

Open http://localhost:8888. Refreshes every 2 seconds.

Header — agent health badge + last-updated time
Summary stats — alive clients, dead clients, total requests, gallons delivered
Inventory — live gauge with capacity bar
Clients table — per-client stock, status, request/fulfil counts
Delivery feed — last 20 deliveries, newest first
Agent conversation log — all agent interactions including monthly summaries, rendered as a chat
Knowledge base — last 30 entries from each KB collection, with metadata

All scrollable panels are resizable (drag the bottom edge).

Adapting this for your own agent

The gallon delivery domain is a placeholder. The pattern is the point.

Define your tools in tools.py — plain Python functions with @tool and type annotations. The framework handles schema generation and invocation.
Write a focused system prompt — tell the agent what it is and what single action it should take. One tool per agent is more reliable than a Swiss-army prompt.
Wire it into FastAPI in server.py — call asyncio.to_thread(agent, prompt) from your handler. Clear agent.messages after each call if you want stateless invocations.
Choose your model in agent.py via the provider factory. Switch with an env var, not a code change.
Separate data access into its own service (gallon-service) so the agent never touches the database directly. This makes the agent testable, replaceable, and deployable independently.

Tech stack

Layer	Technology
Agent framework	Strands Agents
Local inference	Rapid-MLX (Apple Silicon)
Cloud inference	Amazon Bedrock, OpenAI
HTTP server	FastAPI + Uvicorn
HTTP client	httpx
Relational DB	PostgreSQL 16 via asyncpg
Vector DB	LanceDB
Embeddings	fastembed (`BAAI/bge-small-en-v1.5`)
Config	pydantic-settings
Packaging	uv + hatchling
Containers	Docker Compose

Running on Apple Silicon (Colima)

If you are using Colima instead of Docker Desktop:

# first-time setup — allocate enough disk and mount your repo volume
colima start --disk 100 --mount /Volumes/Repository:w --mount $HOME:w

# start Rapid-MLX on the host (accessible from containers via host.docker.internal)
rapid-mlx serve qwen3-0.6b-8bit --port 8000

# then start the stack
docker compose up -d --build

If Colima was previously stopped and the runtime directory is missing:

mkdir -p ~/tmp/colima
colima start --disk 100 --mount /Volumes/Repository:w --mount $HOME:w

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gallon-agent		gallon-agent
gallon-clients		gallon-clients
gallon-monitor		gallon-monitor
gallon-service		gallon-service
init-scripts		init-scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Gallon Delivery Simulation

Background

What the simulation does

Project structure

Quickstart

Prerequisites

1. Clone and configure

2. Start everything

3. Open the dashboard

4. Restart clients (after they all die)

Inference providers

Local (default) — Rapid-MLX or any OpenAI-compatible server

OpenAI

Amazon Bedrock

Configuration reference

How the agent works

The agent contract

Tools

Two focused agents

Deterministic vs LLM decisions

Concurrency

Model switching

Knowledge base

Service API reference

gallon-service :8081

gallon-agent :8080

Dashboard

Adapting this for your own agent

Tech stack

Running on Apple Silicon (Colima)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

gallon-service `:8081`

gallon-agent `:8080`

Packages