agent-estimate

Know before you build.

PERT estimates for AI-agent tasks — how long, which model's reliable enough, and the human-equivalent cost. In one command.

Website · Compare · PyPI

Why

AI agents can write the code — but how long will the task actually take? Manual estimation is slow and biased toward optimism; no estimate means scope creep and missed deadlines. The gap between "agents can do it" and "we know when it'll be done" is where projects break down.

agent-estimate closes that gap in one command: a three-point PERT timeline calibrated on real agent runs, plus a human-speed comparison so you see the compression before you spend the compute. It sizes the task, picks a tier, routes it to a model, and flags when the work runs past that model's reliability horizon — calibrated forecasts in seconds, not meetings.

Multi-model matters because the models aren't interchangeable. Opus 4.7, GPT-5.5, and Gemini 3.1 have different reliability horizons (METR p80) and different costs per turn. A safe 40-minute job for one model is a coin flip for another. agent-estimate models the whole fleet, not a single agent — so the number reflects who actually runs the work.

Quick Start

First estimate: 30 seconds to install. Every one after: instant.

With your agent (recommended)

Paste this into your Claude Code or Codex session:

Install the agent-estimate plugin (https://github.com/kiloloop/agent-estimate) and
estimate this task for me: "Implement OAuth 2.0 flow (Google + GitHub)". Tell me the
expected time, the human-speed equivalent, and the compression ratio.

Your agent installs the tool, runs the estimate, and reads back the numbers. Nothing to memorize — describe the task in plain English and let the agent translate to flags.

For a whole backlog:

Estimate every open issue in this repo with agent-estimate, group them into parallel
waves, and tell me the total wall-clock time for a 3-agent fleet versus doing them
sequentially myself.

Manual

pip install agent-estimate
agent-estimate estimate "your task description here"

No config required — sensible defaults for a 3-agent fleet (Claude, Codex, Gemini). Point it at a file or GitHub issues when you're ready:

agent-estimate estimate --file tasks.txt
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14

How It Works

agent-estimate produces three-point PERT estimates calibrated for agents, not humans:

Tier classification — auto-sizes tasks XS→XL from complexity signals
PERT math — optimistic / most-likely / pessimistic, weighted to an expected value
Human comparison — a per-task-type multiplier, so you see the compression
METR thresholds — warns when an estimate exceeds a model's p80 reliability horizon
Wave planning — schedules independent tasks in parallel across the fleet
Review overhead — models review cycles as additive cost (standard, complex, 3-round)
Modifiers — --spec-clarity, --warm-context, --agent-fit tune the estimate

Task types

Type	Flag	Models
Coding	(default)	Feature work, fixes, refactors
Research	`--type research`	Audits, investigations, analysis
Documentation	`--type documentation`	API docs, guides, changelogs
Brainstorm	`--type brainstorm`	Ideation, spikes, design exploration
Config/SRE	`--type config`	Deploys, infra, CI/CD
Frontend/UI	`--type frontend`	Content patches vs. component builds
App dev	`--type app_dev`	App shells, desktop/mobile builds

METR thresholds (defaults)

Model	p80 threshold
Opus 4.7	90 min
GPT-5.5	90 min
GPT-5.4	60 min
Gemini 3.1 Pro	45 min
Sonnet 4.6	30 min
Haiku 4.5	15 min

opus_4_x is a forward-compatible alias that resolves to the current Opus threshold. Legacy keys (opus_4_6, GPT-5/5.2/5.3, Gemini 3 Pro, Sonnet) stay supported. Estimates are calibrated against Claude Code (Opus 4.7, high thinking) and Codex (GPT-5.4/5.5, extra-high) — shift with --spec-clarity and --warm-context for other setups.

Examples

Real estimates from production use — including the misses.

The tool, estimating its own docs. We sized this v0.7.0 skill-and-README refresh at ~30 minutes. It took 28.

An honest over-estimate. We pre-registered a UI mockup build at ~95 minutes with no prior app-dev data. Two agents did it in parallel in 12 and 25 minutes — a 4–8x over-estimate. agent-estimate now ships an app_dev prior shaped by that result. The miss stays in the README because calibration means showing where you were wrong.

Two tasks, one model — what the tool prints, including the METR reliability flag:

$ agent-estimate estimate "Implement auth" "Add tests" --model opus

Task             Tier   PERT (O/M/P)    Expected   Human-eq
───────────────────────────────────────────────────────────
Implement auth   M      25/50/90m       57.8m      160m
Add tests        S      12/23/40m       24.0m       75m

Timeline ──────────────────────────────
  best 37m   ·   expected 81.8m   ·   worst 130m
  human-equivalent: 235m  →  2.87× compression

  ⚠ METR warning: "Implement auth" exceeds Opus p80

~82 minutes expected versus ~4 hours by hand — plus a flag that the auth task runs past Opus's p80 reliability horizon, so you split it or add a checkpoint before dispatching.

Three tasks, three agents, in parallel:

$ agent-estimate estimate --file tasks.txt

Metric	Value
Wave 0	All 3 tasks in parallel (Claude + Codex + Gemini)
Expected case	131m
Human-speed equivalent	709.5m
Compression ratio	5.42x
Estimated cost	$4.84

~2 hours wall-clock versus ~12 hours sequential. You see the compression before you commit the compute. More in examples/ — coding S/M, research, documentation, multi-agent.

Integrations

Claude Code plugin

/plugin marketplace add kiloloop/agent-estimate
/plugin install agent-estimate@agent-estimate-marketplace

/estimate Add a login page with OAuth
/estimate --file spec.md
/estimate --issues 1,2,3 --repo myorg/myrepo
/validate-estimate observation.yaml
/calibrate

GitHub Action

- uses: kiloloop/agent-estimate@v0
  with:
    issues: '11,12,14'

Full workflow example

name: Estimate
on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  estimate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: kiloloop/agent-estimate@v0
        with:
          issues: '11,12,14'
          output-mode: summary+pr-comment

Action inputs and outputs

Input	Required	Default	Description
`issues`	yes	—	GitHub issue numbers (comma-separated)
`repo`	no	current repo	GitHub repo (owner/name)
`format`	no	`markdown`	Output format: `markdown` or `json`
`output-mode`	no	`summary`	`summary`, `pr-comment`, `step-output`, `summary+pr-comment`
`config`	no	—	Path to agent config YAML
`title`	no	`Agent Estimate Report`	Report title
`review-mode`	no	`standard`	Review tier: `none`, `standard`, `complex`, `3-round`
`spec-clarity`	no	`1.0`	Spec clarity modifier (0.3–1.3)
`warm-context`	no	`1.0`	Warm context modifier (0.3–1.15)
`agent-fit`	no	`1.0`	Agent fit modifier (0.9–1.2)
`task-type`	no	—	Category: `coding`, `brainstorm`, `research`, `config`, `documentation`, `frontend`, `app_dev`
`python-version`	no	`3.12`	Python version to use
`version`	no	latest	`agent-estimate` version to install
`token`	no	`${{ github.token }}`	GitHub token

Output	Description
`report`	Full estimation report content
`expected-minutes`	Expected minutes (when `format: json`)

Skill layout

Skills follow the oacp-skills convention:

skills/estimate/
  skill.yaml            # machine-readable metadata
  README.md             # human-readable docs
  shared/INTENT.md      # shared intent across runtimes
  claude/SKILL.md       # Claude Code skill definition
  codex/SKILL.md        # Codex skill definition

Both runtime slices cover the same CLI (estimate, validate, calibrate), phrased for their respective ecosystems.

Configuration

Agent fleet

Pass a config to model your own fleet:

agents:
  - name: Claude
    capabilities: [planning, implementation, review]
    parallelism: 2
    cost_per_turn: 0.12
    model_tier: frontier
  - name: Codex
    capabilities: [implementation, debugging, testing]
    parallelism: 3
    cost_per_turn: 0.08
    model_tier: production
settings:
  friction_multiplier: 1.15
  inter_wave_overhead: 0.25
  review_overhead: 0.2
  metr_fallback_threshold: 45.0

agent-estimate estimate "Ship packaging flow" --config ./my_agents.yaml

Output formats

agent-estimate estimate "Refactor auth pipeline" --format json   # machine-readable
agent-estimate estimate --repo myorg/myrepo --issues 11,12,14    # from GitHub issues
agent-estimate estimate --file tasks.txt                          # from file

Calibration

Validate estimates against observed outcomes and build a calibration database:

agent-estimate validate observation.yaml --db ~/.agent-estimate/calibration.db

Project

Website — landing page, live demo, and the estimate comparison view.
OACP — coordinate the agents you just estimated. Open Agent Coordination Protocol for multi-agent async workflows.
oacp-skills — the skill bundle agent-estimate's /estimate ships in.
kiloloop — the rest of the ecosystem.

Contributing

See CONTRIBUTING.md for the full workflow.

pip install -e '.[dev]'
ruff check .
pytest -q

Community

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.claude-plugin		.claude-plugin
.github		.github
examples		examples
scripts		scripts
skills/estimate		skills/estimate
src/agent_estimate		src/agent_estimate
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
action.yml		action.yml
metr_thresholds.yaml		metr_thresholds.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-estimate

Why

Quick Start

With your agent (recommended)

Manual

How It Works

Task types

METR thresholds (defaults)

Examples

Integrations

Claude Code plugin

GitHub Action

Skill layout

Configuration

Agent fleet

Output formats

Calibration

Project

Contributing

Community

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-estimate

Why

Quick Start

With your agent (recommended)

Manual

How It Works

Task types

METR thresholds (defaults)

Examples

Integrations

Claude Code plugin

GitHub Action

Skill layout

Configuration

Agent fleet

Output formats

Calibration

Project

Contributing

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages