Core code lives in src/optimize_anything/:
cli.pyforoptimize-anythingsubcommands (optimize,generate-evaluator,intake,explain,budget,score,analyze)evaluators.pyfor command/HTTP evaluator adaptersllm_judge.pyfor LLM-as-judge evaluator and dimension analysisintake.pyfor intake schema normalization (evaluation_pattern,execution_mode,quality_dimensions, constraints)evaluator_generator.pyfor evaluator scaffolding from seed/objective/intakespec_loader.pyfor TOML spec file loadingresult_contract.pyfor canonical optimize summary output used by CLI
Tests are in tests/ with shared fixtures in tests/conftest.py. Supporting material is in docs/ (protocol, smoke gates, remediation, release/handoff), examples/ (seed/evaluator samples), commands/ (slash command docs), skills/ (packaged skills), plus root guides install.md and evaluator-cookbook.md.
uv sync— install runtime and dev dependencies.uv run pytest— run the full test suite.uv run pytest tests/test_cli.py— run one test module.uv run pytest -k "explain"— run tests by name pattern.uv run optimize-anything --help— inspect CLI usage.uv run python scripts/check.py— unified gate: pytest + smoke + score_check.uv run python scripts/check.py --skip-smoke— unified gate without smoke (offline).uv run python scripts/smoke_harness.py --budget 1— run CLI smoke harness.uv run python scripts/score_check.py— score regression check.uv run python scripts/score_check.py --update— update baselines after improvement.uv run python scripts/live_integration.py --phase green --artifact FILE --model openai/gpt-4o-mini --budget 15 --objective "..." --evaluator-command bash evaluators/eval.sh— GREEN phase optimization.
Target Python is >=3.10. Follow existing style:
- 4-space indentation, explicit type hints, concise module/function docstrings.
snake_casefor modules, functions, and variables;CapWordsfor classes.- Keep interfaces separated by layer: CLI behavior in
cli.py, evaluator plumbing inevaluators.py. - Preserve evaluator JSON contract: input includes
candidate; output must include numericscore. - Keep runtime mode and strategy distinct in docs/code:
execution_mode(command/http) controls transport/runtime.evaluation_pattern(verification/judge/simulation/composite) describes scoring approach.
Use pytest for tests.
- Name files
tests/test_<unit>.pyand functionstest_<behavior>. - Prefer focused unit tests for each command path and error mode.
- Reuse fixtures for temp evaluator scripts and mock external HTTP calls where possible.
Recent history uses conventional prefixes: feat:, fix:, docs:, chore: (optionally scoped). Keep commits small and single-purpose.
For PRs, include:
- a short problem/solution summary,
- linked issue (if any),
- test evidence (command + result),
- sample CLI output when behavior changes.
When running RED-GREEN-OBSERVER cycles:
- Always pass
--modelexplicitly tolive_integration.py - Place
--evaluator-commandas the LAST flag - Run RED validation after every GREEN improvement
- Accept artifacts only when cross_provider_delta < 0.2
- Update baselines with
score_check.py --updateafter accepting - Commit with descriptive message including score deltas
src/optimize_anything/— package source (all Python modules)scripts/— operational scripts (gates, smoke, live integration)skills/— optimizable skill artifacts (SKILL.md files)evaluators/— production evaluator scriptsexamples/— sample evaluators and seedstests/— pytest test suitedocs/— planning and handoff documents (gitignored from commits)integration_runs/— optimization run artifacts (gitignored)