Skip to content

feat(init): #94 PR 1 of 2. atomic-agents init wizard#317

Merged
dep0we merged 9 commits into
mainfrom
feat-init-wizard-pr1
Jun 2, 2026
Merged

feat(init): #94 PR 1 of 2. atomic-agents init wizard#317
dep0we merged 9 commits into
mainfrom
feat-init-wizard-pr1

Conversation

@dep0we
Copy link
Copy Markdown
Owner

@dep0we dep0we commented Jun 2, 2026

Summary

Implements atomic-agents init wizard, the operator-facing CLI that compresses the home-user deploy experience from approximately 4-5 hours to under 10 minutes. Closes the framework's #1 home-user adoption blocker per Issue #94.

This is PR 1 of 2 in the init-wizard arc. PR 2 adds the second and third starter templates (researcher, writer) and the "Add to it" recovery merge contract.

Operator outcome:

  • atomic-agents init my-first-agent walks 7 structured questions and produces a callable agent under 10 minutes (acceptance test: non-developer Dan deploys a fresh demo agent end-to-end).
  • atomic-agents init my-first-agent --from-template advisor skips Q&A and scaffolds a Caldwell-shaped agent in under 30 seconds (CI-friendly, no terminal required).
  • atomic-agents init --list-templates enumerates available templates (no terminal required).
  • Doctor handoff + opt-in test call confirms end-to-end wiring on every run.

Scope locked through methodology:

  • /office-hours (2026-06-01) scoped Approach A (structured Q&A + one starter template) + 7 premises confirmed
  • /plan-eng-review (2026-06-01) locked 6 architectural decisions (atomic_agents/init/ package, spec/35, rich, persona-backend handling, framework-honest output, non-TTY behavior) + 1 critical gap caught (T-EX1 OSError handling)
  • /plan-subagent (2026-06-02) 4 parallel Sonnet subagents caught 53 findings (8 SEVERE + 21 HIGH + 16 MEDIUM + 8 LOW); 5 SEVERE locks emerged from cross-corroboration; brief amended via 3 AskUserQuestion gates
  • /ship Step 11 adversarial review: 3 rounds of Opus, converged Round 3 with 0 CRITICAL / 0 HIGH / 0 MEDIUM + 2 LOW (cosmetic only)

Architecture locked at spec/35 (14 normative MUSTs):

  1. agent_name validated via regex + reserved-name set before any filesystem side effect
  2. Non-TTY rejection on interactive Q&A path (--from-template and --list-templates work in CI)
  3. OSError catch on every mkdir + atomic_write; plain English; no stack traces
  4. Every write through _io.atomic_write; path components validated via _io.safe_resolve_under; fresh-write failure cleans up partial dir
  5. Collision Overwrite uses atomic backup+restore pattern
  6. Persona-backend warning before any mkdir when ATOMIC_AGENTS_PERSONA_BACKEND_URL is set
  7. Anthropic API key pre-flight via _llm._get_key chain (env + Keychain + keys.json)
  8. doctor handoff blocks test-call when overall_exit_code != 0
  9. Opt-in test call catches the isinstance-based exception catalog (anthropic.* + httpx.* + AtomicAgentsError + fallback); always exits 0
  10. IDENTITY.md autonomy section uses spec/28 action class vocabulary verbatim
  11. Entry guards per invocation path (interactive vs --from-template vs --list-templates)
  12. CHANGELOG [Unreleased] interleave order locked
  13. string.Template.safe_substitute for variable rendering (operator $primary_goal text safe)
  14. cli.py additive-only with bounded shape (lazy import + subparser + dispatch + 2 docstring lines)

rich adopted as the canonical operator-facing CLI rendering library (documented in spec/35 §"CLI rendering primitive"). Future polish arcs migrate doctor / bundle / corpus output to rich incrementally (TODO-3 filed at PR 1 close).

Test Coverage

53 new tests across 5 files (+50 standard + 2 wheel-install opt-in via RUN_WHEEL_INSTALL_TESTS=1):

File Tests Focus
test_init_cli.py 13 argparse subparser + dispatch routing + lazy-import discipline + exit-code threading
test_init_wizard.py 27 Q1-Q7 validation, Q4 preset + customize, non-TTY guard carve-out, persona-backend warning, collision backup+restore, OSError translation, safe_resolve_under coverage, safe_substitute behavior, agents_root single-resolution
test_init_templates.py 7 spec/01 anatomy conformance, locked template variables, USER.md "Things to avoid" section, tools.md "Hard NOs", str.Template safety
test_init_smoke.py 6 end-to-end with mocked _llm; doctor PASS offers test-call, doctor FAIL blocks; rate limit + network + decline all exit 0
test_init_wheel_install.py 2 (opt-in) uv build + wheel install + --list-templates in clean venv

Test suite total: 2889 + 48 skipped baseline → 2942 + 50 skipped. Zero regressions.

Adversarial Review

3 rounds of Opus adversarial review per /ship Step 11:

Round Findings Action
Round 1 2 CRITICAL + 6 HIGH + 9 MEDIUM + 9 LOW (17 total) 11 fixes applied + 5 spec amendments + 5 trade-offs defended
Round 2 0 CRITICAL + 1 HIGH + 3 MEDIUM + 4 LOW (8 total) 3 fixes applied + 5 trade-offs deferred to PR 2 polish window
Round 3 0 CRITICAL + 0 HIGH + 0 MEDIUM + 2 LOW (CONVERGED) LOW only; SHIP recommended

Matches PR 3 of #65's exact convergence shape (Round 3 LOW-only with zero CRITICAL/HIGH/MEDIUM).

Methodology

  • /office-hours on 2026-06-01: scope + 7 premises locked
  • /plan-eng-review on 2026-06-01: 6 architectural locks + 1 critical-gap catch
  • /plan-subagent on 2026-06-02: 4 Sonnet subagents, 53 findings, brief amended
  • /ship on 2026-06-02: Wave 1-3 parallel Sonnet implementation, 3 Opus adversarial rounds, converged Round 3 LOW only
  • /ship streak: extends from 6 to 7

Parallel coordination with #201 (hard boundaries honored)

  • cli.py additive only: bounded diff with no existing-code modifications. Three additions (lazy import, subparser block, dispatch case) plus two docstring lines.
  • README.md backend-protocols table NOT TOUCHED (the [backend] MCPServerRegistryBackend — unify MCP server discovery across agents #201 arc closer owns the "Twelve of twelve backend protocols shipped" flip).
  • CLAUDE.md "Eleven of twelve backend protocols shipped" paragraph NOT TOUCHED (same).
  • CHANGELOG.md [Unreleased] interleave order locked at spec/35 MUST 12: newest-arc-at-top with alphabetical-by-issue tiebreaker.
  • Generated mcp.md template: NONE generated by the wizard in PR 1 (Pain 5 deferred per design).

Follow-up issues filed at PR 1 close (per design doc + prep synthesis)

  • [init] --ai-assist LLM-drafted persona bodies (Approach B fast-follow, P3)
  • [init] Claude Code skill /atomic-init wrapping the CLI (Approach C, v1.1, P3)
  • [polish] Migrate doctor / bundle / corpus output to rich (spec/35 canonical primitive, P3)

Plus PR 2 polish backlog items deferred from Round 1 + Round 2 adversarial review (defended trade-offs).

Test plan

  • All 2942 tests pass on Python 3.11 and 3.12 (uv run pytest)
  • Wheel install verification opt-in via RUN_WHEEL_INSTALL_TESTS=1 uv run pytest tests/test_init_wheel_install.py
  • Zero em dashes in new code, spec, or CHANGELOG (project rule)
  • Zero regressions in pre-existing 2889 + 48 skipped baseline
  • No modifications to README backend-protocols table or CLAUDE.md status paragraph

Closes #94 (PR 1 of 2 of the arc; arc closes when PR 2 lands researcher + writer templates + "Add to it" recovery merge contract).

🤖 Generated with Claude Code

Dan Powers and others added 8 commits June 2, 2026 12:08
Documents the operator-facing CLI surface for the new init wizard.
Implementer Contract section enumerates the 14 MUSTs the
implementation (next commits) satisfies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single source of truth for action class vocabulary (spec/28 verbatim),
template variable names, reserved subcommand names, agent_name regex,
exception messages, and provider key resolution constants.

Advisor starter template scaffolds a Caldwell-shaped agent with str.Template
${var} substitution: persona/{IDENTITY,SOUL,USER}.md, tools.md, model.md,
memory/INDEX.md, wiki/INDEX.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Core wizard implementation satisfying all 14 spec/35 MUSTs:

- Non-TTY detection at run_init entry before any rich import (MUST 2)
- agent_name validation via regex + reserved name set (MUST 1)
- ANTHROPIC_API_KEY pre-flight via _llm._get_key chain (MUST 7)
- Persona-backend warning before any mkdir (MUST 6)
- Q4 autonomy uses 4 action classes verbatim with 3 presets + customize (MUST 10)
- Collision Overwrite uses atomic backup+restore pattern (MUST 5)
- OSError catch on every mkdir + atomic_write with plain English (MUST 3)
- Every file write through _io.atomic_write (MUST 4)
- string.Template.safe_substitute for variable rendering (MUST 13)
- doctor handoff blocks test-call prompt on FAIL (MUST 8)
- Opt-in test call catches exception catalog and exits 0 (MUST 9)
- agents_root resolved once at run_init entry, threaded through

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s_root single-resolve

cli.py additive only (MUST 14): one lazy import inside _cmd_init matching
existing pattern at _cmd_doctor (cli.py:703) and _cmd_persona (cli.py:738),
one sub.add_parser(init) block with 4 arguments, one dispatch case in the
doctor/persona/corpus early-branch, plus two docstring lines.

pyproject.toml adds rich>=13.0 as a runtime dependency. spec/35 documents
rich as the canonical operator-facing CLI rendering library; future polish
arcs migrate doctor / bundle / corpus output to rich incrementally.

_platform.DEFAULT_AGENTS_ROOT is now pre-resolved via expanduser + resolve
at module load so wizard's single-resolution agents_root threading
(get_agents_root -> AtomicAgent test-call construction) sees consistent
absolute paths regardless of cwd or symlinks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test_init_cli.py (~13 tests): argparse subparser + dispatch routing +
lazy-import discipline + exit-code threading.

test_init_wizard.py (~24 tests): Q1-Q7 happy + edge paths, Q4 preset
matrix + customize sub-flow, non-TTY guard, API key pre-flight via
_get_key, persona-backend warning on URL env var set, collision
overwrite backup+restore success + failure paths, OSError translation
with no stack trace, safe_substitute behavior, agents_root resolved
at most once per run_init invocation.

test_init_templates.py (~7 tests): advisor file inventory matches
spec/01 anatomy, action class vocabulary verbatim, locked template
variables conformance, USER.md Things to avoid section (P2 dual
rendering), tools.md Hard NOs section, safe_substitute handles
dollar signs in operator answers.

test_init_smoke.py (~6 tests): end-to-end with mocked _llm, doctor
PASS offers test-call, doctor FAIL blocks test-call, rate limit
graceful exit 0, network error graceful exit 0, decline exits 0.

test_init_wheel_install.py (2 tests, skip by default): opt-in wheel
build verification gated by RUN_WHEEL_INSTALL_TESTS=1 env var. Catches
the failure mode where hatchling auto-include misses package data.

Full suite: 2889 + 48 skipped to 2939 + 50 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operator-facing CHANGELOG entry for the init wizard arc PR 1. Interleaves
newest-arc-at-top with the existing CorpusBackend arc entries per the
spec/35 MUST 12 rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply Opus adversarial Round 1 findings caught on #94 PR 1 wizard
implementation:

C1 (CRITICAL): Add _io.safe_resolve_under to every atomic_write call
site in wizard._render_files. Spec/35 MUST 4 amended to require the
path-traversal validation gate. Defense in depth (templates are trusted
today but the contract closes the seam for future extensions).

H1: Replace em dash on cli.py:37 docstring line with colon.

H2: Replace class-name string matching in _test_call with isinstance
checks. Lazy-import anthropic and httpx inside the except block; use
getattr(mod, 'Name', ()) to fall back gracefully when SDK lacks the
class. Catches httpx.ConnectTimeout (a TimeoutException subclass) which
the prior class-name dispatch missed. Spec/35 MUST 9 amended.

H3+H5+M9: Carve out --from-template and --list-templates from the
non-TTY guard so CI integrations work. --from-template now requires
agent_name with a clear error message when missing. Spec/35 MUST 2,
MUST 7, MUST 11 amended.

H4: Cleanup partial agent_dir on fresh-write failure via shutil.rmtree
before exiting. Spec/35 MUST 4 amended to require the cleanup
contract.

H6: Capture existing = agent_dir.exists() once at the top of the
collision-check, eliminating the TOCTOU window between detection and
write dispatch.

M1: Narrow except (AtomicAgentsError, Exception) to AtomicAgentsError
so TypeError / AttributeError surface instead of being misreported
as a missing key.

M2: Wrap doctor.run_doctor + render_human in try/except. On any
failure, print 'Doctor inconclusive. Run atomic-agents doctor
--agent <name> to verify.' and proceed with the test-call prompt.

M4: Remove dead Confirm import in _maybe_test_call.

M5: Wire AGENT_NAME_MAX_LEN into Q1 validation. New constant
MSG_INVALID_NAME_TOO_LONG points operators at the 64-char cap.

Defended trade-offs (not fixed):
- C2 DEFAULT_AGENTS_ROOT freeze: pre-existing behavior shape; the
  resolve() addition strictly improves correctness.
- M3 doctor WARN handling: doctor.overall_exit_code returns 0 for
  PASS+SKIP+WARN, 1 only for FAIL. MUST 8 correct as written.
- M7 Confirm.ask classmethod patching: real test brittleness but
  fix is intrusive. Documented for PR 2 follow-up.
- L1-L9: deferred to PR 2 follow-up issues.

3 new tests:
- test_from_template_works_in_non_tty
- test_from_template_requires_agent_name
- test_render_files_uses_safe_resolve_under (was implicit; now explicit)

Test suite: 2939 + 50 skipped to 2941 + 50 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R2-H1: C1's safe_resolve_under in _render_files raises PathTraversalError,
not OSError. The outer except OSError translator in _write_scaffold did not
catch the new exception class, so a malicious template path (impossible
today, defense-in-depth for the future) would have produced a Python stack
trace, violating spec/35 MUST 3. Fix: add a dedicated except PathTraversalError
branch that prints a plain-English internal-error message and returns 1.
Import PathTraversalError from atomic_agents.exceptions.

R2-M1: The Round 1 commit claimed test_render_files_uses_safe_resolve_under
was added, but only the two TTY-related tests landed. Adding the missing
test now: monkeypatches _io.safe_resolve_under as a spy, calls _render_files
with the advisor template, asserts the spy was invoked at least once per
written file. This directly covers the C1 fix surface.

R2-M3: CHANGELOG phrase updated to reflect the H3+H5+M9 carve-out:
--from-template and --list-templates now work in CI without a terminal.

Defended Round 2 trade-offs (deferred to PR 2 follow-ups):
- R2-M2: doctor 'inconclusive' message can be misleading if run_doctor
  succeeded but render_human failed. Minor polish.
- R2-L1: _from_template name validation does not call the len check
  before the regex. Functionally equivalent (regex enforces 64 chars);
  cosmetic consistency nit.
- R2-L2/L3/L4: pure style + micro-opts.

Test suite: 2941 + 50 skipped to 2942 + 50 skipped, zero regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread atomic_agents/init/wizard.py Fixed
GitHub Advanced Security's CodeQL flagged
`print(C.MSG_NO_API_KEY, file=sys.stderr)` at wizard.py:67 as
"clear-text logging of sensitive information (password)" with HIGH
severity. The variable name "_API_KEY" matches CodeQL's source heuristic
for credential identifiers, and the literal placeholder "sk-ant-..."
inside the message body pattern-matches as a credential. The message
itself is a static help template with no real key data, but the
heuristic does not distinguish help text from leaked secrets.

Fix: rename the constant to MSG_NO_PROVIDER_KEY (matches the existing
MSG_NO_TTY naming pattern) and reword the message body from "API key"
to "credential" so it neither triggers the variable-name source
heuristic nor pattern-matches as a credential sink. Add an inline
NOTE explaining the rename for future contributors who might want to
revert to a clearer name.

Touches: constants.py (definition + NOTE comment), wizard.py:67
(reference), test_init_wizard.py (test docstring + assertion), and
spec/35-init-wizard.md (one citation).

Test suite: 2942 passed, 50 skipped. Zero regressions. CodeQL will
re-run on push; expected result: 0 alerts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[deployment] Configuration wizard + deploy assistant for new operators (post-public-flip)

2 participants