Skip to content

Rewrite generate-theories skill UX after dogfooding#62

Merged
charliemcgrady merged 5 commits into
mainfrom
matt/05.19.26/generate-theories-skill
May 21, 2026
Merged

Rewrite generate-theories skill UX after dogfooding#62
charliemcgrady merged 5 commits into
mainfrom
matt/05.19.26/generate-theories-skill

Conversation

@MLatzke
Copy link
Copy Markdown
Member

@MLatzke MLatzke commented May 19, 2026

Updated the generate-theories skill UX. The skill now choreographs the full request end-to-end via question shaping, mode selection, stage checkpoints, and handoff.

What changed

  • Two-half structure (reference/flow), separated by a "How to run it" banner and orientation paragraph.
  • Question shaping before submission: four-dimension rubric (Intervention, Outcome, optional Mechanism, Scope) with an explicit audit step and two routes:
  • Proceed silently when all required dimensions are present.
  • Poll for what's missing otherwise — plain-English questions, no internal dimension labels leaked.
  • Escape hatch in the poll: phrases like "just send it" or "let Theorizer figure it out" skip shaping and submit as given. Supports advanced users who want to under-specify deliberately and have Theorizer infer the rest from papers. Surfaced in the poll example so users discover it.
  • Refuse path for non-research questions ("name my startup," "summarize this paper") explains Theorizer's domain instead of forcing the rubric onto a question with no causal structure.
  • Mode selection (automatic vs piecemeal) via AskUserQuestion, with Continue / Edit / Stop checkpoints between piecemeal stages and decision-oriented per-stage summaries.
  • New evals/evals.json — test prompts spanning the silent / poll / refuse routes for future iteration.

The skill was previously a thin wrapper around the Theorizer A2A CLI.
It now choreographs the full request end-to-end.

- Two-half structure (reference / flow) separated by a "How to run it" banner and orientation paragraph.
- Question shaping: four-dimension rubric (Intervention, Outcome, optional Mechanism, Scope) with explicit audit step and two routes — proceed silently when all required dimensions are present, or poll for what's missing.
- Escape hatch in Route 2: "just send it," "let Theorizer figure it out," etc. bypass shaping for advanced users who want to under-specify deliberately and let Theorizer infer from papers. Surfaced in the poll example.
- Refuse path for non-research questions explains Theorizer's domain instead of forcing the rubric.
- Mode selection (automatic vs piecemeal) via AskUserQuestion, with Continue / Edit / Stop checkpoints between piecemeal stages.
- Pipeline section refreshed against the live agent card: adds build-extraction-schema, tightens stage descriptions, reflects form-theory / evaluate-novelty accepting either a prior task_id or user-supplied data.
- Internal scaffolding stays internal — explicit rule that audit results, dimension names, and route numbers don't appear in user-facing chat.
- Diverse non-AI-biased well-formed examples (metabolic medicine, education, ecology) to avoid biasing generations toward chatbot scopes.
- Fix Theorizer GitHub URL → asta-theorizer-internal.
- Add evals/evals.json with three baseline prompts spanning the silent / poll / refuse routes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MLatzke MLatzke requested a review from charliemcgrady May 19, 2026 23:46
MLatzke and others added 4 commits May 19, 2026 16:48
make build-plugins, picking up the generate-theories SKILL.md rewrite
and the new evals/evals.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Validator requires a space-separated string, not a YAML list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Will live elsewhere per review feedback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"just send it" → "skip that" in the rule list. Poll example rewritten in
the agent's voice ("I can send your query verbatim...") rather than
echoing a quoted user phrase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@charliemcgrady charliemcgrady merged commit c35e498 into main May 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants