diff --git a/.claude/skills/autosteer/SKILL.md b/.claude/skills/autosteer/SKILL.md index 5471bf31..6feabfaa 100644 --- a/.claude/skills/autosteer/SKILL.md +++ b/.claude/skills/autosteer/SKILL.md @@ -1,107 +1,49 @@ --- name: autosteer -description: Run the full development pipeline (plan, implement, validate, review, branch, commit, PR, YouTrack update) autonomously without pausing for user confirmation between phases. Use when you want hands-off task completion. Stops only on errors. +description: Run the full development pipeline autonomously without pausing between phases. Stops only on quality-gate failures. argument-hint: "[ticket-id-or-task-description]" --- # Autosteer -Run the full development pipeline end-to-end without pausing for user -confirmation between phases. The agent proceeds autonomously through every -stage, stopping **only** when a phase fails (tests, lint, coverage). +Suspends the **pacing rule** from `CLAUDE.md`. Proceed through all phases +without confirmation, stopping **only** on quality-gate failures. ## Activation -When this skill is invoked, the **pacing rule** from `CLAUDE.md` is -suspended for the remainder of the current task. Concretely: - -- Do **not** pause between phases to ask for confirmation. -- Do **not** present a plan and wait for a go-ahead — just execute. -- Do **not** pause before pushing or creating the PR. -- **Do** stop immediately if a quality gate fails (`make check`, - `make test`, `make test-cov-check`) and attempt to fix it. If you - cannot fix it after two attempts, stop and report the failure. +- Do **not** pause between phases or present plans for approval. +- **Do** stop on quality-gate failure (`make check`, `make test`, + `make test-cov-check`). Attempt fix twice, then stop and report. ## Steps ### 1. Resolve the ticket -If the argument looks like a ticket ID (`DBA-XXX`, a bare number, or a -YouTrack URL), fetch it with `get_issue` and move it to **Develop**. - -If the argument is a free-text task description without a ticket ID, -create a YouTrack issue automatically: - -- Derive **Summary** (imperative, one-line) and **Description** - (2-4 sentences) from the argument text. -- Type: infer Bug / Task / Feature; default to Task. -- Create via `create_issue` in the **DBA** project. -- Move to **Develop**. - -If no argument is provided, ask the user once for a task description or -ticket ID, then proceed autonomously from that point. - -### 2. Plan - -Read the ticket (or task description) and relevant source files. -Outline the approach internally — do not present it to the user. -Proceed immediately to implementation. - -### 3. Implement - -Write the code changes and tests. Run `make test` to verify. -If tests fail, fix and re-run (up to two retries). - -### 4. Validate - -Run quality gates sequentially: - -1. `make check` — auto-fix ruff issues if needed, re-run. -2. `make test-cov-check` — if coverage is below threshold, write - additional tests and re-run. - -If a gate still fails after two fix attempts, stop and report. - -### 5. Review - -Run `local-code-review` and `review-architecture` skills **in parallel** -(both use `context: fork`). Inspect findings: - -- **Critical / High severity** — fix them, then re-run validate (step 4). -- **Medium / Low** — note them but proceed. - -### 6. Branch and commit - -- Use the `create-branch` skill to create a feature branch. -- Stage specific files and commit following **Commit Messages** in - `CLAUDE.md`. - -### 7. PR +If argument is a ticket ID (`DBA-XXX`, number, or URL): fetch with +`get_issue`, move to **Develop**. -- Push with `-u` flag. -- Create the PR via `gh pr create` using the template from the - `create-pr` skill. Skip the pre-push confirmation pause. +If free-text: create via `create_issue` in **DBA** project (derive +summary, description, type; default Task). Move to **Develop**. -### 8. Update YouTrack +If no argument: ask user once, then proceed autonomously. -- Move the ticket to **Review** state. -- Add a comment with the PR URL and the Claude Code session cost - (from `/cost`). +### 2. Execute phases -### 9. Report +Follow **After Completing Work** in `CLAUDE.md` (phases 1-7), with: +- Phase 2 (implement): up to two retries on test failure. +- Phase 3 (validate): run quality gates per **Quality Gates** in `CLAUDE.md`. + Two fix attempts, then stop. +- Phase 4 (review): fix Critical/High findings, note Medium/Low. +- Phase 5-7: use `create-branch`, `create-pr` skills. Skip pre-push pause. -Output a final summary: +### 3. Report -- Ticket ID and link -- Branch name -- PR URL -- Any review findings that were not auto-fixed +Output: ticket ID/link, branch name, PR URL, unfixed review findings. ## Guardrails -- Never commit or push to `main` — always branch first. -- Never skip quality gates — they are blocking even in autosteer mode. -- Never create a YouTrack issue without at least a task description - (from argument or one user prompt). -- If `gh` CLI or YouTrack MCP is unavailable, stop and inform the user. -- Stage specific files — never use `git add -A` or `git add .`. +- Never commit or push to `main`. +- Never skip quality gates. +- Never create a ticket without at least a task description. +- Stage specific files -- never `git add -A` or `git add .`. +- If `gh` or YouTrack MCP unavailable, stop and inform user. diff --git a/.claude/skills/check-coverage/SKILL.md b/.claude/skills/check-coverage/SKILL.md index 5e1efecd..017f1797 100644 --- a/.claude/skills/check-coverage/SKILL.md +++ b/.claude/skills/check-coverage/SKILL.md @@ -1,86 +1,48 @@ --- name: check-coverage -description: Run test coverage measurement, analyze results, and fix gaps when coverage falls below the 80% threshold. Use after implementing features or fixing bugs, when `make test-cov-check` fails, when reviewing module test adequacy, or before creating a PR that touches `src/databao_cli/`. +description: Run test coverage measurement, analyze results, and fix gaps when coverage falls below the 80% threshold. --- # Check Coverage -Run test coverage measurement, analyze results, and fix gaps when coverage -falls below the 80% threshold. - ## Steps -### 1. Run coverage measurement - -```bash -make test-cov-check -``` - -If all tests pass and coverage is ≥80%, you are done. Otherwise continue. - -### 2. If tests fail (non-zero exit from pytest itself) +### 1. Run `make test-cov-check` -Before looking at coverage, fix the test failures: +If tests pass and coverage >= 80%, done. -- **Existing tests broke after your code change**: The production code is - likely wrong. Fix the code in `src/databao_cli/`, not the tests. The - existing tests encode expected behavior — do not weaken them to make - them pass. -- **Newly written tests fail**: The test itself has a bug. Fix the test. -- **Ambiguous case**: Read the test name and docstring to understand its - intent. If the test asserts correct prior behavior that your change - intentionally alters, update the test and document the behavioral change - in the commit message. +### 2. If tests fail -### 3. If coverage is below 80% +- **Existing tests broke**: fix production code, not tests. +- **New tests fail**: fix the test. +- **Ambiguous**: read test intent. If behavior intentionally changed, update + test and document in commit message. -Examine the "Missing" column in the terminal report to identify uncovered -lines. Prioritize: +### 3. If coverage below 80% -1. New code you just wrote — this should always have tests. -2. Critical paths (error handling, validation, CLI command logic). -3. Utility functions with clear input/output contracts. +Check "Missing" column. Prioritize: +1. New code you just wrote (must have tests). +2. Critical paths (error handling, validation, CLI logic). +3. Utility functions with clear contracts. -Do NOT: -- Add empty or trivial tests solely to raise the coverage number. -- Add `# pragma: no cover` to bypass the threshold without justification. -- Test third-party library behavior or Streamlit UI internals. +Do NOT: add trivial tests just to raise numbers, add `# pragma: no cover` +without justification, or test third-party/Streamlit internals. -### 4. Write targeted tests +### 4. Write tests -Add tests in the corresponding `tests/test_.py` file. Follow the -existing pattern: -- Use `project_layout` fixture from `conftest.py` when the test needs a - project directory. -- One behavior per test function. -- Test file mirrors source module structure. +Add to `tests/test_.py`. Use `project_layout` fixture when needed. +One behavior per test function. -### 5. Re-run and verify +### 5. Re-run `make test-cov-check` -```bash -make test-cov-check -``` +Repeat until threshold met. -Repeat steps 2–5 until the threshold is met and all tests pass. +### 6. HTML report (optional) -### 6. Generate HTML report (optional) - -```bash -make test-cov -``` - -Open `htmlcov/index.html` to visually inspect coverage. +`make test-cov` -- opens `htmlcov/index.html`. ## Failure handling -- If `pytest-cov` is not installed, run `uv sync --dev` first. -- If coverage is below 80% and you cannot reasonably cover the uncovered - lines (e.g., platform-specific code, external service calls), add a - brief `# pragma: no cover` comment with a reason, and note it in the - commit message. - -## What this skill does NOT do - -- It does not run e2e tests or measure their coverage. -- It does not modify the 80% threshold — that is set in `pyproject.toml`. -- It does not auto-generate tests — it guides you to write meaningful ones. +- Missing `pytest-cov`: run `uv sync --dev`. +- Uncoverable lines (platform-specific, external calls): add + `# pragma: no cover` with reason, note in commit message. diff --git a/.claude/skills/check-pr-comments/SKILL.md b/.claude/skills/check-pr-comments/SKILL.md index d344de02..f61876f9 100644 --- a/.claude/skills/check-pr-comments/SKILL.md +++ b/.claude/skills/check-pr-comments/SKILL.md @@ -1,47 +1,24 @@ --- name: check-pr-comments -description: Check GitHub pull request review comments using the GitHub CLI. Use when the user wants to fetch unresolved PR review threads, implement requested changes on the PR branch, validate the fix locally, reply in-thread, and resolve addressed threads. +description: Fetch unresolved PR review threads, triage them, implement fixes, validate, reply in-thread, and resolve. compatibility: gh must be installed and authenticated. --- # Address PR Comments -Address GitHub pull request review comments with `gh`, make the requested -changes on the PR branch, validate them locally, and close the loop on GitHub. - ## Steps -### 1. Identify the PR and verify GitHub access - -Start with: - -```bash -gh auth status -``` - -If the user supplied a PR number or URL, use it. Otherwise try the PR for the -current branch: +### 1. Identify the PR -```bash -gh pr view --json number,title,url,headRefName,baseRefName -``` - -If no PR can be identified, ask the user for the PR number or URL before doing -anything else. +Run `gh auth status`. Use the user-supplied PR number/URL, or detect from +current branch: `gh pr view --json number,title,url,headRefName,baseRefName`. +If no PR found, ask the user. ### 2. Fetch unresolved review threads -Prefer GraphQL over `gh pr view --comments`. Timeline comments do not preserve -thread resolution state, file paths, or line mappings well enough for this -task. +Get repo owner/name: `gh repo view --json owner,name -q '.owner.login + " " + .name'` -First, obtain the repository owner and name: - -```bash -gh repo view --json owner,name -q '.owner.login + " " + .name' -``` - -Then fetch the review threads using those values: +Fetch threads via GraphQL: ```bash gh api graphql -f query=" @@ -51,22 +28,10 @@ gh api graphql -f query=" reviewThreads(first:100) { pageInfo { hasNextPage endCursor } nodes { - id - isResolved - isOutdated + id isResolved isOutdated comments(first:20) { pageInfo { hasNextPage endCursor } - nodes { - id - databaseId - author { login } - body - path - line - originalLine - createdAt - url - } + nodes { id author{login} body path line url } } } } @@ -75,121 +40,51 @@ gh api graphql -f query=" }" -F owner= -F repo= -F number= ``` -Replace `` and `` with the values from `gh repo view` above, and -`` with the pull request number. - -If `hasNextPage` is `true` for either `reviewThreads` or `comments`, paginate -by re-running the query with an `after: ""` argument on the -corresponding connection until all pages are fetched. - -Focus first on threads where `isResolved` is `false`. Treat `isOutdated: true` -as lower priority unless the underlying issue still exists in the current code. +If `hasNextPage` is true, re-run the query adding `after: ""` +to the corresponding connection (e.g., `reviewThreads(first:100, after:"")`). +Focus on `isResolved: false` threads. -### 3. Build a Markdown triage table and stop for approval +### 3. Triage table -- stop for approval -Before making changes, summarize the unresolved threads in a Markdown table so -the user can see the plan at a glance. +Present a Markdown table before any edits: -Use one row per unresolved thread with these columns: - -- `Comment`: a short paraphrase of the reviewer request -- `File`: the referenced path, or `PR-level` if the thread is not tied to a file -- `Triage`: one of `Implement`, `Reply only`, or `Blocked` - -Template: - -```md | Comment | File | Triage | | --- | --- | --- | -| Validate empty input before calling the parser | `src/foo.py` | Implement | -| Why not keep the old flag name for compatibility? | `src/cli.py` | Reply only | -| This conflicts with another approved requirement | `src/config.py` | Blocked | -``` +| | `path` or `PR-level` | Implement / Reply only / Blocked | -Then decide which bucket each thread belongs to: +Wait for **explicit** user approval before proceeding. -- **Code change needed**: implement the requested fix. -- **Explanation needed**: no code change is appropriate; prepare a concise, - respectful reply with evidence. -- **Blocked or conflicting**: the reviewer request conflicts with the PR scope, - another comment, or product intent. Surface that to the user before making a - broad or risky change. +### 4. Apply fixes -Read the referenced files and nearby code before editing. Do not treat a single -comment in isolation if it touches behavior shared across the module. +Address one thread at a time, minimal changes. Fix root cause once when +multiple threads share it. -The triage table should come before any file edits or GitHub replies. +### 5. Validate -After showing the table, stop and wait for explicit user approval. Do not: +Run smallest meaningful validation for the changed area. Do not claim +"fixed" without running validation or stating it is unverified. -- edit files -- run validation for a proposed fix -- reply on GitHub -- resolve any review thread - -Approval must be explicit. A vague acknowledgment is not enough. - -### 4. Apply the fixes on the PR branch - -Only proceed to this step after the user approves the triage table. - -Address one thread at a time and keep the change set minimal. When multiple -threads point to the same root cause, fix the root cause once and reply to each -affected thread explicitly. - -Guardrails: - -- Do not push unrelated cleanups while addressing review comments. -- Do not discard reviewer requests without explaining why. -- Do not mark a thread resolved until the underlying issue is actually handled. - -### 5. Validate before replying - -Only run this after the user has approved implementation. - -Run the smallest meaningful validation for the changed area, based on the repo's -actual tooling. - -Do not reply with "fixed" unless you have either run the relevant validation or -state clearly that the change is unverified. - -### 6. Reply in-thread and resolve when appropriate - -Only run this after the user has approved proceeding beyond the triage table. - -If the user asked you to actually address the PR comments on GitHub, post a -thread reply after the code change and validation. - -Reply to a thread: +### 6. Reply and resolve +Reply in-thread: ```bash -gh api graphql -f query=" - mutation(\$threadId:ID!, \$body:String!) { - addPullRequestReviewThreadReply( - input:{pullRequestReviewThreadId:\$threadId, body:\$body} - ) { - comment { url } - } - }" -F threadId= -f body='Addressed in . Validation: .' +gh api graphql -f query="mutation(\$threadId:ID!, \$body:String!) { + addPullRequestReviewThreadReply(input:{pullRequestReviewThreadId:\$threadId, body:\$body}) + { comment { url } } +}" -F threadId= -f body='Addressed in .' ``` -Resolve the thread only after the response is posted and the issue is actually -addressed: - +Resolve only after reply and real fix: ```bash -gh api graphql -f query=" - mutation(\$threadId:ID!) { - resolveReviewThread(input:{threadId:\$threadId}) { - thread { id isResolved } - } - }" -F threadId= +gh api graphql -f query="mutation(\$threadId:ID!) { + resolveReviewThread(input:{threadId:\$threadId}) { thread { id isResolved } } +}" -F threadId= ``` -If the right action is explanation rather than code, reply with the reasoning -and leave the thread open unless the user explicitly wants you to resolve it. +For explanation-only replies, leave thread open unless user says to resolve. -## What this skill does NOT do +## Guardrails -- It does not replace local code review before implementation. -- It does not assume every comment should lead to a code change. -- It does not resolve threads silently without a reply and a real fix. +- No edits, replies, or resolves before user approves the triage table. +- No unrelated cleanups while addressing comments. +- No silent thread resolution without a reply and real fix. diff --git a/.claude/skills/create-pr/SKILL.md b/.claude/skills/create-pr/SKILL.md index 1897f2c0..78a4d2e2 100644 --- a/.claude/skills/create-pr/SKILL.md +++ b/.claude/skills/create-pr/SKILL.md @@ -1,67 +1,42 @@ --- name: create-pr -description: Stage, commit, push, and open a GitHub PR following project conventions. Use when code is ready to ship — after tests pass, code review, and architecture review are done. +description: Stage, commit, push, and open a GitHub PR following project conventions. Use when code is ready to ship. compatibility: gh must be installed and authenticated. --- # Create PR -Stage changes, commit, push, and open a GitHub pull request following -project conventions. - ## Steps ### 1. Verify preconditions -- Confirm you are NOT on `main`. If on `main`, run the `create-branch` - skill to create a feature branch before continuing. -- Check `git status` for changes to commit. If nothing to commit, inform - the user and stop. -### 2. Run quality gates - -These are **blocking** — do not commit until both pass. - -1. **`make check`** (ruff + mypy) - - Run `make check`. - - If ruff fails, auto-fix with `uv run ruff check --fix src/databao_cli && uv run ruff format src/databao_cli`, then re-run `make check`. - - If mypy fails after ruff is clean, stop and fix the type errors before continuing. -2. **`make test`** (pytest) - - Run `make test`. - - If tests fail, stop and fix the failures before continuing. -3. **`make test-cov-check`** (coverage threshold) - - Run `make test-cov-check`. - - Warn the user if coverage is below threshold, but do **not** block the PR. +- Must NOT be on `main`. If so, run `create-branch` skill first. +- Check `git status` for changes. If clean, inform user and stop. -### 3. Determine the ticket ID - -- Extract from the current branch name if it contains `DBA-`. -- Otherwise ask the user for the ticket ID. - -### 4. Stage and commit +### 2. Run quality gates -- Stage relevant files (prefer explicit paths over `git add -A`). -- Commit following the **Commit Messages** section in `CLAUDE.md`. +Run all three gates from **Quality Gates** in `CLAUDE.md`. Do not commit +until they pass. -### 5. Pause for confirmation +### 3. Stage and commit -Present the user with: +- Extract ticket ID from branch name (`DBA-`) or ask user. +- Stage specific files (never `git add -A`). +- Commit per **Commit Messages** in `CLAUDE.md`. -- The branch name and commit(s) that will be pushed. -- A draft PR description following the template below. +### 4. Pause for confirmation -**Wait for explicit user confirmation before proceeding.** +Show branch, commit(s), and draft PR description. Wait for explicit approval. -> **Autosteer exception**: if autosteer mode is active, skip this pause -> and proceed directly to pushing and creating the PR. +> **Autosteer exception**: skip this pause. -### 6. Push and create PR +### 5. Push and create PR -- Push with `-u` flag: `git push -u origin ` -- Create the PR via `gh pr create` using the template: +Push with `-u` flag. Create PR via `gh pr create` using this template: ``` ## Summary -<1-3 sentence overview of why this change exists> +<1-3 sentence overview> ## Changes @@ -69,26 +44,21 @@ Present the user with:
Files -- `path/to/file1` -- `path/to/file2` +- `path/to/file`
-### -... - ## Test Plan -- [ ] +- [ ] ``` -### 7. Report +### 6. Report -Output the PR URL so the user can review it. +Output the PR URL. ## Guardrails - Never push to `main`. -- Never push without explicit user confirmation. -- Never skip the commit prefix when a ticket is known (see `CLAUDE.md`). -- Never use `git add -A` or `git add .` — stage specific files. -- If `gh` CLI is not available, show the push command and PR URL for - manual creation. +- Never push without user confirmation (except autosteer). +- Never skip commit prefix when ticket is known. +- Never use `git add -A` or `git add .`. +- If `gh` unavailable, show manual push/PR instructions. diff --git a/.claude/skills/eval-skills/SKILL.md b/.claude/skills/eval-skills/SKILL.md index a04548cb..07491b60 100644 --- a/.claude/skills/eval-skills/SKILL.md +++ b/.claude/skills/eval-skills/SKILL.md @@ -1,107 +1,57 @@ --- name: eval-skills -description: Run structured agent-in-the-loop evaluations on skills to measure quality and track improvements. Use after modifying a SKILL.md, after changing `CLAUDE.md` or guidance docs that skills depend on, or periodically to benchmark skill quality and catch regressions. +description: Run structured evaluations on skills to measure quality and track improvements. argument-hint: "[skill-name ...] (e.g. local-code-review review-architecture)" --- # Eval Skills -Run structured evaluations on agent skills to measure quality and track -improvements across iterations. - ## Steps -### 1. Determine which skills to evaluate - -This skill accepts an optional list of skill names via `$ARGUMENTS`. +### 1. Determine skills to evaluate -- If skill names are provided (e.g. `/eval-skills local-code-review review-architecture`), - evaluate only those skills. -- If no arguments are provided, ask the user which skills they want to - evaluate. List all skills that have `evals/evals.json` files and let - the user pick. Accept "all" to evaluate every skill with evals. +If names provided via `$ARGUMENTS`, evaluate those. Otherwise list skills +with `evals/evals.json` files and ask user to pick (accept "all"). -Each skill's test cases live in `.claude/skills//evals/evals.json`. - -### 2. Create an iteration directory +### 2. Create iteration directory ```bash mkdir -p .claude/evals-workspace/iteration- ``` -Use the next sequential number. Check existing directories to determine N. +Use next sequential number. ### 3. Run eval cases -For each test case in `evals.json`, run it twice: - -**With skill** — spawn a subagent with the skill loaded: -- Provide the skill path: `.claude/skills/` -- Provide the test prompt from `evals.json` -- Save outputs to `.claude/evals-workspace/iteration-/-/with_skill/outputs/` +For each test case in `evals.json`, run twice: -**Without skill** — spawn a subagent without the skill: -- Use the same prompt -- Save outputs to `.claude/evals-workspace/iteration-/-/without_skill/outputs/` +- **With skill**: subagent with skill loaded, save to `iteration-/-/with_skill/outputs/` +- **Without skill**: subagent without skill, save to `iteration-/-/without_skill/outputs/` -Each run should start with clean context (no prior state). +Each run starts with clean context. -### 4. Grade outputs - -For each run, evaluate every assertion from `evals.json` against the actual -output. Record results in `grading.json`: +### 4. Grade +Evaluate assertions against output. Save `grading.json`: ```json { - "assertion_results": [ - { - "text": "Agent runs make setup", - "passed": true, - "evidence": "Agent executed `make setup` as first command" - } - ], - "summary": { "passed": 3, "failed": 1, "total": 4, "pass_rate": 0.75 } + "assertion_results": [{"text": "...", "passed": true, "evidence": "..."}], + "summary": {"passed": 3, "failed": 1, "total": 4, "pass_rate": 0.75} } ``` -Require concrete evidence for every PASS — don't give benefit of the doubt. +Require concrete evidence for every PASS. -### 5. Aggregate results +### 5. Aggregate -Compute summary statistics and save to -`.claude/evals-workspace/iteration-/benchmark.json`: +Save `iteration-/benchmark.json` with mean pass rates (with/without skill) and delta. -```json -{ - "run_summary": { - "with_skill": { "pass_rate": { "mean": 0.83 } }, - "without_skill": { "pass_rate": { "mean": 0.33 } }, - "delta": { "pass_rate": 0.50 } - } -} -``` - -### 6. Present results for human review +### 6. Present results -Show a summary table: -- Per-eval pass rates (with vs without skill) -- Overall delta -- Any assertions that always pass (candidates for removal) -- Any assertions that always fail (candidates for revision) - -Record human feedback in -`.claude/evals-workspace/iteration-/feedback.json`. +Show per-eval pass rates, overall delta, always-pass candidates (remove?), +always-fail candidates (revise?). Save feedback to `feedback.json`. ## Iteration loop -After review: -1. Update the SKILL.md based on failed assertions and feedback. -2. Run a new iteration (increment N). -3. Compare benchmark.json across iterations to track improvement. -4. Stop when feedback is consistently empty or pass rates plateau. - -## What this skill does NOT do - -- Automatically fix skills — it produces evaluation data for human decision-making. -- Run Tier 1/2 validation — use `make lint-skills` or `make smoke-skills` for that. -- Modify evals.json — test cases should be updated deliberately, not during eval runs. +Update SKILL.md based on findings, run new iteration, compare benchmarks, +stop when pass rates plateau. diff --git a/.claude/skills/local-code-review/SKILL.md b/.claude/skills/local-code-review/SKILL.md index 696f6efd..d083e144 100644 --- a/.claude/skills/local-code-review/SKILL.md +++ b/.claude/skills/local-code-review/SKILL.md @@ -1,6 +1,6 @@ --- name: local-code-review -description: Review local code changes in Databao repositories before a commit or PR. Use when the user wants a review of staged or unstaged diffs, local branches, or pre-merge changes. Focus on correctness, regressions, missing tests, API/CLI behavior changes, executor or tooling changes, dependency or plugin-loading risks, and user-visible behavior changes. +description: Review local code changes for correctness, regressions, missing tests, and Databao-specific risks. argument-hint: "[scope: staged | branch | files:]" context: fork agent: reviewer @@ -8,150 +8,63 @@ agent: reviewer # Local Code Review -You are reviewing code changes for the Databao CLI project. -You have NO prior context about why these changes were made — review -purely on merit. +You are reviewing code changes for Databao CLI with NO prior context. ## Scope -Review scope: $ARGUMENTS +Review scope: $ARGUMENTS (default: `branch`) -If no scope was provided, default to `branch`. - -Accepted scopes: - -- `staged` — review only staged changes (`git diff --cached`) -- `branch` — review the branch diff against main (default) -- `files:` — review specific files or directories (e.g. `files:src/databao_cli/mcp/`) - -## Review Goal - -Find the highest-signal problems in the changes under review: - -- correctness bugs -- regressions in user-visible behavior -- broken or inconsistent integration behavior -- dependency, packaging, or plugin-loading risks -- missing or misaligned tests -- docs or help-text drift for user-visible changes -- formatting or linting issues - -Keep summaries brief. +- `staged` -- `git diff --cached` +- `branch` -- diff against main +- `files:` -- specific files/directories ## Steps -### 1. Scope Discovery - -Start by identifying what changed: - -1. Run `git status --short`. -2. Inspect the relevant changes based on the scope you were given: - - **branch**: diff from the merge base with the main branch - - **staged**: `git diff --cached --stat` and `git diff --cached` - - **files**: read the specified files and their recent git history -3. Read the actual diffs for changed files before reading large surrounding files. - -Prefer `rg`, `git diff`, and targeted file reads over broad scans. - -### Databao Review Priorities - -Pay extra attention to these repository-specific areas: - -- CLI, API, or UI behavior -- agent, executor, or model-provider wiring -- MCP, plugin, or integration boundaries -- configuration, build, or initialization flows -- datasource, schema, or context-building logic -- dependency, packaging, extras, and lockfile changes -- test coverage for changed behavior - -If a change touches one of those areas, review both the changed code and related tests. - -Use these targeted checks when the diff touches the corresponding area: - -- CLI, API, or UI behavior: - check defaults, help text, argument handling, request or response contracts, and user-visible output -- agent, executor, or integration wiring: - check provider and model defaults, executor names, tool contracts, and consistency across callers -- plugin, datasource, or build flows: - check configuration prompts, validation paths, plugin-loading expectations, and schema or context drift -- packaging and dependencies: - check extras wiring, entrypoints, transitive dependency impact, lockfile drift, and docs/help drift -- tests: - verify that changed behavior has corresponding assertions or call out the gap explicitly. - Make sure the tests cover some real logical path and are not only trivial assertions. - -### 2. Review Workflow - -1. Establish the review scope from git. -2. Read the diff carefully. -3. Read the surrounding implementation for changed logic. -4. Check related tests, identify where tests should have changed but did not. -5. Evaluate if new tests should be added to cover added functionality. - -Good validation options: - -- the full test suite when it is practical for the repo and review scope -- targeted test runs for modified areas when they give a faster or more relevant signal -- non-mutating lint checks -- non-mutating formatter checks -- type checks -- lockfile or dependency metadata validation when package definitions changed - -Before running validation, inspect the repo's local tooling configuration and use commands that actually exist there. - -Examples, when configured in the current repo: - -- `uv run pytest ` -- `uv run ruff check ` -- `uv run ruff format --check ` -- `uv run mypy ` -- `uv lock --check` - -Default to running the full test suite when it is practical and likely to add useful confidence. Use targeted tests instead when the diff is narrow and a focused run is the better fit. - -Avoid mutating validation in review mode: - -- do not run formatter or linter commands with `--fix` -- do not run formatting commands without a check-only mode when one exists -- do not run wrapper commands like `make check` or `pre-commit` unless you have verified they are non-mutating +### 1. Discover changes -### 3. Findings +Run `git status --short`, then inspect the relevant diff. Read diffs before +reading large surrounding files. -A good finding should: +### 2. Review -- identify a concrete bug, regression, or meaningful risk -- explain why it matters in real behavior -- point to the exact file and line -- mention the missing validation if that is part of the risk +1. Read the diff carefully. +2. Read surrounding implementation for changed logic. +3. Check related tests -- identify where tests should have changed but did not. -Avoid weak findings like stylistic opinions, speculative architecture preferences, or advice not grounded in the diff. +#### Databao-specific priorities -#### Output Format +Pay extra attention when changes touch: +- CLI/API/UI behavior (defaults, help text, argument handling, output) +- Agent/executor/model-provider wiring (provider defaults, tool contracts) +- MCP/plugin/integration boundaries +- Config/build/init flows, datasource/schema/context logic +- Packaging/deps/lockfile (extras, entrypoints, transitive impact) +- Test coverage for changed behavior -Return findings first, ordered by severity. +#### Validation -For each finding: +Run non-mutating checks only: +- `uv run pytest ` or full suite if practical +- `uv run ruff check `, `uv run ruff format --check ` +- `uv run mypy `, `uv lock --check` -- short severity label -- concise title -- why it is a problem -- file and line reference -- brief remediation direction when obvious +Never run `--fix` or mutating formatters in review mode. -Then include: +### 3. Report findings -- open questions or assumptions -- a short note on testing gaps or validation performed +Order by severity. Each finding: +- Severity label + concise title +- Why it matters +- File and line reference +- Remediation direction (brief) -If there are no findings, say that clearly and still mention any residual risk or untested surface. +Then: open questions, testing gaps, validation performed. -Format the results using Markdown. +No findings? Say so, mention residual risk or untested surface. ## Guardrails -- Include short code snippets to illustrate suggested fixes, but keep them conceptual — avoid pasting full rewrites or verbose replacement blocks. -- Do not bury findings under a long summary. +- Short code snippets for fixes, not full rewrites. +- Do not bury findings under summaries. - Do not claim tests passed unless you ran them. -- Do not over-index on style when behavior risks exist. -- Prefer explicit evidence from the diff and nearby code. +- Prefer evidence from the diff over style opinions. diff --git a/.claude/skills/make-yt-issue/SKILL.md b/.claude/skills/make-yt-issue/SKILL.md index 00ab8335..253ec646 100644 --- a/.claude/skills/make-yt-issue/SKILL.md +++ b/.claude/skills/make-yt-issue/SKILL.md @@ -1,93 +1,49 @@ --- name: make-yt-issue -description: Ensure a YouTrack issue exists before starting work. Use at the start of any task when the user has not provided a ticket ID, when you need to verify a ticket exists, when the user asks to create a ticket, or before starting untracked work. +description: Ensure a YouTrack issue exists before starting work. Validates existing tickets or creates new ones. compatibility: YouTrack MCP must be configured and available. --- # Ensure YouTrack Issue -Ensure a YouTrack issue exists for the current work. Depending on context, this -skill resolves an existing ticket, or drafts and creates a new one after user -approval. - ## Steps ### 1. Determine intent -Read the conversation to decide which path to take: - -| Signal | Path | -|--------|------| -| User provides a ticket ID (`DBA-123`, `123`, or a YouTrack URL) | Go to **step 2** (validate) | -| User explicitly asks to create a ticket (e.g., "create a ticket for …") | Go to **step 3** (draft) | -| No ticket mentioned | Ask: *"Do you have a YouTrack ticket for this work?"* — then route based on answer | - -Accept `DBA-XXX`, a bare number (expand to `DBA-XXX`), or a full YouTrack URL. - -### 2. Validate the ticket - -Use the `get_issue` MCP tool to fetch the issue. - -- **Found** — display the issue summary and confirm with the user that it - matches the intended work. If confirmed, proceed to step 5. -- **Not found / error** — inform the user the ticket could not be loaded and - offer to create a new one (continue to step 3). - -### 3. Draft the issue from context - -Generate a proposed issue using details from the conversation: - -- **Summary**: concise one-line title in imperative mood. -- **Description**: what the work involves and why it matters (2-4 sentences). -- **Type**: Bug, Task, or Feature — infer from context, default to Task. - -Present the draft clearly: +| Signal | Action | +|---|---| +| User provides ticket ID (`DBA-123`, `123`, or URL) | Validate (step 2) | +| User asks to create a ticket | Draft (step 3) | +| No ticket mentioned | Ask: "Do you have a YouTrack ticket?" | -``` -Summary: -Type: +### 2. Validate -Description: - -``` +Fetch with `get_issue`. If found, confirm with user it matches the work. +If not found, offer to create (step 3). -Ask the user to **approve, edit, or reject** the draft. -If the user rejects and does not want a ticket, respect that and stop. +### 3. Draft -> **Autosteer exception**: if autosteer mode is active, skip the approval -> prompt and create the issue immediately using the drafted values. +From conversation context, propose: +- **Summary**: imperative one-line title +- **Description**: 2-4 sentences +- **Type**: Bug / Task / Feature (default: Task) -### 4. Create the issue +Ask user to approve, edit, or reject. -After the user approves (or edits and then approves): +> **Autosteer exception**: create immediately without approval. -- Use the `create_issue` MCP tool targeting the **DBA** project. -- Report the created issue ID (e.g., `DBA-456`). +### 4. Create -### 5. Transition to Develop +Use `create_issue` in **DBA** project. Report created issue ID. -Move the issue to **Develop** state using `update_issue` (set the `State` -custom field) so the board reflects active work. +### 5. Move to Develop -Report the final state: issue ID and current status. +Set `State` to **Develop** via `update_issue`. ## Guardrails -- Never create an issue without explicit user approval of the summary, - description, and type. -- Never skip the validation step when the user provides an existing ticket ID — - always confirm the ticket matches the intended work. -- If the YouTrack MCP server is unavailable, inform the user and refer them to - `DEVELOPMENT.md` for setup instructions. -- Default to the **DBA** project unless the user specifies otherwise. -- Accept flexible input: `DBA-123`, `123`, or a YouTrack URL should all resolve - correctly. -- If the user declines to create a ticket, respect that and do not push back. - -## What this skill does NOT do - -- It does not manage state transitions beyond the initial move to Develop — - ongoing state management (Develop → Review) is handled by the workflow in - CLAUDE.md. -- It does not assign the issue or set priority — use `update_issue` or - `change_issue_assignee` MCP tools directly for that. +- Never create without user approval of summary/description/type (except in autosteer mode). +- Always validate when user provides an existing ID. +- If YouTrack MCP unavailable, refer to `DEVELOPMENT.md`. +- Default to **DBA** project. Accept `DBA-XXX`, bare numbers, or URLs. +- Respect user declining to create a ticket. diff --git a/.claude/skills/review-architecture/SKILL.md b/.claude/skills/review-architecture/SKILL.md index d5d975f0..fa4cd123 100755 --- a/.claude/skills/review-architecture/SKILL.md +++ b/.claude/skills/review-architecture/SKILL.md @@ -1,6 +1,6 @@ --- name: review-architecture -description: Review architecture quality, maintainability, and developer experience before or after significant changes. Use when introducing a new CLI command or MCP tool, refactoring core module boundaries, diagnosing repeated dev friction, or preparing a PR with broad structural impact. +description: Review architecture quality, maintainability, and developer experience. argument-hint: "[scope: branch | module: | full]" context: fork agent: reviewer @@ -8,82 +8,52 @@ agent: reviewer # Review Architecture -You are reviewing the architecture of the Databao CLI project. -You have NO prior context about why these changes were made — review -purely on merit. +You are reviewing Databao CLI architecture with NO prior context. ## Scope -Review scope: $ARGUMENTS +Review scope: $ARGUMENTS (default: `branch`) -If no scope was provided, default to `branch`. +- `branch` -- architecture of code changed on current branch +- `module:` -- specific module +- `full` -- full project architecture -Accepted scopes: - -- `branch` — review architecture of code changed on the current branch (default) -- `module:` — review a specific module (e.g. `module:src/databao_cli/mcp/`) -- `full` — review the full project architecture - -## Primary sources of truth - -Review in this order: +## Sources of truth 1. `docs/architecture.md` 2. `docs/python-coding-guidelines.md` 3. `docs/testing-strategy.md` 4. `CLAUDE.md` -5. `README.md` (CLI usage and user-facing workflows) - -## Review goals - -- Confirm boundaries are clear and responsibilities are separated. -- Confirm extension paths are low-friction (new command, MCP tool, UI page). -- Confirm docs reflect real behavior and command paths. -- Identify highest-impact improvements with minimal disruption. +5. `README.md` ## Architecture checklist -- Are modules aligned with single responsibility? -- Are CLI concerns separated from business logic? -- Is the Click command structure clean and discoverable? -- Does `workflows/` stay free of business logic (delegates to `features/`)? -- Are `features/` functions free of Click dependency (pure business operations)? -- Is `shared/` limited to cross-feature utilities with no business logic of its own? -- Are MCP tools properly isolated in their own module? -- Are UI components reusable and page-specific logic separated? -- Are errors actionable and surfaced at the right layer? +- Modules aligned with single responsibility? +- CLI concerns separated from business logic? +- Click command structure clean and discoverable? +- `workflows/` delegates to `features/`, no business logic? +- `features/` free of Click dependency? +- `shared/` limited to cross-feature utilities? +- MCP tools properly isolated? +- Errors actionable and surfaced at right layer? ## Dev UX checklist -- Can a new contributor find the right entrypoints quickly? -- Are `uv` / `make` commands obvious and consistent across docs/code? -- Is local verification clear (`make check`, `make test`)? -- Are defaults safe when environment/dependencies are missing? -- Do naming and file layout reduce cognitive load? -- Are common workflows discoverable in README + docs? +- New contributors find entrypoints quickly? +- `uv`/`make` commands obvious and consistent? +- Defaults safe when deps missing? +- Naming and layout reduce cognitive load? ## Output format -Provide a concise report with these sections: - -1. **Current State**: 3-6 bullets on what is working well. -2. **Risks / Gaps**: prioritized issues (High/Med/Low) with evidence. -3. **Recommendations**: concrete changes, ordered by impact vs effort. -4. **Doc Sync Needed**: exact files that should be updated. -5. **Validation Plan**: minimal commands to verify proposed changes. - -## Recommendation style - -- Prefer small, composable changes over sweeping rewrites. -- Tie every recommendation to a specific pain point. -- Include expected developer benefit (speed, clarity, reliability). -- Flag trade-offs explicitly. -- Separate immediate actions from longer-term improvements. +1. **Current State**: 3-6 bullets on what works well +2. **Risks/Gaps**: prioritized (High/Med/Low) with evidence +3. **Recommendations**: concrete, ordered by impact vs effort +4. **Doc Sync**: files needing updates +5. **Validation Plan**: commands to verify changes ## Guardrails -- Do not invent architecture that conflicts with current code unless proposing it - explicitly as a future direction. -- Do not request broad rewrites without clear ROI. -- Keep proposals compatible with existing `uv` workflow and Click-based CLI. -- Keep feedback actionable for the next pull request, not just aspirational. +- Small composable changes over rewrites. Tie recommendations to pain points. +- Keep proposals compatible with `uv` workflow and Click CLI. +- Actionable for next PR, not aspirational. diff --git a/.claude/skills/update-pr/SKILL.md b/.claude/skills/update-pr/SKILL.md index 72d2a727..f2bba09c 100644 --- a/.claude/skills/update-pr/SKILL.md +++ b/.claude/skills/update-pr/SKILL.md @@ -1,53 +1,40 @@ --- name: update-pr -description: Stage, commit, and push follow-up changes to an existing feature branch or PR. Use for quick iterations — after addressing review feedback, fixing a bug on the branch, or adding incremental changes that don't need a new PR. +description: Stage, commit, and push follow-up changes to an existing feature branch or PR. Use for quick iterations. compatibility: gh must be installed and authenticated. --- # Update PR -Stage changes, commit, and push to the current feature branch. Designed for -fast iterations on an existing branch or open PR. +Stage, commit, and push to the current feature branch for fast iterations. ## Steps ### 1. Verify preconditions -- Confirm you are NOT on `main`. If on `main`, run the `create-branch` - skill to create a feature branch before continuing. -- Check `git status` for changes to commit. If nothing to commit, inform - the user and stop. +- Must NOT be on `main`. If so, run `create-branch` skill first. +- Check `git status`. If clean, inform user and stop. ### 2. Run quality gates -These are **blocking** — do not commit until both pass. - -1. **`make check`** (ruff + mypy) - - Run `make check`. - - If ruff fails, auto-fix with `uv run ruff check --fix src/databao_cli && uv run ruff format src/databao_cli`, then re-run `make check`. - - If mypy fails after ruff is clean, stop and fix the type errors before continuing. -2. **`make test`** (pytest) - - Run `make test`. - - If tests fail, stop and fix the failures before continuing. +Run gates from **Quality Gates** in `CLAUDE.md`. Do not commit until they pass. ### 3. Stage and commit -- Stage relevant files (prefer explicit paths over `git add -A`). -- Extract the ticket ID from the branch name if it contains `DBA-`. -- Commit following the **Commit Messages** section in `CLAUDE.md`. +- Stage specific files (never `git add -A`). +- Extract ticket ID from branch name if it contains `DBA-`. +- Commit per **Commit Messages** in `CLAUDE.md`. ### 4. Push -- Push to the tracked remote branch: `git push`. -- If no upstream is set, push with `-u`: `git push -u origin `. +Push to tracked remote. If no upstream: `git push -u origin `. ### 5. Report -Confirm the push succeeded and show the commit hash. If a PR exists for -the branch, show the PR URL. +Confirm push, show commit hash. Show PR URL if one exists for the branch. ## Guardrails - Never push to `main`. -- Never use `git add -A` or `git add .` — stage specific files. -- Never skip the commit prefix when a ticket is known (see `CLAUDE.md`). +- Never use `git add -A` or `git add .`. +- Never skip commit prefix when ticket is known. diff --git a/.claude/skills/write-tests/SKILL.md b/.claude/skills/write-tests/SKILL.md index 058eafbc..9ac1fa16 100644 --- a/.claude/skills/write-tests/SKILL.md +++ b/.claude/skills/write-tests/SKILL.md @@ -1,71 +1,46 @@ --- name: write-tests -description: Write or update unit tests for changed code, following project conventions and ensuring coverage meets the 80% threshold. Use after implementing a feature, command, or MCP tool, after fixing a bug (regression test), when asked to cover existing code, or when `make test-cov-check` reports missing coverage. +description: Write or update unit tests for changed code, following project conventions and ensuring coverage meets the 80% threshold. --- # Write Tests -Write or update unit tests for new or changed code in `src/databao_cli/`, -following project conventions and ensuring the coverage threshold is met. - ## Steps ### 1. Identify what to test -Read the source module(s) you need to cover. Focus on: - -- Public functions and CLI command handlers. -- Branches: error paths, edge cases, validation logic. -- New or changed behavior — not unchanged code. +Read the source module(s). Focus on: +- Public functions and CLI command handlers +- Branches: error paths, edge cases, validation +- New or changed behavior -Do NOT test: +Skip: third-party internals, Streamlit UI, trivial pass-throughs. -- Third-party library internals or Streamlit UI (`src/databao_cli/ui/`). -- Trivial pass-through wrappers with no logic. +### 2. Locate or create test file -### 2. Locate or create the test file +Tests in `tests/` mirror source modules: -Test files live in `tests/` and mirror source modules: - -| Source module | Test file | -|----------------------------------------|--------------------------------| -| `src/databao_cli/commands/init.py` | `tests/test_init.py` | -| `src/databao_cli/commands/build.py` | `tests/test_build.py` | -| `src/databao_cli/mcp/tools/.py` | `tests/test_mcp.py` | +| Source | Test file | +|---|---| +| `src/databao_cli/commands/init.py` | `tests/test_init.py` | +| `src/databao_cli/commands/build.py` | `tests/test_build.py` | +| `src/databao_cli/mcp/tools/.py` | `tests/test_mcp.py` | | `src/databao_cli/commands/datasource/add.py` | `tests/test_add_datasource.py` | -If the test file already exists, add tests to it. Create a new file only when -no matching test file exists. - -### 3. Follow project test conventions +Add to existing file when possible. -- **Framework**: `pytest` — no `unittest.TestCase` subclasses. -- **Fixtures**: use the `project_layout` fixture from `conftest.py` when the - test needs an initialized project directory. Use `tmp_path` for isolated - filesystem operations. -- **CLI testing**: use `click.testing.CliRunner` to invoke CLI commands. Import - the top-level `cli` group from `databao_cli.__main__`. Check `result.exit_code` - and `result.output` / `result.stderr`. -- **One behavior per test**: each test function asserts one logical behavior. - Name it `test__` (e.g., `test_init_fails_when_project_exists`). -- **Type hints**: add return type `-> None` to all test functions. -- **Mocking**: mock external I/O (network, subprocess, Docker) but do NOT mock - internal project modules — test real behavior. Use `unittest.mock.patch` or - `pytest.monkeypatch`. -- **Helpers**: if you need shared test utilities, place them in `tests/utils/`. +### 3. Conventions -### 4. Write the tests +- `pytest` only, no `unittest.TestCase`. +- Use `project_layout` fixture for project dirs, `tmp_path` for filesystem. +- CLI: `click.testing.CliRunner`, import `cli` from `databao_cli.__main__`. +- One behavior per test: `test__`. +- Return type `-> None` on all test functions. +- Mock external I/O, not internal modules. -For each behavior to cover: - -1. **Arrange** — set up inputs, fixtures, and any mocks. -2. **Act** — call the function or invoke the CLI command. -3. **Assert** — verify the expected outcome (return value, side effects, output, - exit code, raised exception). - -Keep assertions specific and actionable. Prefer checking exact values over -truthiness. Include error message context in assertions: +### 4. Write tests (Arrange/Act/Assert) +Specific assertions over truthiness. Include context: ```python assert result.exit_code == 0, f"Expected success but got: {result.output}" ``` @@ -74,29 +49,11 @@ assert result.exit_code == 0, f"Expected success but got: {result.output}" ```bash uv run pytest tests/test_.py -v -``` - -Fix any failures. Then run the full suite with coverage: - -```bash make test-cov-check ``` -If coverage is still below 80%, identify remaining uncovered lines from the -report and add more tests. Repeat until the threshold is met. - -### 6. Lint and type-check - -```bash -make check -``` - -Fix any ruff or mypy errors in the new test code before considering the -tests complete. +Repeat until 80% threshold met. -## What this skill does NOT do +### 6. Lint -- It does not write e2e tests (those use `pexpect` and `testcontainers`). -- It does not modify the coverage threshold — that is set in `pyproject.toml`. -- It does not weaken existing tests to make them pass. If existing tests break - after a code change, fix the production code first (see `check-coverage` skill). +Run `make check`. Fix ruff/mypy errors in test code. diff --git a/CLAUDE.md b/CLAUDE.md index d94c3951..dee9da03 100755 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,14 +6,24 @@ Claude Code entrypoint for agent instructions in this repository. - Prefer concise updates with clear file/command references. - YouTrack MCP must be configured (see DEVELOPMENT.md). Use get_issue / update_issue tools. -- **Pacing rule**: pause for user confirmation between phases (plan → - implement → validate → commit → PR). Present what you intend to do or - what you just did, then wait for a go-ahead before moving to the next - phase. Small, safe actions within a phase (running tests, reading files) +- **Pacing rule**: pause for user confirmation between phases (plan -> + implement -> validate -> commit -> PR). Small, safe actions within a phase do not require a pause. - - **Exception — autosteer mode**: when the `autosteer` skill is active, - skip all inter-phase confirmations and proceed autonomously. Stop only - on quality-gate failures. See `.claude/skills/autosteer/SKILL.md`. + - **Exception -- autosteer mode**: skip all inter-phase confirmations. + Stop only on quality-gate failures. + +## Output Efficiency + +- No sycophantic openers ("Sure!", "Great question!", "Absolutely!") -- lead with the answer. +- No hollow closings ("Let me know if you need anything!", "I hope this helps!"). +- No restating the user's prompt -- execute immediately. +- No unsolicited suggestions or scope creep -- answer exact scope only. +- No redundant file reads -- read each file once per session unless changed. +- No unnecessary disclaimers unless there is a genuine safety risk. +- When corrected, apply the correction as ground truth -- no "You're absolutely right" preamble. +- Code first, explanation after -- only if non-obvious. No inline prose in code blocks. +- Simplest working solution. No abstractions for single-use operations. +- ASCII-only output: plain hyphens, straight quotes. Copy-paste safe. ## References @@ -53,90 +63,66 @@ Key directories: ## Build, Lint, Test Commands -- Environment setup: `make setup` (installs deps, pre-commit hooks, verifies toolchain) -- Pre-commit (ruff + mypy): `make check` or `uv run pre-commit run --all-files` -- Ruff lint: `uv run ruff check src/databao_cli` -- Ruff format: `uv run ruff format src/databao_cli` -- Mypy: `uv run mypy src/databao_cli` -- Unit tests: `make test` or `uv run pytest tests/ -v` -- Smoke test: `uv run databao --help` -- Single test file: `uv run pytest tests/test_foo.py -v` +- Setup: `make setup` +- Lint + type-check: `make check` +- Unit tests: `make test` - Single test: `uv run pytest tests/test_foo.py::test_bar -v` - Coverage check: `make test-cov-check` (fails if below 80%) -- Coverage report: `make test-cov` (terminal + HTML in `htmlcov/`) -- Skill validation: `make lint-skills` (static checks on agent guidance) -- Skill smoke tests: `make smoke-skills` (functional verification of skill commands) +- Coverage report: `make test-cov` (HTML in `htmlcov/`) +- Skill validation: `make lint-skills` + +### Quality Gates + +These three gates are **blocking** before any commit. Skills reference +this section instead of repeating the commands. + +1. `make check` -- runs `pre-commit run --all-files` (ruff, mypy, uv lock, agent guidance validation). Auto-fix ruff with `uv run ruff check --fix . && uv run ruff format .`, then re-run. +2. `make test` -- pytest. Fix failures before continuing. +3. `make test-cov-check` -- coverage threshold. Write tests if below 80%. ## Coding Guidelines -Style and formatting are enforced by ruff and mypy — only non-linter-enforceable -rules are listed here. Use `ruff check --fix` and `ruff format` to auto-fix -style issues; do not manually fix formatting. +Style enforced by ruff + mypy. Use `ruff check --fix` and `ruff format` to auto-fix. -- Add type hints for public APIs and non-trivial helpers; strict mypy is enabled. -- Validate config/args early and raise specific exceptions with actionable - messages. -- Use `logging` for runtime behavior; use `print` only for tiny utilities. -- CLI framework is Click — follow Click patterns for new commands. +- Type hints on public APIs; strict mypy enabled. +- Validate config/args early; raise specific exceptions. +- Use `logging` for runtime; `print` only for tiny utilities. +- CLI framework is Click. ## Change Management -- Prefer minimal, focused edits over broad rewrites. -- Do not silently alter behavior; document intentional changes. +- Minimal, focused edits over broad rewrites. - Update tests when changing commands, protocols, or behavior. - Update `README.md` if command examples or workflows change. -- Run `make test-cov-check` after changing code in `src/databao_cli/` to - verify coverage meets the threshold in `[tool.coverage.report] fail_under` - (`pyproject.toml`). If existing tests break, fix the production code — - do not weaken tests. If newly written tests are wrong, fix the tests. -- When modifying agent guidance files (skills, coding-guidelines), - run `make lint-skills` to validate consistency. - The pre-commit hook runs this automatically on commit. +- Run `make test-cov-check` after changing `src/databao_cli/`. If existing + tests break, fix production code -- do not weaken tests. +- Run `make lint-skills` after modifying agent guidance files. ## YouTrack Ticket Workflow -- Before starting work, use the `make-yt-issue` skill to verify or create a - YouTrack ticket. It handles asking for the ID, validating it exists, and - creating one if needed. -- If the YouTrack MCP server is unavailable, refer the user to `DEVELOPMENT.md` - for setup instructions. -- If a ticket is provided, read it with the `get_issue` tool to understand the - full scope before writing any code. -- When starting work on a ticket, move it to **Develop** state using - `update_issue` (set `State` field). -- After creating a PR, move the ticket to **Review** state and add a - comment with the PR URL and the Claude Code session cost (from `/cost`). +- Use `make-yt-issue` skill to verify or create a ticket before starting work. +- Read ticket with `get_issue` to understand scope before coding. +- Move to **Develop** on start (`update_issue`, set `State`). +- Move to **Review** after PR, comment with PR URL and session cost (`/cost`). +- If YouTrack MCP unavailable, refer user to `DEVELOPMENT.md`. + ## Commit Messages - Format: `[DBA-XXX] ` (max 72 chars) -- Use imperative mood: "Add feature", not "Added feature" or "Adds feature" -- Lowercase after the prefix: `[DBA-123] fix auth timeout` -- No trailing period -- If a body is needed, add a blank line after the summary: - - Explain *why*, not *what* (the diff shows what) - - Wrap at 72 characters -- If no ticket exists, omit the prefix — don't invent one +- Imperative mood, lowercase after prefix: `[DBA-123] fix auth timeout` +- No trailing period. Body explains *why*, not *what*. Wrap at 72 chars. +- No ticket? Omit the prefix. ## After Completing Work -Each numbered step below is a **phase**. Present the outcome of each -phase and wait for user confirmation before starting the next one. -**Exception**: in autosteer mode (`/autosteer`), run all phases -sequentially without pausing — stop only on quality-gate failures. - -1. **Plan** — outline the approach and list files you intend to change. -2. **Implement** — write the code changes to satisfy the ticket - requirements, including tests. Run `make test` to verify they pass. -3. **Validate** — run `make check` then `make test-cov-check`. Fix any - failures before proceeding. -4. **Review** — run `local-code-review` and `review-architecture` skills. - Both run in forked sub-agent context (no prior conversation state) - and can run **in parallel**. -5. **Branch & commit** — use the `create-branch` skill, then stage and - commit following **Commit Messages** conventions. -6. **PR** — use the `create-pr` skill (pushes and opens the PR). -7. **Update YouTrack** — move the ticket to **Review** state and add - a comment with the PR URL and the Claude Code session cost (run - `/cost` to obtain it). +Each step is a **phase** -- pause for confirmation between phases (except in autosteer mode). + +1. **Plan** -- outline approach and files to change. +2. **Implement** -- code + tests. Run `make test`. +3. **Validate** -- run quality gates (see above). Fix failures. +4. **Review** -- run `local-code-review` and `review-architecture` skills in parallel (forked context). +5. **Branch & commit** -- `create-branch` skill, then commit per **Commit Messages**. +6. **PR** -- `create-pr` skill. +7. **Update YouTrack** -- move to **Review**, comment with PR URL + session cost (`/cost`). Never commit directly to `main`. diff --git a/e2e-tests/src/project_utils.py b/e2e-tests/src/project_utils.py index 7b14c9b1..f0b6d48b 100644 --- a/e2e-tests/src/project_utils.py +++ b/e2e-tests/src/project_utils.py @@ -1,3 +1,4 @@ +import os from pathlib import Path import allure @@ -37,7 +38,8 @@ def execute_init(project_dir: Path, db: PostgresDB | MysqlDB | SnowflakeDB | Big def execute_build(project_dir: Path): log_file_path = project_dir / "cli.log" with open(log_file_path, "w") as logfile: - child = pexpect.spawn("uv run databao build", cwd=project_dir, encoding="utf-8", timeout=900, logfile=logfile) + env = {**os.environ, "NO_COLOR": "1", "TERM": "dumb"} + child = pexpect.spawn("uv run databao build", cwd=project_dir, encoding="utf-8", timeout=900, logfile=logfile, env=env) try: with allure.step("Checking for Ollama model"): # Wait for Ollama download/installation with extended timeout