SpecCritic

SpecCritic evaluates software specifications as formal contracts, identifying defects before implementation begins. It behaves like a hostile contract lawyer—not a collaborator—treating vague language, unverifiable requirements, and missing failure modes as bugs.

$ speccritic check SPEC.md --verbose
INFO: Loading spec: SPEC.md
INFO: Calling LLM: anthropic:claude-sonnet-4-20250514
INFO: Rendering output (format: json)

Verdict: INVALID  Score: 60/100  Critical: 2  Warn: 3  Info: 1

Why

A specification is invalid when two competent engineers could implement it differently and both believe they followed it. SpecCritic enforces this contract before a single line of code is written.

Intended workflow:

Write or update SPEC.md
Run speccritic check SPEC.md
Fix issues (revise the spec or answer clarification questions)
Repeat until verdict is acceptable
Only then begin implementation

Installation

go install github.com/dshills/speccritic/cmd/speccritic@latest

Or build from source:

git clone https://github.com/dshills/speccritic.git
cd speccritic
go build -ldflags "-X main.version=$(git describe --tags --always)" -o speccritic ./cmd/speccritic/

Quick Start

Run a fast deterministic preflight check without model credentials:

speccritic check SPEC.md --preflight-mode only

Use this first when iterating on a spec. It catches obvious placeholders, vague language, weak requirements, missing sections, undefined acronyms, and unmeasurable criteria locally before making an LLM call.

Set your model and API key when you are ready for a full review:

export SPECCRITIC_LLM_PROVIDER=anthropic
export SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514
export ANTHROPIC_API_KEY=sk-ant-...

Run a check:

speccritic check SPEC.md

Output as Markdown:

speccritic check SPEC.md --format md

Fail in CI if the spec is invalid:

speccritic check SPEC.md --fail-on INVALID

Gate CI before any LLM call:

speccritic check SPEC.md --preflight-mode gate --fail-on INVALID

Force parallel section chunking for a large spec:

speccritic check SPEC.md --chunking on --chunk-concurrency 4

Reuse a previous JSON result while iterating on a changed spec:

speccritic check SPEC.md --format json --out previous.json
cp SPEC.md SPEC.previous.md
# edit SPEC.md
speccritic check SPEC.md --incremental-from previous.json --incremental-base SPEC.previous.md

Generate advisory completion patches for common profile gaps:

speccritic check SPEC.md \
  --preflight-mode only \
  --completion-suggestions \
  --patch-out completion.patch

This works with preflight-only runs because completion can use deterministic preflight findings as its source findings.

Web UI

SpecCritic also includes a local Go web UI for reviewing specs in the browser. It uses the same review pipeline as the CLI, then renders the uploaded spec with line numbers, summary metrics, finding annotations, provider/model metadata, and modal issue details.

The web UI is intended for local review sessions. It does not replace the CLI and it does not change CLI behavior. Large uploaded specs use the same automatic chunked review path as the CLI and still render as one merged result.

Set the same provider configuration used by the CLI:

export SPECCRITIC_LLM_PROVIDER=anthropic
export SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514
export ANTHROPIC_API_KEY=sk-ant-...

Run the web UI:

make run-web

Then open:

http://127.0.0.1:8080

Large Gemini reviews can take several minutes because chunk calls are run serially in the web UI to avoid provider timeouts. The web server default request timeout is 10 minutes; override it with --request-timeout if needed.

From the browser:

Choose a Markdown or text spec file. Manual text entry is intentionally not supported.
Select a profile and severity threshold.
Optionally upload a previous JSON result for convergence tracking and/or incremental rerun, and a previous spec file when using incremental rerun.
Optionally set convergence mode to track finding status across review iterations.
Optionally enable completion suggestions to generate draft/advisory patch text for profile-specific missing structure.
Optionally enable strict mode or disable the default preflight pass.
Click Check spec.

The left pane lets you choose the provider and model before the review starts. It defaults to the configured environment values when present, otherwise it uses the normal SpecCritic defaults. When the provider changes, the web UI queries that provider's models API using the matching local API key and refreshes the model dropdown; if the query fails, you can still type a model manually.

The Check spec button is disabled until a file is selected and remains disabled while a check is running. During review, the page shows a running indicator and elapsed timer. When the check completes, findings are shown beside the annotated spec. Deterministic findings are labeled Preflight. Incremental, convergence, and completion metadata are shown in the summary when available. Completion patches are labeled draft/advisory, and clicking any finding opens its detail in a modal so the annotated document stays in place.

Use a different address or port with WEB_ADDR:

make run-web WEB_ADDR=127.0.0.1:8081

Build, install, or run the web binary directly:

make build-web
make install-web
go run ./cmd/speccritic-web --addr 127.0.0.1:8080

make build-all builds both bin/speccritic and bin/speccritic-web.

For live local development with Air, this repository includes .air.toml configured for the web server:

air

The Air config builds ./cmd/speccritic-web into ./tmp/speccritic-web and runs it on 127.0.0.1:8090.

Configuration

Model Selection

Set SPECCRITIC_LLM_PROVIDER and SPECCRITIC_LLM_MODEL, or pass --llm-provider and --llm-model for a single CLI run. If unset, SpecCritic defaults to SPECCRITIC_LLM_PROVIDER=anthropic and SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514 with a warning to stderr. If a provider is set without a model, SpecCritic uses the default model for that provider. Preflight-only checks do not require model configuration.

Current builds read the split provider/model variables. If you have old shell or CI snippets that set SPECCRITIC_MODEL=provider:model, replace them with the two variables above.

Provider	API Key Env Var	Model Value Example
`anthropic`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-20250514`
`openai`	`OPENAI_API_KEY`	`gpt-4o`
`gemini`	`GEMINI_API_KEY`	`gemini-2.0-flash`

export SPECCRITIC_LLM_PROVIDER=openai
export SPECCRITIC_LLM_MODEL=gpt-4o
export OPENAI_API_KEY=sk-...

Preflight

Preflight is a deterministic local pass that runs before the LLM by default. It is designed to reduce review latency, token usage, and repeated model round trips by catching high-signal defects immediately.

Preflight findings use the same issue schema as LLM findings and participate in the same final scoring and verdict calculation. When the LLM confirms a preflight finding, SpecCritic deduplicates the result and tags the LLM issue with preflight-confirmed and preflight-rule:<ID> instead of showing the same defect twice.

Modes:

Mode	Behavior
`warn`	Include preflight findings in the final report and continue to the LLM. This is the default.
`gate`	Skip the LLM when blocking preflight findings exist. A finding is blocking when its rule or finding marks it blocking, and all CRITICAL preflight findings are blocking by default.
`only`	Run only deterministic preflight checks. No model credentials are required.

Recommended workflow:

speccritic check SPEC.md --preflight-mode only
# fix deterministic findings
speccritic check SPEC.md --preflight-mode gate --fail-on INVALID
# when preflight is clean enough, run the full review
speccritic check SPEC.md

Suppress a known deterministic false positive with --preflight-ignore:

speccritic check SPEC.md --preflight-ignore PREFLIGHT-ACRONYM-001

Useful preflight behavior:

Redaction still runs before any prompt is built.
--preflight-mode only does not require SPECCRITIC_LLM_PROVIDER, SPECCRITIC_LLM_MODEL, or provider API keys.
--preflight-mode gate is useful in CI when obvious blocking defects should prevent any provider call.
--preflight-profile defaults to --profile; override it only when deterministic checks need a different profile than the LLM review.

Chunked Review

Chunked review is an execution strategy for large specs. It splits the redacted spec by Markdown sections, reviews chunks with bounded parallel LLM calls, validates each chunk against the same schema and evidence rules, optionally runs one cross-section synthesis pass, and merges everything back into one normal report.

Small specs still use the existing single-call path by default.

The final output remains a normal SpecCritic report. Chunk internals are not rendered as separate reports, but chunk-related tags may appear on findings:

Tag	Meaning
`chunked-review`	Finding came from chunked review rather than the single-call path.
`chunk:<CHUNK-ID>`	Source chunk that emitted the finding.
`cross-section`	Chunk reviewer believed the finding depends on another section.
`synthesis`	Finding came from the cross-section synthesis pass.

Modes:

Mode	Behavior
`auto`	Use chunking when the spec has at least `--chunk-min-lines` lines or the estimated prompt is at least `--chunk-token-threshold` tokens. This is the default.
`on`	Force chunking whenever an LLM review is needed.
`off`	Always use the original single-call LLM path.

Examples:

# Force chunking for a large spec and allow four concurrent chunk calls.
speccritic check SPEC.md --chunking on --chunk-concurrency 4

# Disable chunking while debugging prompt behavior.
speccritic check SPEC.md --chunking off --debug

# Tune for a rate-limited provider.
speccritic check SPEC.md --chunk-concurrency 1 --chunk-lines 140

Chunking usually reduces wall-clock latency for large specs, but it may increase the total number of provider calls. Provider rate limits, low concurrency, and cross-section synthesis can reduce the speedup. Cross-section defects are still hard: chunk prompts receive a table of contents and summaries, and synthesis can catch contradictions across sections, but no chunking strategy is a substitute for a well-structured spec.

Implementation details:

Chunking happens after spec loading, redaction, and preflight.
auto mode uses a deterministic local estimate of one token per four UTF-8 bytes. This is a rough heuristic; code-heavy specs and non-English specs may need a lower --chunk-token-threshold or forced --chunking on.
Chunk reviews cite only their primary line range; overlap lines are context only.
Every chunk response must include meta.chunk_summary; summaries are used for synthesis and are not shown as user-facing output.
Chunk calls run with bounded concurrency.
If one chunk fails permanently after the built-in repair attempt, the check fails with model-output/provider error rather than returning partial results.
Synthesis runs when chunked review has findings or when the spec is at least --synthesis-line-threshold lines. A no-finding chunked review below that threshold skips synthesis.

Incremental Rerun

Incremental rerun is an execution strategy for iterative edits. It takes a previous SpecCritic JSON report, compares the current spec with the previous spec text, reuses eligible findings from unchanged sections, and reviews only changed ranges when reuse is safe.

The default behavior is conservative. If incremental safety cannot be proven in auto mode, SpecCritic falls back to a normal full review. Use --incremental-mode on only when you want unsafe incremental conditions to fail instead of falling back.

Typical CLI workflow:

# 1. Save a full JSON result for the current spec.
speccritic check SPEC.md --format json --out previous.json

# 2. Preserve the exact spec text that produced previous.json.
cp SPEC.md SPEC.previous.md

# 3. Edit SPEC.md, then rerun against only changed sections when safe.
speccritic check SPEC.md \
  --incremental-from previous.json \
  --incremental-base SPEC.previous.md \
  --incremental-report

Important behavior:

--incremental-from must point to valid SpecCritic JSON output.
--incremental-base is required when the current spec content differs from the previous report hash. The JSON report contains findings and metadata, not the old spec text needed for section diffing.
If the current spec hash matches the previous report hash, SpecCritic can reuse eligible findings without a base file.
Preflight still runs against the full current spec before any incremental LLM call.
Reused findings are tagged incremental-reused; new findings from changed ranges are tagged incremental-review.
--incremental-report adds optional meta.incremental details to JSON output. Markdown output keeps the normal human-readable report shape.
The web UI exposes the same workflow with optional Previous JSON result, Previous spec file, and Mode controls. Uploaded previous results are used only for the current request.

Convergence Tracking

Convergence tracking compares the current review result with a previous SpecCritic JSON report and reports progress across iterations. It classifies active findings as new or still_open, and historical findings as resolved, dropped, or untracked.

Typical CLI workflow:

# 1. Save a baseline JSON result.
speccritic check SPEC.md --format json --out review-1.json

# 2. Edit SPEC.md, then compare the new result with the baseline.
speccritic check SPEC.md \
  --convergence-from review-1.json \
  --format md

Use --convergence-mode auto for normal iteration. It keeps the current review successful even when comparison is partial or unavailable, including strict compatibility mismatches. Use --convergence-mode on when a missing, invalid, or strictly incompatible previous report should fail with exit code 3. Use --convergence-mode off to ignore a configured convergence baseline.

Important behavior:

Convergence runs after the current review is complete.
Current score, verdict, patches, and --fail-on behavior are based only on current findings.
Resolved historical findings do not affect the current score or verdict.
dropped means a historical finding no longer participates because the current threshold filters it out or its prior content is no longer applicable.
Preflight-only runs cannot prove prior LLM findings are resolved, so those findings are marked untracked unless they match current preflight findings.
Convergence matching is local and does not send previous report contents to a provider.
When --convergence-report is enabled, JSON output includes meta.convergence and Markdown output includes a human-readable convergence summary.
The web UI can use the uploaded previous JSON result for convergence tracking and/or incremental rerun; incremental rerun also needs the previous spec file when the spec changed. Uploaded previous results are not stored server-side.

Completion Suggestions

Completion suggestions are an optional advisory layer that turns current findings into draft patch text for common missing profile structure. They are never applied automatically, never reduce or suppress findings, and never affect score, verdict, or --fail-on behavior.

Completion patch construction is deterministic for the same spec text, findings, questions, profile, and completion options. Completion runs after preflight, chunk merge, incremental merge, and convergence processing. It uses validated current findings/questions and the same redacted review inputs used by the preflight or LLM reviewer, but exact-match patch construction targets the unredacted current spec text so --patch-out remains applicable. Suggested patches are emitted only when SpecCritic can find a safe, exact edit location. When multiple suggestions target overlapping text, the first suggestion by target line, supported severity order, source ID, template section order, and text is emitted; later overlapping suggestions are skipped. Patches are also skipped when the before or after text contains redaction markers or secret-looking values. Missing behavior that requires user judgment is represented with OPEN DECISION placeholders instead of invented requirements.

Supported built-in templates:

Template	Intended use
`profile`	Use the selected `--profile` template. This is the default; the supported review profiles each have a matching completion template.
`general`	General-purpose spec structure and testability gaps.
`backend-api`	Authentication, authorization, error responses, rate limits, and idempotency gaps.
`regulated-system`	Audit trail, access control, retention, deletion, and compliance evidence gaps.
`event-driven`	Event schema, delivery guarantees, ordering, retry, and dead-letter behavior gaps.

Examples:

# Add advisory completion patches to normal review output.
speccritic check SPEC.md --completion-suggestions --format md

# Require safe completion patches for blocking missing-section findings.
speccritic check SPEC.md --completion-mode on

# Generate at most three backend API completion patches.
speccritic check SPEC.md \
  --profile backend-api \
  --completion-suggestions \
  --completion-max-patches 3 \
  --patch-out completion.patch

Completion mode behavior:

Mode	Behavior
`auto`	Emit suggestions only when `--completion-suggestions` is true; suggestions must be tied to current findings/questions and have safe patch locations.
`on`	Enable completion even if `--completion-suggestions` is omitted. Require safe generation for blocking missing-section findings; if required patches cannot be generated, exit with code `3` as a patch requirement error.
`off`	Disable completion output even when `--completion-suggestions` is set.

When completion metadata is present, JSON output includes meta.completion with the enabled status, effective mode, template, generated_patches, skipped_suggestions, and open_decisions. Markdown and web output label completion text as draft/advisory.

A blocking missing-section finding is a current finding with blocking: true, category UNSPECIFIED_CONSTRAINT, and the missing-section tag.

In auto mode, completion output is produced only when --completion-suggestions or SPECCRITIC_COMPLETION_SUGGESTIONS=true is set. --completion-max-patches=0 is valid in all modes; in on mode it causes exit code 3 when any blocking missing-section finding requires a patch. Hitting the patch limit for a required blocking finding also counts as a failure to generate the required patch and exits with code 3. --completion-mode takes precedence over --completion-suggestions: off disables completion, on enables completion, and auto follows the boolean flag.

Flags

speccritic check <spec-file> [flags]

Flag	Default	Description
`--format`	`json`	Output format: `json` or `md`
`--out`	(stdout)	Write output to file
`--profile`	`general`	Evaluation profile (see Profiles)
`--context`	(none)	Context file paths; can be repeated
`--strict`	`false`	Treat all unstated behavior as ambiguous
`--fail-on`	(none)	Exit 2 if verdict meets or exceeds the threshold; valid values are case-sensitive `VALID_WITH_GAPS` or `INVALID`
`--severity-threshold`	`info`	Minimum severity to include in output: `info`, `warn`, `critical`
`--patch-out`	(none)	Write suggested patches to file
`--llm-provider`	env/default	LLM provider override: `anthropic`, `openai`, or `gemini`
`--llm-model`	env/provider default	LLM model override
`--temperature`	`0.2`	LLM temperature (0.0–2.0)
`--max-tokens`	`4096`	Maximum response tokens
`--offline`	`false`	Exit 3 if LLM provider/model env vars are not set (CI enforcement)
`--verbose`	`false`	Print processing steps to stderr
`--debug`	`false`	Dump full prompt to stderr (use only in trusted environments)
`--preflight`	`true`	Run deterministic checks before LLM review
`--preflight-mode`	`warn`	Preflight mode: `warn`, `gate`, or `only`
`--preflight-profile`	same as `--profile`	Override the preflight rule profile
`--preflight-ignore`	(none)	Suppress a preflight rule ID; can be repeated
`--chunking`	`auto`	Chunking mode: `auto`, `on`, or `off`
`--chunk-lines`	`180`	Target maximum source lines per chunk before overlap
`--chunk-overlap`	`20`	Neighboring lines included before and after each chunk for context
`--chunk-min-lines`	`120`	Minimum line count before `auto` may use chunking
`--chunk-token-threshold`	`4000`	Estimated prompt-token count before `auto` may use chunking
`--chunk-concurrency`	`3`	Maximum concurrent chunk LLM calls
`--synthesis-line-threshold`	`240`	Minimum total line count before a no-finding chunked review may run synthesis
`--incremental-from`	(none)	Previous SpecCritic JSON report used as the incremental baseline
`--incremental-base`	(none)	Previous spec text used for section diffing when the current spec changed
`--incremental-mode`	`auto`	Incremental mode: `auto`, `on`, or `off`
`--incremental-max-change-ratio`	`0.35`	Maximum changed-line ratio allowed before fallback or failure
`--incremental-max-remap-failure-ratio`	`0.25`	Maximum prior-finding remap failure ratio allowed before fallback or failure
`--incremental-context-lines`	`20`	Neighboring unchanged lines included around changed sections
`--incremental-strict-reuse`	`true`	Reuse prior findings only when evidence remaps exactly or by unchanged line hash
`--incremental-report`	`false`	Include optional incremental metadata in JSON output
`--convergence-from`	(none)	Previous SpecCritic JSON report used as the convergence baseline
`--convergence-mode`	`auto`	Convergence mode: `auto`, `on`, or `off`
`--convergence-strict`	`false`	Require strict profile, strict-mode, threshold, and redaction compatibility; in `on` mode, mismatches exit 3
`--convergence-report`	`true`	Include optional convergence metadata when convergence is requested
`--completion-suggestions`	`false`	Generate profile-specific advisory completion patches after review
`--completion-mode`	`auto`	Completion mode: `auto`, `on`, or `off`
`--completion-template`	`profile`	Template set to use: `profile`, `general`, `backend-api`, `regulated-system`, or `event-driven`
`--completion-max-patches`	`8`	Maximum completion patches to emit
`--completion-open-decisions`	`true`	Include `OPEN DECISION` placeholders for missing behavior that requires judgment

Chunking, incremental, convergence, and completion environment defaults are also supported when the matching flag is not provided:

Env Var	Matching Flag
`SPECCRITIC_CHUNKING`	`--chunking`
`SPECCRITIC_CHUNK_LINES`	`--chunk-lines`
`SPECCRITIC_CHUNK_OVERLAP`	`--chunk-overlap`
`SPECCRITIC_CHUNK_MIN_LINES`	`--chunk-min-lines`
`SPECCRITIC_CHUNK_TOKEN_THRESHOLD`	`--chunk-token-threshold`
`SPECCRITIC_CHUNK_CONCURRENCY`	`--chunk-concurrency`
`SPECCRITIC_SYNTHESIS_LINE_THRESHOLD`	`--synthesis-line-threshold`
`SPECCRITIC_INCREMENTAL_FROM`	`--incremental-from`
`SPECCRITIC_INCREMENTAL_BASE`	`--incremental-base`
`SPECCRITIC_INCREMENTAL_MODE`	`--incremental-mode`
`SPECCRITIC_INCREMENTAL_MAX_CHANGE_RATIO`	`--incremental-max-change-ratio`
`SPECCRITIC_INCREMENTAL_MAX_REMAP_FAILURE_RATIO`	`--incremental-max-remap-failure-ratio`
`SPECCRITIC_INCREMENTAL_CONTEXT_LINES`	`--incremental-context-lines`
`SPECCRITIC_INCREMENTAL_STRICT_REUSE`	`--incremental-strict-reuse`
`SPECCRITIC_INCREMENTAL_REPORT`	`--incremental-report`
`SPECCRITIC_CONVERGENCE_FROM`	`--convergence-from`
`SPECCRITIC_CONVERGENCE_MODE`	`--convergence-mode`
`SPECCRITIC_CONVERGENCE_STRICT`	`--convergence-strict`
`SPECCRITIC_CONVERGENCE_REPORT`	`--convergence-report`
`SPECCRITIC_COMPLETION_SUGGESTIONS`	`--completion-suggestions`
`SPECCRITIC_COMPLETION_MODE`	`--completion-mode`
`SPECCRITIC_COMPLETION_TEMPLATE`	`--completion-template`
`SPECCRITIC_COMPLETION_MAX_PATCHES`	`--completion-max-patches`
`SPECCRITIC_COMPLETION_OPEN_DECISIONS`	`--completion-open-decisions`

Validation rules:

--chunk-lines must be greater than 0.
--chunk-overlap must be >= 0 and less than --chunk-lines.
--chunk-min-lines must be >= 0.
--chunk-token-threshold must be greater than 0.
--chunk-concurrency must be between 1 and 16.
--synthesis-line-threshold must be >= 0.
--incremental-mode must be auto, on, or off.
--incremental-mode on requires --incremental-from.
--incremental-max-change-ratio must be > 0 and <= 1.
--incremental-max-remap-failure-ratio must be >= 0 and <= 1.
--incremental-context-lines must be >= 0.
--convergence-mode must be auto, on, or off.
--convergence-mode on requires --convergence-from.
--completion-mode must be auto, on, or off.
--completion-mode on enables completion output and does not require --completion-suggestions.
--completion-template must be profile, general, backend-api, regulated-system, or event-driven.
--completion-max-patches must be >= 0.

Profiles

Profiles tune the evaluation for different specification types.

`general` (default)

Applies to any software specification. Flags vague phrases (fast, quickly, as needed, TBD) and enforces that all failure modes and interfaces are defined.

`backend-api`

Requires sections for Authentication, Error Codes, and Rate Limiting. Every endpoint must define request/response schemas. All error codes must be enumerated. Rate limits must be expressed as numeric values with time windows.

speccritic check SPEC.md --profile backend-api

`regulated-system`

For specifications subject to compliance requirements. Requires sections for Audit Trail, Data Retention, and Access Control. Data retention periods must be concrete durations (e.g., "7 years", not "a reasonable period"). Every state transition must be enumerable and auditable.

speccritic check SPEC.md --profile regulated-system

`event-driven`

For event-driven architectures. Requires sections for Event Schema, Delivery Guarantees, and Consumer Failure. Every event type must state delivery semantics (at-least-once vs. exactly-once). Consumer failure modes and retry policies must be specified.

speccritic check SPEC.md --profile event-driven

Defect Categories

Category	Description
`NON_TESTABLE_REQUIREMENT`	Requirement cannot be verified by a test
`AMBIGUOUS_BEHAVIOR`	Two engineers could implement differently
`CONTRADICTION`	Two statements cannot both be true
`MISSING_FAILURE_MODE`	What happens when X fails is not stated
`UNDEFINED_INTERFACE`	A referenced interface has no specification
`MISSING_INVARIANT`	A property that must always hold is not stated
`SCOPE_LEAK`	Spec describes implementation, not behavior
`ORDERING_UNDEFINED`	Sequence of operations is ambiguous
`TERMINOLOGY_INCONSISTENT`	Same concept named differently
`UNSPECIFIED_CONSTRAINT`	Implicit constraint not made explicit
`ASSUMPTION_REQUIRED`	Must assume something unstated to implement

Verdicts and Scoring

Verdicts

Verdict	Meaning
`VALID`	No issues found; spec is consistent and testable
`VALID_WITH_GAPS`	Has WARN or INFO issues; implementation is possible but risky
`INVALID`	Has at least one CRITICAL issue or CRITICAL question; spec cannot be safely implemented

Score

Starts at 100, deducted per finding:

Severity	Deduction
CRITICAL	−20
WARN	−7
INFO	−2

Score is clamped at 0. Both score and verdict are computed before --severity-threshold filtering.

Output Format

JSON (default)

{
  "tool": "speccritic",
  "version": "0.1.0",
  "input": {
    "spec_file": "SPEC.md",
    "spec_hash": "sha256:a3f1...",
    "context_files": [],
    "profile": "general",
    "strict": false,
    "severity_threshold": "info"
  },
  "summary": {
    "verdict": "INVALID",
    "score": 60,
    "critical_count": 2,
    "warn_count": 3,
    "info_count": 1
  },
  "issues": [
    {
      "id": "ISSUE-0001",
      "severity": "CRITICAL",
      "category": "NON_TESTABLE_REQUIREMENT",
      "title": "Performance requirement is not measurable",
      "description": "The spec requires the system to be 'fast' without defining metrics.",
      "evidence": [
        {
          "path": "SPEC.md",
          "line_start": 12,
          "line_end": 12,
          "quote": "The system must respond fast."
        }
      ],
      "impact": "No acceptance test can be written.",
      "recommendation": "Define a concrete latency target, e.g. P99 ≤ 200ms under 500 concurrent users.",
      "blocking": true,
      "tags": []
    }
  ],
  "questions": [...],
  "patches": [...],
  "meta": {
    "model": "anthropic:claude-sonnet-4-20250514",
    "temperature": 0.2,
    "completion": {
      "enabled": true,
      "mode": "auto",
      "template": "backend-api",
      "generated_patches": 2,
      "skipped_suggestions": 1,
      "open_decisions": 3
    }
  }
}

Note: summary counts always reflect all issues regardless of --severity-threshold. The issues array is filtered. The input.severity_threshold field records which filter was applied.

Patches

When the LLM suggests corrections, they are included in the patches array and optionally written to --patch-out in diff-match-patch format:

speccritic check SPEC.md --patch-out spec.patch

Patches are advisory—they are minimal textual corrections, never wholesale rewrites. Completion patches are also advisory and are labeled separately in Markdown, web output, and patch comments when written with --patch-out.

Exit Codes

Code	Meaning
`0`	Success; verdict below `--fail-on` threshold (or no threshold set)
`2`	Verdict meets or exceeds `--fail-on` threshold
`3`	Input error: invalid flags, file not found, or LLM provider/model env vars unset with `--offline`
`4`	Provider error: failed to create LLM provider (bad format, missing API key)
`5`	Model output invalid: LLM response failed schema validation after one retry

Context Files

Use --context to provide grounding documents that inform the evaluation without adding requirements:

speccritic check SPEC.md \
  --context glossary.md \
  --context architecture-overview.md \
  --context compliance-notes.md

Context is used for reference only—it is never used to infer requirements. Each file is redacted independently before being sent to the LLM.

Strict Mode

In strict mode, all silence is treated as ambiguity:

speccritic check SPEC.md --strict

Any behavior not explicitly stated is flagged. Any assumption required to implement is filed as CRITICAL and tagged assumption. Use this for specifications that must be complete before any ambiguity is acceptable.

Security and Privacy

Redaction is always applied before the LLM call. The following patterns are replaced with [REDACTED] (line structure is preserved for accurate evidence citations):
- PEM key blocks
- AWS access key IDs (AKIA...)
- API secret keys (sk-...)
- JWT tokens
- Bearer tokens (≥ 20 characters)
- Inline password assignments
No telemetry. Nothing is logged or transmitted beyond the LLM call.
--debug dumps the full redacted prompt to stderr. Do not use in environments where stderr is captured in shared logs.

CI Integration

# GitHub Actions example
- name: Check specification
  env:
    SPECCRITIC_LLM_PROVIDER: anthropic
    SPECCRITIC_LLM_MODEL: claude-sonnet-4-20250514
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: |
    speccritic check SPEC.md \
      --offline \
      --fail-on INVALID \
      --severity-threshold warn \
      --out spec-review.json

The --offline flag ensures the run fails immediately (exit 3) if SPECCRITIC_LLM_PROVIDER and SPECCRITIC_LLM_MODEL are not set, preventing accidental use of the default model in CI.

For a credentials-free deterministic gate:

- name: Preflight specification
  run: |
    speccritic check SPEC.md \
      --preflight-mode only \
      --fail-on INVALID

Development

# Run all tests
make test

# Run a specific test
go test ./cmd/speccritic/... -run TestRunCheck_BadSpec_INVALID -v

# Build CLI and web binaries
make build-all

# Run the local web UI
make run-web

# Code review (staged changes)
prism review staged

Agentic Integration

See WORKFLOW.md for a detailed guide on integrating SpecCritic into an agentic coding system (Claude Code, Cursor, or any LLM-based agent), including:

The canonical spec → plan → implement gate order
How to parse JSON output and route on verdict
Handling questions (user decisions) vs. issues (agent-fixable)
CLAUDE.md snippet, pre-commit hook, and GitHub Actions CI job
Anti-patterns and a full example agent session

Claude Code Skill

A ready-to-install Claude Code skill lives in examples/claude-code-skill/. It teaches Claude Code when to invoke speccritic, how to parse .speccritic-review.json, and how to route CRITICAL issues (fix in place) vs. CRITICAL questions (ask the user). Install with:

mkdir -p ~/.claude/skills/speccritic
cp examples/claude-code-skill/SKILL.md ~/.claude/skills/speccritic/SKILL.md

See the skill README for project-level install, prerequisites, and customization.

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
cmd		cmd
docs		docs
examples/claude-code-skill		examples/claude-code-skill
internal		internal
pkg/speccritic		pkg/speccritic
specs		specs
testdata		testdata
.air.toml		.air.toml
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
WORKFLOW.md		WORKFLOW.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

SpecCritic

Why

Installation

Quick Start

Web UI

Configuration

Model Selection

Preflight

Chunked Review

Incremental Rerun

Convergence Tracking

Completion Suggestions

Flags

Profiles

general (default)

backend-api

regulated-system

event-driven

Defect Categories

Verdicts and Scoring

Verdicts

Score

Output Format

JSON (default)

Patches

Exit Codes

Context Files

Strict Mode

Security and Privacy

CI Integration

Development

Agentic Integration

Claude Code Skill

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`general` (default)

`backend-api`

`regulated-system`

`event-driven`

Packages