SpecCritic evaluates software specifications as formal contracts, identifying defects before implementation begins. It behaves like a hostile contract lawyer—not a collaborator—treating vague language, unverifiable requirements, and missing failure modes as bugs.
$ speccritic check SPEC.md --verbose
INFO: Loading spec: SPEC.md
INFO: Calling LLM: anthropic:claude-sonnet-4-20250514
INFO: Rendering output (format: json)
Verdict: INVALID Score: 60/100 Critical: 2 Warn: 3 Info: 1
A specification is invalid when two competent engineers could implement it differently and both believe they followed it. SpecCritic enforces this contract before a single line of code is written.
Intended workflow:
- Write or update
SPEC.md - Run
speccritic check SPEC.md - Fix issues (revise the spec or answer clarification questions)
- Repeat until verdict is acceptable
- Only then begin implementation
go install github.com/dshills/speccritic/cmd/speccritic@latestOr build from source:
git clone https://github.com/dshills/speccritic.git
cd speccritic
go build -ldflags "-X main.version=$(git describe --tags --always)" -o speccritic ./cmd/speccritic/Run a fast deterministic preflight check without model credentials:
speccritic check SPEC.md --preflight-mode onlyUse this first when iterating on a spec. It catches obvious placeholders, vague language, weak requirements, missing sections, undefined acronyms, and unmeasurable criteria locally before making an LLM call.
Set your model and API key when you are ready for a full review:
export SPECCRITIC_LLM_PROVIDER=anthropic
export SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514
export ANTHROPIC_API_KEY=sk-ant-...Run a check:
speccritic check SPEC.mdOutput as Markdown:
speccritic check SPEC.md --format mdFail in CI if the spec is invalid:
speccritic check SPEC.md --fail-on INVALIDGate CI before any LLM call:
speccritic check SPEC.md --preflight-mode gate --fail-on INVALIDForce parallel section chunking for a large spec:
speccritic check SPEC.md --chunking on --chunk-concurrency 4Reuse a previous JSON result while iterating on a changed spec:
speccritic check SPEC.md --format json --out previous.json
cp SPEC.md SPEC.previous.md
# edit SPEC.md
speccritic check SPEC.md --incremental-from previous.json --incremental-base SPEC.previous.mdGenerate advisory completion patches for common profile gaps:
speccritic check SPEC.md \
--preflight-mode only \
--completion-suggestions \
--patch-out completion.patchThis works with preflight-only runs because completion can use deterministic preflight findings as its source findings.
SpecCritic also includes a local Go web UI for reviewing specs in the browser. It uses the same review pipeline as the CLI, then renders the uploaded spec with line numbers, summary metrics, finding annotations, provider/model metadata, and modal issue details.
The web UI is intended for local review sessions. It does not replace the CLI and it does not change CLI behavior. Large uploaded specs use the same automatic chunked review path as the CLI and still render as one merged result.
Set the same provider configuration used by the CLI:
export SPECCRITIC_LLM_PROVIDER=anthropic
export SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514
export ANTHROPIC_API_KEY=sk-ant-...Run the web UI:
make run-webThen open:
http://127.0.0.1:8080
Large Gemini reviews can take several minutes because chunk calls are run serially in the web UI to avoid provider timeouts. The web server default request timeout is 10 minutes; override it with --request-timeout if needed.
From the browser:
- Choose a Markdown or text spec file. Manual text entry is intentionally not supported.
- Select a profile and severity threshold.
- Optionally upload a previous JSON result for convergence tracking and/or incremental rerun, and a previous spec file when using incremental rerun.
- Optionally set convergence mode to track finding status across review iterations.
- Optionally enable completion suggestions to generate draft/advisory patch text for profile-specific missing structure.
- Optionally enable strict mode or disable the default preflight pass.
- Click
Check spec.
The left pane lets you choose the provider and model before the review starts. It defaults to the configured environment values when present, otherwise it uses the normal SpecCritic defaults. When the provider changes, the web UI queries that provider's models API using the matching local API key and refreshes the model dropdown; if the query fails, you can still type a model manually.
The Check spec button is disabled until a file is selected and remains disabled while a check is running. During review, the page shows a running indicator and elapsed timer. When the check completes, findings are shown beside the annotated spec. Deterministic findings are labeled Preflight. Incremental, convergence, and completion metadata are shown in the summary when available. Completion patches are labeled draft/advisory, and clicking any finding opens its detail in a modal so the annotated document stays in place.
Use a different address or port with WEB_ADDR:
make run-web WEB_ADDR=127.0.0.1:8081Build, install, or run the web binary directly:
make build-web
make install-web
go run ./cmd/speccritic-web --addr 127.0.0.1:8080make build-all builds both bin/speccritic and bin/speccritic-web.
For live local development with Air, this repository includes .air.toml configured for the web server:
airThe Air config builds ./cmd/speccritic-web into ./tmp/speccritic-web and runs it on 127.0.0.1:8090.
Set SPECCRITIC_LLM_PROVIDER and SPECCRITIC_LLM_MODEL, or pass --llm-provider and --llm-model for a single CLI run. If unset, SpecCritic defaults to SPECCRITIC_LLM_PROVIDER=anthropic and SPECCRITIC_LLM_MODEL=claude-sonnet-4-20250514 with a warning to stderr. If a provider is set without a model, SpecCritic uses the default model for that provider. Preflight-only checks do not require model configuration.
Current builds read the split provider/model variables. If you have old shell or CI snippets that set SPECCRITIC_MODEL=provider:model, replace them with the two variables above.
| Provider | API Key Env Var | Model Value Example |
|---|---|---|
anthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
openai |
OPENAI_API_KEY |
gpt-4o |
gemini |
GEMINI_API_KEY |
gemini-2.0-flash |
export SPECCRITIC_LLM_PROVIDER=openai
export SPECCRITIC_LLM_MODEL=gpt-4o
export OPENAI_API_KEY=sk-...Preflight is a deterministic local pass that runs before the LLM by default. It is designed to reduce review latency, token usage, and repeated model round trips by catching high-signal defects immediately.
Preflight findings use the same issue schema as LLM findings and participate in the same final scoring and verdict calculation. When the LLM confirms a preflight finding, SpecCritic deduplicates the result and tags the LLM issue with preflight-confirmed and preflight-rule:<ID> instead of showing the same defect twice.
Modes:
| Mode | Behavior |
|---|---|
warn |
Include preflight findings in the final report and continue to the LLM. This is the default. |
gate |
Skip the LLM when blocking preflight findings exist. A finding is blocking when its rule or finding marks it blocking, and all CRITICAL preflight findings are blocking by default. |
only |
Run only deterministic preflight checks. No model credentials are required. |
Recommended workflow:
speccritic check SPEC.md --preflight-mode only
# fix deterministic findings
speccritic check SPEC.md --preflight-mode gate --fail-on INVALID
# when preflight is clean enough, run the full review
speccritic check SPEC.mdSuppress a known deterministic false positive with --preflight-ignore:
speccritic check SPEC.md --preflight-ignore PREFLIGHT-ACRONYM-001Useful preflight behavior:
- Redaction still runs before any prompt is built.
--preflight-mode onlydoes not requireSPECCRITIC_LLM_PROVIDER,SPECCRITIC_LLM_MODEL, or provider API keys.--preflight-mode gateis useful in CI when obvious blocking defects should prevent any provider call.--preflight-profiledefaults to--profile; override it only when deterministic checks need a different profile than the LLM review.
Chunked review is an execution strategy for large specs. It splits the redacted spec by Markdown sections, reviews chunks with bounded parallel LLM calls, validates each chunk against the same schema and evidence rules, optionally runs one cross-section synthesis pass, and merges everything back into one normal report.
Small specs still use the existing single-call path by default.
The final output remains a normal SpecCritic report. Chunk internals are not rendered as separate reports, but chunk-related tags may appear on findings:
| Tag | Meaning |
|---|---|
chunked-review |
Finding came from chunked review rather than the single-call path. |
chunk:<CHUNK-ID> |
Source chunk that emitted the finding. |
cross-section |
Chunk reviewer believed the finding depends on another section. |
synthesis |
Finding came from the cross-section synthesis pass. |
Modes:
| Mode | Behavior |
|---|---|
auto |
Use chunking when the spec has at least --chunk-min-lines lines or the estimated prompt is at least --chunk-token-threshold tokens. This is the default. |
on |
Force chunking whenever an LLM review is needed. |
off |
Always use the original single-call LLM path. |
Examples:
# Force chunking for a large spec and allow four concurrent chunk calls.
speccritic check SPEC.md --chunking on --chunk-concurrency 4
# Disable chunking while debugging prompt behavior.
speccritic check SPEC.md --chunking off --debug
# Tune for a rate-limited provider.
speccritic check SPEC.md --chunk-concurrency 1 --chunk-lines 140Chunking usually reduces wall-clock latency for large specs, but it may increase the total number of provider calls. Provider rate limits, low concurrency, and cross-section synthesis can reduce the speedup. Cross-section defects are still hard: chunk prompts receive a table of contents and summaries, and synthesis can catch contradictions across sections, but no chunking strategy is a substitute for a well-structured spec.
Implementation details:
- Chunking happens after spec loading, redaction, and preflight.
automode uses a deterministic local estimate of one token per four UTF-8 bytes. This is a rough heuristic; code-heavy specs and non-English specs may need a lower--chunk-token-thresholdor forced--chunking on.- Chunk reviews cite only their primary line range; overlap lines are context only.
- Every chunk response must include
meta.chunk_summary; summaries are used for synthesis and are not shown as user-facing output. - Chunk calls run with bounded concurrency.
- If one chunk fails permanently after the built-in repair attempt, the check fails with model-output/provider error rather than returning partial results.
- Synthesis runs when chunked review has findings or when the spec is at least
--synthesis-line-thresholdlines. A no-finding chunked review below that threshold skips synthesis.
Incremental rerun is an execution strategy for iterative edits. It takes a previous SpecCritic JSON report, compares the current spec with the previous spec text, reuses eligible findings from unchanged sections, and reviews only changed ranges when reuse is safe.
The default behavior is conservative. If incremental safety cannot be proven in auto mode, SpecCritic falls back to a normal full review. Use --incremental-mode on only when you want unsafe incremental conditions to fail instead of falling back.
Typical CLI workflow:
# 1. Save a full JSON result for the current spec.
speccritic check SPEC.md --format json --out previous.json
# 2. Preserve the exact spec text that produced previous.json.
cp SPEC.md SPEC.previous.md
# 3. Edit SPEC.md, then rerun against only changed sections when safe.
speccritic check SPEC.md \
--incremental-from previous.json \
--incremental-base SPEC.previous.md \
--incremental-reportImportant behavior:
--incremental-frommust point to valid SpecCritic JSON output.--incremental-baseis required when the current spec content differs from the previous report hash. The JSON report contains findings and metadata, not the old spec text needed for section diffing.- If the current spec hash matches the previous report hash, SpecCritic can reuse eligible findings without a base file.
- Preflight still runs against the full current spec before any incremental LLM call.
- Reused findings are tagged
incremental-reused; new findings from changed ranges are taggedincremental-review. --incremental-reportadds optionalmeta.incrementaldetails to JSON output. Markdown output keeps the normal human-readable report shape.- The web UI exposes the same workflow with optional
Previous JSON result,Previous spec file, andModecontrols. Uploaded previous results are used only for the current request.
Convergence tracking compares the current review result with a previous SpecCritic JSON report and reports progress across iterations. It classifies active findings as new or still_open, and historical findings as resolved, dropped, or untracked.
Typical CLI workflow:
# 1. Save a baseline JSON result.
speccritic check SPEC.md --format json --out review-1.json
# 2. Edit SPEC.md, then compare the new result with the baseline.
speccritic check SPEC.md \
--convergence-from review-1.json \
--format mdUse --convergence-mode auto for normal iteration. It keeps the current review successful even when comparison is partial or unavailable, including strict compatibility mismatches. Use --convergence-mode on when a missing, invalid, or strictly incompatible previous report should fail with exit code 3. Use --convergence-mode off to ignore a configured convergence baseline.
Important behavior:
- Convergence runs after the current review is complete.
- Current score, verdict, patches, and
--fail-onbehavior are based only on current findings. - Resolved historical findings do not affect the current score or verdict.
droppedmeans a historical finding no longer participates because the current threshold filters it out or its prior content is no longer applicable.- Preflight-only runs cannot prove prior LLM findings are resolved, so those findings are marked
untrackedunless they match current preflight findings. - Convergence matching is local and does not send previous report contents to a provider.
- When
--convergence-reportis enabled, JSON output includesmeta.convergenceand Markdown output includes a human-readable convergence summary. - The web UI can use the uploaded previous JSON result for convergence tracking and/or incremental rerun; incremental rerun also needs the previous spec file when the spec changed. Uploaded previous results are not stored server-side.
Completion suggestions are an optional advisory layer that turns current findings into draft patch text for common missing profile structure. They are never applied automatically, never reduce or suppress findings, and never affect score, verdict, or --fail-on behavior.
Completion patch construction is deterministic for the same spec text, findings, questions, profile, and completion options. Completion runs after preflight, chunk merge, incremental merge, and convergence processing. It uses validated current findings/questions and the same redacted review inputs used by the preflight or LLM reviewer, but exact-match patch construction targets the unredacted current spec text so --patch-out remains applicable. Suggested patches are emitted only when SpecCritic can find a safe, exact edit location. When multiple suggestions target overlapping text, the first suggestion by target line, supported severity order, source ID, template section order, and text is emitted; later overlapping suggestions are skipped. Patches are also skipped when the before or after text contains redaction markers or secret-looking values. Missing behavior that requires user judgment is represented with OPEN DECISION placeholders instead of invented requirements.
Supported built-in templates:
| Template | Intended use |
|---|---|
profile |
Use the selected --profile template. This is the default; the supported review profiles each have a matching completion template. |
general |
General-purpose spec structure and testability gaps. |
backend-api |
Authentication, authorization, error responses, rate limits, and idempotency gaps. |
regulated-system |
Audit trail, access control, retention, deletion, and compliance evidence gaps. |
event-driven |
Event schema, delivery guarantees, ordering, retry, and dead-letter behavior gaps. |
Examples:
# Add advisory completion patches to normal review output.
speccritic check SPEC.md --completion-suggestions --format md
# Require safe completion patches for blocking missing-section findings.
speccritic check SPEC.md --completion-mode on
# Generate at most three backend API completion patches.
speccritic check SPEC.md \
--profile backend-api \
--completion-suggestions \
--completion-max-patches 3 \
--patch-out completion.patchCompletion mode behavior:
| Mode | Behavior |
|---|---|
auto |
Emit suggestions only when --completion-suggestions is true; suggestions must be tied to current findings/questions and have safe patch locations. |
on |
Enable completion even if --completion-suggestions is omitted. Require safe generation for blocking missing-section findings; if required patches cannot be generated, exit with code 3 as a patch requirement error. |
off |
Disable completion output even when --completion-suggestions is set. |
When completion metadata is present, JSON output includes meta.completion with the enabled status, effective mode, template, generated_patches, skipped_suggestions, and open_decisions. Markdown and web output label completion text as draft/advisory.
A blocking missing-section finding is a current finding with blocking: true, category UNSPECIFIED_CONSTRAINT, and the missing-section tag.
In auto mode, completion output is produced only when --completion-suggestions or SPECCRITIC_COMPLETION_SUGGESTIONS=true is set. --completion-max-patches=0 is valid in all modes; in on mode it causes exit code 3 when any blocking missing-section finding requires a patch. Hitting the patch limit for a required blocking finding also counts as a failure to generate the required patch and exits with code 3.
--completion-mode takes precedence over --completion-suggestions: off disables completion, on enables completion, and auto follows the boolean flag.
speccritic check <spec-file> [flags]
| Flag | Default | Description |
|---|---|---|
--format |
json |
Output format: json or md |
--out |
(stdout) | Write output to file |
--profile |
general |
Evaluation profile (see Profiles) |
--context |
(none) | Context file paths; can be repeated |
--strict |
false |
Treat all unstated behavior as ambiguous |
--fail-on |
(none) | Exit 2 if verdict meets or exceeds the threshold; valid values are case-sensitive VALID_WITH_GAPS or INVALID |
--severity-threshold |
info |
Minimum severity to include in output: info, warn, critical |
--patch-out |
(none) | Write suggested patches to file |
--llm-provider |
env/default | LLM provider override: anthropic, openai, or gemini |
--llm-model |
env/provider default | LLM model override |
--temperature |
0.2 |
LLM temperature (0.0–2.0) |
--max-tokens |
4096 |
Maximum response tokens |
--offline |
false |
Exit 3 if LLM provider/model env vars are not set (CI enforcement) |
--verbose |
false |
Print processing steps to stderr |
--debug |
false |
Dump full prompt to stderr (use only in trusted environments) |
--preflight |
true |
Run deterministic checks before LLM review |
--preflight-mode |
warn |
Preflight mode: warn, gate, or only |
--preflight-profile |
same as --profile |
Override the preflight rule profile |
--preflight-ignore |
(none) | Suppress a preflight rule ID; can be repeated |
--chunking |
auto |
Chunking mode: auto, on, or off |
--chunk-lines |
180 |
Target maximum source lines per chunk before overlap |
--chunk-overlap |
20 |
Neighboring lines included before and after each chunk for context |
--chunk-min-lines |
120 |
Minimum line count before auto may use chunking |
--chunk-token-threshold |
4000 |
Estimated prompt-token count before auto may use chunking |
--chunk-concurrency |
3 |
Maximum concurrent chunk LLM calls |
--synthesis-line-threshold |
240 |
Minimum total line count before a no-finding chunked review may run synthesis |
--incremental-from |
(none) | Previous SpecCritic JSON report used as the incremental baseline |
--incremental-base |
(none) | Previous spec text used for section diffing when the current spec changed |
--incremental-mode |
auto |
Incremental mode: auto, on, or off |
--incremental-max-change-ratio |
0.35 |
Maximum changed-line ratio allowed before fallback or failure |
--incremental-max-remap-failure-ratio |
0.25 |
Maximum prior-finding remap failure ratio allowed before fallback or failure |
--incremental-context-lines |
20 |
Neighboring unchanged lines included around changed sections |
--incremental-strict-reuse |
true |
Reuse prior findings only when evidence remaps exactly or by unchanged line hash |
--incremental-report |
false |
Include optional incremental metadata in JSON output |
--convergence-from |
(none) | Previous SpecCritic JSON report used as the convergence baseline |
--convergence-mode |
auto |
Convergence mode: auto, on, or off |
--convergence-strict |
false |
Require strict profile, strict-mode, threshold, and redaction compatibility; in on mode, mismatches exit 3 |
--convergence-report |
true |
Include optional convergence metadata when convergence is requested |
--completion-suggestions |
false |
Generate profile-specific advisory completion patches after review |
--completion-mode |
auto |
Completion mode: auto, on, or off |
--completion-template |
profile |
Template set to use: profile, general, backend-api, regulated-system, or event-driven |
--completion-max-patches |
8 |
Maximum completion patches to emit |
--completion-open-decisions |
true |
Include OPEN DECISION placeholders for missing behavior that requires judgment |
Chunking, incremental, convergence, and completion environment defaults are also supported when the matching flag is not provided:
| Env Var | Matching Flag |
|---|---|
SPECCRITIC_CHUNKING |
--chunking |
SPECCRITIC_CHUNK_LINES |
--chunk-lines |
SPECCRITIC_CHUNK_OVERLAP |
--chunk-overlap |
SPECCRITIC_CHUNK_MIN_LINES |
--chunk-min-lines |
SPECCRITIC_CHUNK_TOKEN_THRESHOLD |
--chunk-token-threshold |
SPECCRITIC_CHUNK_CONCURRENCY |
--chunk-concurrency |
SPECCRITIC_SYNTHESIS_LINE_THRESHOLD |
--synthesis-line-threshold |
SPECCRITIC_INCREMENTAL_FROM |
--incremental-from |
SPECCRITIC_INCREMENTAL_BASE |
--incremental-base |
SPECCRITIC_INCREMENTAL_MODE |
--incremental-mode |
SPECCRITIC_INCREMENTAL_MAX_CHANGE_RATIO |
--incremental-max-change-ratio |
SPECCRITIC_INCREMENTAL_MAX_REMAP_FAILURE_RATIO |
--incremental-max-remap-failure-ratio |
SPECCRITIC_INCREMENTAL_CONTEXT_LINES |
--incremental-context-lines |
SPECCRITIC_INCREMENTAL_STRICT_REUSE |
--incremental-strict-reuse |
SPECCRITIC_INCREMENTAL_REPORT |
--incremental-report |
SPECCRITIC_CONVERGENCE_FROM |
--convergence-from |
SPECCRITIC_CONVERGENCE_MODE |
--convergence-mode |
SPECCRITIC_CONVERGENCE_STRICT |
--convergence-strict |
SPECCRITIC_CONVERGENCE_REPORT |
--convergence-report |
SPECCRITIC_COMPLETION_SUGGESTIONS |
--completion-suggestions |
SPECCRITIC_COMPLETION_MODE |
--completion-mode |
SPECCRITIC_COMPLETION_TEMPLATE |
--completion-template |
SPECCRITIC_COMPLETION_MAX_PATCHES |
--completion-max-patches |
SPECCRITIC_COMPLETION_OPEN_DECISIONS |
--completion-open-decisions |
Validation rules:
--chunk-linesmust be greater than0.--chunk-overlapmust be>= 0and less than--chunk-lines.--chunk-min-linesmust be>= 0.--chunk-token-thresholdmust be greater than0.--chunk-concurrencymust be between1and16.--synthesis-line-thresholdmust be>= 0.--incremental-modemust beauto,on, oroff.--incremental-mode onrequires--incremental-from.--incremental-max-change-ratiomust be> 0and<= 1.--incremental-max-remap-failure-ratiomust be>= 0and<= 1.--incremental-context-linesmust be>= 0.--convergence-modemust beauto,on, oroff.--convergence-mode onrequires--convergence-from.--completion-modemust beauto,on, oroff.--completion-mode onenables completion output and does not require--completion-suggestions.--completion-templatemust beprofile,general,backend-api,regulated-system, orevent-driven.--completion-max-patchesmust be>= 0.
Profiles tune the evaluation for different specification types.
Applies to any software specification. Flags vague phrases (fast, quickly, as needed, TBD) and enforces that all failure modes and interfaces are defined.
Requires sections for Authentication, Error Codes, and Rate Limiting. Every endpoint must define request/response schemas. All error codes must be enumerated. Rate limits must be expressed as numeric values with time windows.
speccritic check SPEC.md --profile backend-apiFor specifications subject to compliance requirements. Requires sections for Audit Trail, Data Retention, and Access Control. Data retention periods must be concrete durations (e.g., "7 years", not "a reasonable period"). Every state transition must be enumerable and auditable.
speccritic check SPEC.md --profile regulated-systemFor event-driven architectures. Requires sections for Event Schema, Delivery Guarantees, and Consumer Failure. Every event type must state delivery semantics (at-least-once vs. exactly-once). Consumer failure modes and retry policies must be specified.
speccritic check SPEC.md --profile event-driven| Category | Description |
|---|---|
NON_TESTABLE_REQUIREMENT |
Requirement cannot be verified by a test |
AMBIGUOUS_BEHAVIOR |
Two engineers could implement differently |
CONTRADICTION |
Two statements cannot both be true |
MISSING_FAILURE_MODE |
What happens when X fails is not stated |
UNDEFINED_INTERFACE |
A referenced interface has no specification |
MISSING_INVARIANT |
A property that must always hold is not stated |
SCOPE_LEAK |
Spec describes implementation, not behavior |
ORDERING_UNDEFINED |
Sequence of operations is ambiguous |
TERMINOLOGY_INCONSISTENT |
Same concept named differently |
UNSPECIFIED_CONSTRAINT |
Implicit constraint not made explicit |
ASSUMPTION_REQUIRED |
Must assume something unstated to implement |
| Verdict | Meaning |
|---|---|
VALID |
No issues found; spec is consistent and testable |
VALID_WITH_GAPS |
Has WARN or INFO issues; implementation is possible but risky |
INVALID |
Has at least one CRITICAL issue or CRITICAL question; spec cannot be safely implemented |
Starts at 100, deducted per finding:
| Severity | Deduction |
|---|---|
| CRITICAL | −20 |
| WARN | −7 |
| INFO | −2 |
Score is clamped at 0. Both score and verdict are computed before --severity-threshold filtering.
{
"tool": "speccritic",
"version": "0.1.0",
"input": {
"spec_file": "SPEC.md",
"spec_hash": "sha256:a3f1...",
"context_files": [],
"profile": "general",
"strict": false,
"severity_threshold": "info"
},
"summary": {
"verdict": "INVALID",
"score": 60,
"critical_count": 2,
"warn_count": 3,
"info_count": 1
},
"issues": [
{
"id": "ISSUE-0001",
"severity": "CRITICAL",
"category": "NON_TESTABLE_REQUIREMENT",
"title": "Performance requirement is not measurable",
"description": "The spec requires the system to be 'fast' without defining metrics.",
"evidence": [
{
"path": "SPEC.md",
"line_start": 12,
"line_end": 12,
"quote": "The system must respond fast."
}
],
"impact": "No acceptance test can be written.",
"recommendation": "Define a concrete latency target, e.g. P99 ≤ 200ms under 500 concurrent users.",
"blocking": true,
"tags": []
}
],
"questions": [...],
"patches": [...],
"meta": {
"model": "anthropic:claude-sonnet-4-20250514",
"temperature": 0.2,
"completion": {
"enabled": true,
"mode": "auto",
"template": "backend-api",
"generated_patches": 2,
"skipped_suggestions": 1,
"open_decisions": 3
}
}
}Note:
summarycounts always reflect all issues regardless of--severity-threshold. Theissuesarray is filtered. Theinput.severity_thresholdfield records which filter was applied.
When the LLM suggests corrections, they are included in the patches array and optionally written to --patch-out in diff-match-patch format:
speccritic check SPEC.md --patch-out spec.patchPatches are advisory—they are minimal textual corrections, never wholesale rewrites. Completion patches are also advisory and are labeled separately in Markdown, web output, and patch comments when written with --patch-out.
| Code | Meaning |
|---|---|
0 |
Success; verdict below --fail-on threshold (or no threshold set) |
2 |
Verdict meets or exceeds --fail-on threshold |
3 |
Input error: invalid flags, file not found, or LLM provider/model env vars unset with --offline |
4 |
Provider error: failed to create LLM provider (bad format, missing API key) |
5 |
Model output invalid: LLM response failed schema validation after one retry |
Use --context to provide grounding documents that inform the evaluation without adding requirements:
speccritic check SPEC.md \
--context glossary.md \
--context architecture-overview.md \
--context compliance-notes.mdContext is used for reference only—it is never used to infer requirements. Each file is redacted independently before being sent to the LLM.
In strict mode, all silence is treated as ambiguity:
speccritic check SPEC.md --strictAny behavior not explicitly stated is flagged. Any assumption required to implement is filed as CRITICAL and tagged assumption. Use this for specifications that must be complete before any ambiguity is acceptable.
- Redaction is always applied before the LLM call. The following patterns are replaced with
[REDACTED](line structure is preserved for accurate evidence citations):- PEM key blocks
- AWS access key IDs (
AKIA...) - API secret keys (
sk-...) - JWT tokens
- Bearer tokens (≥ 20 characters)
- Inline password assignments
- No telemetry. Nothing is logged or transmitted beyond the LLM call.
--debugdumps the full redacted prompt to stderr. Do not use in environments where stderr is captured in shared logs.
# GitHub Actions example
- name: Check specification
env:
SPECCRITIC_LLM_PROVIDER: anthropic
SPECCRITIC_LLM_MODEL: claude-sonnet-4-20250514
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
speccritic check SPEC.md \
--offline \
--fail-on INVALID \
--severity-threshold warn \
--out spec-review.jsonThe --offline flag ensures the run fails immediately (exit 3) if SPECCRITIC_LLM_PROVIDER and SPECCRITIC_LLM_MODEL are not set, preventing accidental use of the default model in CI.
For a credentials-free deterministic gate:
- name: Preflight specification
run: |
speccritic check SPEC.md \
--preflight-mode only \
--fail-on INVALID# Run all tests
make test
# Run a specific test
go test ./cmd/speccritic/... -run TestRunCheck_BadSpec_INVALID -v
# Build CLI and web binaries
make build-all
# Run the local web UI
make run-web
# Code review (staged changes)
prism review stagedSee WORKFLOW.md for a detailed guide on integrating SpecCritic into an agentic coding system (Claude Code, Cursor, or any LLM-based agent), including:
- The canonical spec → plan → implement gate order
- How to parse JSON output and route on verdict
- Handling questions (user decisions) vs. issues (agent-fixable)
CLAUDE.mdsnippet, pre-commit hook, and GitHub Actions CI job- Anti-patterns and a full example agent session
A ready-to-install Claude Code skill lives in examples/claude-code-skill/. It teaches Claude Code when to invoke speccritic, how to parse .speccritic-review.json, and how to route CRITICAL issues (fix in place) vs. CRITICAL questions (ask the user). Install with:
mkdir -p ~/.claude/skills/speccritic
cp examples/claude-code-skill/SKILL.md ~/.claude/skills/speccritic/SKILL.mdSee the skill README for project-level install, prerequisites, and customization.
MIT — see LICENSE
