feat: semantic prompt quality gates (#20)

Wreos · lozhkovoi · web-flow · commit da9097e1ae71 · 2026-02-21T13:38:17.000-08:00
* feat: standardize agent/skill outputs and add validation runner

* feat: add semantic prompt quality gates

* fix: install ripgrep for semantic quality checks

---------

Co-authored-by: Aleksandr Lozhkovoi &lt;aleksandr.lozhkovoi@enpal.de&gt;
diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "flutter-cursor-plugin",
   "displayName": "Flutter Cursor Plugin",
-  "version": "1.10.3",
+  "version": "1.10.4",
   "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",
   "author": {
     "name": "Aleksandr Lozhkovoi",
diff --git a/.github/workflows/semantic-quality.yml b/.github/workflows/semantic-quality.yml
@@ -0,0 +1,26 @@
+name: Semantic Prompt Quality
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  semantic-quality:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Ensure ripgrep
+        run: |
+          if ! command -v rg >/dev/null 2>&1; then
+            sudo apt-get update
+            sudo apt-get install -y ripgrep
+          fi
+
+      - name: Validate agent/skill structure
+        run: bash scripts/validate_agents_skills.sh
+
+      - name: Validate prompt semantics
+        run: bash scripts/validate_prompt_semantics.sh
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,11 @@
 - Added reference example repository for project structure and tests:
   - https://github.com/Wreos/flutter-cursor-plugin-example
 - Added pre-release enable guide (`docs/pre-release-enable-plugin.md`) with repository install and manual workspace settings options.
+- Added semantic prompt quality gates:
+  - `scripts/validate_prompt_semantics.sh`
+  - `.github/workflows/semantic-quality.yml`
+  - `docs/semantic-quality-gates.md`
+- Updated scaffold architecture guidance to prefer existing project state-management convention before selecting a pattern.
 
 ## 1.10.0
 
diff --git a/README.md b/README.md
@@ -80,6 +80,7 @@ Reference project layout:
 - **Reference Flutter app layout**: https://github.com/Wreos/flutter-cursor-plugin-example
 - **Prompt guardrails**: `docs/prompt-execution-guardrails.md`.
 - **Validation matrix**: `docs/validation-matrix.md`.
+- **Semantic quality gates**: `docs/semantic-quality-gates.md`.
 - **Agents**
   - `flutter-app-builder` (general Flutter implementation)
   - `flutter-code-reviewer`
diff --git a/agents/flutter-code-reviewer.md b/agents/flutter-code-reviewer.md
@@ -28,4 +28,5 @@ Dedicated agent for code review and conventions.
 1. Findings first, ordered by severity.
 2. File references for each finding.
 3. Security findings included explicitly.
-4. Residual risks/testing gaps summary.
+4. Validation evidence (commands/scans/checks performed).
+5. Residual risks/testing gaps summary.
diff --git a/agents/flutter-test-writer.md b/agents/flutter-test-writer.md
@@ -30,5 +30,5 @@ Main router for Flutter test tasks.
 
 1. Test type selected (widget/bloc/integration) and reason.
 2. Files changed and template used.
-3. Test commands run and pass/fail result.
+3. Validation commands run and pass/fail result.
 4. Remaining coverage gaps.
diff --git a/docs/semantic-quality-gates.md b/docs/semantic-quality-gates.md
@@ -0,0 +1,21 @@
+# Semantic Prompt Quality Gates
+
+This repository validates prompt quality on two levels:
+
+1. Structural checks (`scripts/validate_agents_skills.sh`):
+   - required sections exist in agents and skills.
+2. Semantic checks (`scripts/validate_prompt_semantics.sh`):
+   - canonical commands include guardrails and validation references.
+   - skills include workflow, output contract, and guardrails/scope limits.
+   - agents require validation evidence in output expectations.
+   - active Flutter rules remain project-first for state management.
+   - no blanket prohibition of Riverpod/Bloc/GetX in active rules.
+
+Run locally:
+
+```bash
+bash scripts/validate_agents_skills.sh
+bash scripts/validate_prompt_semantics.sh
+```
+
+CI runs both checks in `.github/workflows/semantic-quality.yml`.
diff --git a/plugin.json b/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "flutter-cursor-plugin",
   "displayName": "Flutter Cursor Plugin",
-  "version": "1.10.3",
+  "version": "1.10.4",
   "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",
   "author": "Aleksandr Lozhkovoi",
   "license": "MIT",
diff --git a/scripts/validate_prompt_semantics.sh b/scripts/validate_prompt_semantics.sh
@@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+repo_root="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+cd "${repo_root}"
+
+pass_count=0
+fail_count=0
+
+check() {
+  local name="$1"
+  local cmd="$2"
+  if eval "${cmd}" >/dev/null 2>&1; then
+    echo "PASS | ${name}"
+    pass_count=$((pass_count + 1))
+  else
+    echo "FAIL | ${name}"
+    fail_count=$((fail_count + 1))
+  fi
+}
+
+canonical_commands=(
+  "commands/implement-flutter-feature.md"
+  "commands/implement-figma-screen.md"
+  "commands/generate-flutter-tests.md"
+  "commands/review-flutter-code.md"
+  "commands/security-review.md"
+  "commands/update-flutter-dependencies.md"
+  "commands/resolve-flutter-build-error.md"
+  "commands/prepare-mobile-release.md"
+  "commands/integrate-firebase.md"
+  "commands/migrate-flutter-code.md"
+  "commands/scaffold-flutter-feature.md"
+  "commands/setup-mobile-github-pipeline.md"
+  "commands/sync-official-flutter-ai-rules.md"
+  "commands/write-widget-test.md"
+  "commands/write-bloc-test.md"
+  "commands/write-e2e-test.md"
+)
+
+for cmd_file in "${canonical_commands[@]}"; do
+  base="$(basename "${cmd_file}")"
+  check "C-${base} has guardrails section" "rg -q '^Preconditions and guardrails:' '${cmd_file}'"
+  check "C-${base} references prompt guardrails doc" "rg -q 'prompt-execution-guardrails\\.md' '${cmd_file}'"
+  check "C-${base} references validation matrix" "rg -q 'validation-matrix\\.md' '${cmd_file}'"
+done
+
+for skill in skills/*/SKILL.md; do
+  base="$(basename "$(dirname "${skill}")")"
+  check "S-${base} has workflow" "rg -q '^## Workflow' '${skill}'"
+  check "S-${base} has output contract" "rg -q '^## (Required output|Output format)' '${skill}'"
+  check "S-${base} has guardrails or scope limits" "rg -q '^## (Guardrails|Scope guardrails|Quality defaults)' '${skill}'"
+done
+
+for agent in agents/*.md; do
+  base="$(basename "${agent}")"
+  check "A-${base} has output expectations" "rg -q '^## Output expectations' '${agent}'"
+  check "A-${base} expects validation evidence" "rg -qi '(validation|commands and results|evidence)' '${agent}'"
+done
+
+check "Active Flutter rules are project-first for state management" "rg -q '\\* \\*\\*Project First:\\*\\* Follow the existing project architecture and state-management choice\\.' rules/flutter-official-ai-rules.mdc"
+check "Active Flutter rules do not prohibit Riverpod/Bloc/GetX outright" "! rg -q 'Prohibited:\\*\\* NO Riverpod, Bloc, GetX unless explicitly requested\\.' rules/flutter-official-ai-rules.mdc"
+check "Scaffold architecture skill enforces project-first state-management selection" "rg -q 'existing project state-management convention' skills/scaffold-flutter-architecture/SKILL.md"
+
+total=$((pass_count + fail_count))
+echo
+echo "SUMMARY | total=${total} passed=${pass_count} failed=${fail_count}"
+
+if [ "${fail_count}" -gt 0 ]; then
+  exit 1
+fi
diff --git a/skills/debug-flutter-issues/SKILL.md b/skills/debug-flutter-issues/SKILL.md
@@ -19,6 +19,12 @@ Use for compiler/build/runtime failures.
 4. Apply smallest deterministic fix.
 5. Validate with rerun and impacted tests.
 
+## Guardrails
+
+- Do not propose a fix without a reproducible command or clear log evidence.
+- Keep fixes minimal and limited to the failing layer unless a cross-layer root cause is proven.
+- Call out unknowns explicitly instead of guessing when logs are incomplete.
+
 ## Output format
 
 - Root cause.
diff --git a/skills/review-flutter-code/SKILL.md b/skills/review-flutter-code/SKILL.md
@@ -27,6 +27,12 @@ Use for PR/diff/code review requests.
 - MASVS-PLATFORM: Android/iOS platform integration risks and exported surfaces.
 - MASVS-CODE: unsafe code patterns and hardcoded secrets.
 
+## Guardrails
+
+- Do not provide a deep review without explicit target scope (PR diff, range, or file list).
+- Tie each finding to concrete code evidence and expected behavioral impact.
+- Keep findings prioritized by severity and user risk, not by style preference.
+
 ## Output format
 
 - Findings first, ordered by severity.
diff --git a/skills/scaffold-flutter-architecture/SKILL.md b/skills/scaffold-flutter-architecture/SKILL.md
@@ -9,7 +9,7 @@ Use for new feature/module boilerplate generation.
 
 ## Workflow
 
-1. Confirm target feature name and selected pattern (BLoC, Riverpod, or clean layered default).
+1. Confirm target feature name and existing project state-management convention first (BLoC, Riverpod, Cubit, GetX, ValueNotifier, or clean layered default if none).
 2. Create folder structure with presentation/domain/data boundaries.
 3. Add state-management entry points (cubit/bloc/provider/notifier).
 4. Add repository and data source contracts.
diff --git a/skills/security-audit/SKILL.md b/skills/security-audit/SKILL.md
@@ -39,6 +39,12 @@ Every code review must include this security pass.
 - MASVS-CODE
 - MASVS-RESILIENCE (where applicable)
 
+## Guardrails
+
+- Do not claim scanner coverage that was not actually executed.
+- Keep findings actionable by filtering out non-exploitable or low-signal noise.
+- If scope is missing, stop and request it before running a full security assessment.
+
 ## Output format
 
 - Findings first (highest severity first).

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "flutter-cursor-plugin",`
`3`	`3`	`"displayName": "Flutter Cursor Plugin",`
`4`		`- "version": "1.10.3",`
	`4`	`+ "version": "1.10.4",`
`5`	`5`	`"description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",`
`6`	`6`	`"author": {`
`7`	`7`	`"name": "Aleksandr Lozhkovoi",`