feat: improve semantic response quality with truthfulness guardrails (#22)

Wreos · web-flow · commit 6d5442b34cfb · 2026-02-22T06:48:37.000+01:00
diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "flutter-cursor-plugin",
   "displayName": "Flutter Cursor Plugin",
-  "version": "1.10.5",
+  "version": "1.10.6",
   "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",
   "author": {
     "name": "Aleksandr Lozhkovoi",
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -29,6 +29,10 @@
   - command: `commands/setup-flutter-environment.md`
   - skill: `skills/setup-flutter-environment/SKILL.md`
 - Simplified command prompts by removing repeated guardrails boilerplate from canonical command files.
+- Strengthened semantic output quality across agents/skills:
+  - explicit truthfulness policy (`planned/not executed` wording when no command evidence exists)
+  - required missing-inputs/assumptions notes for partial context
+  - required next steps and confidence/residual risk coverage in output contracts
 
 ## 1.10.0
 
diff --git a/agents/flutter-app-builder.md b/agents/flutter-app-builder.md
@@ -31,9 +31,17 @@ Primary agent for Flutter feature development.
 - Add/update tests proportionally to behavior changes.
 - Prefer incremental, reviewable changes over large rewrites.
 
+## Semantic quality defaults
+
+- Never claim commands were executed if no command output is available.
+- If context is missing, explicitly list the missing inputs before proposing deep changes.
+- Separate confirmed facts from assumptions.
+- End with 1-3 concrete next steps for the user.
+
 ## Output expectations
 
 1. Selected route/skill and reason.
 2. Scope and files touched.
 3. Validation commands and results.
 4. Risks or follow-up steps.
+5. Missing inputs or assumptions (if any).
diff --git a/agents/flutter-code-reviewer.md b/agents/flutter-code-reviewer.md
@@ -23,10 +23,17 @@ Dedicated agent for code review and conventions.
 - Test gaps and brittle assertions.
 - Accessibility and localization risks.
 
+## Semantic quality defaults
+
+- If review scope is missing, request it before deep findings.
+- Mark each finding as confirmed from evidence vs inferred from limited context.
+- Never imply security scans were executed unless command output is available.
+
 ## Output expectations
 
 1. Findings first, ordered by severity.
 2. File references for each finding.
 3. Security findings included explicitly.
 4. Validation evidence (commands/scans/checks performed).
 5. Residual risks/testing gaps summary.
+6. Confidence/assumption note when evidence is partial.
diff --git a/agents/flutter-mobile-release-manager.md b/agents/flutter-mobile-release-manager.md
@@ -19,9 +19,16 @@ Dedicated agent for mobile app publishing readiness.
 - iOS App Store-ready archive and signing checks.
 - Versioning, release notes, privacy declarations, and submission gating.
 
+## Semantic quality defaults
+
+- Do not mark a platform "ready" without explicit build/check evidence.
+- If evidence is missing, return `BLOCKED` and list exact data needed.
+- Keep blockers actionable and ordered by release impact.
+
 ## Output expectations
 
 1. Android readiness status.
 2. iOS readiness status.
 3. Validation evidence (commands/artifacts/checklists).
 4. Blocking issues before submission.
+5. Next unblock steps.
diff --git a/agents/flutter-test-writer.md b/agents/flutter-test-writer.md
@@ -26,9 +26,16 @@ Main router for Flutter test tasks.
 - For Patrol E2E tests, cover critical user journeys only (slow lane), keep unit/widget tests as fast lane.
 - Run only impacted tests before finishing.
 
+## Semantic quality defaults
+
+- Do not present pseudo-code as "implemented tests" unless files/patches were actually created.
+- If repository context is missing, provide a minimal test scaffold and explicitly mark assumptions.
+- Always include remaining coverage gaps, not only happy path suggestions.
+
 ## Output expectations
 
 1. Test type selected (widget/bloc/integration) and reason.
 2. Files changed and template used.
 3. Validation commands run and pass/fail result.
 4. Remaining coverage gaps.
+5. Next test step for the user.
diff --git a/plugin.json b/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "flutter-cursor-plugin",
   "displayName": "Flutter Cursor Plugin",
-  "version": "1.10.5",
+  "version": "1.10.6",
   "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",
   "author": "Aleksandr Lozhkovoi",
   "license": "MIT",
diff --git a/rules/flutter-plugin-policy-priority.mdc b/rules/flutter-plugin-policy-priority.mdc
@@ -15,6 +15,12 @@ This file is the high-priority policy layer for this plugin.
 - Conflict rule: if official guidance conflicts with project policy, project policy wins.
 - Do not patch synced official files to enforce project policy.
 
+## Truthfulness policy
+
+- Never state or imply that actions are completed without command output or concrete file diff evidence.
+- In planning/simulation mode, use explicit wording: `planned`, `expected`, `not executed`.
+- If evidence is missing, return status as `PENDING` or `BLOCKED` instead of `DONE`.
+
 ## Architecture and state-management policy
 
 - Project First: follow the existing project architecture and state-management choice.
diff --git a/skills/build-flutter-features/SKILL.md b/skills/build-flutter-features/SKILL.md
@@ -31,13 +31,15 @@ Use this skill for non-test Flutter development tasks.
 
 - Restrict changes to the requested feature/module unless explicitly expanded.
 - Do not mix unrelated refactors with feature delivery.
+- Do not claim implementation is complete unless concrete file changes or command outputs are provided.
 
 ## Required output
 
 1. Goal + scope summary.
 2. Files changed by layer (presentation/domain/data).
 3. Validation commands run and results.
 4. Residual risks or follow-up TODOs.
+5. Missing inputs/assumptions (if context is incomplete).
 
 ## Required references
 
diff --git a/skills/debug-flutter-issues/SKILL.md b/skills/debug-flutter-issues/SKILL.md
@@ -24,6 +24,7 @@ Use for compiler/build/runtime failures.
 - Do not propose a fix without a reproducible command or clear log evidence.
 - Keep fixes minimal and limited to the failing layer unless a cross-layer root cause is proven.
 - Call out unknowns explicitly instead of guessing when logs are incomplete.
+- Include one preventive follow-up even when the fix is minimal.
 
 ## Output format
 
diff --git a/skills/integrate-firebase/SKILL.md b/skills/integrate-firebase/SKILL.md
@@ -28,6 +28,8 @@ Use this skill for end-to-end Firebase integration in Flutter apps.
 - Keep service wrappers injectable and testable.
 - Add error handling and fallback behavior for remote dependencies.
 - Validate behavior in both debug and release-capable builds.
+- Do not claim Android/iOS integration is complete without naming changed config files.
+- In simulation/planning mode, never use `integrated/completed`; use `planned/not executed`.
 
 ## Required output
 
diff --git a/skills/migrate-flutter-code/SKILL.md b/skills/migrate-flutter-code/SKILL.md
@@ -21,10 +21,12 @@ Use for framework/API/state-management migrations.
 - Do not mix unrelated refactors with migration work.
 - Keep intermediate states buildable when possible.
 - Prefer codemod-like repetitive edits over ad hoc changes.
+- Attach validation status to each migration batch.
 
 ## Required output
 
 1. Migration target and acceptance criteria.
 2. Batch-by-batch changes summary.
 3. Validation commands/results per batch.
 4. Breaking changes and rollback notes.
+5. Next batch recommendation.
diff --git a/skills/release-mobile-apps/SKILL.md b/skills/release-mobile-apps/SKILL.md
@@ -32,13 +32,15 @@ Use this skill for Android/iOS store publishing preparation.
 - Do not mark release ready without artifact build evidence.
 - Keep Android/iOS signing and versioning checks explicit.
 - Flag missing compliance metadata as blockers, not warnings.
+- When evidence is missing, return `BLOCKED` instead of speculative readiness.
 
 ## Required output
 
 1. Android readiness status (+ artifact path if built).
 2. iOS readiness status (+ artifact/archive status).
 3. Validation commands run and outcomes.
 4. Blocking gaps before submission.
+5. Immediate next actions to unblock release.
 
 ## Required references
 
diff --git a/skills/review-flutter-code/SKILL.md b/skills/review-flutter-code/SKILL.md
@@ -32,12 +32,15 @@ Use for PR/diff/code review requests.
 - Do not provide a deep review without explicit target scope (PR diff, range, or file list).
 - Tie each finding to concrete code evidence and expected behavioral impact.
 - Keep findings prioritized by severity and user risk, not by style preference.
+- Distinguish confirmed findings from inferred risks when evidence is partial.
+- Do not claim scans/commands were run without output evidence.
 
 ## Output format
 
 - Findings first, ordered by severity.
 - File references for each finding.
 - Brief residual risk/testing gap summary.
+- Confidence/assumption note when applicable.
 
 ## Required references
 
diff --git a/skills/setup-flutter-environment/SKILL.md b/skills/setup-flutter-environment/SKILL.md
@@ -31,6 +31,7 @@ Use this skill when a project needs a clean, reproducible Flutter setup before i
 - Do not claim setup is complete while `flutter doctor` still has unresolved blockers for requested target platforms.
 - Keep setup changes minimal and reversible; avoid unrelated dependency upgrades.
 - If a required platform is out of scope (for example iOS on a non-iOS task), report it explicitly instead of forcing changes.
+- Do not say `done/completed` without command evidence.
 
 ## Required output
 
diff --git a/skills/sync-official-flutter-ai-rules/SKILL.md b/skills/sync-official-flutter-ai-rules/SKILL.md
@@ -31,6 +31,8 @@ Use this workflow to keep plugin guidance aligned with upstream Flutter AI rules
 - Do not enforce plugin policy by patching official content after sync.
 - Use `rules/flutter-plugin-policy-priority.mdc` for higher-priority policy and conflict resolution.
 - Prefer `4k` unless there is a clear reason to switch to `10k` or `1k`.
+- Do not claim sync completed unless command output is available.
+- In simulation/planning mode, status must be `PENDING` and include `not executed` note.
 
 ## Required output
 
diff --git a/skills/update-flutter-dependencies/SKILL.md b/skills/update-flutter-dependencies/SKILL.md
@@ -39,6 +39,7 @@ Use this skill for SDK and package upgrades that must stay stable and reviewable
 - If failures cascade, split into two PRs:
   - Flutter SDK upgrade
   - package upgrade and fixes
+- Always include before/after version snapshot and explicit rollback trigger.
 
 ## Required output
 
@@ -47,3 +48,4 @@ Use this skill for SDK and package upgrades that must stay stable and reviewable
 3. Validation commands run and their result.
 4. Files changed for compatibility fixes.
 5. Rollback instructions.
+6. Known remaining risks after upgrade.
diff --git a/skills/write-flutter-tests/SKILL.md b/skills/write-flutter-tests/SKILL.md
@@ -46,10 +46,12 @@ Use this skill as the single entry point for Flutter test work.
 - Prefer deterministic tests over time-dependent assertions.
 - Keep test setup local unless shared helpers already exist.
 - Avoid broad snapshot/golden assertions unless explicitly requested.
+- Do not present sample test snippets as completed repository changes without file-level confirmation.
 
 ## Required output
 
 1. Test type selected and why.
 2. Files created/updated.
 3. Test commands run and results.
 4. Flakiness risks or missing coverage notes.
+5. Next test to add (single highest-value gap).

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "flutter-cursor-plugin",`
`3`	`3`	`"displayName": "Flutter Cursor Plugin",`
`4`		`- "version": "1.10.5",`
	`4`	`+ "version": "1.10.6",`
`5`	`5`	`"description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.",`
`6`	`6`	`"author": {`
`7`	`7`	`"name": "Aleksandr Lozhkovoi",`