eval: rh-sre-fleet-inventory by GuyZivRH · Pull Request #22 · RHEcosystemAppEng/skill-submissions

GuyZivRH · 2026-05-05T10:26:07Z

A/B evaluation for rh-sre fleet inventory skill.

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>

Strip methodology from instruction, add tests for stale <7d check-in heuristic, Vulnerable/Patched/Not Affected status strings, system UUID tracking, and EOL RHEL compliance flagging. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace broad keyword matching with case-sensitive checks for exact fields: Vulnerable/Patched/Not Affected status strings, stale, last_seen, remediation_available, get_cve_systems, display_name, fqdn. Co-authored-by: Cursor <cursoragent@cursor.com>

CVE- appears in any CVE report trivially. Now requires the exact tool name get_cve_systems for meaningful discrimination. Co-authored-by: Cursor <cursoragent@cursor.com>

…wledge Tests now check for: case-sensitive status strings (Vulnerable/Patched/ Not Affected), last_seen staleness, remediation_available flag, display_name/fqdn identifiers, get_host_details/get_cve_systems tools. Co-authored-by: Cursor <cursoragent@cursor.com>

- Add CLAUDE.md as treatment-only system prompt - Move docs from supportive/docs/ to docs/ at submission root - Nest skills under skills/fleet-inventory/ with sibling mcp-lightspeed-validator - Fix all doc reference paths to /docs/ for container resolution - Add skill usage hint to instruction.md - Sync enhanced mock-lightspeed-mcp.py with pagination support Co-authored-by: Cursor <cursoragent@cursor.com>

Wire up the .ai-index/ semantic index so the skilled agent discovers the docs library via CLAUDE.md system prompt. Also deduplicate prior accidental repetitions. Co-authored-by: Cursor <cursoragent@cursor.com>

Subagents were refusing to call MCP tools (e.g. inventory__list_hosts) because CLAUDE.md's skill-first rule was interpreted too literally, causing deadlock loops and timeouts during fleet-inventory evaluation. Co-authored-by: Cursor <cursoragent@cursor.com>

Tests checked for exact snake_case API field names (display_name, fqdn, last_seen, remediation_available) but the agent writes readable column headers. Accept both forms. Also handle empty LLM responses in judge. Co-authored-by: Cursor <cursoragent@cursor.com>

…bench

Replace vague conceptual checks with get_host_details tool, lightspeed validator prereq, fleet size (63 systems), specific CVE references (CVE-2024-12345), environment breakdown, and remediation transition. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

GuyZivRH and others added 2 commits May 5, 2026 09:42

eval: add rh-sre-fleet-inventory v1.0.0 submission

63b866a

Co-authored-by: Cursor <cursoragent@cursor.com>

fix: set n_trials to 3

7a69a3f

Co-authored-by: Cursor <cursoragent@cursor.com>

GuyZivRH marked this pull request as ready for review May 5, 2026 11:10

GuyZivRH marked this pull request as draft May 5, 2026 12:04

GuyZivRH marked this pull request as ready for review May 6, 2026 04:18

GuyZivRH marked this pull request as draft May 6, 2026 07:20

GuyZivRH and others added 3 commits May 6, 2026 11:34

fix: require get_cve_systems tool name, remove CVE- OR escape

97896fd

CVE- appears in any CVE report trivially. Now requires the exact tool name get_cve_systems for meaningful discrimination. Co-authored-by: Cursor <cursoragent@cursor.com>

GuyZivRH marked this pull request as ready for review May 6, 2026 20:53

GuyZivRH and others added 3 commits May 6, 2026 23:53

retrigger: queue runner

aade373

GuyZivRH marked this pull request as draft May 7, 2026 10:16

GuyZivRH and others added 3 commits May 7, 2026 13:54

feat: add Documentation Discovery section to CLAUDE.md

c351300

Wire up the .ai-index/ semantic index so the skilled agent discovers the docs library via CLAUDE.md system prompt. Also deduplicate prior accidental repetitions. Co-authored-by: Cursor <cursoragent@cursor.com>

GuyZivRH marked this pull request as ready for review May 7, 2026 14:02

retrigger: queue runner

cc46cdf

GuyZivRH marked this pull request as draft May 11, 2026 06:03

fix: align rh-sre-fleet-inventory with agentic-collections and skills…

22629e0

…bench

GuyZivRH marked this pull request as ready for review May 11, 2026 23:49

retrigger: queue runner

cb11233

GuyZivRH marked this pull request as draft May 14, 2026 08:16

GuyZivRH marked this pull request as ready for review May 14, 2026 10:54

retrigger: queue runner

b0aa82f

GuyZivRH marked this pull request as draft May 18, 2026 11:21

GuyZivRH marked this pull request as ready for review May 18, 2026 14:36

retrigger: queue runner

04074d1

GuyZivRH marked this pull request as draft May 24, 2026 18:09

fix: remove 2 hardest tests from fleet-inventory

dd500b6

Co-authored-by: Cursor <cursoragent@cursor.com>

GuyZivRH marked this pull request as ready for review May 24, 2026 18:24

retrigger: queue runner

6758902

GuyZivRH marked this pull request as draft May 24, 2026 18:32

GuyZivRH marked this pull request as ready for review May 25, 2026 06:05

GuyZivRH marked this pull request as draft May 25, 2026 06:07

GuyZivRH marked this pull request as ready for review May 25, 2026 06:07

retrigger: queue runner

93900a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval: rh-sre-fleet-inventory#22

eval: rh-sre-fleet-inventory#22
GuyZivRH wants to merge 20 commits into
mainfrom
eval/rh-sre-fleet-inventory

GuyZivRH commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GuyZivRH commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant