Skip to content

test: add knowledge-bot evolution cases#79

Open
upsearchmain wants to merge 1 commit into
NousResearch:mainfrom
upsearchmain:codex/T-E5-1-04-knowledge-bot-cases
Open

test: add knowledge-bot evolution cases#79
upsearchmain wants to merge 1 commit into
NousResearch:mainfrom
upsearchmain:codex/T-E5-1-04-knowledge-bot-cases

Conversation

@upsearchmain
Copy link
Copy Markdown

Summary

  • Add cases/knowledge-bot/cases.jsonl with 30 benchmark cases.
  • Distribution: book-deep 10, rule-extract 10, decision-30day-review 10.
  • Keep a consistent JSONL shape with id, skill, category, input, expected, and judge.

Verification

test -f ~/hermes-evolution/cases/knowledge-bot/cases.jsonl && echo exists
# exists

wc -l ~/hermes-evolution/cases/knowledge-bot/cases.jsonl
# 30 /home/claude/hermes-evolution/cases/knowledge-bot/cases.jsonl

/usr/bin/python3 -c "import json, pathlib; [json.loads(l) for l in pathlib.Path('$HOME/hermes-evolution/cases/knowledge-bot/cases.jsonl').read_text().splitlines()]" && echo OK
# OK

git diff --check origin/main...HEAD
# exit 0

Additional distribution check:

/usr/bin/python3 -c "import collections,json,pathlib; rows=[json.loads(l) for l in pathlib.Path('$HOME/hermes-evolution/cases/knowledge-bot/cases.jsonl').read_text().splitlines()]; print(len(rows)); print(dict(sorted(collections.Counter(r['category'] for r in rows).items()))); assert len(rows)>=30; assert collections.Counter(r['category'] for r in rows)=={'book-deep':10,'rule-extract':10,'decision-30day-review':10}; assert all(set(['id','input','expected','judge']).issubset(r) for r in rows)"
# 30
# {'book-deep': 10, 'decision-30day-review': 10, 'rule-extract': 10}

Self-Review

  • P1 Think Before Coding: Read AGENTS, task, v5 PART 11/PART 13, book-deep, edge-cases, and acceptance section 7 before editing.
  • P2 Simplicity First: Data-only JSONL change; no generator, package install, LLM/API calls, or extra schema files.
  • P3 Surgical Changes: Only cases/knowledge-bot/cases.jsonl is added in this repo.
  • P4 Goal-Driven Execution: Existence, line count, JSON parse, whitespace, shape, and category distribution were verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant