Skip to content

Release - New agent demo#16

Merged
Ajayvardhanreddy merged 2 commits into
mainfrom
dev
May 25, 2026
Merged

Release - New agent demo#16
Ajayvardhanreddy merged 2 commits into
mainfrom
dev

Conversation

@Ajayvardhanreddy
Copy link
Copy Markdown
Owner

No description provided.

Ajayvardhanreddy and others added 2 commits May 24, 2026 20:42
… eval cases

Tools (demos/engineering_agent/tools.py):
  code_search       — find symbols/patterns across a codebase (by keyword, returns file+line+snippet)
  file_read         — read a specific file, optional line range
  pr_review         — fetch PR issues by severity/category (security, perf, style, correctness)
  dependency_check  — outdated versions + CVE lookup per repo/package manager

Mock data covers realistic scenarios:
  authenticate_user defined in src/auth/service.py, used across 3 files
  SQL injection patterns in src/db/queries.py (raw f-strings)
  PR-42: critical timing-attack + missing rate limit
  PR-99: docs-only, clean
  PR-17: DB refactor, 2 warnings
  backend/pip: 4 outdated, cryptography CRITICAL CVE + requests HIGH CVE
  frontend/npm: lodash HIGH + axios MEDIUM CVEs

5 scenarios: find_function, review_pr, dependency_audit, inspect_file, security_investigation

10 eval cases (evals/dataset/engineering_cases.py):
  eng_001–003: code_search (definition, usage, SQL injection)
  eng_004–005: file_read (existing file, not-found graceful handling)
  eng_006–008: pr_review (critical PR, clean PR, warning-only PR)
  eng_009: dependency_check (must surface critical CVE)
  eng_010: multi-step code_search + file_read

Makefile: make eval-engineering, make eval-all, make demo-eng now active.
134 tests pass.
…-agent

feat(phase-5): engineering assistant agent — 4 tools, 5 scenarios, 10 eval cases
@Ajayvardhanreddy Ajayvardhanreddy merged commit cdd0f6c into main May 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant