Skip to content

klavis MCP#312

Merged
benjibc merged 2 commits intomainfrom
update_klavis_example
Nov 4, 2025
Merged

klavis MCP#312
benjibc merged 2 commits intomainfrom
update_klavis_example

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Nov 4, 2025

Note

Refactors the MCP pytest to async and dataset-driven with LLM-based JSON-schema grading, adds a Gmail inbox dataset, and tidies the MCP config.

  • Tests:
    • Refactor tests/pytest/test_pytest_klavis_mcp.py to async and dataset-driven (input_dataset), replacing inline messages.
    • Use AsyncOpenAI (Fireworks endpoint) to grade final output vs ground_truth via JSON Schema (ResponseFormat) and set EvaluateResult from the parsed score.
  • Data:
    • Add tests/pytest/datasets/gmail_inbox.jsonl with Gmail prompt and ground_truth.
  • Config:
    • Tidy tests/pytest/mcp_configurations/klavis_strata_mcp.json structure for klavis-strata MCP server.

Written by Cursor Bugbot for commit e2d8427. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@benjibc benjibc merged commit 2480071 into main Nov 4, 2025
9 checks passed
@benjibc benjibc deleted the update_klavis_example branch November 4, 2025 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant