Fix at least ten type errors by benjibc · Pull Request #150 · eval-protocol/python-sdk

benjibc · 2025-09-01T14:52:42Z

name: Pull Request
about: Propose changes to the codebase
title: "Fix: Resolve multiple type errors across the codebase"
labels: ''
assignees: ''

Description

This pull request addresses and resolves at least 10 type errors identified across various files in the codebase. The primary motivations are to improve code robustness, maintainability, and ensure stricter adherence to type hints.

Key changes include:

ep.rollout Asynchronous Signature Update: The ep.rollout function in eval_protocol/mcp_env.py has been updated to be an async function and now directly returns List[EvaluationRow]. All call sites have been adjusted to await its execution and handle the direct list return.
LangGraphRolloutProcessor Awaitable Handling: Modified eval_protocol/pytest/default_langchain_rollout_processor.py to correctly handle callables that may or may not return awaitable objects, preventing "object is not awaitable" errors.
SimulationServerBase Typing Enhancements: Addressed type issues in eval_protocol/mcp/simulation_server.py by explicitly typing the set_logging_level parameter, adding an optional create_environment_with_seed hook, and clarifying AnyUrl usage.
EvaluationPipeline Type Refinements: Improved typing in eval_protocol/execution/pipeline.py by asserting self.model_client before use and providing more precise type hints for asyncio.gather results and list appends.
Benchmark Test Data Structure Alignment: Updated various benchmark test files (eval_protocol/benchmarks/test_tau_bench_airline.py, test_tau_bench_retail.py, eval_protocol/mcp_servers/tau2/tests/test_tau2_e2e.py, tests/pytest/test_tau_bench_airline.py) to correctly instantiate Task, ToolCall, and ToolMessage objects with all required and optional fields, such as requestor, env_assertions, persona, description, ticket, and initial_state.

These changes collectively reduce the number of reported type errors, making the codebase more reliable and easier to reason about.

Fixes # (issue)
Implements # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Refactoring/Code cleanup
Build/CI/CD related changes
Other (please describe):

How Has This Been Tested?

The changes were developed based on static analysis of type checker outputs and manual code inspection.

Test A
Test B

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

To verify these changes, please run the project's type checker (e.g., make pre-commit if configured, or mypy/pyright directly) in your local environment.

Checklist:

My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Screenshots (if applicable)

Additional context

Due to limitations in the development environment, I was unable to run the type checker locally to confirm the exact reduction in error count. However, the changes directly address the reported type errors based on their descriptions and code context. Running the project's type checker after merging should reflect the intended error reduction.

Co-authored-by: bchen <bchen@fireworks.ai>

cursor · 2025-09-01T14:52:43Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Update async rollout, add requestor, and improve type handling

0c0abf2

Co-authored-by: bchen <bchen@fireworks.ai>

reformat

11fb547

benjibc marked this pull request as ready for review September 1, 2025 22:44

benjibc merged commit dcf7b0e into main Sep 1, 2025
12 of 14 checks passed

benjibc deleted the cursor/fix-at-least-ten-type-errors-9836 branch September 1, 2025 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix at least ten type errors#150

Fix at least ten type errors#150
benjibc merged 2 commits intomainfrom
cursor/fix-at-least-ten-type-errors-9836

benjibc commented Sep 1, 2025

Uh oh!

cursor bot commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benjibc commented Sep 1, 2025

Description

Type of change

How Has This Been Tested?

Checklist:

Screenshots (if applicable)

Additional context

Uh oh!

cursor bot commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants