Conversation
a8a8a9a to
1cd89b6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
| from eval_protocol.models import EvaluationRow, Message | ||
| from {module} import {func} as _ep_test | ||
|
|
||
| def evaluate(messages: List[Dict[str, Any]], ground_truth: Optional[Union[str, List[Dict[str, Any]]]] = None, tools=None, **kwargs): |
There was a problem hiding this comment.
Package test code in uploaded evaluator
The generated TS-mode snippet imports the evaluation test from the user’s module (from {module} import {func}) but create_evaluation only uploads this single file as python_code_to_evaluate. When the evaluator runs on Fireworks, the referenced module is not present in that environment, so every uploaded evaluator fails immediately with ModuleNotFoundError unless the user’s entire project is already installed remotely. The upload command needs to embed the test source (e.g., via inspect.getsource) or package the module alongside the snippet.
Useful? React with 👍 / 👎.
| self._model_base_url = model_base_url | ||
| if os.getenv("EP_REMOTE_ROLLOUT_PROCESSOR_BASE_URL"): | ||
| self._remote_base_url = os.getenv("EP_REMOTE_ROLLOUT_PROCESSOR_BASE_URL") | ||
| self._model_base_url = model_base_url |
There was a problem hiding this comment.
do we need option to overwrite this by env var as well? i guess no right? (cuz its gonna be fixed to something like https://api.fireworks.ai/inference/v1/chat/completions)
There was a problem hiding this comment.
oh seems like we need the overwrite option (overwrite to https://tracing.fireworks.ai/project_id/xxxxxx)
| return code, file_name, qualname | ||
|
|
||
|
|
||
| def _generate_ts_mode_code(test: DiscoveredTest) -> tuple[str, str]: |
| try: | ||
| result = create_evaluation( | ||
| evaluator_id=evaluator_id, | ||
| python_code_to_evaluate=code, |
There was a problem hiding this comment.
for the uploaded code here, should we just upload their full code base with a selected pyargs (entry)? i can create a new backend endpoint for it.
Screen.Recording.2025-09-30.at.5.04.16.PM.mov
Screen.Recording.2025-09-30.at.6.04.39.PM.mov