fix: preserve API-provided verifier SHA in bundle creation#62
Open
shihan-fleet wants to merge 65 commits intomainfrom
Open
fix: preserve API-provided verifier SHA in bundle creation#62shihan-fleet wants to merge 65 commits intomainfrom
shihan-fleet wants to merge 65 commits intomainfrom
Conversation
When verifier code contains multiple functions (e.g., a main verifier function and helper functions), the helper functions were not accessible from the main function due to namespace isolation. The exec() call created functions in local_namespace, but the main function's __globals__ pointed to exec_globals which didn't contain the helper functions. This caused NameError when the main function tried to call helpers, which was silently caught and returned 0.0. Fix: Merge local_namespace into exec_globals after exec() so all defined functions are accessible when the verifier is called.
…mespace fix: allow verifier helper functions to be called from main verifier
InstanceRequest changes: - Add: profile_id, async_provision, instance_mode, ssh_public_keys, snapshot_interval_minutes, version (deprecated) - Fix: region default changed from 'us-west-1' to None (server decides) - Fix: created_from default changed from None to 'api' TaskRequest changes: - Add: verifier_func, project_key, data_id, data_version, writer_metadata - Add: model_config with extra='ignore' and populate_by_name=True - Add: alias='env_id' for environment_id field - Remove: metadata (doesn't exist in orchestrator TaskRequest, only in TaskResponse)
…API" This reverts commit 9a0af14.
add metadata to tasks in SDK
bump version
…odels Add factual_answer field to support research/factual tasks: - Task model: stores expected answer for verification - TaskRequest: accept factual_answer when creating tasks - TaskResponse: return factual_answer from API Part of: https://linear.app/fleet-ai/issue/ENG-843/import-script-needs-to-support-output-json-schemas Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add factual_answer field to Task and API models
Add task_modality field to Task and TaskResponse models to support copying task modality (computer_use, tool_use, browser) when importing tasks via the SDK. Changes: - Add task_modality to TaskResponse model (API response) - Add task_modality to Task model (SDK model) - Pass task_modality from TaskResponse to Task in load_tasks Co-authored-by: Cursor <cursoragent@cursor.com>
Addresses Bugbot comment: load_task_from_json wasn't extracting task_modality from JSON data, causing tasks loaded from JSON files to have task_modality=None even when the JSON contains this field. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Add task_modality field to async Task model, TaskResponse model, and update load_task_from_json and load_tasks to preserve task_modality. Co-authored-by: Cursor <cursoragent@cursor.com>
- Change ScenarioResponse.id from str to int - Change task_scenario_id from Optional[str] to Optional[int] in Task and TaskResponse models - Bump version to 0.2.112 Co-authored-by: Cursor <cursoragent@cursor.com>
fix: use int for scenario IDs to match database schema
The API returns `environment_id` but load_task_from_json was only looking for `env_id` or `env_key`. Now it checks all three field names. Bump version to 0.2.113. Co-authored-by: Cursor <cursoragent@cursor.com>
fix: handle environment_id in load_task_from_json
Previously, import_single_task would catch all exceptions and return None, making it impossible to debug import failures. Now it raises the exception so callers can handle or report the actual error. Bump version to 0.2.114. Co-authored-by: Cursor <cursoragent@cursor.com>
…dling fix: propagate errors from import_single_task instead of swallowing
This field was missing from the SDK, causing the lifecycle status to be lost when copying tasks. The API returns this field but the SDK wasn't capturing it. Changes: - Add task_lifecycle_status field to Task model (sync and async) - Map task_lifecycle_status in load_task_from_json (sync and async) - Bump version to 0.2.115 Co-authored-by: Cursor <cursoragent@cursor.com>
The API returns 'environment_id', so just use that directly instead of a fallback chain of env_id/env_key/environment_id. Co-authored-by: Cursor <cursoragent@cursor.com>
The database uses env_key, so the SDK model should match. Added alias="environment_id" so the API response still maps correctly. Updated all references: - Task.env_id -> Task.env_key - TaskInfo.env_id -> TaskInfo.env_key - Updated docstrings and examples Co-authored-by: Cursor <cursoragent@cursor.com>
The API expects env_id (or environment_id), so we map env_key to env_id in import_single_task before sending. This keeps the SDK using env_key internally (matching DB) while maintaining API compatibility. No API changes needed - this is SDK-only. Co-authored-by: Cursor <cursoragent@cursor.com>
The task_lifecycle_status field was added to the Task model but was missing from: - TaskResponse model (sync and async) - needed to parse API response - load_tasks method - needed to pass the field to Task constructor This completes the task_lifecycle_status support in the SDK. Co-authored-by: Cursor <cursoragent@cursor.com>
The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored fallback for env_key in load_task_from_json to support JSON files that use env_key field. Co-authored-by: Cursor <cursoragent@cursor.com>
The field was renamed to env_key but there was already a property with the same name, causing infinite recursion. Renamed the property to get_env_key() method. Also restored env_id fallback in load_task_from_json for backward compatibility with existing JSON files. Co-authored-by: Cursor <cursoragent@cursor.com>
The make() method was using self.env_key (raw field) instead of self.get_env_key() (computed method with version). This would cause environments to be created without the version suffix. Co-authored-by: Cursor <cursoragent@cursor.com>
The API returns env_id but TaskInfo was renamed to use env_key. Added alias="env_id" so Pydantic accepts both field names during deserialization of API responses. Co-authored-by: Cursor <cursoragent@cursor.com>
When export_tasks serializes tasks, it outputs env_key. The loading function needs to check for env_key first (canonical name), then fallback to environment_id (API) and env_id (legacy). Co-authored-by: Cursor <cursoragent@cursor.com>
- TaskResponse: rename environment_id -> env_key (alias="environment_id") - TaskRequest: rename environment_id -> env_key (alias="environment_id") - Add ConfigDict(populate_by_name=True) for alias support - Add Task.env_spec property for env_key:version string - Use task.env_spec in Task.make() and make_for_task() - Clean up load_tasks to use task_response.env_key directly - Remove scattered inline env_key:version string building Co-authored-by: Cursor <cursoragent@cursor.com>
- data_spec: renamed from data_key (data_key kept as alias) - has_verifier: whether task has verifier_func or verifier - is_research_based: whether task has a factual_answer - is_action_based: inverse of is_research_based Co-authored-by: Cursor <cursoragent@cursor.com>
TaskInfo has alias="env_id" on env_key field but was missing model_config = ConfigDict(populate_by_name=True). Without this, creating TaskInfo(env_key="...") would fail since only the alias name was accepted. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
feat: Add task_lifecycle_status field to Task model
The PUT /v1/tasks/{task_key} endpoint can return environment_id: null,
which caused a Pydantic validation error since env_key was required.
This made update_task crash instead of returning a TaskResponse.
- TaskResponse.env_key: str -> Optional[str]
- Task.env_key: str -> Optional[str]
- Task.env_spec now returns None when env_key is absent
Co-authored-by: Cursor <cursoragent@cursor.com>
When a task has env_key=None, make_for_task would pass None to make() causing a TypeError at ":" in env_key. Now raises a clear ValueError matching the guard in Task.make(). Co-authored-by: Cursor <cursoragent@cursor.com>
fix: make TaskResponse.env_key optional to handle null API responses
…v-key revert: restore env_key as required in TaskResponse and Task
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| raise OSError(f"Cannot create bundle for {self.key}: {e}") | ||
|
|
||
| if self._bundle_sha is None: | ||
| self._bundle_sha = _get_bundle_sha(self._bundle_data) |
There was a problem hiding this comment.
Empty SHA now bypasses hash generation
Medium Severity
_get_or_create_bundle now recomputes the hash only when _bundle_sha is None, but API-loaded verifiers pass missing hashes as "". That preserves an empty sha256 instead of deriving one from bundle_data, so later check/execute calls can run with an invalid SHA.
Additional Locations (1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


No description provided.