Merged
Conversation
Design for resume-from-failure functionality that works across local and Argo executors, leveraging native capabilities like memoization.
- Add abstract _calculate_attempt_number method to BaseExecutor - Remove deprecated step_attempt_number property with TODO - Extract attempt calculation into dedicated method in GenericPipelineExecutor - Update _execute_node to use new calculation logic - Implement concrete method in BaseJobExecutor (reads from env var) - Keep abstract method in BasePipelineExecutor for custom implementations - Add comprehensive tests for attempt number calculation scenarios - Delete complex retry design document in favor of simpler approach - Verify all changes work with example pipelines and full test suite This refactoring improves code organization, testability, and maintains backward compatibility while removing technical debt. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Add _get_parameters_for_retry method for original parameters - Ignore new parameter files and environment variables during retry - Show informative console messages about parameter usage - Support both retry and normal execution modes - Add comprehensive test coverage for parameter loading scenarios 🤖 Generated with Claude Code
- Add _should_skip_step_in_retry method to check step execution status - Modify execute_from_graph to skip successful steps during retry - Skip steps if they were previously successful in original run - Execute steps that failed or were never executed - Support both retry and normal execution modes - Add comprehensive test coverage for skip logic and integration 🤖 Generated with Claude Code
- Modify _set_up_run_log to handle retry runs properly - Call retry validation and return early for retry runs - Reuse existing run log instead of creating new one for retry - Maintain normal run log creation logic unchanged - Prevent RunLogExistsError for legitimate retry scenarios - Add comprehensive integration tests for retry pipeline setup 🤖 Generated with Claude Code
- Fix critical bug where attempt numbers were always 1 during retry - During retry, reuse existing step logs to preserve original attempts - For failed steps: get existing step log with previous attempts - For never executed steps: create new step log as normal - This allows _calculate_attempt_number to correctly increment attempts - Add comprehensive tests for step log reuse behavior during retry Root cause: execute_from_graph was always calling create_step_log which created brand new step logs, wiping out the original attempts from the failed run, causing _calculate_attempt_number to always return 1. 🤖 Generated with Claude Code
Investigation findings:
- Runtime error: nil pointer dereference in Argo workflows
- Root cause: run_id parameter changed from {{workflow.uid}} to
"PLEASE_SET_RUN_ID"
- Memoization system uses {{workflow.parameters.run_id}} as key for
all templates
- Placeholder string causes nil pointer dereference in Argo runtime
- Design document specifies run_id should default to workflow.uid
when not provided
- Memoization is required for retry functionality, cannot be removed
Next: Fix run_id parameter to use {{workflow.uid}} instead of
placeholder
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
The run_id parameter was using placeholder "PLEASE_SET_RUN_ID" which
caused runtime nil pointer dereference when Argo tried to resolve
memoization keys {{workflow.parameters.run_id}}.
Changes:
- Restore run_id default value from "PLEASE_SET_RUN_ID" to
"{{workflow.uid}}" as specified in design document
- This provides a valid workflow identifier for memoization keys
- Maintains retry functionality while preventing runtime panics
- All existing tests continue to pass
Verified:
- argo lint passes
- All retry tests pass (14/14)
- Generated workflow validates correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Generated and validated YAML from multiple example pipelines: - examples/01-tasks/python_tasks.py - examples/02-sequential/traversal.py - examples/06-parallel/parallel.py All generated YAMLs include ConfigMap cache configuration in memoize blocks and pass Argo workflow linting validation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Updated the retry design documentation to include the ConfigMap cache configuration in the memoize block. This reflects the implementation of persistent caching across workflow resubmissions using Argo's ConfigMap cache feature. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Performed comprehensive integration testing across multiple pipeline types: - Tested python_tasks, passing_parameters_python, and catalog_python examples - Verified all generate valid Argo YAML with cache configuration - Confirmed cache name consistency across templates (runnable-xxxxxx pattern) - Validated backward compatibility (old-style memoize without cache still works) - All tests passed with argo lint validation Results documented in integration-test-results.txt Task 7 complete: ConfigMap memoization implementation is production-ready
Previously, RUNNABLE_RETRY_RUN_ID was incorrectly set to
{{workflow.parameters.run_id}} (current run ID) instead of
{{workflow.parameters.retry_run_id}} (original run ID for retries).
This caused is_retry to always return True, making the system look for
run logs even on first runs where none exist.
Changes:
- Fix _add_retry_env_vars to use {{workflow.parameters.retry_run_id}}
- Now retry_run_id is empty string for normal runs (is_retry = False)
- And contains original run ID for actual retries (is_retry = True)
- All 23 Argo tests continue to pass
Resolves issue where first-time pipeline runs were incorrectly flagged
as retries and attempted to load non-existent run logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Add configmap_cache_name parameter to ArgoExecutor config - Allow users to specify custom ConfigMap names for memoization - Default to random generation (runnable-xxxxxx) when not specified - Update documentation with new configuration option - Backward compatible with existing configurations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Move secrets import from method level to module level - Follow Python best practices for import organization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Remove hardcoded run_id parameter from workflow arguments
- Users must provide run_id at workflow submission time
- Enables cache reuse: same run_id = cache hit, new run_id = fresh execution
- Resolves 'invalid cache key: {{workflow.uid}}' error
- Gives users full control over memoization behavior
Usage:
# Cache reuse: argo submit argo-pipeline.yaml -p run_id=stable-run
# Fresh run: argo submit argo-pipeline.yaml -p run_id=new-run-123
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Add run_id parameter declaration without default value - Makes run_id required at workflow submission time - Workflow now declares all expected parameters correctly - Argo lint validation passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
- Add key field to ConfigMapCache model with default 'cache' value - Resolves issue where memoization worked within workflows but didn't persist - ConfigMap entries now properly stored and retrieved across workflow runs - Fixes root cause: empty configMap.key prevented cache persistence Issue analysis showed: - Memoization was working (Hit: true/false status confirmed) - ConfigMap key field was empty, breaking persistent storage - Cache worked within single workflow but not across runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Add complete retry documentation with cross-environment debugging: - Production failure → local debugging workflow - Technical deep-dive on surgical retry mechanisms - Working examples from examples/09-retry/ directory - Clear distinction between failure handling and retry Also fix broken links in visualization.md and clarify failure handling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
CronWorkflow support: - Add CronSchedule model with schedules and timezone fields - Add cron_schedule config option to ArgoExecutor - Generate CronWorkflow instead of Workflow when schedule is configured - Add documentation in argo.md and example config Retry CLI command: - Add `runnable retry <run_id>` CLI command - Add retry_pipeline() entrypoint that loads run log and re-executes - Document CLI retry in retry-recovery.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Python tasks are plain functions - any IDE debugger works without special configuration. Added callout box explaining this advantage over frameworks that require special debugging setup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added examples/07-map/dynamic_map.py demonstrating how to generate the list of items to iterate over at runtime from a previous step's return value, rather than from a static parameters file. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.