Skip to content

fix: create parent directory before moving workflow clone#1520

Open
guzalv wants to merge 3 commits intoambient-code:mainfrom
guzalv:fix/workflow-clone-missing-mkdir-v2
Open

fix: create parent directory before moving workflow clone#1520
guzalv wants to merge 3 commits intoambient-code:mainfrom
guzalv:fix/workflow-clone-missing-mkdir-v2

Conversation

@guzalv
Copy link
Copy Markdown

@guzalv guzalv commented May 7, 2026

Summary

Custom workflows loaded via "Load custom" (git URL + branch, no subpath) silently fail — the session starts with an empty workflow directory. None of the workflow's commands, system prompt, or CLAUDE.md are available to the agent.

Root causes

1. Missing parent directory for workflow clone

The init container (hydrate.sh) and runtime clone (workflow.py) both attempt to move the cloned workflow to /workspace/workflows/<name>, but the parent directory /workspace/workflows/ is never created beforehand. The mkdir -p call exists in the subpath code path but is missing from the non-subpath path (which is the common case — entire repo = the workflow). The move fails silently (set +e in hydrate.sh, except in workflow.py), and the runner later creates an empty directory as the CWD.

2. Repo and workflow cloning coupled to S3 availability

hydrate.sh calls exit 0 when S3 credentials are missing, which skips everything after line 98 — including all git credential setup, repo cloning, and workflow cloning. These operations are independent of S3 state persistence and should always run. This means any deployment without S3 configured (or with temporarily unavailable S3) gets no repos and no workflows.

Who is affected

  • Missing mkdir: All users loading a custom workflow without a subpath (the common case). The subpath code path works correctly, which is likely why this wasn't caught.
  • S3 coupling: Any deployment where S3 credentials are not configured or temporarily unavailable. The platform otherwise functions fine without S3 (sessions start, agent works), but repos and workflows silently don't load.

Changes

hydrate.sh

  • Add mkdir -p "$(dirname "$WORKFLOW_FINAL")" before mv in the no-subpath and subpath-fallback branches
  • Replace the exit 0 when S3 is unconfigured with a conditional block: S3-specific operations (rclone setup, state download, repo state restore) are skipped, but git credential setup, repo cloning, and workflow cloning always run

workflow.py

  • Add workflow_final.parent.mkdir(parents=True, exist_ok=True) before shutil.move() in the non-subpath and subpath-fallback branches of clone_workflow_at_runtime()

Test plan

  • Load a custom workflow (URL + branch, no subpath) on a fresh session — verify /workspace/workflows/<name>/ is populated
  • Load a custom workflow on a deployment without S3 configured — verify workflow still clones
  • Load a custom workflow with a valid subpath — verify extraction still works (unchanged code path)
  • Load a custom workflow with a nonexistent subpath — verify fallback to entire repo works
  • Full deployment with S3 configured — verify state hydration still works as before
  • make kind-up LOCAL_IMAGES=true with local builds of ambient-runner and state-sync

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced workflow cloning to properly handle edge cases where workflow subpaths are missing, ensuring required directories are created before filesystem operations.
    • Improved workflow repository preparation to conditionally manage operations based on storage availability, preventing unnecessary script termination.

When loading a custom workflow without a subpath, the /workspace/workflows/
parent directory was never created before the mv/shutil.move call. This
caused the move to fail silently (hydrate.sh has set +e, workflow.py
catches exceptions), and the finally block then deleted the successfully
cloned temp directory. The runner later created an empty directory as the
CWD, making it appear the workflow loaded but was empty.

The subpath code paths already had the mkdir -p / parent.mkdir() calls;
this adds them to the non-subpath paths in both hydrate.sh and workflow.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit 66f6f7e
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/69fc97be301a7500075f1450

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

📝 Walkthrough

Walkthrough

Adjusts runtime workflow placement and hydration behavior: both the ambient-runner endpoint and the state-sync script now ensure the destination parent directory exists before placing a cloned or extracted workflow. The Python endpoint also switches to moving the temporary clone into the final path for certain missing-subpath cases, and the shell hydration now only attempts S3 restore when S3_PATH is set.

Changes

Workflow Destination & Hydration

Layer / File(s) Summary
Directory Precondition
components/runners/ambient-runner/ambient_runner/endpoints/workflow.py, components/runners/state-sync/hydrate.sh
Parent directory of WORKFLOW_FINAL / workflow_final is created (mkdir -p / Path(...).parent.mkdir(parents=True, exist_ok=True)) before moving or extracting workflows.
Core Placement Behavior
components/runners/ambient-runner/ambient_runner/endpoints/workflow.py
When requested subpath is missing or not provided, the endpoint now ensures the destination parent exists and uses shutil.move(temp_dir, workflow_final) (re-homing the cloned repo) for the no-subpath / missing-subpath paths.
Hydration Control Flow
components/runners/state-sync/hydrate.sh
S3 repo restore (rclone/git state restore) is now guarded by checking S3_PATH; if S3 is not configured the script logs/skips hydration instead of exiting early.
Extraction / Move Variants
components/runners/state-sync/hydrate.sh
When ACTIVE_WORKFLOW_PATH is set (extract subpath) or unset (use whole repo), the script creates the parent dir of WORKFLOW_FINAL before copying/extracting or moving the repository into /workspace/workflows/${WORKFLOW_NAME}.

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error Path traversal vulnerability in subpath handling. User path input used without validation in path operations, allowing ../ escape sequences. Validate subpath input to reject .., absolute paths. Add validation before using subpath in temp_dir/subpath concatenation. Reject if path contains ..' or starts with /`.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (fix: prefix) and accurately describes the main change: ensuring parent directories exist before moving workflow clones.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed No performance regressions detected. Changes add defensive mkdir calls before filesystem moves. No O(n²) algorithms, N+1 patterns, unbounded loops, or expensive operations inside loops.
Kubernetes Resource Safety ✅ Passed PR modifies only application-level code (Python workflow endpoint and bash init script), not Kubernetes resource definitions. No K8s manifests in scope.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/runners/state-sync/hydrate.sh (1)

339-345: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing mkdir -p before mv in the subpath-not-found fallback.

The subpath-found branch (line 334) and the no-subpath branch (line 348) both call mkdir -p "$(dirname "$WORKFLOW_FINAL")" before moving/copying, but the subpath-not-found fallback at line 343 does not. Since set +e is active here, the mv will silently fail if /workspace/workflows/ doesn't exist yet.

🐛 Proposed fix
             echo "  Using entire repo instead"
+            mkdir -p "$(dirname "$WORKFLOW_FINAL")"
             mv "$WORKFLOW_TEMP" "$WORKFLOW_FINAL"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/runners/state-sync/hydrate.sh` around lines 339 - 345, The
fallback branch that handles a missing subpath moves the entire cloned repo with
mv "$WORKFLOW_TEMP" "$WORKFLOW_FINAL" but does not ensure the destination parent
exists, so mv can silently fail under set +e; add a mkdir -p "$(dirname
"$WORKFLOW_FINAL")" immediately before the mv in that branch (the same pattern
used in the subpath-found and no-subpath branches) to guarantee the parent
directory for WORKFLOW_FINAL exists before moving WORKFLOW_TEMP into place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@components/runners/state-sync/hydrate.sh`:
- Around line 339-345: The fallback branch that handles a missing subpath moves
the entire cloned repo with mv "$WORKFLOW_TEMP" "$WORKFLOW_FINAL" but does not
ensure the destination parent exists, so mv can silently fail under set +e; add
a mkdir -p "$(dirname "$WORKFLOW_FINAL")" immediately before the mv in that
branch (the same pattern used in the subpath-found and no-subpath branches) to
guarantee the parent directory for WORKFLOW_FINAL exists before moving
WORKFLOW_TEMP into place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2420c208-5c06-4cce-90e1-6813a4e8e7fc

📥 Commits

Reviewing files that changed from the base of the PR and between 7149354 and 11b2528.

📒 Files selected for processing (2)
  • components/runners/ambient-runner/ambient_runner/endpoints/workflow.py
  • components/runners/state-sync/hydrate.sh

guzalv and others added 2 commits May 7, 2026 14:45
The subpath-fallback branch (when a subpath is specified but not found in
the cloned repo) also lacked mkdir -p before mv, same root cause.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The init container previously called exit 0 when S3 credentials were
missing, which skipped all git cloning of repos and workflows. Repo and
workflow cloning are independent of S3 state persistence and should
always run.

Replace the early exit with a conditional block around S3-specific
operations (rclone setup, state download, repo state restore), while
leaving git credential setup, repo cloning, and workflow cloning
unconditional.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/runners/state-sync/hydrate.sh (1)

469-471: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing chown/chmod for /workspace/workflows after cloning.

/workspace/workflows (and its contents) is created under set +e by the new mkdir -p calls, but only /workspace/repos is covered in the final permissions block. The existing comment at lines 87–92 explains exactly why repos need chmod 777 (init container = root, runner = uid 1001, no shared fsGroup). The same applies here: if the runner writes anything into the workflow directory at runtime (e.g. __pycache__, PID files, generated config), it will get a permission denial.

🔧 Proposed fix
 # Set permissions on repos after restore (repos may have been cloned or restored)
 chown -R 1001:0 /workspace/repos 2>/dev/null || true
 chmod -R 777 /workspace/repos 2>/dev/null || true
+
+# Set permissions on workflows (same reasoning as repos — init container runs as root)
+chown -R 1001:0 /workspace/workflows 2>/dev/null || true
+chmod -R 777 /workspace/workflows 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/runners/state-sync/hydrate.sh` around lines 469 - 471, The
permissions block currently only adjusts /workspace/repos; add the same chown -R
1001:0 and chmod -R 777 commands for /workspace/workflows so the runner (uid
1001) can write runtime files there; update the final permissions section that
contains chown -R 1001:0 /workspace/repos and chmod -R 777 /workspace/repos to
also perform chown -R 1001:0 /workspace/workflows 2>/dev/null || true and chmod
-R 777 /workspace/workflows 2>/dev/null || true.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@components/runners/state-sync/hydrate.sh`:
- Around line 469-471: The permissions block currently only adjusts
/workspace/repos; add the same chown -R 1001:0 and chmod -R 777 commands for
/workspace/workflows so the runner (uid 1001) can write runtime files there;
update the final permissions section that contains chown -R 1001:0
/workspace/repos and chmod -R 777 /workspace/repos to also perform chown -R
1001:0 /workspace/workflows 2>/dev/null || true and chmod -R 777
/workspace/workflows 2>/dev/null || true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4ca458f6-93a3-44c7-9122-3d36742b7c8d

📥 Commits

Reviewing files that changed from the base of the PR and between 11b2528 and 66f6f7e.

📒 Files selected for processing (1)
  • components/runners/state-sync/hydrate.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant