Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 34 additions & 8 deletions crates/tower-runtime/src/local.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,31 @@ pub struct LocalApp {
execute_handle: Option<JoinHandle<Result<(), Error>>>,
}

// Sent when execute_local_app returns Err before producing a real exit code
// (uv not found, venv spawn failed, PYTHONPATH construction failed, etc).
// Distinct from the cancellation sentinel -1.
pub(crate) const SETUP_FAILURE_EXIT_CODE: i32 = -2;
Comment on lines +52 to +55
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

-1 is still overloaded.

These comments make -1 part of the cancellation contract, but wait_for_process() in this file also returns -1 for child.wait() I/O failures at Lines 578-581. That means callers still can't reliably distinguish cancellation from a runtime wait failure. If these negative codes are now semantic, reserve -1 for cancellation only and give wait errors their own sentinel or a typed status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tower-runtime/src/local.rs` around lines 52 - 55, The comment points
out that -1 is overloaded between cancellation and child.wait() I/O failures;
change the contract so SETUP_FAILURE_EXIT_CODE remains -2 and -1 is reserved
exclusively for cancellation, and introduce a new sentinel or typed status for
wait failures (for example WAIT_FAILURE_EXIT_CODE) used by wait_for_process()
instead of returning -1 on child.wait() I/O errors; update wait_for_process(),
its callers, and any documentation/comments that reference the old -1 behavior
(identify uses in wait_for_process(), child.wait() error handling, and callers
of execute_local_app()) so callers can reliably distinguish cancellation vs.
wait failures.


// Drop guard that ensures the oneshot waiter always receives a code, so `?`
// paths in execute_local_app cannot leave status() stuck on WaiterClosed.
struct StatusReporter {
tx: Option<oneshot::Sender<i32>>,
}

impl StatusReporter {
fn send(&mut self, code: i32) {
if let Some(tx) = self.tx.take() {
let _ = tx.send(code);
}
}
}

impl Drop for StatusReporter {
fn drop(&mut self) {
self.send(SETUP_FAILURE_EXIT_CODE);
}
}

// Helper function to check if a file is executable
async fn is_executable(path: &PathBuf) -> bool {
let metadata = match fs::metadata(path).await {
Expand Down Expand Up @@ -104,6 +129,7 @@ async fn execute_local_app(
sx: oneshot::Sender<i32>,
cancel_token: CancellationToken,
) -> Result<(), Error> {
let mut reporter = StatusReporter { tx: Some(sx) };
let ctx = opts.ctx.clone();
let package = opts.package;
let environment = opts.environment;
Expand Down Expand Up @@ -159,7 +185,7 @@ async fn execute_local_app(
if cancel_token.is_cancelled() {
// if there's a waiter, we want them to know that the process was cancelled so we have
// to return something on the relevant channel.
let _ = sx.send(-1);
reporter.send(-1);
return Err(Error::Cancelled);
}

Expand All @@ -176,7 +202,7 @@ async fn execute_local_app(
)
.await?;

let _ = sx.send(wait_for_process(ctx.clone(), &cancel_token, child).await);
reporter.send(wait_for_process(ctx.clone(), &cancel_token, child).await);
} else {
// we put Uv in to protected mode when there's no caching configured/enabled.
let protected_mode = opts.cache_dir.is_none();
Expand All @@ -198,7 +224,7 @@ async fn execute_local_app(
// ensure everything is in place.
if cancel_token.is_cancelled() {
// again tell any waiters that we cancelled.
let _ = sx.send(-1);
reporter.send(-1);
return Err(Error::Cancelled);
}

Expand Down Expand Up @@ -226,15 +252,15 @@ async fn execute_local_app(

if res != 0 {
// If the venv process failed, we want to return an error.
let _ = sx.send(res);
reporter.send(res);
return Err(Error::VirtualEnvCreationFailed);
}

// Check once more if the process was cancelled before we do a uv sync. The sync itself,
// once started, will take a while and we have logic for checking for cancellation.
if cancel_token.is_cancelled() {
// again tell any waiters that we cancelled.
let _ = sx.send(-1);
reporter.send(-1);
return Err(Error::Cancelled);
}

Expand Down Expand Up @@ -279,7 +305,7 @@ async fn execute_local_app(

if res != 0 {
// If the sync process failed, we want to return an error.
let _ = sx.send(res);
reporter.send(res);
return Err(Error::DependencyInstallationFailed);
}
}
Expand All @@ -289,7 +315,7 @@ async fn execute_local_app(
if cancel_token.is_cancelled() {
// if there's a waiter, we want them to know that the process was cancelled so we have
// to return something on the relevant channel.
let _ = sx.send(-1);
reporter.send(-1);
return Err(Error::Cancelled);
}

Expand All @@ -312,7 +338,7 @@ async fn execute_local_app(
BufReader::new(stderr),
));

let _ = sx.send(wait_for_process(ctx.clone(), &cancel_token, child).await);
reporter.send(wait_for_process(ctx.clone(), &cancel_token, child).await);
}

// Everything was properly executed I suppose.
Expand Down
66 changes: 65 additions & 1 deletion crates/tower-runtime/tests/local_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ use std::path::PathBuf;
use tower_runtime::{local::LocalApp, App, StartOptions, Status};

use config::Towerfile;
use tower_package::{Package, PackageSpec};
use tmpdir::TmpDir;
use tower_package::{Manifest, Package, PackageSpec};
use tower_telemetry::{self, debug};

use tokio::sync::mpsc::unbounded_channel;
Expand Down Expand Up @@ -329,6 +330,69 @@ async fn test_running_app_with_secret() {
assert!(status == Status::Exited, "App should be running");
}

#[cfg(unix)]
#[tokio::test]
async fn test_setup_failure_reports_status_instead_of_waiter_closed() {
// Regression: execute_local_app used to bubble up `?` errors (e.g. a failing
// std::env::join_paths) without ever signalling the oneshot waiter, so
// LocalApp::status() returned Err(WaiterClosed) and callers exited 1
// silently. A setup failure must now surface as a terminal Status.
let tmp = TmpDir::new("test-silent-setup")
.await
.expect("Failed to create tmp dir");
let unpacked_path: PathBuf = tmp.as_ref().to_path_buf();

// On Unix, std::env::join_paths rejects paths containing ':'. Feeding one
// into manifest.import_paths causes the PYTHONPATH construction inside
// execute_local_app to fail via `?` before any subprocess is spawned.
let package = Package {
tmp_dir: None,
package_file_path: None,
unpacked_path: Some(unpacked_path),
manifest: Manifest {
version: Some(1),
invoke: "main.py".to_string(),
parameters: vec![],
schedule: None,
import_paths: vec!["bad:path".to_string()],
app_dir_name: "app".to_string(),
modules_dir_name: "modules".to_string(),
checksum: "".to_string(),
},
};

let (sender, _receiver) = unbounded_channel();
let opts = StartOptions {
ctx: tower_telemetry::Context::new("runner-id".to_string()),
package,
output_sender: sender,
cwd: None,
environment: "local".to_string(),
secrets: HashMap::new(),
parameters: HashMap::new(),
env_vars: HashMap::new(),
cache_dir: Some(config::default_cache_dir()),
};

let app = LocalApp::start(opts).await.expect("Failed to start app");

for _ in 0..20 {
let status = app.status().await.expect("status should not be WaiterClosed");
if status.is_terminal() {
match status {
Status::Crashed { code } => {
assert!(code < 0, "expected negative sentinel, got {}", code);
return;
Comment on lines +383 to +385
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the exact setup-failure code.

code < 0 is too loose for this regression: it would also pass if setup failures accidentally started reporting the cancellation sentinel (-1). This test should pin the specific setup-failure code introduced by the PR, even if that means exposing the constant to integration tests or duplicating -2 here.

Suggested assertion
-                    assert!(code < 0, "expected negative sentinel, got {}", code);
+                    assert_eq!(code, -2, "expected setup failure sentinel, got {}", code);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Status::Crashed { code } => {
assert!(code < 0, "expected negative sentinel, got {}", code);
return;
Status::Crashed { code } => {
assert_eq!(code, -2, "expected setup failure sentinel, got {}", code);
return;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tower-runtime/tests/local_test.rs` around lines 383 - 385, Replace the
loose check that Status::Crashed { code } asserts code < 0 with an exact
equality assertion for the setup-failure sentinel introduced by the PR (i.e.,
assert_eq!(code, -2) or reference the exported constant if available), so update
the assertion in the Status::Crashed match arm to pin the specific setup-failure
code rather than any negative value; if the constant (e.g.,
SETUP_FAILURE_SENTINEL) is not exported to tests, duplicate the -2 literal in
the test to make the expectation explicit.

}
other => panic!("expected Crashed, got {:?}", other),
}
}
tokio::time::sleep(std::time::Duration::from_millis(25)).await;
}

panic!("status never reached a terminal state");
}

#[tokio::test]
async fn test_abort_on_dependency_installation_failure() {
debug!("Running 05-broken-dependencies");
Expand Down
Loading