Skip to content

Race condition with multiple pty file descriptors sent concurrently over DBus #24

@HastD

Description

@HastD

I have a program with a client/daemon model where the daemon takes requests over DBus (using the zbus crate) and, for each request, runs a command in a pty and sends the file descriptor of the pty as a reply to the client, allowing the client to read the command output.

This works under typical circumstances, but if the daemon receives a large number of concurrent requests, sometimes some of the clients will either receive no output on the pty or will hang indefinitely, which suggests some sort of race condition is happening. I opened an issue with zbus (z-galaxy/zbus#1766) but the maintainer doesn't think it's likely to be a zbus issue as zbus is just sending the FD of the pty over DBus.

I have a minimal reproduction here: https://codeberg.org/HastD/zbus-bug-demo
The two binary targets, daemon.rs and client.rs, use the zbus interface/proxy defined in lib.rs. If the client program runs successfully, it should simply echo back the first argument passed to it.

The test in tests/test.rs runs the daemon in a background process and then spawns 100 client processes and checks whether they produce the expected output. However, when I run cargo test, the test usually fails at a nondeterministic index with most of the clients succeeding but a few having empty output.

I'm unsure whether the bug is in zbus or pty-process, but I can't reproduce the bug without sending the pty file descriptors over DBus. Also, strangely, the bug does not reproduce if the daemon uses the blocking API of pty-process, which suggests the issue has something to do with the difference between the blocking and async APIs of pty-process.

Example code

For ease of reference, here's the code from the above example demonstrating the issue:

lib.rs

use pty_process::{Command, Pty, open as open_pty};
use zbus::zvariant::OwnedFd;

pub struct Foo;

#[zbus::interface(
    name = "example.foo.Foo",
    proxy(default_service = "example.foo.Foo", default_path = "/example/foo/Foo")
)]
impl Foo {
    async fn hello(&self, msg: &str) -> zbus::fdo::Result<OwnedFd> {
        let pty = spawn_command_in_pty(msg).map_err(|err| zbus::Error::Failure(err.to_string()))?;
        Ok(OwnedFd::from(std::os::fd::OwnedFd::from(pty)))
    }
}

fn spawn_command_in_pty(msg: &str) -> pty_process::Result<Pty> {
    let (pty, pts) = open_pty()?;
    pty.resize(pty_process::Size::new(24, 80))?;
    Command::new("echo").arg(msg).spawn(pts)?;
    Ok(pty)
}

bin/daemon.rs

#[tokio::main]
async fn main() -> zbus::Result<()> {
    let connection = zbus::connection::Builder::session()?
        .name("example.foo.Foo")?
        .serve_at("/example/foo/Foo", zbus_bug_demo::Foo)?
        .build()
        .await?;
    tokio::time::sleep(std::time::Duration::from_secs(5)).await;
    connection.graceful_shutdown().await;
    Ok(())
}

bin/client.rs

#[tokio::main]
async fn main() -> zbus::fdo::Result<()> {
    let msg = std::env::args().nth(1).unwrap_or_default();
    let conn = zbus::Connection::session().await?;
    let proxy = zbus_bug_demo::FooProxy::new(&conn).await?;
    let pty_fd = proxy.hello(&msg).await?;
    let mut pty = unsafe { pty_process::Pty::from_fd(pty_fd.into()).unwrap() };
    let _ = tokio::io::copy(&mut pty, &mut tokio::io::stdout()).await;
    Ok(())
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions