Skip to content

fix(bwrap): default to deny-by-default filesystem (mirror seatbelt)#482

Open
caarlos0 wants to merge 3 commits into
microsoft:mainfrom
caarlos0:bwrap-deny-default
Open

fix(bwrap): default to deny-by-default filesystem (mirror seatbelt)#482
caarlos0 wants to merge 3 commits into
microsoft:mainfrom
caarlos0:bwrap-deny-default

Conversation

@caarlos0
Copy link
Copy Markdown

@caarlos0 caarlos0 commented Jun 3, 2026

📖 Description

Change the Bubblewrap backend's default filesystem posture from "host root mounted read-only" to deny-by-default, matching the macOS Seatbelt backend's (deny default) baseline.

Why

bwrap_command::build_args used to emit:

args.extend(["--ro-bind".into(), "/".into(), "/".into()]);

That bind-mounted the entire host root read-only into every sandbox, so the caller's $HOME/.aws/credentials, $HOME/.ssh/id_*, browser cookies, etc. were readable inside the sandbox by default. The Seatbelt backend on macOS starts from (deny default) and only allows narrow system paths (SYSTEM_READ_ALLOW in src/backends/seatbelt/common/src/profile_builder.rs), so the two backends had a meaningful asymmetry in the confidentiality guarantees they offered. This PR closes that gap.

What changes

New baseline (BASELINE_RO_BIND_PATHS) — mirrors seatbelt's SYSTEM_READ_ALLOW:

  • Top-level dirs: /bin, /sbin, /lib, /lib32, /lib64, /libx32 (symlinks under /usr on merged-usr distros; bwrap follows source-side symlinks so both real-dir and symlinked distros work).
  • /usr subpaths: /usr/bin, /usr/sbin, /usr/lib, /usr/lib32, /usr/lib64, /usr/libexec, /usr/share — deliberately not /usr wholesale, so /usr/local is not implicitly exposed.
  • /etc — whole, like seatbelt's /private/etc. Files with restrictive perms (/etc/shadow, /etc/sudoers, /etc/ssh/ssh_host_*_key) stay unreadable to a non-root caller because user-namespace UID mapping does not bypass kernel DAC.
  • DNS stub-resolver dirs: /run/systemd/resolve, /run/NetworkManager, /run/resolvconf — needed when /etc/resolv.conf is a symlink. Narrow subpaths so /run/user/<uid> (D-Bus session, keyring, ssh-agent sockets) stays hidden.

All emitted via --ro-bind-try so missing paths are silently skipped (e.g. /lib32 on x86_64-only systems, /run/systemd/resolve on hosts without systemd-resolved).

What disappears from sandbox by default

$HOME, /root, /home/*, /opt, /srv, /mnt, /media, /var, /sys, /usr/local, /run/user/<uid>, /run/dbus. Callers who legitimately need any of these must list them under readonlyPaths or readwritePaths.

What's preserved

  • readwritePaths / readonlyPaths / deniedPaths semantics — unchanged.
  • --unshare-* flags, network policy handling, proxy env-var injection, working-dir, env clearing — unchanged.
  • Standard --dev /dev / --proc /proc / --tmpfs /tmp overlay — unchanged.

Drive-by build fix

The second commit (fix(nanvix): compile as build-dep from non-Linux/Windows hosts) adds empty/zero fallbacks for REQUIRED_BINARIES and NANVIXD_BINARY so nanvix_common compiles on macOS hosts when pulled in as a [build-dependency] of lxc / wxc during cross-compile. Zero runtime impact on supported platforms — the consuming build scripts already gate the surrounding logic behind cfg(target_os = "linux"/"windows") and feature = "microvm". Separated out so it can be reviewed (or split into its own PR) independently.

Breaking change for users

This is a behavior change. Configs that implicitly relied on $HOME (or /opt, /var, /usr/local, …) being readable will start failing. The migration is to list the directory in readonlyPaths:

{
  "filesystem": {
    "readonlyPaths": ["/home/alice/project", "/usr/local"]
  }
}

Documented in the updated "How It Works → Deny-by-default filesystem" and "Limitations" sections of docs/bwrap-support/bubblewrap-backend.md.

🔗 References

No tracking issue — this came out of a direct comparison between the seatbelt and bwrap baselines while reviewing the two unprivileged backends.

🔍 Validation

Unit tests (cargo test -p bwrap_common from src/) — 21/21 pass, including 5 new tests covering the new contract:

  • baseline_does_not_bind_mount_host_root — regression test for the old --ro-bind / / default.
  • baseline_emits_required_ro_bind_try_paths/bin, /sbin, /lib, /lib64, /usr/bin, /usr/lib, /usr/share, /etc all emitted.
  • baseline_does_not_expose_usr_local — no --ro-bind /usr /usr and no explicit /usr/local entry.
  • baseline_excludes_confidential_paths — no /home, /root, /opt, /srv, /var, /sys, /run/user, /run/dbus bind-mounts.
  • baseline_includes_dns_stub_resolver_dirs — all three DNS dirs emitted via --ro-bind-try.
  • baseline_mounts_precede_policy_mounts — policy mounts can still shadow baseline.

Plus updated filesystem_policy_produces_correct_mounts to match the new contract (a bare --ro-bind /data /data is now unambiguously the policy mount).

Lint / formatcargo clippy -p bwrap_common --all-targets -- -D warnings clean, cargo fmt --all -- --check clean.

Linux VM verification — cross-compiled lxc-exec for aarch64-unknown-linux-gnu and ran a 6-config smoke suite on a Linux VM (see src/target/vm-test-bundle/ locally — gitignored). The suite plants TOP_SECRET=hunter2 in /home/SENTINEL_DO_NOT_LEAK.txt on the host and verifies the secret does not appear in sandbox output without an explicit readonlyPaths: ["/home"], then verifies the opt-in does expose it. Also covers /opt//var//sys//root//usr/local being hidden, DNS resolution working with network allowed, and /etc/shadow staying unreadable via DAC. (Will paste the run output as a PR comment once the VM run is complete.)

✅ Checklist

📋 Issue Type

  • Bug fix
  • Feature
  • Task
Microsoft Reviewers: Open in CodeFlow

caarlos0 and others added 2 commits June 3, 2026 09:39
The Bubblewrap backend used to bind-mount the entire host root read-only
into every sandbox (`--ro-bind / /`), so the caller's $HOME, /root,
/opt, /var/sys, /run/user/<uid>, and everything else readable by the
calling uid was visible inside the sandbox by default. The macOS Seatbelt
backend, by contrast, starts from `(deny default)` and only allows a
narrow system baseline -- bwrap now matches that posture.

The new baseline (`BASELINE_RO_BIND_PATHS`) mirrors seatbelt's
`SYSTEM_READ_ALLOW` allowlist: top-level executable/library dirs
(/bin, /sbin, /lib*), the /usr subpaths that seatbelt allows (without
/usr/local), /etc, and the DNS stub-resolver directories under /run
(/run/systemd/resolve, /run/NetworkManager, /run/resolvconf) so
/etc/resolv.conf symlinks still resolve when network is allowed.
$HOME, /opt, /usr/local, /var, /sys, and /run/user/<uid> are no
longer visible until the caller opts in via `readonlyPaths` /
`readwritePaths`.

Paths are emitted via `--ro-bind-try` so missing entries are silently
skipped (e.g. /lib32 on x86_64-only systems, /run/systemd/resolve on
hosts without systemd-resolved).

Files in /etc with restrictive perms (/etc/shadow, /etc/sudoers,
/etc/ssh/ssh_host_*_key) remain unreadable to a non-root caller even
though /etc is bound whole -- user-namespace UID mapping does not
bypass kernel DAC.

Updated the existing `filesystem_policy_produces_correct_mounts` test
and added 5 new tests covering the new contract (no host-root bind,
required baseline paths emitted, /usr/local not exposed, confidential
paths excluded, DNS dirs included, baseline precedes policy mounts).

Docs in docs/bwrap-support/bubblewrap-backend.md updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`nanvix_common` is a `[build-dependency]` of `lxc` and `wxc`. Build
deps are compiled for the host, so cross-compiling lxc-exec from macOS
to aarch64-unknown-linux-gnu pulled nanvix_common into a host build
where `target_os` was neither "windows" nor "linux" -- the
`REQUIRED_BINARIES` and `NANVIXD_BINARY` constants then had no
definition and the crate failed to compile.

Add empty/zero fallbacks for non-Windows/Linux hosts. The empty slice
is correct because:
- NanVix only runs on Windows and Linux, so iterating `REQUIRED_BINARIES`
  on other hosts must be a no-op.
- The consuming build scripts (e.g. `src/core/lxc/build.rs`) already
  gate the surrounding logic behind `cfg(target_os = "linux")` and
  `feature = "microvm"`, so the fallback values are never reached
  in practice.

Zero runtime impact on supported platforms; pure build-time
portability fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 3, 2026 12:40
@caarlos0 caarlos0 requested a review from a team as a code owner June 3, 2026 12:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR tightens the Bubblewrap backend’s default filesystem exposure by switching from a full host-root bind mount to a minimal allowlist baseline, adds regression tests for the new deny-by-default posture, and updates docs accordingly. It also adds NanVix constant fallbacks so the NanVix common crate can compile on non-Windows/Linux hosts when used as a build dependency.

Changes:

  • Bubblewrap: replace --ro-bind / / with a minimal baseline set of --ro-bind-try mounts and add targeted regression tests.
  • NanVix: add non-Windows/Linux fallbacks for REQUIRED_BINARIES and NANVIXD_BINARY to support host builds on macOS/BSD.
  • Docs: document the Bubblewrap deny-by-default filesystem model and its consequences.
Show a summary per file
File Description
src/backends/nanvix/common/src/lib.rs Adds non-Windows/Linux fallbacks for NanVix host-compiled constants to keep builds working when cross-compiling.
src/backends/bubblewrap/common/src/bwrap_command.rs Introduces a minimal baseline allowlist (deny-by-default) via --ro-bind-try and expands/updates tests.
docs/bwrap-support/bubblewrap-backend.md Documents the new baseline filesystem behavior and user-facing implications.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 5

Comment on lines +41 to +43
/// Fallback for non-Windows/Linux hosts. See `REQUIRED_BINARIES` above.
#[cfg(not(any(target_os = "windows", target_os = "linux")))]
pub const NANVIXD_BINARY: &str = "";
Comment on lines +134 to +136
for path in BASELINE_RO_BIND_PATHS {
args.extend(["--ro-bind-try".into(), (*path).into(), (*path).into()]);
}
Comment on lines +274 to +280
// ro — baseline paths are emitted via --ro-bind-try, so a bare
// --ro-bind must correspond to the user's readonlyPaths entry.
let ro_pos = args
.windows(3)
.position(|w| w[0] == "--ro-bind" && w[1] == "/data" && w[2] == "/data")
.expect("readonly policy path /data should produce a --ro-bind mount");
assert!(ro_pos > 0);
Comment on lines +533 to +535
let usr_local = args.iter().any(|a| a == "/usr/local");
assert!(!usr_local, "baseline must not expose /usr/local by default");
}
Comment on lines +67 to +68
// /usr subpaths — mirrors seatbelt's baseline exactly, intentionally
// excluding /usr/local.
@bbonaby
Copy link
Copy Markdown
Collaborator

bbonaby commented Jun 3, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +88 to +100
Common consequences of this default:

- `$HOME` (e.g. `~/.aws/credentials`, `~/.ssh/id_*`, browser cookies) is
not readable from the sandbox.
- `/opt` and `/usr/local` tooling is not on PATH; list either path under
`readonlyPaths` if the script depends on it.
- `working_directory` must live under the baseline or a policy path — a
`cwd` of `~/project` without a matching `readonlyPaths` entry will fail.
- DNS works on systemd-resolved, NetworkManager, and resolvconf hosts
because the corresponding `/run/...` directories are bound. Hosts where
`/etc/resolv.conf` symlinks somewhere else need that target listed in
`readonlyPaths`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should we return readable errors in these cases if we know beforehand that these things may not work? Also it'd probably be worth it to create a GitHub issue if you think these can be resolved at some point in the future

`/etc/resolv.conf` symlinks somewhere else need that target listed in
`readonlyPaths`.

Files in `/etc` that contain secrets (`/etc/shadow`, `/etc/sudoers`,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: same question as above

/// (`/etc/shadow`, `/etc/sudoers`, `/etc/ssh/ssh_host_*_key`) are mode
/// `0400` / `0640` root and remain unreadable to a non-root caller —
/// user-namespace UID mapping does not bypass kernel DAC.
const BASELINE_RO_BIND_PATHS: &[&str] = &[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is the addition of this a breaking change for what we have now? E.g These were suppose to be default deny but they were not prior to this change. We'd want to know so we can call this out via SDK versioning.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I may have read this wrong initially. It looks like you want to allow these read only paths and disallow everything else unless specified right? If so, then I think we're good here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants