-
Notifications
You must be signed in to change notification settings - Fork 25
fix(bwrap): default to deny-by-default filesystem (mirror seatbelt) #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,7 +54,12 @@ lxc-exec --experimental --config-base64 "$(base64 -w0 bubblewrap_hello.json)" | |
| Bubblewrap creates a namespace-isolated process by: | ||
|
|
||
| 1. Unsharing user, PID, IPC, and UTS namespaces (`--unshare-*`) | ||
| 2. Bind-mounting the host root filesystem read-only as a base | ||
| 2. Bind-mounting a **minimal deny-by-default baseline** read-only into the | ||
| sandbox (`/bin`, `/sbin`, `/lib*`, `/usr/bin`, `/usr/sbin`, `/usr/lib*`, | ||
| `/usr/libexec`, `/usr/share`, `/etc`, plus DNS stub-resolver dirs | ||
| under `/run`). Everything else on the host β including the caller's | ||
| `$HOME`, `/root`, `/opt`, `/var`, `/sys`, and `/run/user/<uid>` β is | ||
| invisible inside the sandbox. | ||
| 3. Layering filesystem policy overrides (read-write, read-only, denied paths) | ||
| 4. Setting up minimal `/dev`, `/proc`, and `/tmp` | ||
| 5. Clearing the environment and applying only requested variables | ||
|
|
@@ -63,6 +68,41 @@ Bubblewrap creates a namespace-isolated process by: | |
| The sandboxed process runs as a child of `bwrap` and dies automatically when | ||
| execution completes β no container lifecycle management required. | ||
|
|
||
| ### Deny-by-default filesystem | ||
|
|
||
| The baseline mirrors the macOS Seatbelt backend's `(deny default)` posture: | ||
| the sandbox can read the dynamic linker, libc, system tools, and system | ||
| configuration β and **nothing else** β until the caller opts in via | ||
| `readonlyPaths` / `readwritePaths`. To make a host directory visible inside | ||
| the sandbox, list it explicitly: | ||
|
|
||
| ```json | ||
| { | ||
| "filesystem": { | ||
| "readonlyPaths": ["/home/alice/project", "/usr/local"], | ||
| "readwritePaths": ["/tmp/workspace"] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Common consequences of this default: | ||
|
|
||
| - `$HOME` (e.g. `~/.aws/credentials`, `~/.ssh/id_*`, browser cookies) is | ||
| not readable from the sandbox. | ||
| - `/opt` and `/usr/local` tooling is not on PATH; list either path under | ||
| `readonlyPaths` if the script depends on it. | ||
| - `working_directory` must live under the baseline or a policy path β a | ||
| `cwd` of `~/project` without a matching `readonlyPaths` entry will fail. | ||
| - DNS works on systemd-resolved, NetworkManager, and resolvconf hosts | ||
| because the corresponding `/run/...` directories are bound. Hosts where | ||
| `/etc/resolv.conf` symlinks somewhere else need that target listed in | ||
| `readonlyPaths`. | ||
|
|
||
| Files in `/etc` that contain secrets (`/etc/shadow`, `/etc/sudoers`, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: same question as above |
||
| `/etc/ssh/ssh_host_*_key`) are mode `0400` / `0640` `root` and remain | ||
| unreadable to a non-root caller β user-namespace UID mapping does not | ||
| bypass kernel DAC. | ||
|
|
||
| ## Configuration | ||
|
|
||
| Bubblewrap uses the shared cross-backend configuration fields. No | ||
|
|
@@ -285,8 +325,12 @@ Test configs are in `tests/configs/bubblewrap_*.json`. | |
|
|
||
| - **Experimental** β requires `--experimental` flag | ||
| - **Linux only** β Bubblewrap requires Linux kernel namespaces | ||
| - **Host filesystem** β the sandbox sees the host's files (read-only by | ||
| default); there is no separate rootfs | ||
| - **Deny-by-default filesystem** β the sandbox sees a minimal allowlist | ||
| of host paths (system binaries, libs, `/etc`, DNS stub-resolver dirs) | ||
| and nothing else. `$HOME`, `/opt`, `/var`, `/sys`, `/run/user/<uid>`, | ||
| and `/usr/local` are invisible unless explicitly listed in | ||
| `readonlyPaths` / `readwritePaths`. There is no separate rootfs β the | ||
| visible paths are bind-mounted from the host. | ||
| - **Network filtering** β per-host `allowedHosts`/`blockedHosts` is best | ||
| done via the cooperative env-var **network proxy** (no privilege | ||
| required, see above). The legacy iptables path | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,69 @@ const PROXY_ENV_KEYS: &[&str] = &[ | |
| "no_proxy", | ||
| ]; | ||
|
|
||
| /// Read-only host paths bind-mounted into every Bubblewrap sandbox as the | ||
| /// deny-by-default baseline. Mirrors the seatbelt backend's | ||
| /// `SYSTEM_READ_ALLOW` (`src/backends/seatbelt/common/src/profile_builder.rs`): | ||
| /// just enough of the host for a shell, the dynamic linker, libc, and | ||
| /// system tools to work. Everything else β including the caller's `$HOME`, | ||
| /// `/root`, `/opt`, `/var`, `/sys`, `/mnt`, `/media`, and the rest of | ||
| /// `/run` β is invisible until the caller opts in via `readonlyPaths` / | ||
| /// `readwritePaths`. | ||
| /// | ||
| /// Notes: | ||
| /// - Missing paths are silently skipped because the runner emits these | ||
| /// via `--ro-bind-try` (e.g. `/lib32` does not exist on x86_64-only | ||
| /// systems; `/run/systemd/resolve` does not exist on hosts without | ||
| /// systemd-resolved). | ||
| /// - On merged-usr distros (modern Debian, Ubuntu, Fedora, Arch) the | ||
| /// top-level `/bin`, `/sbin`, `/lib*` entries are symlinks pointing | ||
| /// under `/usr`. `bwrap` follows the source-side symlink, so the | ||
| /// bind-mount still succeeds and the sandbox sees `/bin/sh` etc. | ||
| /// - We deliberately do NOT bind `/usr` wholesale: that would expose | ||
| /// `/usr/local`, which contains locally-installed (and sometimes | ||
| /// user-managed) software. Callers who need `/usr/local` must list it | ||
| /// explicitly in `readonlyPaths`. | ||
| /// - We deliberately do NOT bind `/run` wholesale: `/run/user/<uid>` | ||
| /// holds the caller's D-Bus session socket, keyring sockets, and | ||
| /// ssh-agent socket. We only bind the well-known DNS stub-resolver | ||
| /// directories so name resolution still works when `/etc/resolv.conf` | ||
| /// is a symlink (the default on systemd-resolved hosts). | ||
| /// - `/etc` is bound whole because cherry-picking files (`passwd`, | ||
| /// `nsswitch.conf`, `ssl/`, `ld.so.conf*`, β¦) is fragile and breaks | ||
| /// tools that read other config files. Files with sensitive contents | ||
| /// (`/etc/shadow`, `/etc/sudoers`, `/etc/ssh/ssh_host_*_key`) are mode | ||
| /// `0400` / `0640` root and remain unreadable to a non-root caller β | ||
| /// user-namespace UID mapping does not bypass kernel DAC. | ||
| const BASELINE_RO_BIND_PATHS: &[&str] = &[ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: Is the addition of this a breaking change for what we have now? E.g These were suppose to be default deny but they were not prior to this change. We'd want to know so we can call this out via SDK versioning.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh wait, I may have read this wrong initially. It looks like you want to allow these read only paths and disallow everything else unless specified right? If so, then I think we're good here. |
||
| // Top-level executable / library dirs (symlinks under /usr on | ||
| // merged-usr distros, real directories on Alpine and older Debian). | ||
| "/bin", | ||
| "/sbin", | ||
| "/lib", | ||
| "/lib32", | ||
| "/lib64", | ||
| "/libx32", | ||
| // /usr subpaths β mirrors seatbelt's baseline exactly, intentionally | ||
| // excluding /usr/local. | ||
|
Comment on lines
+67
to
+68
|
||
| "/usr/bin", | ||
| "/usr/sbin", | ||
| "/usr/lib", | ||
| "/usr/lib32", | ||
| "/usr/lib64", | ||
| "/usr/libexec", | ||
| "/usr/share", | ||
| // System configuration (ld.so config, certs, resolv.conf, hosts, | ||
| // passwd, group, machine-id, β¦). See module-level note on DAC. | ||
| "/etc", | ||
| // DNS stub-resolver directories. /etc/resolv.conf is usually a | ||
| // symlink into one of these on modern Linux distros (systemd-resolved | ||
| // / NetworkManager / resolvconf). We bind the narrow subdirectories | ||
| // rather than all of /run to avoid exposing /run/user/<uid>. | ||
| "/run/systemd/resolve", | ||
| "/run/NetworkManager", | ||
| "/run/resolvconf", | ||
| ]; | ||
|
|
||
| /// Build the complete argument list for `bwrap` from the given request. | ||
| /// | ||
| /// The returned vector does **not** include the `bwrap` binary name itself β | ||
|
|
@@ -62,13 +125,15 @@ pub fn build_args(request: &ExecutionRequest, proxy_address: Option<&ProxyAddres | |
| args.push("--unshare-net".into()); | ||
| } | ||
|
|
||
| // -- Base filesystem --------------------------------------------------- | ||
| // -- Base filesystem (deny-by-default; see `BASELINE_RO_BIND_PATHS`) --- | ||
| // bwrap applies mounts in order; later mounts at the same path shadow | ||
| // earlier ones. We therefore lay down the base + standard virtual | ||
| // filesystems first, then apply user-supplied policy mounts last so they | ||
| // always win, including when policy paths overlap a standard mount such | ||
| // as `/tmp` (e.g. `readwritePaths: ["/tmp/workspace"]`). | ||
| args.extend(["--ro-bind".into(), "/".into(), "/".into()]); | ||
| // earlier ones. We therefore lay the baseline + standard virtual | ||
| // filesystems down first, then apply user-supplied policy mounts last | ||
| // so they always win when paths overlap (e.g. `readwritePaths: | ||
| // ["/tmp/workspace"]` must beat the standard `--tmpfs /tmp`). | ||
| for path in BASELINE_RO_BIND_PATHS { | ||
| args.extend(["--ro-bind-try".into(), (*path).into(), (*path).into()]); | ||
| } | ||
|
Comment on lines
+134
to
+136
|
||
|
|
||
| // Standard virtual filesystems (applied before policy mounts so policy | ||
| // paths under /dev, /proc, or /tmp survive). | ||
|
|
@@ -206,17 +271,13 @@ mod tests { | |
| assert_eq!(args[rw_pos + 1], "/workspace"); | ||
| assert_eq!(args[rw_pos + 2], "/workspace"); | ||
|
|
||
| // ro | ||
| let ro_positions: Vec<_> = args | ||
| .iter() | ||
| .enumerate() | ||
| .filter(|(_, a)| *a == "--ro-bind") | ||
| .map(|(i, _)| i) | ||
| .collect(); | ||
| // First --ro-bind is the base "/" mount, second is "/data" | ||
| assert!(ro_positions.len() >= 2); | ||
| let data_pos = *ro_positions.last().unwrap(); | ||
| assert_eq!(args[data_pos + 1], "/data"); | ||
| // ro β baseline paths are emitted via --ro-bind-try, so a bare | ||
| // --ro-bind must correspond to the user's readonlyPaths entry. | ||
| let ro_pos = args | ||
| .windows(3) | ||
| .position(|w| w[0] == "--ro-bind" && w[1] == "/data" && w[2] == "/data") | ||
| .expect("readonly policy path /data should produce a --ro-bind mount"); | ||
| assert!(ro_pos > 0); | ||
|
Comment on lines
+274
to
+280
|
||
|
|
||
| // denied | ||
| let tmpfs_positions: Vec<_> = args | ||
|
|
@@ -401,4 +462,156 @@ mod tests { | |
| let pos = args.iter().position(|a| a == "HTTP_PROXY").unwrap(); | ||
| assert_eq!(args[pos + 1], "http://caller.example:8080"); | ||
| } | ||
|
|
||
| // ------- Deny-by-default baseline filesystem tests ------------------ | ||
|
|
||
| /// Regression test for the original `--ro-bind / /` baseline. The | ||
| /// builder must NOT bind-mount the entire host root, because that | ||
| /// exposed `$HOME` and other confidential dirs by default. Mirrors | ||
| /// the seatbelt backend's `(deny default)` posture. | ||
| #[test] | ||
| fn baseline_does_not_bind_mount_host_root() { | ||
| let args = build_args(&base_request(), None); | ||
| let root_bind = args | ||
| .windows(3) | ||
| .any(|w| (w[0] == "--ro-bind" || w[0] == "--bind") && w[1] == "/" && w[2] == "/"); | ||
| assert!( | ||
| !root_bind, | ||
| "baseline must not bind-mount host / into the sandbox; got: {:?}", | ||
| args | ||
| ); | ||
| } | ||
|
|
||
| /// The minimum baseline allowlist required for a shell + dynamic | ||
| /// linker + libc to function inside the sandbox. Emitted via | ||
| /// `--ro-bind-try` so missing paths are silently skipped on distros | ||
| /// where they don't exist (e.g. `/lib32` on x86_64-only systems). | ||
| #[test] | ||
| fn baseline_emits_required_ro_bind_try_paths() { | ||
| let args = build_args(&base_request(), None); | ||
| let required = [ | ||
| "/bin", | ||
| "/sbin", | ||
| "/lib", | ||
| "/lib64", | ||
| "/usr/bin", | ||
| "/usr/lib", | ||
| "/usr/share", | ||
| "/etc", | ||
| ]; | ||
| for path in required { | ||
| let found = args | ||
| .windows(3) | ||
| .any(|w| w[0] == "--ro-bind-try" && w[1] == path && w[2] == path); | ||
| assert!( | ||
| found, | ||
| "baseline must emit `--ro-bind-try {} {}` so sandboxed processes \ | ||
| can find sh / libc / system config", | ||
| path, path | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| /// The baseline must NOT include `/usr` wholesale because that would | ||
| /// expose `/usr/local` (locally-installed software, sometimes | ||
| /// user-managed). Seatbelt's `SYSTEM_READ_ALLOW` does not include | ||
| /// `/usr/local` either β match that posture. | ||
| #[test] | ||
| fn baseline_does_not_expose_usr_local() { | ||
| let args = build_args(&base_request(), None); | ||
| // No `--ro-bind /usr /usr` and no `--ro-bind-try /usr /usr`. | ||
| let usr_whole = args | ||
| .windows(3) | ||
| .any(|w| matches!(w[0].as_str(), "--ro-bind" | "--ro-bind-try") && w[1] == "/usr"); | ||
| assert!( | ||
| !usr_whole, | ||
| "baseline must bind /usr subpaths individually so /usr/local is \ | ||
| not implicitly exposed; got: {:?}", | ||
| args | ||
| ); | ||
| // And no explicit /usr/local entry either. | ||
| let usr_local = args.iter().any(|a| a == "/usr/local"); | ||
| assert!(!usr_local, "baseline must not expose /usr/local by default"); | ||
| } | ||
|
Comment on lines
+533
to
+535
|
||
|
|
||
| /// The baseline must keep confidential host locations out of the | ||
| /// sandbox. Callers who legitimately need any of these can opt in | ||
| /// via `readonlyPaths`. | ||
| #[test] | ||
| fn baseline_excludes_confidential_paths() { | ||
| let args = build_args(&base_request(), None); | ||
| for forbidden in [ | ||
| "/home", | ||
| "/root", | ||
| "/opt", | ||
| "/srv", | ||
| "/var", | ||
| "/sys", | ||
| "/run/user", | ||
| "/run/dbus", | ||
| ] { | ||
| let exposed = args.windows(2).any(|w| { | ||
| matches!(w[0].as_str(), "--bind" | "--ro-bind" | "--ro-bind-try") | ||
| && w[1] == forbidden | ||
| }); | ||
| assert!( | ||
| !exposed, | ||
| "baseline must not bind-mount {} β that would re-expose \ | ||
| confidential host state", | ||
| forbidden | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| /// DNS stub-resolver dirs must be in the baseline so `/etc/resolv.conf` | ||
| /// symlinks resolve when the caller has network access. Emitted via | ||
| /// `--ro-bind-try` so hosts without systemd-resolved / NetworkManager / | ||
| /// resolvconf still build a valid argument vector. | ||
| #[test] | ||
| fn baseline_includes_dns_stub_resolver_dirs() { | ||
| let args = build_args(&base_request(), None); | ||
| for path in [ | ||
| "/run/systemd/resolve", | ||
| "/run/NetworkManager", | ||
| "/run/resolvconf", | ||
| ] { | ||
| let found = args | ||
| .windows(3) | ||
| .any(|w| w[0] == "--ro-bind-try" && w[1] == path && w[2] == path); | ||
| assert!( | ||
| found, | ||
| "baseline must emit `--ro-bind-try {} {}` so DNS works when \ | ||
| /etc/resolv.conf is a symlink", | ||
| path, path | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| /// Baseline mounts must come before policy mounts so the user's | ||
| /// `readwritePaths` / `readonlyPaths` / `deniedPaths` always win on | ||
| /// conflict (same shadowing rule as the existing `/tmp` regression | ||
| /// test, applied here to the baseline). | ||
| #[test] | ||
| fn baseline_mounts_precede_policy_mounts() { | ||
| let mut r = base_request(); | ||
| r.policy.readwrite_paths = vec!["/etc/policy-writable".into()]; | ||
| let args = build_args(&r, None); | ||
|
|
||
| let baseline_etc = args | ||
| .windows(3) | ||
| .position(|w| w[0] == "--ro-bind-try" && w[1] == "/etc" && w[2] == "/etc") | ||
| .expect("baseline /etc bind missing"); | ||
| let policy_bind = args | ||
| .windows(3) | ||
| .position(|w| w[0] == "--bind" && w[1] == "/etc/policy-writable") | ||
| .expect("policy bind missing"); | ||
|
|
||
| assert!( | ||
| policy_bind > baseline_etc, | ||
| "policy mount at /etc/policy-writable (pos {}) must come after \ | ||
| baseline /etc bind (pos {}) so the policy mount wins", | ||
| policy_bind, | ||
| baseline_etc | ||
| ); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Should we return readable errors in these cases if we know beforehand that these things may not work? Also it'd probably be worth it to create a GitHub issue if you think these can be resolved at some point in the future