Skip to content

Commit 0f73d11

Browse files
authored
fix(podman): avoid host-gateway on macOS machines (#1637)
Closes #1307 Default the Podman host gateway alias override to gvproxy's host-loopback IP on macOS while preserving host-gateway resolution on Linux. Wire the setting through Podman config, gateway TOML inheritance, and the standalone driver, and document the platform behavior. Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
1 parent f6d0fd1 commit 0f73d11

11 files changed

Lines changed: 175 additions & 17 deletions

File tree

architecture/gateway.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,10 @@ table.
367367
- Docker-backed local gateways use Docker's `host-gateway` callback alias on
368368
macOS and Docker Desktop-style runtimes. Native Linux Docker may expose an
369369
additional bridge-gateway listener because the host can bind that bridge IP.
370+
- Podman-backed macOS gateways use gvproxy's host-loopback IP for sandbox host
371+
aliases by default so stale Podman machine images do not need Podman's
372+
`host-gateway` resolver. Linux Podman keeps the resolver unless
373+
`host_gateway_ip` is configured.
370374
- Gateway restarts recover persisted objects from storage, but live relay
371375
streams must be re-established by supervisors.
372376
- User-facing behavior changes must update published docs in `docs/`; this file

crates/openshell-driver-podman/NETWORKING.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -223,15 +223,22 @@ The container spec configures:
223223
- `networks` to attach to the configured bridge, `openshell` by default.
224224
- `portmappings` with `host_port: 0`, `container_port: 2222`, and `protocol:
225225
"tcp"` to publish the SSH compatibility port on an ephemeral host port.
226-
- `hostadd` entries for `host.containers.internal:host-gateway` and
227-
`host.openshell.internal:host-gateway`.
226+
- `hostadd` entries for `host.containers.internal` and
227+
`host.openshell.internal`, using Podman's `host-gateway` resolver or the
228+
configured `host_gateway_ip`.
228229

229230
Pasta is not explicitly configured by the driver. The driver requests bridge
230231
mode and logs the network backend that Podman reports at startup.
231232

232233
The `host.containers.internal` hostname is injected into `/etc/hosts` so the
233-
supervisor can reach the gateway on the host. If `OPENSHELL_GRPC_ENDPOINT` is
234-
empty, the driver auto-detects:
234+
supervisor can reach the gateway on the host. Linux defaults to
235+
`host-gateway`; macOS Podman machine defaults to `192.168.127.254`, gvproxy's
236+
host-loopback IP, because older Podman machine images can fail to resolve
237+
`host-gateway`. Override this with `host_gateway_ip` or
238+
`OPENSHELL_PODMAN_HOST_GATEWAY_IP` when a Podman machine uses a non-standard
239+
host-loopback address.
240+
241+
If `OPENSHELL_GRPC_ENDPOINT` is empty, the driver auto-detects:
235242

236243
```rust
237244
if config.grpc_endpoint.is_empty() {

crates/openshell-driver-podman/README.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,9 +167,11 @@ Key points:
167167
- Port publishing: the container spec still requests `host_port: 0` for the
168168
configured SSH port. The gateway SSH tunnel uses the supervisor relay rather
169169
than connecting directly to the published port.
170-
- Host gateway: `host.containers.internal:host-gateway` and
171-
`host.openshell.internal:host-gateway` in `/etc/hosts` allow containers to
172-
reach services on the gateway host.
170+
- Host gateway: `host.containers.internal` and `host.openshell.internal` are
171+
injected into `/etc/hosts` so containers can reach services on the gateway
172+
host. Linux defaults to Podman's `host-gateway` resolver. macOS Podman
173+
machine defaults to gvproxy's host-loopback IP, `192.168.127.254`, because
174+
stale Podman machines may fail to resolve `host-gateway`.
173175
- nsenter: the supervisor uses `nsenter --net=` instead of `ip netns exec` for
174176
namespace operations, avoiding the sysfs remount path that fails in rootless
175177
containers.
@@ -291,6 +293,7 @@ Podman resources after out-of-band container removal or label drift.
291293
| `OPENSHELL_GRPC_ENDPOINT` | `--grpc-endpoint` | Auto-detected via `host.containers.internal` | Gateway gRPC endpoint for sandbox callbacks. |
292294
| `OPENSHELL_GATEWAY_PORT` | `--gateway-port` | `17670` | Gateway port used for endpoint auto-detection by the standalone binary. |
293295
| `OPENSHELL_NETWORK_NAME` | `--network-name` | `openshell` | Podman bridge network name. |
296+
| `OPENSHELL_PODMAN_HOST_GATEWAY_IP` | `--host-gateway-ip` | empty on Linux, `192.168.127.254` on macOS | Host gateway IP used for sandbox host aliases. Empty uses Podman's `host-gateway` resolver. |
294297
| `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` | `--sandbox-ssh-socket-path` | `/run/openshell/ssh.sock` | Supervisor Unix socket path in `PodmanComputeConfig`. |
295298
| `OPENSHELL_STOP_TIMEOUT` | `--stop-timeout` | `10` | Container stop timeout in seconds. |
296299
| `OPENSHELL_SANDBOX_PIDS_LIMIT` | `--sandbox-pids-limit` | `2048` | Podman cgroup PID limit for sandbox containers. Set `0` to inherit Podman's runtime/default PID limit. |
@@ -304,10 +307,11 @@ Podman resources after out-of-band container removal or label drift.
304307
The Podman driver is designed for rootless operation. The following adaptations
305308
matter compared to cluster or rootful runtimes:
306309

307-
1. subuid/subgid preflight check: `check_subuid_range()` in `driver.rs` warns
308-
operators if `/etc/subuid` or `/etc/subgid` entries are missing for the
309-
current user. This is not a hard error because some systems use LDAP or
310-
other mechanisms.
310+
1. subuid/subgid preflight check: on non-macOS hosts, `check_subuid_range()` in
311+
`driver.rs` warns operators if `/etc/subuid` or `/etc/subgid` entries are
312+
missing for the current user. This is not a hard error because some systems
313+
use LDAP or other mechanisms. macOS skips the check because `podman machine`
314+
runs the Podman service inside a Linux VM.
311315
2. cgroups v2 requirement: the driver refuses to start if cgroups v1 is
312316
detected. Rootless Podman requires the unified cgroup hierarchy.
313317
3. `nsenter` for namespace operations: `openshell-sandbox` uses

crates/openshell-driver-podman/src/config.rs

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22
// SPDX-License-Identifier: Apache-2.0
33

44
use openshell_core::config::{DEFAULT_STOP_TIMEOUT_SECS, DEFAULT_SUPERVISOR_IMAGE};
5+
use std::net::IpAddr;
56
use std::path::PathBuf;
67
use std::str::FromStr;
78

89
/// Default Podman bridge network name.
910
pub const DEFAULT_NETWORK_NAME: &str = "openshell";
1011
pub const DEFAULT_SANDBOX_PIDS_LIMIT: i64 = 2048;
12+
pub const MACOS_PODMAN_MACHINE_HOST_GATEWAY_IP: &str = "192.168.127.254";
1113

1214
/// Image pull policy for sandbox and supervisor images.
1315
///
@@ -90,6 +92,13 @@ pub struct PodmanComputeConfig {
9092
/// Name of the Podman bridge network.
9193
/// Created automatically if it does not exist.
9294
pub network_name: String,
95+
/// Host gateway IP used for sandbox host aliases.
96+
///
97+
/// Empty uses Podman's `host-gateway` resolver. macOS defaults to
98+
/// gvproxy's host-loopback IP because stale Podman machines may fail to
99+
/// resolve `host-gateway` while still serving `host.containers.internal`
100+
/// through gvproxy.
101+
pub host_gateway_ip: String,
93102
/// Container stop timeout in seconds (SIGTERM → SIGKILL).
94103
pub stop_timeout_secs: u32,
95104
/// OCI image containing the openshell-sandbox supervisor binary.
@@ -164,6 +173,33 @@ impl PodmanComputeConfig {
164173
Ok(())
165174
}
166175

176+
/// Validate optional host gateway override.
177+
pub fn validate_host_gateway_ip(&self) -> Result<(), crate::client::PodmanApiError> {
178+
let trimmed = self.host_gateway_ip.trim();
179+
if trimmed.is_empty() {
180+
return Ok(());
181+
}
182+
183+
trimmed.parse::<IpAddr>().map(|_| ()).map_err(|err| {
184+
crate::client::PodmanApiError::InvalidInput(format!(
185+
"invalid host_gateway_ip value '{trimmed}': {err}"
186+
))
187+
})
188+
}
189+
190+
/// Resolve the default host gateway override for the current platform.
191+
#[must_use]
192+
pub fn default_host_gateway_ip() -> String {
193+
#[cfg(target_os = "macos")]
194+
{
195+
MACOS_PODMAN_MACHINE_HOST_GATEWAY_IP.to_string()
196+
}
197+
#[cfg(not(target_os = "macos"))]
198+
{
199+
String::new()
200+
}
201+
}
202+
167203
/// Resolve the default socket path from the environment.
168204
///
169205
/// - **macOS**: `$HOME/.local/share/containers/podman/machine/podman.sock`
@@ -201,6 +237,7 @@ impl Default for PodmanComputeConfig {
201237
gateway_port: openshell_core::config::DEFAULT_SERVER_PORT,
202238
sandbox_ssh_socket_path: "/run/openshell/ssh.sock".to_string(),
203239
network_name: DEFAULT_NETWORK_NAME.to_string(),
240+
host_gateway_ip: Self::default_host_gateway_ip(),
204241
stop_timeout_secs: DEFAULT_STOP_TIMEOUT_SECS,
205242
supervisor_image: DEFAULT_SUPERVISOR_IMAGE.to_string(),
206243
guest_tls_ca: None,
@@ -221,6 +258,7 @@ impl std::fmt::Debug for PodmanComputeConfig {
221258
.field("gateway_port", &self.gateway_port)
222259
.field("sandbox_ssh_socket_path", &self.sandbox_ssh_socket_path)
223260
.field("network_name", &self.network_name)
261+
.field("host_gateway_ip", &self.host_gateway_ip)
224262
.field("stop_timeout_secs", &self.stop_timeout_secs)
225263
.field("supervisor_image", &self.supervisor_image)
226264
.field("guest_tls_ca", &self.guest_tls_ca)
@@ -275,6 +313,32 @@ mod tests {
275313
assert!(cfg.validate_runtime_limits().is_ok());
276314
}
277315

316+
#[test]
317+
#[cfg(target_os = "macos")]
318+
fn default_config_uses_gvproxy_host_gateway_ip_on_macos() {
319+
let cfg = PodmanComputeConfig::default();
320+
assert_eq!(cfg.host_gateway_ip, MACOS_PODMAN_MACHINE_HOST_GATEWAY_IP);
321+
assert!(cfg.validate_host_gateway_ip().is_ok());
322+
}
323+
324+
#[test]
325+
#[cfg(not(target_os = "macos"))]
326+
fn default_config_leaves_host_gateway_ip_empty_off_macos() {
327+
let cfg = PodmanComputeConfig::default();
328+
assert!(cfg.host_gateway_ip.is_empty());
329+
assert!(cfg.validate_host_gateway_ip().is_ok());
330+
}
331+
332+
#[test]
333+
fn host_gateway_ip_validation_rejects_invalid_values() {
334+
let cfg = PodmanComputeConfig {
335+
host_gateway_ip: "not-an-ip".to_string(),
336+
..PodmanComputeConfig::default()
337+
};
338+
let err = cfg.validate_host_gateway_ip().unwrap_err();
339+
assert!(err.to_string().contains("host_gateway_ip"));
340+
}
341+
278342
#[test]
279343
fn runtime_limit_validation_rejects_negative_pids_limit() {
280344
let cfg = PodmanComputeConfig {

crates/openshell-driver-podman/src/container.rs

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -537,10 +537,7 @@ pub fn build_container_spec_with_token(
537537
// Inject stable host aliases into /etc/hosts so sandbox containers can
538538
// reach services on the host. `host.openshell.internal` is the driver-
539539
// neutral alias used by policies and e2e tests.
540-
hostadd: vec![
541-
"host.containers.internal:host-gateway".into(),
542-
"host.openshell.internal:host-gateway".into(),
543-
],
540+
hostadd: hostadd_entries(config),
544541
netns: NetNS {
545542
nsmode: "bridge".to_string(),
546543
},
@@ -622,6 +619,21 @@ pub fn build_container_spec_with_token(
622619
serde_json::to_value(container_spec).expect("ContainerSpec serialization cannot fail")
623620
}
624621

622+
fn hostadd_entries(config: &PodmanComputeConfig) -> Vec<String> {
623+
let host_gateway_ip = config.host_gateway_ip.trim();
624+
if host_gateway_ip.is_empty() {
625+
return vec![
626+
"host.containers.internal:host-gateway".into(),
627+
"host.openshell.internal:host-gateway".into(),
628+
];
629+
}
630+
631+
vec![
632+
format!("host.containers.internal:{host_gateway_ip}"),
633+
format!("host.openshell.internal:{host_gateway_ip}"),
634+
]
635+
}
636+
625637
/// Parse a Kubernetes-style CPU quantity to cgroup quota microseconds
626638
/// (for a 100ms period).
627639
///
@@ -1071,6 +1083,7 @@ mod tests {
10711083
socket_path: std::path::PathBuf::from("/tmp/test.sock"),
10721084
default_image: "test-image:latest".to_string(),
10731085
grpc_endpoint: "http://localhost:50051".to_string(),
1086+
host_gateway_ip: String::new(),
10741087
sandbox_ssh_socket_path: "/run/openshell/test-ssh.sock".to_string(),
10751088
..PodmanComputeConfig::default()
10761089
}
@@ -1109,6 +1122,34 @@ mod tests {
11091122
);
11101123
}
11111124

1125+
#[test]
1126+
fn container_spec_uses_configured_host_gateway_ip() {
1127+
let sandbox = test_sandbox("test-id", "test-name");
1128+
let mut config = test_config();
1129+
config.host_gateway_ip = "192.168.127.254".to_string();
1130+
let spec = build_container_spec(&sandbox, &config);
1131+
1132+
let hostadd: Vec<&str> = spec["hostadd"]
1133+
.as_array()
1134+
.expect("hostadd should be an array")
1135+
.iter()
1136+
.filter_map(|v| v.as_str())
1137+
.collect();
1138+
1139+
assert!(
1140+
hostadd.contains(&"host.containers.internal:192.168.127.254"),
1141+
"missing Podman host alias with configured host gateway IP"
1142+
);
1143+
assert!(
1144+
hostadd.contains(&"host.openshell.internal:192.168.127.254"),
1145+
"missing OpenShell host alias with configured host gateway IP"
1146+
);
1147+
assert!(
1148+
!hostadd.contains(&"host.containers.internal:host-gateway"),
1149+
"configured host gateway IP should avoid Podman's host-gateway resolver"
1150+
);
1151+
}
1152+
11121153
#[test]
11131154
fn container_spec_includes_tls_mounts_when_configured() {
11141155
let sandbox = test_sandbox("tls-id", "tls-name");

crates/openshell-driver-podman/src/driver.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ impl PodmanComputeDriver {
142142
// get a clear error instead of a silent fallback to plaintext HTTP.
143143
config.validate_tls_config()?;
144144
config.validate_runtime_limits()?;
145+
config.validate_host_gateway_ip()?;
145146

146147
let client = PodmanClient::new(config.socket_path.clone());
147148

@@ -195,7 +196,7 @@ impl PodmanComputeDriver {
195196
// Rootless pre-flight: warn if subuid/subgid ranges look missing.
196197
// Not a hard error because some systems configure these via LDAP or
197198
// other mechanisms that /etc/subuid does not reflect.
198-
if rustix::process::getuid().as_raw() != 0 {
199+
if !cfg!(target_os = "macos") && rustix::process::getuid().as_raw() != 0 {
199200
check_subuid_range();
200201
}
201202

crates/openshell-driver-podman/src/main.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,12 @@ struct Args {
5858
)]
5959
gateway_port: u16,
6060

61+
/// Host gateway IP used for sandbox host aliases.
62+
///
63+
/// Empty uses Podman's `host-gateway` resolver.
64+
#[arg(long, env = "OPENSHELL_PODMAN_HOST_GATEWAY_IP")]
65+
host_gateway_ip: Option<String>,
66+
6167
#[arg(
6268
long,
6369
env = "OPENSHELL_SANDBOX_SSH_SOCKET_PATH",
@@ -118,6 +124,9 @@ async fn main() -> Result<()> {
118124
image_pull_policy: args.sandbox_image_pull_policy,
119125
grpc_endpoint: args.grpc_endpoint.unwrap_or_default(),
120126
gateway_port: args.gateway_port,
127+
host_gateway_ip: args
128+
.host_gateway_ip
129+
.unwrap_or_else(PodmanComputeConfig::default_host_gateway_ip),
121130
sandbox_ssh_socket_path: args.sandbox_ssh_socket_path,
122131
network_name: args.network_name,
123132
stop_timeout_secs: args.stop_timeout,

crates/openshell-server/src/config_file.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,7 @@ fn inheritable_keys(driver: ComputeDriverKind) -> &'static [&'static str] {
275275
ComputeDriverKind::Podman => &[
276276
"default_image",
277277
"supervisor_image",
278+
"host_gateway_ip",
278279
"guest_tls_ca",
279280
"guest_tls_cert",
280281
"guest_tls_key",
@@ -498,6 +499,25 @@ version = 2
498499
);
499500
}
500501

502+
#[test]
503+
fn podman_driver_table_inherits_gateway_host_gateway_ip() {
504+
let gateway = GatewayFileSection {
505+
default_image: Some("ghcr.io/nvidia/openshell/sandbox:0.9".to_string()),
506+
host_gateway_ip: Some("192.168.127.254".to_string()),
507+
..Default::default()
508+
};
509+
let merged = driver_table(ComputeDriverKind::Podman, &gateway, None);
510+
let table = merged.as_table().expect("table");
511+
assert_eq!(
512+
table.get("default_image").and_then(|v| v.as_str()),
513+
Some("ghcr.io/nvidia/openshell/sandbox:0.9")
514+
);
515+
assert_eq!(
516+
table.get("host_gateway_ip").and_then(|v| v.as_str()),
517+
Some("192.168.127.254")
518+
);
519+
}
520+
501521
#[test]
502522
fn driver_table_specific_value_overrides_gateway_default() {
503523
let gateway = GatewayFileSection {

crates/openshell-server/src/lib.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -749,6 +749,9 @@ async fn build_compute_runtime(
749749
if let Ok(p) = std::env::var("OPENSHELL_PODMAN_SOCKET") {
750750
podman.socket_path = PathBuf::from(p);
751751
}
752+
if let Ok(ip) = std::env::var("OPENSHELL_PODMAN_HOST_GATEWAY_IP") {
753+
podman.host_gateway_ip = ip;
754+
}
752755
apply_podman_local_tls_defaults(config, &mut podman)?;
753756

754757
ComputeRuntime::new_podman(

docs/reference/gateway-config.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,9 @@ grpc_endpoint = "https://host.containers.internal:17670"
233233
# The gateway overwrites gateway_port from bind_address at runtime.
234234
gateway_port = 17670
235235
network_name = "openshell"
236+
# Omit for the platform default: empty on Linux, 192.168.127.254 on macOS Podman machine.
237+
# Set "" to force Podman's host-gateway resolver.
238+
# host_gateway_ip = "192.168.127.254"
236239
sandbox_ssh_socket_path = "/run/openshell/ssh.sock"
237240
stop_timeout_secs = 10
238241
supervisor_image = "ghcr.io/nvidia/openshell/supervisor:latest"

0 commit comments

Comments
 (0)