fix(net): surface gvproxy bind errors through FFI by G4614 · Pull Request #612 · boxlite-ai/boxlite

G4614 · 2026-05-28T07:54:41Z

Surface gvproxy virtualnetwork.New bind failures (port-in-use, privileged-port, …) through the Go initErr channel → FFI errOut → BoxliteError::Network instead of swallowing them into an opaque box failure

Test plan

Two regression tests, each verified two-sided (fix reverted vs applied):

gvproxy_port_conflict_fails_fast_with_named_error — host pre-binds the port, then boxlite run -p collides (EADDRINUSE).
gvproxy_privileged_port_fails_fast_with_named_error — non-root boxlite run -p 80:80 (EACCES).

observed	pre-fix (main.go/ffi.rs reverted)	post-fix
box exit code	`rc=0` — boots despite the bind failure	`rc≠0` — fails fast
stderr	no `gvproxy_create failed`, no OS reason	`gvproxy_create failed: … address already in use` / `… permission denied`
test result	both FAIL (all 3 checks)	both PASS

Pre-fix the bind error was swallowed in a goroutine logrus line and the box booted with broken networking; post-fix it interrupts startup with the named OS reason.

G4614 · 2026-05-28T08:48:08Z

Where the error is produced — gvproxy-bridge/main.go:440, inside the network goroutine:

go func() {
    vn, err := virtualnetwork.New(tapConfig)   // :440  the bind happens here (EADDRINUSE, EACCES, …)
    if err != nil {
        initErr <- err                          // :443  hand the error out of the goroutine
        return
    }
    initErr <- nil                              // :446  success
    // ...
}()

Where it's captured — main.go:534, gvproxy_create blocks until that goroutine reports back, and a non-nil result becomes a hard stop:

if err := <-initErr; err != nil {   // :534  wait for the bind result
    setErr(err)                     //        copy the reason into *errOut
    cancel(); delete(instances, id); listener.Close(); os.Remove(socketPath)
    return -1                       // :547  ← turn the failure into -1
}

Pre-fix there was no initErr: the goroutine only logrus.Error'd the bind failure and returned, and gvproxy_create fell straight through to return C.longlong(id) without waiting — so the error never reached the return value and the box booted with broken networking (the test's rc=0).

The Rust side (net/gvproxy/ffi.rs, if id < 0 => return Err(...)) already aborted box startup on -1; it simply never received one until this fix produced it.

DorianZheng · 2026-05-28T10:22:56Z

 		os.Remove(socketPath)
 	}()

+	// Wait for virtualnetwork.New to complete before returning a valid id.


will this hurt performances in normal cases?

gvproxy_create_latency_test integrated for normal case (virtual network.New wait < 0.5ms threshold), usually 0.1~0.3ms on Amazon c8i.xlarge ubuntu 22.04

A/B test (latency increase because of this waiting) also in this range

thx for the instruction

`virtualnetwork.New` failures (e.g. host port EADDRINUSE when another process holds the port a user passed to `-p HOST:GUEST`) used to die inside a goroutine at gvproxy-bridge/main.go:417, while the surrounding `gvproxy_create` had already returned a valid id. The Rust runtime shipped the broken socket downstream, the guest booted, and the failure surfaced ~20s later as "DNS lookup … i/o timeout" from inside the guest — multiple layers from the root cause. Add an `initErr` channel so `gvproxy_create` waits for the `virtualnetwork.New` result before returning. On failure it tears down the instance and returns -1; the Rust runtime maps that to `Network("gvproxy_create failed")` so the user sees a clear, named, fail-fast error in well under a second. Regression guard: `src/cli/tests/gvproxy_port_conflict.rs` — non-boxlite `TcpListener` holds an ephemeral host port, then `boxlite run -p PORT:80 alpine:latest true` must exit non-zero with stderr containing "gvproxy_create failed".

The previous commit surfaced bind failures as `gvproxy_create failed`, but that string alone doesn't tell the user *why* — they still have to read box debug files to find "address already in use". Extend `gvproxy_create` with an `errOut **C.char` out-parameter; on each -1 return path the Go side writes the underlying `err.Error()` as a heap-allocated C string. The Rust caller reads, frees, and folds the detail into `BoxliteError::Network`, so `boxlite run` stderr now reads: Error: network error: gvproxy_create failed: cannot add network services: listen tcp 0.0.0.0:27499: bind: address already in use instead of an opaque "gvproxy_create failed". Regression guard: `gvproxy_port_conflict_fails_fast_with_named_error` now also asserts stderr contains "address already in use" — locks down the FFI plumbing against future regressions that would re-collapse the message.

The 3 assertions guard distinct regression classes — boxlite's exit code, the named gvproxy error, and the underlying OS-level bind detail. With `assert!` short-circuiting, a regression that breaks all three only shows the first failure on the test panic, forcing iterative debugging. Collect each check into a Vec and emit a single combined panic at the end so a maintainer sees every level that regressed in one test run. Failure mode on plain main (verified): "3 of 3 checks failed: - L1 [boxlite rc != 0]: ... - L2 [stderr names gvproxy]: ... - L3 [stderr carries OS detail]: ..."

Companion to gvproxy_port_conflict_fails_fast_with_named_error, exercising a different `virtualnetwork.New` bind failure mode: EACCES when a non-root caller maps a privileged host port (e.g. -p 80:80), instead of EADDRINUSE when the port is busy. Same FFI/error plumbing, different kernel error string. Confirms the fix surfaces whatever the kernel returned at bind time rather than hard-coding a port-conflict shortcut. Verified on main worktree: 3 of 3 soft-assertions fire (boxlite rc=0, stderr contains neither "gvproxy_create failed" nor "permission denied" — just qcow2 warns and "Auto-stopping non-detached box"). On the fix branch, all three pass. Skipped when the runner has permission to bind <1024 (root / CAP_NET_BIND_SERVICE / lowered ip_unprivileged_port_start) — the test premise doesn't hold there.

The fix makes gvproxy_create block on `<-initErr` until virtualnetwork.New finishes, prompting the review question of normal-case cost. virtualnetwork.New is the upper bound on that wait; this pins its median (over N runs, warmed, robust to one-off GC/scheduler spikes) under a 0.5ms budget — observed ~120µs. Two-side verified: a 600µs injection pushes the median to ~1.2ms and fails. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 mentioned this pull request May 28, 2026

feat(net): auto-remap EXPOSE host port on conflict #614

Open

G4614 force-pushed the fix/gvproxy-init-error-passthrough branch from 47aefa2 to 0872494 Compare May 28, 2026 08:29

G4614 changed the title ~~fix(net): gvproxy bind-error passthrough + EXPOSE auto-remap~~ fix(net): surface gvproxy bind errors through FFI May 28, 2026

G4614 marked this pull request as ready for review May 28, 2026 08:46

DorianZheng reviewed May 28, 2026

View reviewed changes

gamnaansong added 4 commits May 28, 2026 10:31

G4614 marked this pull request as draft May 28, 2026 10:35

G4614 force-pushed the fix/gvproxy-init-error-passthrough branch from 0872494 to 93cdbc3 Compare May 28, 2026 10:36

G4614 marked this pull request as ready for review May 28, 2026 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(net): surface gvproxy bind errors through FFI#612

fix(net): surface gvproxy bind errors through FFI#612
G4614 wants to merge 5 commits into
boxlite-ai:mainfrom
G4614:fix/gvproxy-init-error-passthrough

G4614 commented May 28, 2026 •

edited

Loading

Uh oh!

G4614 commented May 28, 2026 •

edited

Loading

Uh oh!

DorianZheng May 28, 2026

Uh oh!

G4614 May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

G4614 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

G4614 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DorianZheng May 28, 2026

Choose a reason for hiding this comment

Uh oh!

G4614 May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

G4614 commented May 28, 2026 •

edited

Loading

G4614 commented May 28, 2026 •

edited

Loading

G4614 May 28, 2026 •

edited

Loading