Harden networking: auth, memory-safety, DoS, and thread-safety fixes by igorls · Pull Request #101 · igorls/meshguard

igorls · 2026-06-06T12:38:58Z

Hardening pass over the networking stack (src/net, src/nat, src/discovery, src/protocol, src/wireguard) addressing confirmed Critical/High issues plus two low-risk Mediums from an adversarial review. Every fix has a regression test. Green on zig build test, Debug, ReleaseFast (shipping config), ReleaseSafe, and zig fmt.

What's hardened

Authentication & spoofing

Org control messages (revoke / alias / vouch) now require a verified Ed25519 signature and a trusted org before they take effect.
RX cryptokey routing: a decrypted packet's inner source IP must belong to the sending peer (IPv4 + IPv6) before it reaches the TUN.
Gossiped dead/leave/suspect about a third party now only marks it suspected locally and must be confirmed by our own failure detector before eviction; gossip can no longer clear local suspicion or resurrect a peer.
DNS bootstrap: randomized per-query transaction ID, connected socket (source filtering), and transaction-ID/question validation.

Memory safety (the shipped ReleaseFast build runs with safety checks off)

Bound the base64 token decode against its output buffer before decoding (fixes a stack OOB write on an over-long mg:// token).
Return expired-peer keys via a caller-owned buffer instead of a slice into a reclaimed stack frame.

Resource exhaustion / DoS

Cap the membership table and reclaim non-alive entries so unauthenticated gossip can't grow it without bound.
Rate-limit inbound WireGuard handshake initiations (per-source + global token bucket) before the expensive X25519.

Thread safety

RwLock on the WgDevice peer table (exclusive for mutation/handshake, shared for the data plane) so peer removal can't zero key material mid-encrypt.
RwLock on the membership table — single writer takes it exclusively, external readers (data plane, FFI) share it.

Service policy

The service filter now classifies IPv6 and fails closed on anything unclassifiable under a default-deny policy (previously IPv6 bypassed filtering).

Tests

9 new regression tests covering each fix (membership cap/reclaim, expired-peer buffer, org-signature verification, gossip non-eviction, base64 bound, DNS response validation, IPv6 filter fail-closed, handshake rate limiter, RX source-IP enforcement).

Notes for review

The thread-safety locks are covered by reasoning + unit tests; a multi-node / TSan soak under adversarial churn is recommended before relying on them in production.
A full WireGuard cookie/MAC2 reply (to fully mitigate source-spoofed handshake floods) is a sensible follow-up; this PR adds the token-bucket bound.

🤖 Generated with Claude Code

Addresses the confirmed Critical/High issues (plus two low-risk Mediums) from an adversarial review of the networking stack. Each fix ships with a regression test; green on `zig build test`, Debug, ReleaseFast, ReleaseSafe. Authentication / spoofing - Verify the Ed25519 signature on org control messages (revoke/alias/vouch) and require a trusted org before acting on them (swim.zig, codec.zig). - Enforce RX cryptokey routing: a decrypted packet's inner source IP must belong to the sending peer, for IPv4 and IPv6 (device.zig). - Gossiped dead/leave/suspect about a third party now only marks it suspected locally and is confirmed by our own failure detector before any eviction; gossip can no longer clear local suspicion or resurrect a peer (swim.zig). - DNS bootstrap: random per-query transaction ID, connected socket for source filtering, and transaction-ID/question validation (dns.zig). Memory safety (safety checks are off in the shipped ReleaseFast build) - Bound the base64 token decode against the output buffer before decoding, fixing a stack out-of-bounds write on an over-long mg:// token (coordinated_punch.zig). - Return expired-peer keys via a caller-owned buffer instead of a slice into a reclaimed stack frame (membership.zig). Resource exhaustion / DoS - Cap the membership table and reclaim non-alive entries so unauthenticated gossip cannot grow it without bound (membership.zig). - Rate-limit inbound WireGuard handshake initiations (per-source + global token bucket) before the X25519 (device.zig). Thread safety - Add an RwLock to the WgDevice peer table (exclusive for mutation/handshake, shared for the data plane) so peer removal cannot zero key material while a worker is mid-encrypt (device.zig, main.zig). - Add an RwLock to the membership table; the single writer takes it exclusively for mutations, external readers (data plane, FFI) share it (membership.zig, main.zig, meshguard_ffi.zig). Service policy - Classify IPv6 packets in the service filter and fail closed on anything unclassifiable under a default-deny policy (policy.zig).

The H4 rate-limit change added a src_ip parameter to WgDevice.handleInitiation; wg_interop.zig (built only on Linux) still used the old 1-arg call, breaking the ubuntu CI build. macOS doesn't compile this binary, so local builds missed it.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70e3460cb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T12:43:11Z

+        if (!innerSourceAllowed(peer.mesh_ip, peer.mesh_ip6, out[0..len])) {
+            return error.SourceSpoofed;


Allow framed tunnel payloads through decryptTransport

When the FFI tunnel API is used, meshguard_tunnel_send deliberately frames arbitrary payloads with a leading 0x00 header before encrypting them, and the receiver routes those packets through handleWgPacket → dev.decryptTransport. This new unconditional inner-IP check makes decryptTransport return SourceSpoofed for any non-IPv4/IPv6 plaintext, so every framed tunnel message is dropped after decryption rather than being delivered to the tunnel inbox. The source-IP enforcement needs to be limited to TUN IP packets or provide a separate decrypt path for the framed tunnel transport.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-06T12:43:11Z

+        if (!self.hs_limiter.allow(src_ip, nowNs())) {
+            return error.HandshakeRateLimited;


Avoid charging auto-registered handshakes twice

In the FFI responder path, the first packet from a membership-known peer that is not yet in WgDevice is handled by calling handleInitiation, catching UnknownPeer, auto-registering, and then immediately retrying handleInitiation. Since the limiter token is consumed here before the UnknownPeer lookup, that single legitimate first handshake burns one token on the failed probe and another on the retry; if the source has only one token left, the retry is rate-limited and no response is sent. Consider not consuming the bucket for UnknownPeer probes or making the auto-register retry reuse the admission decision.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Hardening pass across Meshguard’s networking stack to close multiple security and reliability gaps in the WireGuard/userspace data plane, SWIM membership handling, DNS bootstrap, and service-policy enforcement, with accompanying regression tests and documentation updates.

Changes:

Add WireGuard peer-table/thread-safety protections (device RwLock) plus a handshake-init token-bucket limiter and RX inner-source enforcement.
Harden discovery/membership against spoofing and resource exhaustion (org control signature verification, gossip “dead” handling, membership cap + reclaim, safer expireSuspected API).
Fix service policy IPv6 bypass and DNS bootstrap response spoofing weaknesses (random TXID, connected UDP, response validation).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/wireguard/device.zig	Adds handshake rate limiter, peer-table `RwLock`, and RX inner-source enforcement plus tests.
src/services/policy.zig	Adds IPv6-capable packet classification and a fail-closed `allowPacket` API plus tests.
src/protocol/codec.zig	Adds canonical signed-payload builders for org control message verification.
src/net/dns.zig	Randomizes DNS TXID, connects UDP socket, validates response matches query, adds tests.
src/nat/coordinated_punch.zig	Bounds base64 decode output to prevent OOB writes; adds regression tests.
src/meshguard_ffi.zig	Adds membership read locks for several exported FFI entry points; updates WG initiation call signature.
src/main.zig	Uses new service-filter API and adds WG device shared locks in some dataplane paths.
src/discovery/swim.zig	Verifies org control signatures and changes gossip “dead/leave/suspect” handling; updates expire API usage; adds tests.
src/discovery/membership.zig	Adds membership cap + reclaim, introduces `RwLock`, fixes `expireSuspected` use-after-return, adds tests.
SECURITY.md	Documents the new security properties (RX routing, signed org control, confirmed failure detection, rate limiting).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        // Per-source bucket (hash → slot, reset on collision with a different IP).
+        const h = (((@as(usize, ip[0]) *% 131 +% ip[1]) *% 131 +% ip[2]) *% 131 +% ip[3]);
+        const b = &self.buckets[h % SLOTS];


    /// Handle an incoming handshake initiation (we are responder).
    /// O(1) lookup via static key table instead of O(N) peer iteration.
-    pub fn handleInitiation(self: *WgDevice, msg: *const noise.HandshakeInitiation) !struct { response: noise.HandshakeResponse, slot: usize } {
-        // Step 0: Verify MAC1 BEFORE expensive DH (anti-DoS gate)
+    pub fn handleInitiation(self: *WgDevice, msg: *const noise.HandshakeInitiation, src_ip: [4]u8) !struct { response: noise.HandshakeResponse, slot: usize } {
+        // Step 0: Verify MAC1 BEFORE expensive DH (cheap, no lock).


+    if (!std.ascii.eqlIgnoreCase(data[HEADER_LEN .. HEADER_LEN + q_end], q_buf[0..q_end])) return false;
+    if (readU16(data, HEADER_LEN + q_end) != qtype) return false; // QTYPE echoed
+    return true;


        const existing = self.peers.get(peer.pubkey);
        if (existing) |e| {
            // Only update if the incoming info is newer (higher Lamport timestamp)
            if (peer.lamport <= e.lamport) return;
+            try self.peers.put(peer.pubkey, peer);
+            return;
+        }


        if (wg_dev.decryptTransport(pkt, &out_buf)) |dec| {
-            // Service filter: check port access before TUN write
-            if (lib.services.Policy.parseTransportHeader(out_buf[0..dec.len])) |ti| {
-                const peer = wg_dev.peers[dec.slot] orelse continue;
-                const org_pk = if (membership.peers.getPtr(peer.identity_key)) |mp| mp.org_pubkey else null;
-                if (!service_filter.check(peer.identity_key, org_pk, ti.proto, ti.dst_port)) continue;
+            // Service filter: check access before TUN write (IPv4 + IPv6, M5).
+            if (wg_dev.peers[dec.slot]) |peer| {
+                const org_pk = orgPubkeyLocked(membership, peer.identity_key);
+                if (!service_filter.allowPacket(peer.identity_key, org_pk, out_buf[0..dec.len])) continue;
            }


            if (wg_dev.decryptTransport(pkt, &decrypt_storage[n_decrypted.*])) |result| {
-                // Check service filter before buffering
-                const PolicyMod = lib.services.Policy;
-                if (PolicyMod.parseTransportHeader(decrypt_storage[n_decrypted.*][0..result.len])) |ti| {
-                    if (wg_dev.peers[result.slot]) |peer| {
-                        const org_pk = if (swim.membership.peers.getPtr(peer.identity_key)) |mp| mp.org_pubkey else null;
-                        if (!service_filter.check(peer.identity_key, org_pk, ti.proto, ti.dst_port)) return;
-                    }
+                // Check service filter before buffering (IPv4 + IPv6, M5).
+                if (wg_dev.peers[result.slot]) |peer| {
+                    const org_pk = orgPubkeyLocked(swim.membership, peer.identity_key);
+                    if (!service_filter.allowPacket(peer.identity_key, org_pk, decrypt_storage[n_decrypted.*][0..result.len])) return;
                }


Code-review follow-ups on the hardening PR: - P1 (regression): the H1 RX source-IP check rejected non-IP plaintext, which dropped every FFI framed tunnel message (framed with a leading-zero length header → version nibble 0). innerSourceAllowed now only constrains IPv4/IPv6 packets (the IP-routed ones); non-IP framed payloads pass to the inbox. - H6 race: the daemon re-read wg_dev.peers[slot] after decryptTransport had released the device lock — a data race vs removePeer that could also skip the service filter. decryptTransport now returns the peer identity_key (captured under the lock); all four daemon filter blocks and the FFI use it, so peers[] is never re-read post-decrypt. - P2: the FFI auto-register path charged the handshake rate limiter twice for one legitimate handshake (initial UnknownPeer probe + retry). Added handleInitiationAdmitted for the retry, which skips re-admission. - DNS: responseMatchesQuery now also validates the echoed QCLASS. - membership.upsert: free the previous name allocation on the update path (consistent with remove/deinit) to avoid a leak. - device.zig: make the handshake-limiter hash operands explicitly usize; give the initiation handlers a shared named return type (InitiationResult). Verified: zig build test, Debug, ReleaseFast, ReleaseSafe, and x86_64-linux cross-compile (Debug + ReleaseFast).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab85b27106

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T12:59:01Z

+        wg_dev.lock.lockSharedUncancelable(zio());
+        defer wg_dev.lock.unlockShared(zio());


Release the device lock before the next batch

In this loop, defer is scoped to cryptoWorkerPipeline, not to the current iteration, so after the first popped batch each crypto worker keeps a shared wg_dev.lock held until the thread exits (including across continue and the next crypto_q.pop()). Any later exclusive operation on the same device lock, such as initiateHandshake, handleResponse, or removePeer, can then block indefinitely once data-plane traffic has passed through the worker, preventing rekeys/handshakes and peer cleanup in userspace mode.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings June 6, 2026 12:38

Copilot started reviewing on behalf of igorls June 6, 2026 12:39 View session

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

Copilot AI reviewed Jun 6, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

igorls merged commit 56d9d8d into main Jun 6, 2026
4 checks passed

igorls deleted the security/networking-hardening branch June 6, 2026 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden networking: auth, memory-safety, DoS, and thread-safety fixes#101

Harden networking: auth, memory-safety, DoS, and thread-safety fixes#101
igorls merged 3 commits into
mainfrom
security/networking-hardening

igorls commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (!innerSourceAllowed(peer.mesh_ip, peer.mesh_ip6, out[0..len])) {
		return error.SourceSpoofed;

		if (!self.hs_limiter.allow(src_ip, nowNs())) {
		return error.HandshakeRateLimited;

		wg_dev.lock.lockSharedUncancelable(zio());
		defer wg_dev.lock.unlockShared(zio());

Conversation

igorls commented Jun 6, 2026

What's hardened

Tests

Notes for review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants