Fix overlay network IP assignment using Peers-based positioning by firecow · Pull Request #57 · cego/container-manager

firecow · 2026-03-03T20:17:33Z

Summary

Replace FNV hash-based node offset with Peers list-based deterministic positioning for overlay network IP assignment
Each overlay container gets a unique overlay index, selecting its designated IP from the node's band (no more competing for the same candidates)
Multi-round candidate generation: 3 rounds of fallback IPs per container, spaced across the top of the subnet
Pre-filter IPs already visible on this node to skip known conflicts instantly (avoids 20s Docker timeout)
Remove Docker-assigned fallback: skip the network instead of getting a low IP that collides with Swarm

Test plan

All 38 unit tests pass (broadcastAddr, addToIP, computeIPCandidates, multi-round, overlay index selection)
Multi-round: verified no IP overlap across peers, rounds, and container lanes
Tested on staging (swarm-node1-stage-lp1.spilnu.dk): heartbeat gets .254, metricbeat gets .253 on all networks except those where Swarm holds the IP
Verify on production after deploy

Replace the FNV hash-based nodeOffset with a deterministic position derived from the network's Peers list. Each node gets a tight band of IPs at the very top of the subnet based on its sorted index among peers, eliminating collisions on large networks where the hash spread overlapped with Swarm allocations.

Extract computeIPCandidates as a pure function for testability. Tests cover /21, /24, /16, /28, /30 subnets with 1-100 peers and 1-5 containers, including edge cases like subnet overflow, byte boundary crossing, unknown node, and ordering invariance.

- Clone peerIPs before sorting in computeIPCandidates to avoid mutating the caller's slice - Fix test expectations that relied on lexicographic sort side effects - Add tests: input mutation guards, subnet bounds, contiguous IPs, gateway avoidance, peer join stability, real-world spilnu-shared scenario, /30 edge cases

Lexicographic sort caused 10.0.0.10 to sort before 10.0.0.2, which meant adding a 10th node to a 9-node cluster would shift all existing nodes' IP bands. Numeric sort (bytes.Compare on parsed IPs) ensures new higher-IP nodes always append, so existing bands are never disrupted. Also remove unused containerName parameter from highIPCandidates.

Instead of all containers trying the same candidate list and competing (causing allocation failures), each container now gets only its specific IP based on its overlay index within the config.

Tests the production scenario: multiple overlay containers on the same node get different designated IPs, including simulation of the run() loop's index counting with mixed overlay and non-overlay containers.

…allback When an overlay NetworkConnect with a specific IP fails (e.g. IP held by a Swarm service on another node), Docker returns context deadline exceeded after 20 seconds instead of an immediate error. The old code then fell back to Docker-assigned, which gives low IPs that collide with Swarm's bottom-up allocation, causing "could not allocate IP from IPAM: Address already in use" task allocation failures. Three changes: 1. Multi-round candidates: computeIPCandidatesMultiRound generates IPs across 3 rounds (spaced below all peers' primary bands). Each container gets 3 fallback IPs in its own lane instead of just 1. 2. Pre-filter locally visible IPs: highIPCandidates checks networkInfo.Containers to skip IPs already held on this node, avoiding the 20-second timeout for known conflicts. 3. Remove Docker-assigned fallback: if all candidate IPs fail, skip the network instead of getting a dangerous low IP. The next run cycle will retry.

firecow self-assigned this Mar 3, 2026

firecow added 8 commits March 3, 2026 21:32

Add unit tests and test workflow

3ed4f68

Extract computeIPCandidates as a pure function for testability. Tests cover /21, /24, /16, /28, /30 subnets with 1-100 peers and 1-5 containers, including edge cases like subnet overflow, byte boundary crossing, unknown node, and ordering invariance.

Fix misleading /30 test names and add third-peer-no-room case

0fbca43

Add go fix check to test workflow

7f45979

Give each overlay container its own designated IP from the band

e6c5543

Instead of all containers trying the same candidate list and competing (causing allocation failures), each container now gets only its specific IP based on its overlay index within the config.

Add overlay index selection tests

9240aa6

Tests the production scenario: multiple overlay containers on the same node get different designated IPs, including simulation of the run() loop's index counting with mixed overlay and non-overlay containers.

firecow changed the title ~~Fix overlay IP collisions using Peers-based positioning~~ Fix overlay network IP assignment using Peers-based positioning Mar 3, 2026

firecow closed this Mar 3, 2026

firecow deleted the fix/peers-based-ip-assignment branch March 3, 2026 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix overlay network IP assignment using Peers-based positioning#57

Fix overlay network IP assignment using Peers-based positioning#57
firecow wants to merge 9 commits intomainfrom
fix/peers-based-ip-assignment

firecow commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

firecow commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

firecow commented Mar 3, 2026 •

edited

Loading