fix(NSC): check error from rand.Int before using result in shuffle by Aprazor · Pull Request #2020 · cloudnativelabs/kube-router

Aprazor · 2026-03-18T19:04:44Z

What type of PR is this?

bug

What this PR does / why we need it:

In shuffle() (pkg/controllers/proxy/network_services_controller.go), the return value of rand.Int() is dereferenced on the line before the error check. If rand.Int returns an error, it returns (nil, err), and randBitInt.Int64() panics with a nil pointer dereference.

The fix moves the error check before the dereference and uses continue to skip the swap on error.

Which issue(s) this PR is related to:

None found.

Was AI used during the creation of this PR?

What tool was used: Claude Code
To what extent was the tool used? Code review identified the bug, human reviewed and confirmed the fix
If drafted, how detailed of a plan did you create for the AI? Detailed — identified the exact bug pattern, confirmed with line-by-line analysis, minimal fix scoped to the affected function
Help us understand if a human was in the loop or not for this PR? Yes — human confirmed the finding, reviewed the diff, and approved before submission

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

Unit tests pass. No integration testing — the fix is a simple reordering of existing lines.

Does this PR introduce a breaking change?

NONE

Anything else the reviewer should know that wasn't already covered?

The bug is latent — crypto/rand.Reader rarely fails, but when it does (e.g., entropy exhaustion), this would crash the controller.

rand.Int returns (nil, err) on failure, but the return value was dereferenced on the next line before the error check. This causes a nil pointer panic if rand.Int ever fails. Move the error check before the dereference and continue the loop on error.

aauren · 2026-03-23T01:51:21Z

Hi @Aprazor! Thanks for your interest in kube-router and for looking into some of these data race, contention, and
stability bugs!

Hopefully we'll have a chance to review them in the next week or two. In the meantime, it would be super helpful if
you could condense them for us a bit. kube-router has a small maintainer team and it is easier to review work when it
is chunked up along functionality lines instead of as 8 different PRs like it is now.

Here's a suggestion for how you could reorganize these into 3 PRs:

PR 1: fix(NSC): harden Network Services Controller against panics, races, and sync errors

This would combine #2037, #2036, #2023, #2021, and #2020 — all of which are bug fixes in pkg/controllers/proxy/.
Three of them touch the same file (network_services_controller.go), so they're likely to conflict with each other
as-is anyway. The common thread is defensive hardening of the services controller: nil checks, race protection, error
propagation, and control flow fixes. Note that #2020 currently has a CI failure on ci-unit-tests that would need to
be resolved.

PR 2: fix(NRC): use atomic.Bool for bgpServerStarted to prevent data race

This would just be #2038 as-is. It's the only change in the routing controller (pkg/controllers/routing/), it's
self-contained, and it doesn't overlap with anything else. No changes needed here.

PR 3: fix(NPC,LBC): harden network policy and load balancer controllers

This would combine #2039 and #2022 — the sanitization of iptables comment strings in the network policy controller
and the nil panic fix in the load balancer allocator. They share a defense-in-depth / input validation theme and the
combined PR would still be quite small (~41 lines changed).

One other thing — please make sure the consolidated PRs include unit tests for the new code. Most of the current PRs don't add any test coverage. Our Developer's Guide covers the testing expectations, and since you
mentioned using Claude, the repo's AGENTS.md file has detailed testing guidelines that should help your AI follow
the project's conventions (table-driven tests, testify assertions, 70% coverage target, etc.) if you include it in
context. You can run make test-pretty locally to verify everything passes before pushing.

Thanks again for the contributions — looking forward to reviewing once they're consolidated!

Aprazor · 2026-03-23T02:12:13Z

Superseded by #2041 (consolidated NSC PR per @aauren's feedback)

This was referenced Mar 23, 2026

fix(NSC): harden Network Services Controller against panics, races, and sync errors #2041

Merged

fix(NPC,LBC): harden network policy and load balancer controllers #2042

Open

Aprazor closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(NSC): check error from rand.Int before using result in shuffle#2020

fix(NSC): check error from rand.Int before using result in shuffle#2020
Aprazor wants to merge 1 commit intocloudnativelabs:masterfrom
Aprazor:fix/shuffle-nil-deref-before-error-check

Aprazor commented Mar 18, 2026

Uh oh!

aauren commented Mar 23, 2026 •

edited

Loading

Uh oh!

Aprazor commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aprazor commented Mar 18, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Was AI used during the creation of this PR?

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

Does this PR introduce a breaking change?

Anything else the reviewer should know that wasn't already covered?

Uh oh!

aauren commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aprazor commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aauren commented Mar 23, 2026 •

edited

Loading