Skip to content

fix(NSC): check error from rand.Int before using result in shuffle#2020

Closed
Aprazor wants to merge 1 commit intocloudnativelabs:masterfrom
Aprazor:fix/shuffle-nil-deref-before-error-check
Closed

fix(NSC): check error from rand.Int before using result in shuffle#2020
Aprazor wants to merge 1 commit intocloudnativelabs:masterfrom
Aprazor:fix/shuffle-nil-deref-before-error-check

Conversation

@Aprazor
Copy link
Copy Markdown
Contributor

@Aprazor Aprazor commented Mar 18, 2026

What type of PR is this?

bug

What this PR does / why we need it:

In shuffle() (pkg/controllers/proxy/network_services_controller.go), the return value of rand.Int() is dereferenced on the line before the error check. If rand.Int returns an error, it returns (nil, err), and randBitInt.Int64() panics with a nil pointer dereference.

The fix moves the error check before the dereference and uses continue to skip the swap on error.

Which issue(s) this PR is related to:

None found.

Was AI used during the creation of this PR?

  • What tool was used: Claude Code
  • To what extent was the tool used? Code review identified the bug, human reviewed and confirmed the fix
  • If drafted, how detailed of a plan did you create for the AI? Detailed — identified the exact bug pattern, confirmed with line-by-line analysis, minimal fix scoped to the affected function
  • Help us understand if a human was in the loop or not for this PR? Yes — human confirmed the finding, reviewed the diff, and approved before submission

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

Unit tests pass. No integration testing — the fix is a simple reordering of existing lines.

Does this PR introduce a breaking change?

NONE

Anything else the reviewer should know that wasn't already covered?

The bug is latent — crypto/rand.Reader rarely fails, but when it does (e.g., entropy exhaustion), this would crash the controller.

rand.Int returns (nil, err) on failure, but the return value was
dereferenced on the next line before the error check. This causes a
nil pointer panic if rand.Int ever fails. Move the error check before
the dereference and continue the loop on error.
@aauren
Copy link
Copy Markdown
Collaborator

aauren commented Mar 23, 2026

Hi @Aprazor! Thanks for your interest in kube-router and for looking into some of these data race, contention, and
stability bugs!

Hopefully we'll have a chance to review them in the next week or two. In the meantime, it would be super helpful if
you could condense them for us a bit. kube-router has a small maintainer team and it is easier to review work when it
is chunked up along functionality lines instead of as 8 different PRs like it is now.

Here's a suggestion for how you could reorganize these into 3 PRs:

PR 1: fix(NSC): harden Network Services Controller against panics, races, and sync errors

This would combine #2037, #2036, #2023, #2021, and #2020 — all of which are bug fixes in pkg/controllers/proxy/.
Three of them touch the same file (network_services_controller.go), so they're likely to conflict with each other
as-is anyway. The common thread is defensive hardening of the services controller: nil checks, race protection, error
propagation, and control flow fixes. Note that #2020 currently has a CI failure on ci-unit-tests that would need to
be resolved.

PR 2: fix(NRC): use atomic.Bool for bgpServerStarted to prevent data race

This would just be #2038 as-is. It's the only change in the routing controller (pkg/controllers/routing/), it's
self-contained, and it doesn't overlap with anything else. No changes needed here.

PR 3: fix(NPC,LBC): harden network policy and load balancer controllers

This would combine #2039 and #2022 — the sanitization of iptables comment strings in the network policy controller
and the nil panic fix in the load balancer allocator. They share a defense-in-depth / input validation theme and the
combined PR would still be quite small (~41 lines changed).

One other thing — please make sure the consolidated PRs include unit tests for the new code. Most of the current PRs don't add any test coverage. Our Developer's Guide covers the testing expectations, and since you
mentioned using Claude, the repo's AGENTS.md file has detailed testing guidelines that should help your AI follow
the project's conventions (table-driven tests, testify assertions, 70% coverage target, etc.) if you include it in
context. You can run make test-pretty locally to verify everything passes before pushing.

Thanks again for the contributions — looking forward to reviewing once they're consolidated!

@Aprazor
Copy link
Copy Markdown
Contributor Author

Aprazor commented Mar 23, 2026

Superseded by #2041 (consolidated NSC PR per @aauren's feedback)

@Aprazor Aprazor closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants