operator - showing continuous panic errors related to memory and appears to be causing pods to loose network access until restarted

We have been using `calico` very successfully for 5+ years now however in the last couple weeks we started to notice that some of our Kubernetes pods were getting network failures, they would effectively be offline for ingress traffic for 2-3 mins then start working for 1-2 mins then down again. 

After a lot of various troubleshooting we found that the `tigera-operator` was continuously spamming:
`"panic\":\"runtime error: invalid memory address or nil pointer dereference\"`

However immediately after that message is another error:
`"error\":\"panic: runtime error: invalid memory address or nil pointer dereference [recovered]\"`

We aren't sure if this is indicating there is no issue, but if so why is it continuously spamming these messages until we restart, then it seems to work ok for 1-3 days and then starts panic'ing again.

What we did find is that if we `restart` our impacted pods they continue to experience network issues. However if we FIRST restart the `tigera-operator` all the `panic` messages stop and if we then restart the impacted pods they start working normally until the panics start again. 

At this point we are pretty stumped we upgraded to the latest versions:

- image: quay.io/tigera/operator:v1.40.8
- image: quay.io/calico/apiserver:v3.31.5
- image: quay.io/calico/kube-controllers:v3.31.5
- image: quay.io/calico/node:v3.31.5
- image: quay.io/calico/pod2daemon-flexvol:v3.31.5
- image: quay.io/calico/typha:v3.31.5
- AWS EKS 1.35.0 / BottleRocket 1.57.0 / containerd 2.1.6+bottlerocket 
- |  vpc-cni |  ACTIVE |  v1.21.1-eksbuild.7  |
- |  coredns |  ACTIVE |  v1.13.2-eksbuild.4  |
- |  kube-proxy |  ACTIVE |  v1.35.3-eksbuild.2  |
- |  aws-ebs-csi-driver |  ACTIVE |  v1.58.0-eksbuild.1  |

I did see another [post](https://github.com/tigera/operator/issues/4222) from Oct 2025 that looks very similar to the panic I'm getting but the ticket doesn't appear to have been closed without resolution?

At this point this is have a serious impact on our clusters.. we have 6 separate EKS clusters all similar versions and 4 out of 5 are showing the issue. And the same `restart operator, restart pod` seems to fix things for a few days. We are pretty desperate to find a solution. So appreciate any help you can offer.

One other oddity is that our production clusters.. has never experienced the same network failures to date, all clusters run the same versions. However even though Prod never has the issues.. I can see in the logs this panic error going back to Jan 2026 .. which makes me second guess the cause, however the restarts do appear to solve it.

Thanks for any help.

[operator-panic-errors.txt](https://github.com/user-attachments/files/26788548/operator-panic-errors.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator - showing continuous panic errors related to memory and appears to be causing pods to loose network access until restarted #4703

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

operator - showing continuous panic errors related to memory and appears to be causing pods to loose network access until restarted #4703

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions