From 97093f42f45ef12075eab1f045c426da8a3b7ec5 Mon Sep 17 00:00:00 2001 From: Jiawei Huang Date: Thu, 23 Oct 2025 22:21:48 -0700 Subject: [PATCH 1/2] Add troubleshooting guide for non-cluster hosts and VMs setup --- .../bare-metal/troubleshoot.mdx | 88 +++++++++++++++++++ sidebars-calico-enterprise.js | 1 + 2 files changed, 89 insertions(+) create mode 100644 calico-enterprise/getting-started/bare-metal/troubleshoot.mdx diff --git a/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx b/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx new file mode 100644 index 0000000000..231d128a98 --- /dev/null +++ b/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx @@ -0,0 +1,88 @@ +--- +description: Troubleshoot non-cluster hosts and VMs setup +--- + +# Troubleshoot non-cluster hosts and VMs setup + +This document provides guidance for troubleshooting non-cluster host setup for hosts and VMs. + +## Useful commands + +These commands can help you collect logs and monitor system activities during troubleshooting. + +### On non-cluster hosts or VMs + +```bash +journalctl -xue calico-node.service -f +journalctl -xue calico-fluent-bit.service -f +``` + +### On the cluster side + +```bash +kubectl logs -n calico-system -l k8s-app=calico-typha-noncluster-host +kubectl logs -n tigera-manager -l k8s-app=tigera-manager -c tigera-voltron +``` + +Monitor CertificateSigningRequests (CSR): + +```bash +kubectl get certificatesigningrequest -w +``` + +## Common problems + +### No internet connection after installing the Calico Node package + +By default, $[prodname] blocks all traffic to and from host interfaces. You can use a profile with host endpoints to modify default behavior. Apply the built-in profile `projectcalico-default-allow`, which allows all ingress and egress traffic. Host endpoints that use this profile will have *allow-all* behavior instead of *deny-all* when no network policy is applied. + +### Certificate signed by unknown authority + +If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. For the Calico fluent-bit log forwarding, you can temporarily disable TLS verification by setting: + +```conf +[OUTPUT] + ... + tls.verify Off + ... +``` + +in the configuration file `/etc/calico/calico-fluent-bit/calico-fluent-bit.conf`. + +:::note + +Disabling TLS verification should only be used for testing or troubleshooting. + +::: + +### No object can be associated with CSR error + +If a CSR is denied with the following error: + +```text +invalid: no object can be associated with CSR node-certs-noncluster-host: +``` + +verify the following: + +* A corresponding host endpoint resource exists for the non-cluster host or VM. +* The `spec.node` field in the host endpoint resource matches the non-cluster host name exactly. + +### Peer certificate does not have required CN + +If the non-cluster host fails to connect to the dedicated Typha deployment, check that the certificate Common Name (CN) values are consistent on both sides. + +On the non-cluster host or VM under the `/etc/calico/calico-node` folder: + +* In `calico-node.conf`, verify the `TyphaCN` value matches the remote Typha server certificate CN, or +* In `calico-node.env`, verify the `FELIX_TYPHACN` value matches the remote Typha server certificate CN. + +On the cluster side (`calico-system/calico-typha-noncluster-host` deployment): + +* The `TYPHA_CLIENTCN` environment variable must match the CN used in the non-cluster node certificate. + +### Certificate is not renewed or updated + +The `calico-noncluster-host-init` process runs before the main `calico-node` service is responsible for renewing certificates that are expired or near expiry. Certificates are renewed automatically within 90 days of expiry. + +If you need to force immediate renewal, manually delete the existing certificate (`calico-node.crt`) and private key (`calico-node.key`) under the `/etc/calico/calico-node` folder and restart the service. diff --git a/sidebars-calico-enterprise.js b/sidebars-calico-enterprise.js index 8d0106d7e1..003f589622 100644 --- a/sidebars-calico-enterprise.js +++ b/sidebars-calico-enterprise.js @@ -89,6 +89,7 @@ module.exports = { items: [ 'getting-started/bare-metal/about', 'getting-started/bare-metal/typha-node-tls', + 'getting-started/bare-metal/troubleshoot', ], }, { From 53a88b8147d9116132413f4129ed75c717a06c78 Mon Sep 17 00:00:00 2001 From: Jiawei Huang Date: Wed, 29 Oct 2025 16:21:03 -0700 Subject: [PATCH 2/2] Address review comments and add explain CSR monitor --- .../bare-metal/troubleshoot.mdx | 29 +++++++++++++++++-- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx b/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx index 231d128a98..322b6ac14c 100644 --- a/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx +++ b/calico-enterprise/getting-started/bare-metal/troubleshoot.mdx @@ -4,7 +4,7 @@ description: Troubleshoot non-cluster hosts and VMs setup # Troubleshoot non-cluster hosts and VMs setup -This document provides guidance for troubleshooting non-cluster host setup for hosts and VMs. +This document provides guidance for troubleshooting Calico running on hosts and VMs outside of a cluster. ## Useful commands @@ -24,21 +24,44 @@ kubectl logs -n calico-system -l k8s-app=calico-typha-noncluster-host kubectl logs -n tigera-manager -l k8s-app=tigera-manager -c tigera-voltron ``` -Monitor CertificateSigningRequests (CSR): +You can monitor CertificateSigningRequests (CSR) by running: ```bash kubectl get certificatesigningrequest -w ``` +Monitoring CSRs is useful for debugging certificates used for Calico Node and Typha mutual TLS (mTLS) communication. The automatic CSR approval and signing flow can fail in several ways. For example: + +- The CSR request might not be created or submitted correctly. +- The Tigera Operator CSR controller might not process it. +- The Tigera Operator signer might reject the request due to invalid fields or missing permission. + +When such failure occur, the CSR status object contains detailed condition and error messages that help identify the root cause. + ## Common problems ### No internet connection after installing the Calico Node package By default, $[prodname] blocks all traffic to and from host interfaces. You can use a profile with host endpoints to modify default behavior. Apply the built-in profile `projectcalico-default-allow`, which allows all ingress and egress traffic. Host endpoints that use this profile will have *allow-all* behavior instead of *deny-all* when no network policy is applied. +Example `HostEndpoint` with the `projectcalico-default-allow` profile: + +```yaml +apiVersion: projectcalico.org/v3 +kind: HostEndpoint +metadata: + name: +spec: + interfaceName: + node: + expectedIPs: [""] + profiles: + - projectcalico-default-allow +``` + ### Certificate signed by unknown authority -If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. For the Calico fluent-bit log forwarding, you can temporarily disable TLS verification by setting: +If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. Alternatively, for the Calico fluent-bit log forwarder, you can temporarily disable TLS verifications by setting: ```conf [OUTPUT]