From e6f81ae7a0dbde26abf41d4574c19fea23a6f1b6 Mon Sep 17 00:00:00 2001 From: Matt Dupre Date: Mon, 27 Apr 2026 10:53:00 -0700 Subject: [PATCH] Claude WIP --- .../networking/configuring/multi-vrf.mdx | 544 ++++++++++++++++++ .../reference/resources/network.mdx | 169 ++++++ sidebars-calico-enterprise.js | 2 + 3 files changed, 715 insertions(+) create mode 100644 calico-enterprise/networking/configuring/multi-vrf.mdx create mode 100644 calico-enterprise/reference/resources/network.mdx diff --git a/calico-enterprise/networking/configuring/multi-vrf.mdx b/calico-enterprise/networking/configuring/multi-vrf.mdx new file mode 100644 index 0000000000..d12920f4e7 --- /dev/null +++ b/calico-enterprise/networking/configuring/multi-vrf.mdx @@ -0,0 +1,544 @@ +--- +description: Attach pods to one or more Linux VRFs to isolate routing between tenants and support overlapping external IPs. +--- + +# Configure multi-VRF networking + +:::note + +Multi-VRF support is a tech preview feature. APIs and behaviour may change before +GA. Tech preview features may be subject to significant changes before they +become GA. + +::: + +## Big picture + +Attach pods to one or more Linux Virtual Routing and Forwarding (VRF) domains +so that traffic from those pods is routed in a dedicated routing table, peered +with a dedicated upstream fabric, and isolated from pods on other VRFs (and +from the default flat pod network). + +## Value + +A VRF is a routing plane (hence "virtual routing and forwarding") with its own +routing table. Attaching pods to VRFs lets you: + +- **Reach overlapping external IPs.** The same external IP address can be reused + in multiple tenant networks outside the cluster. Pods on a VRF can reach the + copy of that IP in their own tenant network, and responses come back in the + correct VRF. +- **Use VRFs as a routing security boundary.** Pods on different VRFs cannot + reach each other inside the cluster (unless an external router bridges the + two VRFs). Network policy still applies on top of this. +- **Map workloads onto an existing multi-tenant fabric.** Each VRF on each node + is connected to one of your tenant networks (typically over a VLAN + subinterface) and exchanges routes with that tenant's BGP fabric. + +## Concepts + +### Networks and the default flat network + +A pod is attached to one or more **Networks** through one or more interfaces: + +- The default flat pod network (the existing $[prodname] pod network) is + always available; pods that don't opt in to a VRF use only this network. +- A `Network` of type `vrf` represents a single Linux VRF, with its own + routing table on each node. +- A pod can be attached to the flat network, a VRF, or both, and to up to nine + VRFs simultaneously through Multus secondary interfaces. + +### Routing topology + +Each VRF has a routing table on every node where it is configured. Felix: + +- Creates a Linux VRF device on each node and enslaves the host interfaces + listed in `hostConfig.hostInterfaces` (typically a VLAN subinterface). +- Programs `/32` routes for **local** pods on that VRF into the VRF's routing + table. +- Programs static routes from `hostConfig.staticRoutes` (typically a default + route to the upstream router). + +Routes to remote pods (on other nodes) are distributed by **BGP**: each node +peers with its upstream router on the VRF, advertises its own pod `/32`s, and +imports the others. Pod-to-pod traffic on the same VRF crossing nodes therefore +leaves the source node on the VRF's host interface, transits the upstream +fabric, and returns to the destination node still on that VRF — it does **not** +use the default flat pod network. + +The default routing table on each node still handles the flat pod network as +before. + +### VRF isolation + +Pods on different VRFs are isolated inside the cluster: + +- Their veths sit in different VRFs, so traffic from a VRF pod is consulted + against that VRF's routing table only. +- Service traffic must terminate on a backend that is in the same VRF as the + source pod (kube-proxy programs DNAT in a VRF-agnostic way; if the chosen + backend is in a different VRF, the return path will not work). + +If you need traffic between two VRFs, route them outside the cluster. + +### Required services on the default network + +Several core Kubernetes services live on the default flat network — most +notably `kube-dns` and the Kubernetes API server. A pod attached **only** to +a VRF cannot reach those services unless an external router bridges the VRF to +the flat pod network. Most workloads should therefore have a primary interface +on the flat network and an additional Multus interface for the VRF, unless the +deployment can tolerate the loss of cluster DNS / API access. + +For the same reason, kubelet (which sits on the flat network) cannot perform +network-type liveness / readiness probes against a pod that has no flat-network +interface — use exec probes for those pods. + +## Before you begin + +**Required** + +- $[prodname] installed with the **nftables dataplane** (`linuxDataplane: Nftables` in the [Installation](../../reference/installation/api.mdx) resource). The iptables and eBPF dataplanes are not supported in the tech preview. +- **kube-proxy must also be in nftables mode**. If kube-proxy is in iptables or ipvs mode, multi-VRF will not work correctly. ipvs mode is **not** supported. +- **Linux kernel 5.6 or later** on every node. +- The **`vrf` kernel module** loaded on every node. On Ubuntu, this is shipped in `linux-modules-extra-$(uname -r)`. Confirm with: + + ```bash + sudo modprobe vrf && lsmod | grep '^vrf ' + ``` + + :::tip + + On Ubuntu HWE kernels, install the HWE-tracking modules-extra package + (for example `linux-modules-extra-generic-hwe-24.04`) so that the VRF module + follows the kernel through HWE upgrades. Do **not** install the + non-HWE `linux-modules-extra-generic` / `linux-image-generic` metapackages + on an HWE kernel — they will pull in and pin the GA kernel. + + ::: + +- BGP peering configured between each node and the upstream router on each VRF (see [Create BGPPeers and BGPFilters](#create-bgppeers-and-bgpfilters)). +- [Multus](./multiple-networks.mdx) installed if you want to attach pods to a VRF using a secondary interface (the most common topology). + +**Recommended** + +- Pin `nodeAddressAutodetection` in the [Installation](../../reference/installation/api.mdx) to a specific interface (for example `eth0`) or to `kubernetes: NodeInternalIP`. The default "first found" autodetection can chase a VRF-attached interface when additional interfaces are brought up, breaking the cluster. + +## How to + +1. [Plan the VRF topology](#plan-the-vrf-topology) +1. [Bring the VRF interfaces onto the nodes](#bring-the-vrf-interfaces-onto-the-nodes) +1. [Create per-VRF IP pools](#create-per-vrf-ip-pools) +1. [Create the Network resources](#create-the-network-resources) +1. [Create BGPPeers and BGPFilters](#create-bgppeers-and-bgpfilters) +1. [Attach pods to a VRF](#attach-pods-to-a-vrf) +1. [Advertise services scoped to a VRF](#advertise-services-scoped-to-a-vrf) +1. [Verify](#verify) + +The example throughout this guide configures two VRFs (`vrf1` and `vrf2`), +each carried over its own VLAN subinterface and peered with its own upstream +router. The configuration shown matches the test topology shipped in +`hack/test/kind/vrf/` in the $[prodname] source tree. + +### Plan the VRF topology + +For each VRF, decide: + +| Field | Example value (`vrf1`) | Example value (`vrf2`) | Notes | +| ----------------- | ---------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| Network name | `vrf1` | `vrf2` | Name of the `Network` resource. Referenced from pod annotations and from `BGPPeer.network`. | +| Host interface | `eth1.100` | `eth2.200` | The interface (typically a VLAN subinterface) that connects the node to the tenant fabric. Must already exist on the node. | +| Routing table | `100` | `200` | Linux routing table number. Must be unique on each node and must not overlap with `RouteTableRanges` in [FelixConfiguration](../../reference/resources/felixconfig.mdx) or with tables used by other software. | +| Pod IP pool CIDR | `10.244.100.0/24` | `10.244.200.0/24` | Pod IPs must be unique across all VRFs **and** must not be used outside the cluster. | +| Upstream router | `2.100.0.1` | `2.200.0.1` | Reachable on the host interface above. | +| Upstream AS | `65001` | `65002` | Used for the eBGP session to that VRF's router. | + +### Bring the VRF interfaces onto the nodes + +The host interfaces listed in `hostConfig.hostInterfaces` must already exist +on each node before the VRF is created — $[prodname] does not create the +underlying VLAN subinterfaces or physical links. + +For VLAN subinterfaces, configure them through your node provisioning tool +(`netplan`, `NetworkManager`, `systemd-networkd`, etc.). When the `Network` +is created, $[prodname] enslaves the interface into the VRF, which moves the +interface's IP addresses (and their local/connected routes) into the VRF's +routing table. + +### Create per-VRF IP pools + +IPAM is not VRF-aware in the tech preview, so the simplest way to keep pod IPs +per-VRF is to create a dedicated `IPPool` for each VRF and pin it to pods using +the `cni.projectcalico.org/ipv4pools` annotation. + +Use `nodeSelector: "!all()"` so that the pool is only used by pods that +explicitly request it. + +```yaml +apiVersion: projectcalico.org/v3 +kind: IPPool +metadata: + name: vrf1pool +spec: + cidr: 10.244.100.0/24 + blockSize: 29 + nodeSelector: "!all()" + ipipMode: Never + vxlanMode: Never + natOutgoing: false + disabled: false +--- +apiVersion: projectcalico.org/v3 +kind: IPPool +metadata: + name: vrf2pool +spec: + cidr: 10.244.200.0/24 + blockSize: 29 + nodeSelector: "!all()" + ipipMode: Never + vxlanMode: Never + natOutgoing: false + disabled: false +``` + +### Create the Network resources + +Create a [`Network`](../../reference/resources/network.mdx) for each VRF. In a +homogeneous cluster, a single `hostConfig` entry with an empty `nodeSelector` +applies the same configuration to every node: + +```yaml +apiVersion: projectcalico.org/v3 +kind: Network +metadata: + name: vrf1 +spec: + vrf: + routing: + inClusterMode: Local + hostConfig: + - nodeSelector: "" + routeTableIndex: 100 + hostInterfaces: + - name: "eth1.100" + staticRoutes: + - destination: 0.0.0.0/0 + action: + nextHop: "2.100.0.1" +--- +apiVersion: projectcalico.org/v3 +kind: Network +metadata: + name: vrf2 +spec: + vrf: + routing: + inClusterMode: Local + hostConfig: + - nodeSelector: "" + routeTableIndex: 200 + hostInterfaces: + - name: "eth2.200" + staticRoutes: + - destination: 0.0.0.0/0 + action: + nextHop: "2.200.0.1" +``` + +For heterogeneous clusters (for example, different racks with different VLAN +IDs or interface names), you can list multiple `hostConfig` entries with +distinct `nodeSelector`s. Each node is matched against the entries in order +and the **first match wins** — entries should have non-overlapping selectors. + +#### Setting per-VRF static routes + +`staticRoutes` are programmed into the VRF's routing table in addition to: + +- The local/connected routes derived from the IP addresses on `hostInterfaces` + (added automatically by the kernel when the interface is enslaved). +- The pod `/32`s that Felix manages. +- Any routes learned over BGP into this VRF. + +The most common static route is a default route to the upstream router so +that pods on the VRF can reach external destinations. The next-hop must be +reachable on the subnet of one of the VRF host interfaces on the node. + +### Create BGPPeers and BGPFilters + +Create one [`BGPPeer`](../../reference/resources/bgppeer.mdx) per upstream +router per VRF, setting the `network` field to the matching `Network` name. +This makes BIRD program the routes received from that peer into the VRF's +routing table (instead of the main table). + +Use [`BGPFilter`](../../reference/resources/bgpfilter.mdx) to scope what is +exported to and imported from each VRF's peers. The simplest pattern accepts +the per-VRF pod CIDR and the per-VRF service CIDRs and rejects everything +else, which keeps each VRF's prefixes from leaking into the other. + +```yaml +apiVersion: projectcalico.org/v3 +kind: BGPFilter +metadata: + name: vrf1-routes +spec: + exportV4: + - cidr: 10.244.100.0/24 # VRF1 pod CIDR + matchOperator: In + action: Accept + - cidr: 10.96.100.0/24 # VRF1 service cluster IPs (see below) + matchOperator: In + action: Accept + - action: Reject + importV4: + - cidr: 10.244.100.0/24 + matchOperator: In + action: Accept + - action: Reject +--- +apiVersion: projectcalico.org/v3 +kind: BGPPeer +metadata: + name: ext-router-1 +spec: + peerIP: 2.100.0.1 + asNumber: 65001 + network: vrf1 + sourceAddress: None # let BIRD pick the source from the VRF table + filters: + - vrf1-routes +``` + +:::note + +The `sourceAddress: None` setting prevents BIRD from forcing a node-IP source +address that doesn't sit on the VRF interface; the kernel picks the correct +source from the VRF routing table. + +::: + +Repeat for each VRF. + +### Attach pods to a VRF + +There are two ways to attach a pod to a `Network`: + +#### Option 1: Primary interface on the VRF + +Use this when the pod only needs the VRF (and you can live without cluster +DNS / API on that pod, or you've bridged them in via an external router). + +Set the `projectcalico.org/networks` annotation on the pod (or on its +namespace, with pod-level annotations winning over namespace annotations): + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: app-on-vrf1 + annotations: + projectcalico.org/networks: "vrf1" + cni.projectcalico.org/ipv4pools: '["vrf1pool"]' +spec: + containers: + - name: app + image: my-app:latest +``` + +#### Option 2: Secondary interface via Multus + +Use this when the pod needs both the flat network (for `kube-dns`, the +Kubernetes API, etc.) and a VRF. + +Create a Multus `NetworkAttachmentDefinition` whose CNI configuration sets +`"network": ""` and pins the IP pool to the VRF's pool: + +```yaml +apiVersion: 'k8s.cni.cncf.io/v1' +kind: NetworkAttachmentDefinition +metadata: + name: vrf1-secondary +spec: + config: '{ + "cniVersion": "0.3.1", + "type": "calico", + "network": "vrf1", + "log_level": "info", + "datastore_type": "kubernetes", + "nodename_file_optional": false, + "ipam": { + "type": "calico-ipam", + "assign_ipv4": "true", + "assign_ipv6": "false", + "ipv4_pools": ["vrf1pool"] + }, + "policy": { + "type": "k8s" + }, + "kubernetes": { + "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" + } + }' +``` + +Reference the NAD from the pod's `k8s.v1.cni.cncf.io/networks` annotation — +the pod will get its primary interface on the flat pod network and a +secondary interface on the VRF: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: app-with-vrf1 + annotations: + k8s.v1.cni.cncf.io/networks: vrf1-secondary@vrf1eth +spec: + containers: + - name: app + image: my-app:latest +``` + +The `@vrf1eth` suffix names the interface inside the pod (defaults to `net1` +if omitted). + +:::note + +The set of networks attached to a pod is **immutable**. To change them, the pod +must be deleted and recreated (which happens automatically when you edit a +Deployment / DaemonSet / StatefulSet, but not for standalone pods). A `Network` +must not be deleted while pods are still attached to it. + +::: + +#### Routing inside multi-interface pods + +For pods with both a flat-network interface and a VRF interface, $[prodname] +programs source-based ip rules so that responses always go out the same +interface they came in on. + +Outbound connections that the application doesn't bind to a specific interface +or source IP follow the pod's default routing table — by default this means +the flat-network interface. To direct specific outbound destinations down the +VRF interface without changing the application, add a `routes` section to the +secondary interface's IPAM block: + +```json +"ipam": { + "type": "calico-ipam", + "assign_ipv4": "true", + "ipv4_pools": ["vrf1pool"], + "routes": [ + {"dst": "10.0.0.0/8"} + ] +} +``` + +With this configuration, traffic to `10.0.0.0/8` from the pod uses the +secondary VRF interface, while everything else continues out the primary +flat-network interface. + +### Advertise services scoped to a VRF + +Kubernetes services are not VRF-aware. To make a service usable on a VRF: + +- **Pick backends in a single VRF.** The pods selected by the service should + all be on the same VRF. If you mix VRFs, kube-proxy may DNAT to a backend + that the source cannot reach. +- **Advertise the service CIDR only to that VRF's peers.** Add the service + CIDR to [BGPConfiguration](../../reference/resources/bgpconfig.mdx)'s + `serviceClusterIPs` (and/or `serviceExternalIPs` / + `serviceLoadBalancerIPs`), and add it to that VRF's `BGPFilter` so it is + only exported to that VRF's peer: + + ```yaml + apiVersion: projectcalico.org/v3 + kind: BGPConfiguration + metadata: + name: default + spec: + serviceClusterIPs: + - cidr: 10.96.100.0/24 # VRF1 service IPs + - cidr: 10.96.200.0/24 # VRF2 service IPs + ``` + +- **Pin the service cluster IP into the right CIDR.** Either reserve cluster + IPs in the right CIDR for each VRF service, or use `LoadBalancer` services + with explicit `loadBalancerIP` values from the right per-VRF CIDR. + +For services backed by pods reached via Multus secondary interfaces, you will +typically also need [multus-service](https://github.com/k8snetworkplumbingwg/multus-service) +or an equivalent controller so that the service `Endpoints` use the secondary +(VRF) IP rather than the primary (flat) IP. + +NodePort services are not supported in the tech preview; advertise services +as `LoadBalancer` cluster IPs instead. + +### Verify + +Once the resources have been applied and at least one pod is attached to a VRF, +you can spot-check the dataplane on a node: + +```bash +# 1. The VRF device exists and the configured host interface is enslaved. +ip -d link show eth1.100 | grep 'master calinv' + +# 2. The VRF routing table contains the default route to the upstream router. +ip route show table 100 +# Expect a "default via 2.100.0.1 ..." line, plus /32s for local pods on vrf1. + +# 3. List configured VRFs. +ip vrf show +``` + +Inside a VRF pod: + +```bash +ip route # the pod's own table; default route via the VRF interface +ip rule # source-based rules pinning each interface's source IP to its table +``` + +You can find the VRF associated with a $[prodname] `Network` by matching +`routeTableIndex` from the `Network` spec to the table number used in +`ip route show table N` on the node. The VRF device on the node is named +`calinv` (for example `calinv5kv...`); $[prodname] generates the suffix +from the network name and includes the table number in the output of +`ip vrf show`. + +## Limitations + +The current tech preview has the following limitations: + +- **Dataplane**: only the [nftables dataplane](../../operations/nftables.mdx) is supported. iptables and eBPF are not supported. +- **kube-proxy**: must be in `nftables` mode. `ipvs` mode is **not** supported in any cluster that uses VRFs. +- **NodePort services** are not supported on VRF networks. Use `LoadBalancer` services advertised over BGP instead. +- **Egress gateways** cannot be placed on a VRF network. Use [external networks](../egress/external-network.mdx) for that use case. +- **ExternalNetworks** and `Network` resources cannot be used in the same cluster. +- **Host endpoints** should not be applied to interfaces inside a VRF. +- **Pods can be attached to at most 9 VRFs**, the Multus secondary-interface limit. +- **Pod IPs must be unique** across all VRFs **and** must not be used outside the cluster in any VRF. +- **Node IPs** (including those on VRF subinterfaces) must be unique across all VRFs and nodes. +- **Networks attached to a pod are immutable** — change requires pod deletion and recreation. +- **Networks must not be deleted** while any pod is still attached. +- **Pods without a flat-network interface** cannot be reached by `kubelet` for HTTP/TCP liveness or readiness probes — use exec probes for those pods. +- **IPv6** has not been verified in the tech preview. +- IPAM is not VRF-aware. Use a dedicated `IPPool` per VRF and pin pods to it via `cni.projectcalico.org/ipv4pools`. + +## Troubleshooting + +| Symptom | Likely cause | What to check | +| ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| Pod stuck `ContainerCreating`, CNI ADD failing for a VRF pod. | The `Network` exists, but Felix has not yet created the VRF device on the node. The CNI plugin waits up to ~30s before failing. | `kubectl describe network `, then `ip vrf show` and `ip -d link show calinv*` on the node. Check Felix logs for VRF errors. | +| Pod cannot reach the upstream router. | The VRF host interface has no IP, or the static route's next-hop is not on its subnet. | On the node: `ip addr show eth1.100`, `ip route show table 100`. Confirm the next-hop in `staticRoutes` is on the host interface's subnet. | +| Cross-node pod-to-pod within the same VRF fails. | BGP is not established with the upstream router, or `BGPFilter` is blocking the per-VRF pod CIDR. | `kubectl get caliconodestatus` for the BGP session state; check BGP filters' `exportV4` / `importV4` cover the pod CIDR. | +| Service cluster IP works on one VRF but not the other. | Service CIDR is missing from `BGPConfiguration.serviceClusterIPs` or from the per-VRF `BGPFilter` `exportV4`, so the route isn't advertised. | Inspect both, then look for the `/32` on the upstream router. | +| `kube-dns` lookups fail from a VRF pod. | Pod has no flat-network interface. VRF-only pods can't reach in-cluster DNS unless an external router bridges the VRF to the flat network. | Use a Multus secondary interface for the VRF and keep the primary interface on the flat network. Use exec probes if needed. | +| `vrf` module not loaded; Felix logs complain about VRF setup. | The kernel `vrf` module is not loaded on the node. | `sudo modprobe vrf && lsmod \| grep '^vrf '`. Install `linux-modules-extra-$(uname -r)` if missing. | + +## Additional resources + +- [`Network` resource reference](../../reference/resources/network.mdx) +- [`BGPPeer` resource reference](../../reference/resources/bgppeer.mdx) +- [`BGPFilter` resource reference](../../reference/resources/bgpfilter.mdx) +- [`IPPool` resource reference](../../reference/resources/ippool.mdx) +- [Configure multiple Calico Enterprise networks on a pod](./multiple-networks.mdx) (Multus setup) +- [Install Calico Enterprise using the nftables data plane](../../operations/nftables.mdx) diff --git a/calico-enterprise/reference/resources/network.mdx b/calico-enterprise/reference/resources/network.mdx new file mode 100644 index 0000000000..d4f3b9ec4c --- /dev/null +++ b/calico-enterprise/reference/resources/network.mdx @@ -0,0 +1,169 @@ +--- +description: API for this Calico Enterprise resource. +--- + +# Network + +:::note + +The `Network` resource is a tech preview feature. Tech preview features may be subject to significant changes before they become GA. + +::: + +A `Network` resource represents a logical network within a $[prodname] cluster. Each +`Network` has a type (currently `vrf`) that determines how pods on that network are +isolated and how their traffic is routed. + +A `Network` of type `vrf` configures a Linux Virtual Routing and Forwarding (VRF) +domain. $[prodname] creates a Linux VRF device on each selected node, moves the +configured host interfaces into the VRF, and programs pod routes for workloads +attached to the network into the VRF's routing table. Pods on a VRF network are +isolated from pods on other networks (including the default flat pod network) +unless they are explicitly bridged outside the cluster. + +For an end-to-end how-to, see [Configure multi-VRF networking](../../networking/configuring/multi-vrf.mdx). + +For `kubectl` [commands](https://kubernetes.io/docs/reference/kubectl/overview/), +the following case-insensitive aliases may be used to specify the resource type +on the CLI: `network.projectcalico.org`, `networks.projectcalico.org` and +abbreviations such as `network.p` and `networks.p`. + +## Sample YAML + +```yaml +apiVersion: projectcalico.org/v3 +kind: Network +metadata: + name: vrf1 +spec: + vrf: + routing: + inClusterMode: Local + hostConfig: + - nodeSelector: "" + routeTableIndex: 100 + hostInterfaces: + - name: "eth1.100" + staticRoutes: + - destination: 0.0.0.0/0 + action: + nextHop: "10.100.0.1" +``` + +## Definition + +### Metadata + +| Field | Description | Accepted Values | Schema | +| ----- | ------------------------------------------------------------------ | --------------------------------------------------- | ------ | +| name | Unique name to describe this resource instance. Must be specified. | Alphanumeric string with optional `.`, `_`, or `-`. | string | + +### Spec + +Exactly one of the network-type fields must be set. Currently only `vrf` is supported. + +| Field | Description | Schema | +| ----- | ---------------------------- | ----------------------------------- | +| vrf | VRF network configuration. | [VRFNetworkSpec](#vrfnetworkspec) | + +### VRFNetworkSpec + +| Field | Description | Schema | Default | +| ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | ------- | +| routing | Cluster-wide routing behaviour for this VRF network. | [VRFRouting](#vrfrouting) | | +| hostConfig | Per-node configuration for this VRF network. At least one entry is required and at most 100 entries are allowed. When multiple entries are present (for example, one per rack), each node is matched against the entries in order and the **first matching entry wins** — all others are ignored for that node. | List of [VRFHostConfig](#vrfhostconfig) | | + +### VRFRouting + +| Field | Description | Accepted Values | Schema | Default | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | ------ | ------- | +| inClusterMode | Controls how Felix programs routes to pods on remote nodes inside the VRF routing table.

**`Local`**: program routes only to VRF pods that are local to this node; routes to pods on other nodes must be distributed via BGP (a node-to-node mesh is **not** created for VRF networks). | `Local` | string | `Local` | + +### VRFHostConfig + +| Field | Description | Accepted Values | Schema | Default | +| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ | ----------------------------------------------- | ------- | +| nodeSelector | $[prodname] selector expression that determines which nodes this configuration applies to. If omitted (or empty), the entry applies to all nodes. When multiple entries are present, the first entry whose selector matches a given node is applied and all others are ignored. | | [selector](bgppeer.mdx#selector) | `""` | +| routeTableIndex | Linux kernel routing table number to use for this VRF on the selected nodes. **Must be unique** across all VRF networks on a node, must not overlap with the `RouteTableRanges` field in [FelixConfiguration](felixconfig.mdx), and must not collide with tables used by other software on the node. Tables 253 (default), 254 (main), and 255 (local) are reserved by the kernel. **A conflict can result in network outages.** | 1 – 2147483647 | int | | +| hostInterfaces | Interfaces on the node to attach to this VRF. The IP address(es) on the interface (and their local/connected routes) move into the VRF routing table when the interface is enslaved. At least one interface should be specified for pods in the VRF to communicate outside the node. | List of [InterfaceMatch](#interfacematch) entries. | list | | +| staticRoutes | Additional routes programmed into the VRF routing table, beyond the pod routes that Felix manages automatically and the routes derived from the VRF interface addresses. Typically used to add a default route via the upstream router. | | List of [VRFStaticRoute](#vrfstaticroute) | | + +### InterfaceMatch + +Identifies a network interface. Exactly one match criterion must be set. + +| Field | Description | Accepted Values | Schema | +| ----- | -------------------------------------------------------------------------------------------------------------------------- | ---------------------- | ------ | +| name | Match a network interface by its exact device name (for example, `bond0`, `eth1`, `ens192`, `eth1.100`). | 1 – 15 characters | string | + +### VRFStaticRoute + +| Field | Description | Accepted Values | Schema | +| ----------- | -------------------------------------------------------------------------------------------------------- | --------------- | --------------------------------------- | +| destination | CIDR prefix for this route. Use `0.0.0.0/0` or `::/0` for a default route. | A valid CIDR | string | +| action | Forwarding behaviour for traffic matching this route. Exactly one action field must be set. | | [StaticRouteAction](#staticrouteaction) | + +### StaticRouteAction + +Exactly one field must be set. + +| Field | Description | Schema | +| ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | +| nextHop | Forward matching traffic to the specified gateway IP. The address must be reachable on the subnet of one of the VRF host interfaces on the node. | string | + +### Status + +`Network.status.conditions` reports the observed state of the resource as a +list of standard Kubernetes [conditions](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/condition/). + +## Attaching pods to a Network + +Pods are attached to a `Network` either through their primary CNI interface or +through Multus secondary interfaces. See [Configure multi-VRF networking](../../networking/configuring/multi-vrf.mdx) +for the full configuration. + +**Primary interface** — set an annotation on the pod (or its namespace) referencing the `Network` by name: + +```yaml +metadata: + annotations: + projectcalico.org/networks: "vrf1" +``` + +**Secondary interface (Multus)** — create a `NetworkAttachmentDefinition` whose +CNI configuration sets `"network": ""`, and reference the NAD from +the pod's `k8s.v1.cni.cncf.io/networks` annotation. + +The networks attached to a pod are **immutable**. To change them, the pod must +be deleted and recreated. + +## BGP peering for VRFs + +To distribute pod and service routes between nodes inside a VRF — and between +the cluster and the upstream fabric — create a [BGPPeer](bgppeer.mdx) for each +upstream router and set its `network` field to the name of the corresponding +`Network`. Routes received from that peer are programmed into the VRF's routing +table (instead of the main table). Use [BGPFilter](bgpfilter.mdx) to constrain +which prefixes are exported to and imported from each VRF's peers. + +## Limitations (tech preview) + +- **Dataplane**: only the [nftables dataplane](../../operations/nftables.mdx) is supported. iptables and eBPF are not supported. +- **kube-proxy**: must be in `nftables` mode. `ipvs` mode is **not** supported. +- **NodePort services** are not supported on VRF networks; advertise services as `LoadBalancer` cluster IPs instead. If you do use NodePorts on a cluster that uses VRFs, set kube-proxy's `nodePortAddresses` to a CIDR that covers the relevant interface IPs. +- **Egress gateways** cannot be placed on a VRF network. +- **ExternalNetworks** and `Network` resources cannot be used in the same cluster. +- **Host endpoints** should not be applied to interfaces inside a VRF. +- **IPv6** has not been verified in the tech preview. +- A pod can be attached to at most **9** VRFs (the Multus secondary-interface limit). +- Pod IPs must be unique across all VRFs and must not be used outside the cluster in any VRF. +- Node IPs (including those on VRF subinterfaces) must be unique across all VRFs and nodes. +- Networks attached to a pod cannot be changed without deleting and recreating the pod. +- A `Network` must not be deleted while pods are still attached to it. + +## Requirements + +- Linux kernel **5.6 or later** (for the `meta sdifname` nftables match used by VRF policy dispatch). +- The `vrf` kernel module must be loaded on every node. On Ubuntu this is part of `linux-modules-extra-$(uname -r)`. Confirm with `sudo modprobe vrf && lsmod | grep '^vrf '`. +- $[prodname] must be installed with `linuxDataplane: Nftables` and kube-proxy must also be in nftables mode. +- The cluster's [Installation](../installation/api.mdx) should pin `nodeAddressAutodetection` to a specific interface or to `kubernetes: NodeInternalIP`. The default "first found" autodetection can chase a VRF-attached interface and break the cluster when extra interfaces are added. diff --git a/sidebars-calico-enterprise.js b/sidebars-calico-enterprise.js index 59e1353926..aaca0ba572 100644 --- a/sidebars-calico-enterprise.js +++ b/sidebars-calico-enterprise.js @@ -175,6 +175,7 @@ module.exports = { 'networking/configuring/bgp-to-workload', 'networking/configuring/dual-tor', 'networking/configuring/multiple-networks', + 'networking/configuring/multi-vrf', 'networking/configuring/vxlan-ipip', 'networking/configuring/advertise-service-ips', 'networking/configuring/mtu', @@ -777,6 +778,7 @@ module.exports = { 'reference/resources/licensekey', 'reference/resources/kubecontrollersconfig', 'reference/resources/managedcluster', + 'reference/resources/network', 'reference/resources/networkpolicy', 'reference/resources/networkset', 'reference/resources/node',