Expected Behavior
When a node is (re)provisioned, the CNI plugin should use the current node's identity for all IPAM allocations and WEP creation. Even if /var/lib/calico/nodename contains a stale value from a previous node, the system should not:
- Book IPAM allocations under the wrong node name
- Garbage-collect active IP allocations that are still bound to running Pods
- Re-assign an IP that is actively in use, resulting in duplicate Pod IPs
Current Behavior
When a node boots with a stale /var/lib/calico/nodename left over from a previous node identity (e.g., node reimaged from a VM template), the following chain of events occurs:
install-cni init container completes and makes the CNI plugin available
- kubelet immediately triggers
CmdAddK8s for pending DaemonSet Pods
- The CNI plugin reads the stale nodename from
/var/lib/calico/nodename (via DetermineNodename()) and uses it for:
WorkloadEndpoint.Spec.Node
IPAM allocation attrs["node"]
IPAM AutoAssignArgs.Hostname
calico-node starts after the first CNI ADD calls and overwrites the nodename file with the correct value — subsequent Pods get the right identity
- ~15 minutes later,
calico-kube-controllers runs allocationIsValid() and compares Pod.Spec.NodeName (correct, e.g., 10-199-0-105) against allocation.attrs.node (stale, e.g., 10-199-0-21)
- The controller concludes "Pod rescheduled on new node. Allocation no longer valid" and GCs the allocation
- The IP is returned to the pool while the original Pod still uses it on its network interface
- A new Pod on another node gets assigned the same IP → duplicate Pod IP
Evidence from CNI log on node 10-199-0-105 — first Pod booked under stale nodename:
2026-03-23 10:42:13.497 [INFO] k8s.go 77: Extracted identifiers for CmdAddK8s
ContainerID="fc3a04..." Pod="csi-node-driver-rlfpx"
WorkloadEndpoint="10--199--0--21-k8s-csi--node--driver--rlfpx-eth0"
2026-03-23 10:42:13.531 [INFO] ipam_plugin.go 270: Auto assigning IP
Attrs:{"node":"10-199-0-21", "pod":"csi-node-driver-rlfpx", ...}
Hostname:"10-199-0-21"
2026-03-23 10:42:13.732 [INFO] ipam.go 1216: Successfully claimed IPs: [10.200.129.198/26]
6 seconds later, same node uses correct identity for the next Pod:
2026-03-23 10:42:19.137 [INFO] k8s.go 77: Extracted identifiers for CmdAddK8s
ContainerID="97062860..." Pod="node-problem-detector-774b4"
WorkloadEndpoint="10--199--0--105-k8s-node--problem--detector--774b4-eth0"
Attrs:{"node":"10-199-0-105", ...}
Hostname:"10-199-0-105"
Controller log showing incorrect GC of the active allocation:
Pod rescheduled on new node. Allocation no longer valid old=10-199-0-21 new=10-199-0-105
Candidate IP leak ip=10.200.129.198
Confirmed IP leak after 15m0s ip=10.200.129.198
Garbage collecting leaked IP address ip=10.200.129.198
Resulting duplicate IP — two Pods on different nodes holding the same IP:
$ calicoctl get wep -A -o wide | grep '10.200.129.198'
calico-system 10--199--1--92-k8s-csi--node--driver--6bhc7-eth0 10-199-1-92 10.200.129.198/32
calico-system 10--199--0--105-k8s-csi--node--driver--rlfpx-eth0 10-199-0-105 10.200.129.198/32
This was reproduced on multiple nodes (10-199-0-105, 10-199-1-92) in the same cluster, all with the same stale nodename 10-199-0-21.
Possible Solution
There are two contributing issues that could each be addressed:
1. CNI plugin: DetermineNodename() trusts stale file without validation
In cni-plugin/internal/pkg/utils/utils.go, DetermineNodename() reads /var/lib/calico/nodename and trusts its content unconditionally:
func DetermineNodename(conf types.NetConf) (nodename string) {
if conf.Nodename != "" {
nodename = conf.Nodename
} else if nff := nodenameFromFile(conf.NodenameFile); nff != "" {
nodename = nff // ← reads stale file without validation
} else if conf.Hostname != "" {
nodename = conf.Hostname
} else {
nodename, _ = names.Hostname()
}
return
}
Suggested fix: Cross-validate the nodename file content against KUBERNETES_NODE_NAME (available from CNI args / Pod downward API environment). If they differ, prefer KUBERNETES_NODE_NAME or return an error. Alternatively, ensure calico-node writes the nodename file before install-cni signals CNI readiness.
2. kube-controllers: allocationIsValid() treats node mismatch as definitive evidence of rescheduling
In kube-controllers/pkg/controllers/node/ipam.go:
// TODO: Do we need this check?
if p.Spec.NodeName != "" && a.knode != "" && p.Spec.NodeName != a.knode {
logc.WithFields(fields).Info("Pod rescheduled on new node. Allocation no longer valid")
return false
}
Note the existing // TODO: Do we need this check? comment.
This check assumes that a node mismatch means the Pod was rescheduled. But in this scenario, the Pod never moved — the allocation was simply recorded under the wrong node by CNI. The Pod is Running, its status.podIP matches the allocation, and it is actively using the IP.
Suggested fix: Before concluding the allocation is invalid, additionally verify whether Pod.Status.PodIP matches the allocated IP. If the Pod is Running on the "new" node with the exact same IP, the allocation is likely a bookkeeping error rather than a genuine reschedule — it should not be GC'd.
Steps to Reproduce (for bugs)
- Set up a Calico cluster using KDD (Kubernetes datastore)
- Provision a node from a VM image/template that retains
/var/lib/calico/nodename from a different node (e.g., node B has nodename file containing node A's name)
- Start the node — kubelet will schedule DaemonSet Pods immediately
- Observe the startup ordering:
install-cni completes → CNI becomes available
- First
CmdAddK8s calls use stale nodename from the file (within ~1-3 seconds)
calico-node starts and corrects the nodename file (~3-6 seconds after install-cni)
- Subsequent
CmdAddK8s calls use the correct nodename
- Wait ~15 minutes (default
leakGracePeriod)
calico-kube-controllers logs Garbage collecting leaked IP address for the affected IPs
- New Pods scheduled elsewhere may now receive the same IP → duplicate Pod IP
Context
We operate a large Kubernetes cluster and frequently batch-add ~30 nodes at a time. Nodes are provisioned from VM templates that may retain /var/lib/calico/ data from a previous node identity. After each batch expansion, we consistently observe duplicate Pod IPs caused by this race condition.
The impact is severe:
- Silent traffic misrouting — two Pods on different nodes hold the same IP, causing unpredictable network behavior
- No error surfaced — the duplicate is only discovered through manual inspection or when applications fail
- Scales with cluster growth — the more nodes provisioned in parallel, the more Pods are affected
Current workarounds:
- Deleting
/var/lib/calico/nodename before node joins (requires changes to provisioning pipeline)
- Increasing
leakGracePeriod (reduces probability but does not eliminate the root cause)
- Manually deleting affected Pods after detection
Your Environment
- Calico version: v3.28.1
- Calico dataplane: iptables
- Orchestrator version: Kubernetes v1.30.5
- Operating System and version: Ubuntu 22.04 LTS (kernel 5.15.0-94-generic)
- Container runtime: containerd 1.7.23
- IPAM config: 2 workload IPPools (
10.196.0.0/15, 10.195.128.0/17), blockSize 26, ipipMode Always, strictAffinity false
Expected Behavior
When a node is (re)provisioned, the CNI plugin should use the current node's identity for all IPAM allocations and WEP creation. Even if
/var/lib/calico/nodenamecontains a stale value from a previous node, the system should not:Current Behavior
When a node boots with a stale
/var/lib/calico/nodenameleft over from a previous node identity (e.g., node reimaged from a VM template), the following chain of events occurs:install-cniinit container completes and makes the CNI plugin availableCmdAddK8sfor pending DaemonSet Pods/var/lib/calico/nodename(viaDetermineNodename()) and uses it for:WorkloadEndpoint.Spec.NodeIPAM allocation attrs["node"]IPAM AutoAssignArgs.Hostnamecalico-nodestarts after the first CNI ADD calls and overwrites the nodename file with the correct value — subsequent Pods get the right identitycalico-kube-controllersrunsallocationIsValid()and comparesPod.Spec.NodeName(correct, e.g.,10-199-0-105) againstallocation.attrs.node(stale, e.g.,10-199-0-21)Evidence from CNI log on node
10-199-0-105— first Pod booked under stale nodename:6 seconds later, same node uses correct identity for the next Pod:
Controller log showing incorrect GC of the active allocation:
Resulting duplicate IP — two Pods on different nodes holding the same IP:
This was reproduced on multiple nodes (
10-199-0-105,10-199-1-92) in the same cluster, all with the same stale nodename10-199-0-21.Possible Solution
There are two contributing issues that could each be addressed:
1. CNI plugin:
DetermineNodename()trusts stale file without validationIn
cni-plugin/internal/pkg/utils/utils.go,DetermineNodename()reads/var/lib/calico/nodenameand trusts its content unconditionally:Suggested fix: Cross-validate the nodename file content against
KUBERNETES_NODE_NAME(available from CNI args / Pod downward API environment). If they differ, preferKUBERNETES_NODE_NAMEor return an error. Alternatively, ensurecalico-nodewrites the nodename file beforeinstall-cnisignals CNI readiness.2. kube-controllers:
allocationIsValid()treats node mismatch as definitive evidence of reschedulingIn
kube-controllers/pkg/controllers/node/ipam.go:Note the existing
// TODO: Do we need this check?comment.This check assumes that a node mismatch means the Pod was rescheduled. But in this scenario, the Pod never moved — the allocation was simply recorded under the wrong node by CNI. The Pod is Running, its
status.podIPmatches the allocation, and it is actively using the IP.Suggested fix: Before concluding the allocation is invalid, additionally verify whether
Pod.Status.PodIPmatches the allocated IP. If the Pod is Running on the "new" node with the exact same IP, the allocation is likely a bookkeeping error rather than a genuine reschedule — it should not be GC'd.Steps to Reproduce (for bugs)
/var/lib/calico/nodenamefrom a different node (e.g., nodeBhas nodename file containing nodeA's name)install-cnicompletes → CNI becomes availableCmdAddK8scalls use stale nodename from the file (within ~1-3 seconds)calico-nodestarts and corrects the nodename file (~3-6 seconds after install-cni)CmdAddK8scalls use the correct nodenameleakGracePeriod)calico-kube-controllerslogsGarbage collecting leaked IP addressfor the affected IPsContext
We operate a large Kubernetes cluster and frequently batch-add ~30 nodes at a time. Nodes are provisioned from VM templates that may retain
/var/lib/calico/data from a previous node identity. After each batch expansion, we consistently observe duplicate Pod IPs caused by this race condition.The impact is severe:
Current workarounds:
/var/lib/calico/nodenamebefore node joins (requires changes to provisioning pipeline)leakGracePeriod(reduces probability but does not eliminate the root cause)Your Environment
10.196.0.0/15,10.195.128.0/17), blockSize 26, ipipMode Always, strictAffinity false