Stale /var/lib/calico/nodename causes duplicate Pod IPs via incorrect IPAM bookkeeping and subsequent leak GC

## Expected Behavior


When a node is (re)provisioned, the CNI plugin should use the **current** node's identity for all IPAM allocations and WEP creation. Even if `/var/lib/calico/nodename` contains a stale value from a previous node, the system should not:

1. Book IPAM allocations under the wrong node name
2. Garbage-collect active IP allocations that are still bound to running Pods
3. Re-assign an IP that is actively in use, resulting in duplicate Pod IPs
## Current Behavior


When a node boots with a stale `/var/lib/calico/nodename` left over from a previous node identity (e.g., node reimaged from a VM template), the following chain of events occurs:

1. `install-cni` init container completes and makes the CNI plugin available
2. kubelet immediately triggers `CmdAddK8s` for pending DaemonSet Pods
3. The CNI plugin reads the **stale** nodename from `/var/lib/calico/nodename` (via `DetermineNodename()`) and uses it for:
   - `WorkloadEndpoint.Spec.Node`
   - `IPAM allocation attrs["node"]`
   - `IPAM AutoAssignArgs.Hostname`
4. `calico-node` starts **after** the first CNI ADD calls and overwrites the nodename file with the correct value — subsequent Pods get the right identity
5. ~15 minutes later, `calico-kube-controllers` runs `allocationIsValid()` and compares `Pod.Spec.NodeName` (correct, e.g., `10-199-0-105`) against `allocation.attrs.node` (stale, e.g., `10-199-0-21`)
6. The controller concludes "Pod rescheduled on new node. Allocation no longer valid" and GCs the allocation
7. The IP is returned to the pool while the original Pod still uses it on its network interface
8. A new Pod on another node gets assigned the same IP → **duplicate Pod IP**

Evidence from CNI log on node `10-199-0-105` — first Pod booked under stale nodename:

```
2026-03-23 10:42:13.497 [INFO] k8s.go 77: Extracted identifiers for CmdAddK8s
  ContainerID="fc3a04..." Pod="csi-node-driver-rlfpx"
  WorkloadEndpoint="10--199--0--21-k8s-csi--node--driver--rlfpx-eth0"

2026-03-23 10:42:13.531 [INFO] ipam_plugin.go 270: Auto assigning IP
  Attrs:{"node":"10-199-0-21", "pod":"csi-node-driver-rlfpx", ...}
  Hostname:"10-199-0-21"

2026-03-23 10:42:13.732 [INFO] ipam.go 1216: Successfully claimed IPs: [10.200.129.198/26]
```

6 seconds later, same node uses **correct** identity for the next Pod:

```
2026-03-23 10:42:19.137 [INFO] k8s.go 77: Extracted identifiers for CmdAddK8s
  ContainerID="97062860..." Pod="node-problem-detector-774b4"
  WorkloadEndpoint="10--199--0--105-k8s-node--problem--detector--774b4-eth0"

  Attrs:{"node":"10-199-0-105", ...}
  Hostname:"10-199-0-105"
```

Controller log showing incorrect GC of the active allocation:

```
Pod rescheduled on new node. Allocation no longer valid  old=10-199-0-21 new=10-199-0-105
Candidate IP leak  ip=10.200.129.198
Confirmed IP leak after 15m0s  ip=10.200.129.198
Garbage collecting leaked IP address  ip=10.200.129.198
```

Resulting duplicate IP — two Pods on different nodes holding the same IP:

```
$ calicoctl get wep -A -o wide | grep '10.200.129.198'
calico-system  10--199--1--92-k8s-csi--node--driver--6bhc7-eth0   10-199-1-92   10.200.129.198/32
calico-system  10--199--0--105-k8s-csi--node--driver--rlfpx-eth0  10-199-0-105  10.200.129.198/32
```

This was reproduced on multiple nodes (`10-199-0-105`, `10-199-1-92`) in the same cluster, all with the same stale nodename `10-199-0-21`.
## Possible Solution


There are two contributing issues that could each be addressed:

### 1. CNI plugin: `DetermineNodename()` trusts stale file without validation

In `cni-plugin/internal/pkg/utils/utils.go`, `DetermineNodename()` reads `/var/lib/calico/nodename` and trusts its content unconditionally:

```go
func DetermineNodename(conf types.NetConf) (nodename string) {
    if conf.Nodename != "" {
        nodename = conf.Nodename
    } else if nff := nodenameFromFile(conf.NodenameFile); nff != "" {
        nodename = nff                    // ← reads stale file without validation
    } else if conf.Hostname != "" {
        nodename = conf.Hostname
    } else {
        nodename, _ = names.Hostname()
    }
    return
}
```

**Suggested fix**: Cross-validate the nodename file content against `KUBERNETES_NODE_NAME` (available from CNI args / Pod downward API environment). If they differ, prefer `KUBERNETES_NODE_NAME` or return an error. Alternatively, ensure `calico-node` writes the nodename file **before** `install-cni` signals CNI readiness.

### 2. kube-controllers: `allocationIsValid()` treats node mismatch as definitive evidence of rescheduling

In `kube-controllers/pkg/controllers/node/ipam.go`:

```go
// TODO: Do we need this check?
if p.Spec.NodeName != "" && a.knode != "" && p.Spec.NodeName != a.knode {
    logc.WithFields(fields).Info("Pod rescheduled on new node. Allocation no longer valid")
    return false
}
```

Note the existing `// TODO: Do we need this check?` comment.

This check assumes that a node mismatch means the Pod was rescheduled. But in this scenario, the Pod **never moved** — the allocation was simply recorded under the wrong node by CNI. The Pod is Running, its `status.podIP` matches the allocation, and it is actively using the IP.

**Suggested fix**: Before concluding the allocation is invalid, additionally verify whether `Pod.Status.PodIP` matches the allocated IP. If the Pod is Running on the "new" node with the exact same IP, the allocation is likely a bookkeeping error rather than a genuine reschedule — it should not be GC'd.
## Steps to Reproduce (for bugs)


1. Set up a Calico cluster using KDD (Kubernetes datastore)
2. Provision a node from a VM image/template that retains `/var/lib/calico/nodename` from a **different** node (e.g., node `B` has nodename file containing node `A`'s name)
3. Start the node — kubelet will schedule DaemonSet Pods immediately
4. Observe the startup ordering:
   - `install-cni` completes → CNI becomes available
   - First `CmdAddK8s` calls use stale nodename from the file (within ~1-3 seconds)
   - `calico-node` starts and corrects the nodename file (~3-6 seconds after install-cni)
   - Subsequent `CmdAddK8s` calls use the correct nodename
5. Wait ~15 minutes (default `leakGracePeriod`)
6. `calico-kube-controllers` logs `Garbage collecting leaked IP address` for the affected IPs
7. New Pods scheduled elsewhere may now receive the same IP → duplicate Pod IP
## Context


We operate a large Kubernetes cluster and frequently batch-add ~30 nodes at a time. Nodes are provisioned from VM templates that may retain `/var/lib/calico/` data from a previous node identity. After each batch expansion, we consistently observe duplicate Pod IPs caused by this race condition.

The impact is severe:

- **Silent traffic misrouting** — two Pods on different nodes hold the same IP, causing unpredictable network behavior
- **No error surfaced** — the duplicate is only discovered through manual inspection or when applications fail
- **Scales with cluster growth** — the more nodes provisioned in parallel, the more Pods are affected

Current workarounds:
- Deleting `/var/lib/calico/nodename` before node joins (requires changes to provisioning pipeline)
- Increasing `leakGracePeriod` (reduces probability but does not eliminate the root cause)
- Manually deleting affected Pods after detection
## Your Environment

* Calico version: v3.28.1
* Calico dataplane: iptables
* Orchestrator version: Kubernetes v1.30.5
* Operating System and version: Ubuntu 22.04 LTS (kernel 5.15.0-94-generic)
* Container runtime: containerd 1.7.23
* IPAM config: 2 workload IPPools (`10.196.0.0/15`, `10.195.128.0/17`), blockSize 26, ipipMode Always, strictAffinity false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale /var/lib/calico/nodename causes duplicate Pod IPs via incorrect IPAM bookkeeping and subsequent leak GC #12257

Expected Behavior

Current Behavior

Possible Solution

1. CNI plugin: `DetermineNodename()` trusts stale file without validation

2. kube-controllers: `allocationIsValid()` treats node mismatch as definitive evidence of rescheduling

Steps to Reproduce (for bugs)

Context

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stale /var/lib/calico/nodename causes duplicate Pod IPs via incorrect IPAM bookkeeping and subsequent leak GC #12257

Description

Expected Behavior

Current Behavior

Possible Solution

1. CNI plugin: DetermineNodename() trusts stale file without validation

2. kube-controllers: allocationIsValid() treats node mismatch as definitive evidence of rescheduling

Steps to Reproduce (for bugs)

Context

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. CNI plugin: `DetermineNodename()` trusts stale file without validation

2. kube-controllers: `allocationIsValid()` treats node mismatch as definitive evidence of rescheduling