fix: Reprogram NCs missing from CNS for Succeeded CRs after restart by hunter32292 · Pull Request #4357 · Azure/azure-container-networking

hunter32292 · 2026-04-16T16:06:19Z

When CNS restarts or loses persisted state, NetworkContainers may be lost from the in-memory ContainerIDByOrchestratorContext map while the corresponding MultiTenantNetworkContainer CRs remain in Succeeded state.

Previously, the reconciler skipped all CRs not in Initialized state, meaning Succeeded CRs with missing NCs were never reprogrammed. This caused permanent CNI ADD failures (Code 18: UnknownContainerID) with no self-healing path.

Now, the reconciler allows Succeeded CRs through to the NC existence check. If the NC exists in CNS, reconciliation is skipped as before (no behavior change for the happy path). If the NC is missing, it is reprogrammed from the CR's
status fields.

Transient CNS errors are not masked — only UnknownContainerID triggers reprogramming, matching the existing behavior for Initialized CRs.

Reason for Change: Fix permanent CNI ADD failures after CNS restart when NCs are lost from memory but CRs remain in Succeeded state.

Issue Fixed:

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
relevant PR labels added

Notes:

No behavior change for the happy path (Succeeded CRs with NCs present in CNS are still skipped)
Only UnknownContainerID triggers reprogramming; transient errors are surfaced as before
Three new test cases cover: missing NC reprogramming, existing NC skip, and transient error handling

When CNS restarts or loses persisted state, NetworkContainers may be lost from the in-memory ContainerIDByOrchestratorContext map while the corresponding MultiTenantNetworkContainer CRs remain in Succeeded state. Previously, the reconciler skipped all CRs not in Initialized state, meaning Succeeded CRs with missing NCs were never reprogrammed. This caused permanent CNI ADD failures (Code 18: UnknownContainerID) with no self-healing path. Now, the reconciler allows Succeeded CRs through to the NC existence check. If the NC exists in CNS, reconciliation is skipped as before (no behavior change for the happy path). If the NC is missing, it is reprogrammed from the CR's status fields. Transient CNS errors are not masked — only UnknownContainerID triggers reprogramming, matching the existing behavior for Initialized CRs. Co-authored-by: Copilot <[email protected]>

Copilot

Pull request overview

Updates the multi-tenant NetworkContainer CR reconciler so that Succeeded CRs are no longer skipped before verifying NC existence in CNS, enabling self-healing reprogramming after CNS restarts that lose in-memory NC state.

Changes:

Allow Succeeded CRs to proceed to the CNS NC existence check (previously only Initialized CRs were reconciled).
If CNS reports UnknownContainerID for a Succeeded CR, reprogram the NC from CR status; otherwise surface transient CNS errors.
Add unit tests covering succeeded+missing NC reprogramming, succeeded+existing NC skip, and transient error propagation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
cns/multitenantcontroller/multitenantoperator/multitenantcrdreconciler.go	Adjusts reconcile gating to include Succeeded CRs and adds a warning log when reprogramming a missing NC.
cns/multitenantcontroller/multitenantoperator/multitenantcrdreconciler_test.go	Adds targeted tests for the new Succeeded-state reconciliation behavior and error handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rbtr · 2026-04-16T19:57:42Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-04-16T19:57:53Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Agent-Logs-Url: https://github.com/Azure/azure-container-networking/sessions/808be914-b861-4e6d-bfbb-db52b24297d2 Co-authored-by: rbtr <[email protected]>

rbtr · 2026-04-16T20:29:03Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-04-16T20:29:16Z

Azure Pipelines successfully started running 1 pipeline(s).

Agent-Logs-Url: https://github.com/Azure/azure-container-networking/sessions/808be914-b861-4e6d-bfbb-db52b24297d2 Co-authored-by: rbtr <[email protected]>

rbtr · 2026-04-17T15:36:52Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-04-17T15:37:03Z

Azure Pipelines successfully started running 1 pipeline(s).

…locally to update the base images.

rbtr · 2026-04-20T20:31:20Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-04-20T20:31:31Z

Azure Pipelines successfully started running 1 pipeline(s).

QxBytes

signing off for dockerfile changes

hunter32292 requested a review from a team as a code owner April 16, 2026 16:06

hunter32292 requested review from Copilot and rbtr April 16, 2026 16:06

Copilot started reviewing on behalf of hunter32292 April 16, 2026 16:07 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread cns/multitenantcontroller/multitenantoperator/multitenantcrdreconciler.go Outdated

rbtr requested a review from Copilot April 16, 2026 19:57

Copilot started reviewing on behalf of rbtr April 16, 2026 19:57 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Copilot started work on behalf of rbtr April 16, 2026 20:21 View session

fix: use warn-level log when reprogramming missing succeeded NC

4b77e1e

Agent-Logs-Url: https://github.com/Azure/azure-container-networking/sessions/808be914-b861-4e6d-bfbb-db52b24297d2 Co-authored-by: rbtr <[email protected]>

rbtr enabled auto-merge April 16, 2026 20:28

chore: finalize review feedback follow-up

0f599f1

Agent-Logs-Url: https://github.com/Azure/azure-container-networking/sessions/808be914-b861-4e6d-bfbb-db52b24297d2 Co-authored-by: rbtr <[email protected]>

Copilot AI requested review from QxBytes and santhoshmprabhu as code owners April 16, 2026 20:31

chore: revert unintended go.sum update

a17c0b0

Agent-Logs-Url: https://github.com/Azure/azure-container-networking/sessions/808be914-b861-4e6d-bfbb-db52b24297d2 Co-authored-by: rbtr <[email protected]>

Copilot finished work on behalf of rbtr April 16, 2026 20:33

rbtr requested a review from thatmattlong April 17, 2026 15:37

rbtr assigned rbtr and unassigned rbtr Apr 17, 2026

hunter32292 added 2 commits April 20, 2026 14:56

fix linting issues

73e6c39

Run based on failure Changes detected. Please run 'make dockerfiles' …

9bf256a

…locally to update the base images.

hunter32292 requested review from a team as code owners April 20, 2026 20:03

thatmattlong reviewed Apr 20, 2026

View reviewed changes

Comment thread .pipelines/build/dockerfiles/azure-iptables-monitor.Dockerfile

thatmattlong approved these changes Apr 20, 2026

View reviewed changes

rbtr approved these changes Apr 20, 2026

View reviewed changes

QxBytes approved these changes Apr 20, 2026

View reviewed changes

rbtr added this pull request to the merge queue Apr 20, 2026

Merged via the queue into master with commit 754f60b Apr 20, 2026
33 checks passed

rbtr deleted the jostupka/fix-stale-nc-reconciliation branch April 20, 2026 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Reprogram NCs missing from CNS for Succeeded CRs after restart#4357

fix: Reprogram NCs missing from CNS for Succeeded CRs after restart#4357
rbtr merged 6 commits intomasterfrom
jostupka/fix-stale-nc-reconciliation

hunter32292 commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

rbtr commented Apr 16, 2026

Uh oh!

azure-pipelines Bot commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

rbtr commented Apr 16, 2026

Uh oh!

azure-pipelines Bot commented Apr 16, 2026

Uh oh!

rbtr commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

Uh oh!

rbtr commented Apr 20, 2026

Uh oh!

azure-pipelines Bot commented Apr 20, 2026

Uh oh!

QxBytes left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

hunter32292 commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

rbtr commented Apr 16, 2026

Uh oh!

azure-pipelines Bot commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

rbtr commented Apr 16, 2026

Uh oh!

azure-pipelines Bot commented Apr 16, 2026

Uh oh!

rbtr commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

Uh oh!

rbtr commented Apr 20, 2026

Uh oh!

azure-pipelines Bot commented Apr 20, 2026

Uh oh!

QxBytes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants