Skip to content

feat(ci): Implement drift detection as Go test in test/drift/#2933

Open
RobuRishabh wants to merge 5 commits into
kubeflow:masterfrom
RobuRishabh:kustomize-drift-check
Open

feat(ci): Implement drift detection as Go test in test/drift/#2933
RobuRishabh wants to merge 5 commits into
kubeflow:masterfrom
RobuRishabh:kustomize-drift-check

Conversation

@RobuRishabh
Copy link
Copy Markdown
Contributor

@RobuRishabh RobuRishabh commented May 7, 2026

Purpose of this PR

This PR implements an automated Helm-Kustomize Drift Detection Pipeline that uses the Helm chart as the "source of truth" and compares its rendered output against Kustomize build output, reporting any unauthorized divergence.

Proposed changes:

  • Add test/drift/drift_test.go — a Go test package that renders both Helm and Kustomize outputs into typed Kubernetes objects and performs semantic comparison of RBAC rules, webhook configurations, and deployment specs
  • Add test/drift/drift-check-values.yaml — Helm values that align template output with Kustomize defaults for fair comparison
  • Add .github/workflows/kustomize-drift-check.yaml — CI workflow triggered on PRs/pushes affecting config/, charts/, internal/, or test/drift/
  • Update Makefile:
    • Add drift-check target for local developer experience (make drift-check)
    • Exclude test/drift/ from unit-test target (it has its own dedicated workflow)

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

The Kustomize manifests were abandoned and broke because there was no automated gate to ensure they stayed current with Helm chart updates. A byte-for-byte comparison isn't feasible due to inherent differences in naming, labeling, and parameterization schemes between Helm and Kustomize. Instead, this test performs semantic comparison by:

  1. Parsing rendered YAML into typed Go structs (rbacv1.ClusterRole, appsv1.Deployment, admissionregistrationv1.MutatingWebhookConfiguration, etc.)
  2. Normalizing expected differences (naming conventions, namespace variable substitution, rule merging)
  3. Comparing the fields that matter: RBAC verbs/resources, webhook paths/operations/failurePolicy, and deployment args/ports/probes/securityContext

This approach uses only libraries already in go.mod (k8s.io/api, k8s.io/apimachinery, testify) and follows the same pattern as the existing test/kustomize/ package — no new external dependencies.

Note: The drift-check workflow uses continue-on-error: true because 7 genuine drift items currently exist between Helm and Kustomize. These will be fixed in follow-up PRs, after which the continue-on-error will be removed to make the check blocking.

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

The values file is placed under test/drift/ (not charts/spark-operator-chart/ci/) to avoid chart-testing automatically using it as an install test configuration.

The test currently detects 7 genuine drift items:

Area Drift
WebhookClusterRole Kustomize has extra permissions (pods, scheduledsparkapplications, finalizers/status)
ControllerRole Kustomize missing events permission
MutatingWebhookPaths Kustomize missing scheduledsparkapplication webhook
MutatingWebhookRules Pod webhook operations differ (CREATE vs CREATE+UPDATE)
ValidatingWebhookPaths Kustomize has duplicate sparkapplication entry
ControllerArgs Kustomize missing --feature-gates, --kube-api-burst/qps, --scheduled-spark-application-timestamp-precision
WebhookArgs Kustomize missing --kube-api-burst/qps

These represent real accumulated drift that will be fixed in follow-up PRs. Once fixed, the CI gate will prevent future drift from being introduced silently.


Commands to test

# Run via Makefile (same as CI)
make drift-check

# Run Go test directly
go test ./test/drift/ -v -count=1

# Run with explicit binary paths
HELM=/opt/homebrew/bin/helm go test ./test/drift/ -v -count=1

# Run with both explicit paths
HELM=/opt/homebrew/bin/helm KUSTOMIZE=./bin/kustomize-v5.4.1 go test ./test/drift/ -v -count=1

# Run specific subtest groups
HELM=/opt/homebrew/bin/helm go test ./test/drift/ -v -count=1 -run "TestDrift/RBAC"
HELM=/opt/homebrew/bin/helm go test ./test/drift/ -v -count=1 -run "TestDrift/WebhookConfiguration"
HELM=/opt/homebrew/bin/helm go test ./test/drift/ -v -count=1 -run "TestDrift/DeploymentSpecs"

# Verify existing unit tests still pass (drift is excluded)
make unit-test

# Verify compilation
go vet ./test/drift/

- Add test/drift/drift_test.go with 17 subtests covering resource
  inventory, ClusterRoles, Roles, MutatingWebhookConfiguration,
  ValidatingWebhookConfiguration, and Deployment container specs
- Update Makefile drift-check target to run go test ./test/drift/
- Update workflow path triggers for test/drift/**
- Remove scripts/drift-check.sh and config/drift-check-allowlist.yaml
  in favor of logic encoded directly in Go

Signed-off-by: roburishabh <roburishabh@outlook.com>
Copilot AI review requested due to automatic review settings May 7, 2026 16:42
@google-oss-prow google-oss-prow Bot requested review from ImpSy and nabuskey May 7, 2026 16:43
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a CI drift-detection gate that semantically compares rendered Kubernetes objects from the Helm chart (source of truth) against the repository’s Kustomize manifests, to prevent future divergence.

Changes:

  • Introduces a new Go test suite under test/drift/ that renders Helm + Kustomize outputs and compares RBAC rules, webhook configs, and deployment container specs.
  • Adds a Helm values file used to align Helm rendering with Kustomize defaults for fair comparison.
  • Wires the drift check into CI via a dedicated GitHub Actions workflow and a make drift-check target.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
test/drift/drift_test.go Implements Helm/Kustomize rendering, normalization, and semantic comparisons for drift detection.
Makefile Adds a drift-check target for running the drift test locally and in CI.
charts/spark-operator-chart/ci/drift-check-values.yaml Helm values used to render comparable output to Kustomize defaults.
.github/workflows/kustomize-drift-check.yaml Adds a workflow that runs the drift check on relevant PRs/pushes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/drift/drift_test.go
if err == io.EOF {
break
}
continue
Comment thread test/drift/drift_test.go
Comment on lines +350 to +355
for _, w := range wh.Webhooks {
e := webhookEntry{
FailPolicy: string(*w.FailurePolicy),
SideEffects: string(*w.SideEffects),
}
if w.ClientConfig.Service != nil {
Comment thread test/drift/drift_test.go
Comment on lines +372 to +377
var entries []webhookEntry
for _, w := range wh.Webhooks {
e := webhookEntry{
FailPolicy: string(*w.FailurePolicy),
SideEffects: string(*w.SideEffects),
}
Comment thread test/drift/drift_test.go
Comment on lines +516 to +522
for kind := range helmKinds {
assert.Contains(t, kustKinds, kind,
"resource kind %q exists in Helm but not in Kustomize", kind)
}
for kind := range kustKinds {
assert.Contains(t, helmKinds, kind,
"resource kind %q exists in Kustomize but not in Helm", kind)
Comment thread test/drift/drift_test.go
Comment on lines +59 to +64
// Role names differ
helmControllerRole = "spark-operator-controller"
kustControllerRole = "spark-operator-controller"
helmWebhookRole = "spark-operator-webhook"
kustWebhookRole = "spark-operator-webhook"

uses: actions/setup-go@4b73464bb391d4059bd26b0524d20df3927bd417 # v6
with:
go-version-file: go.mod

Signed-off-by: roburishabh <roburishabh@outlook.com>
Signed-off-by: roburishabh <roburishabh@outlook.com>
…Selector

Signed-off-by: roburishabh <roburishabh@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants