fix(kustomize,helm): Align Helm and Kustomize RBAC, args, and sideEffects via code tracing#2936
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate semantic drift between the Helm chart and Kustomize manifests for the spark-operator controller/webhook by aligning RBAC permissions, deployment args (notably API client throttling and feature flags), and webhook sideEffects settings based on traced runtime requirements.
Changes:
- Adjust webhook/controller RBAC rules to remove unused permissions and add missing cache-required permissions.
- Make Kustomize controller/webhook deployments explicit about API client throttling (QPS/burst) and controller feature flags/timestamp precision.
- Align Helm webhook configurations to
sideEffects: None(matching kubebuilder markers).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
config/webhook/deployment.yaml |
Adds explicit --kube-api-qps / --kube-api-burst args to the Kustomize webhook deployment. |
config/webhook/clusterrole.yaml |
Removes webhook ClusterRole permissions deemed unused (e.g., events/resourcequotas/status/finalizers). |
config/rbac/leader-election-role.yaml |
Adds events create/update/patch to the Kustomize controller leader-election Role. |
config/manager/manager.yaml |
Adds explicit controller args for QPS/burst, feature gates, and ScheduledSparkApplication timestamp precision. |
charts/spark-operator-chart/templates/webhook/validatingwebhookconfiguration.yaml |
Changes validating webhooks sideEffects to None. |
charts/spark-operator-chart/templates/webhook/mutatingwebhookconfiguration.yaml |
Changes mutating webhooks sideEffects to None. |
charts/spark-operator-chart/templates/webhook/rbac.yaml |
Removes events permissions from the Helm webhook ClusterRole. |
charts/spark-operator-chart/templates/webhook/_helpers.tpl |
Updates Helm webhook policyRules to include pods watch/list and scheduledsparkapplications access. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - watch | ||
| - apiGroups: |
| resources: | ||
| - pods |
7ed2b2c to
fdf98de
Compare
…ects via code tracing Remove over-provisioned webhook ClusterRole permissions (events, resourcequotas, finalizers/status, extra verbs) that have zero code usage in the webhook binary. Add missing pods and scheduledsparkapplications to Helm policyRules that the controller-runtime cache requires. Add events permission to Kustomize controller leader-election Role (controller has 19 recorder.Eventf calls). Add explicit --kube-api-qps, --kube-api-burst, --feature-gates, and --timestamp-precision flags to Kustomize deployments matching Helm defaults. Fix Helm webhook sideEffects from NoneOnDryRun to None matching all 7 kubebuilder markers (follow-up PR#2915) Signed-off-by: roburishabh <[email protected]>
fdf98de to
642a82e
Compare
| - watch | ||
| - apiGroups: | ||
| - "" | ||
| resources: |
There was a problem hiding this comment.
This removes resourcequotas access, but the webhook still needs it. SparkApplicationValidator.validateResourceUsage() in internal/webhook/sparkapplication_validator.go:192 calls v.client.List(ctx, resourceQuotaList, ...) to list ResourceQuotas when --enable-resource-quota-enforcement is enabled. Removing this RBAC rule will cause 403 errors for anyone using that flag.
The flag defaults to false so this won't be caught by CI or default installs — it'll only break at runtime for users who opted in.
There was a problem hiding this comment.
Thanks for the review @alinaryan. So the removal here was intentional. Kustomize's config/webhook/deployment.yaml doesn't expose --enable-resource-quota-enforcement at all, the flag is always at its default (false), so the webhook will never call List on ResourceQuota. But, I understand your point, replacing resourcequotas with pods instead of adding pods alongside it will cause this 403 error. I removed it to make it aligned with Kustomize, because If Helm has resourcequotas in the ClusterRole but Kustomize doesn't, that's a drift item that the drift detection CI gate (PR #2933) will flag. so what I can do, I will add resourcequotas: list to both Helm policyRules AND Kustomize ClusterRole, zero drift, and both deployment methods support the feature if a user enables it. Thanks for catching it.
| - --kube-api-qps=20 | ||
| - --kube-api-burst=30 | ||
| - --feature-gates=PartialRestart=false,LoadSparkDefaults=false | ||
| - --scheduled-spark-application-timestamp-precision=nanos |
There was a problem hiding this comment.
These values (--kube-api-qps=20, --kube-api-burst=30) override the Go client defaults and force every downstream user to either accept them or patch around them. Same for --feature-gates=PartialRestart=false,LoadSparkDefaults=false. I believe these are already the defaults. Consider moving these to an overlay or removing them.
There was a problem hiding this comment.
I agree here all four values match the Go defaults and adding them doesn't change runtime behavior. The reason they're here: Helm's values.yaml renders all four as explicit args in every deployment. Without them in Kustomize, the drift detection CI gate (PR #2933) flags them as missing, Helm has them, Kustomize doesn't.
So its a tradeoff:
- Keeping them: Zero drift, args serve as documentation of the operating parameters, and the drift test passes without special-case exclusions.
- Removing them: Fewer args to maintain, but re-introduces drift items that the CI gate will flag, requiring either
continue-on-errorin the workflow or normalization logic in the test
I'd prefer to keep them for consistency with Helm, but I understand the maintenance concern. What do you prefer ?
…sterRole Signed-off-by: roburishabh <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Purpose of this PR
Follow-up to #2915 where the review called out: "We need to file a follow-up issue to fix the Helm chart templates too, if required to keep Helm chart in proper alignment with Kustomize." and it fixes all the drifts occurring in #2933 and eliminates all semantic drift between Helm and Kustomize.
Proposed changes:
events,resourcequotas,finalizers/status, extragetverb on pods) — code tracing proves zero usage in the webhook binarypods(list/watch) andscheduledsparkapplications(get/list/watch) to Helm webhook policyRules — the controller-runtime cache unconditionally needs these (Helm bug)events(create/update/patch) to Kustomize controller leader-election Role — controller has 3 EventRecorders and 19recorder.Eventfcalls--kube-api-qps=20,--kube-api-burst=30,--feature-gates,--scheduled-spark-application-timestamp-precision=nanosto Kustomize controller deployment--kube-api-qps=20,--kube-api-burst=30to Kustomize webhook deploymentsideEffectsfromNoneOnDryRun→Noneto match all 7 kubebuilder markers (completes fix(webhook): Fix path for mutating ScheduledSparkApplication, correct sideEffects, remove update verb in manifests #2915 Helm-side)Change Category
Rationale
The Kustomize manifests had accumulated drift from the Helm chart over time. Rather than blindly aligning to Helm, each permission was code-traced to determine what's genuinely required:
eventsfrom webhookEventRecorder/Eventfininternal/webhook/orcmd/operator/webhook/resourcequotasfrom webhook--enable-resource-quota-enforcement=true(defaults tofalse)pods+scheduledsparkapplicationsto Helmcache.ByObjectincmd/operator/webhook/start.go:396-403eventsrecorder.Eventfcalls ininternal/controller/sparkapplication/controller.goleasesbyresourceNamessideEffects: None+kubebuilder:webhookmarkers declaresideEffects=NoneResult: 14 semantic checks pass with zero drift between Helm and Kustomize rendered output.
Checklist
Additional Notes
kustomize-drift-checkbranch) will validate this PR passes once both branches are merged