Skip to content

fix(shield): correct response_actions ClusterRole RBAC scoping#2605

Merged
francesco-furlan merged 2 commits intomainfrom
fix/shield-response-actions-rbac-leak
Apr 29, 2026
Merged

fix(shield): correct response_actions ClusterRole RBAC scoping#2605
francesco-furlan merged 2 commits intomainfrom
fix/shield-response-actions-rbac-leak

Conversation

@francesco-furlan
Copy link
Copy Markdown
Contributor

@francesco-furlan francesco-furlan commented Apr 29, 2026

Summary

Fixes two related defects in the cluster-shield ClusterRole template that caused response_actions RBAC to render incorrectly.

Bug 1 — per-action gates ignored the master enabled flag

cluster.response_actions.<action>.is_enabled (in templates/cluster/_config.tpl) only short-circuited when an individual action's trigger was explicitly "none" — it never consulted features.respond.response_actions.enabled. With chart defaults (master flag false), every per-action gate still resolved to "true" and the cluster-shield ClusterRole was granted delete on pods, networkpolicies, and volumesnapshots, plus the isolate_network / rollout_restart / get_logs / volume_snapshot rules. Real least-privilege violation.

Bug 2 — response_actions blocks rendered outside the cluster.rbac.create wrapper

The seven response_actions.* rule blocks in templates/cluster/clusterrole.yaml lived outside the outer {{- if .Values.cluster.rbac.create }} ... {{- end }} guard. With cluster.rbac.create: false, the document head (apiVersion / kind / metadata / rules:) was correctly suppressed but the per-action rule snippets still rendered, producing a top-level YAML array that Helm could not parse:

Error: YAML parse error on shield/templates/cluster/clusterrole.yaml:
error unmarshaling JSON: while decoding JSON:
json: cannot unmarshal array into Go value of type util.SimpleHead

Reported by @EdwardArchive in #2603.

Fix

  • templates/cluster/_config.tpl: is_enabled returns false early when the master features.respond.response_actions.enabled flag is falsy. Per-action trigger: "none" overrides remain effective when the master flag is on.
  • templates/cluster/clusterrole.yaml: moved the {{- end }} of the cluster.rbac.create wrapper past the seven response_actions.* blocks so they live inside it (matching the working pattern in templates/host/clusterrole.yaml).
  • Chart.yaml: bumped to 1.36.1.

Behavior matrix

response_actions.enabled <action>.trigger Before After
false (default) unset rules leaked into ClusterRole rules absent
false "none" rules absent rules absent
true unset rules present rules present
true "none" rules absent rules absent
cluster.rbac.create response_actions.enabled Before After
true (default) any renders ClusterRole renders ClusterRole
false false (default) helm template error renders no document
false true helm template error renders no document

Tests

charts/shield/tests/cluster/clusterrole_test.yaml:

  • response_actions disabled by default does not leak per-action RBAC — asserts pods: delete,get, networkpolicies: get,delete, volumesnapshots: delete,get,watch,patch, and pods/log: get are absent under chart defaults.
  • cluster.rbac.create false renders no ClusterRolehasDocuments: count: 0.
  • cluster.rbac.create false with response_actions enabled still renders no ClusterRole — covers the original failure mode from [BUG] cluster.rbac.create: false causes helm template to fail with SimpleHead unmarshal error in templates/cluster/clusterrole.yaml #2603.
  • response_actions enabled with delete_pod trigger none suppresses only delete_pod rule — verifies fine-grained per-action overrides still work.

helm unittest --strict -f "tests/**/*_test.yaml" charts/shield: 463 / 463 pass (459 baseline + 4 new).

Test plan

  • helm unittest passes (463/463)
  • helm template with cluster.rbac.create: false no longer errors
  • Default render (no overrides) emits no response-action rules in cluster-shield ClusterRole
  • features.respond.response_actions.enabled: true still emits all per-action rules
  • enabled: true + per-action trigger: "none" suppresses only that rule
  • CI lint + helm-unit-test workflows

Closes #2603

@francesco-furlan francesco-furlan requested a review from a team as a code owner April 29, 2026 07:56
@francesco-furlan francesco-furlan force-pushed the fix/shield-response-actions-rbac-leak branch from 97e5a53 to 5999276 Compare April 29, 2026 08:01
francesco-furlan and others added 2 commits April 29, 2026 10:04
…flag

`cluster.response_actions.<action>.is_enabled` only short-circuited when
an individual action's `trigger` was explicitly set to "none"; it never
consulted `features.respond.response_actions.enabled`. With chart
defaults (master flag false), every per-action gate still resolved to
"true" and the cluster-shield ClusterRole was granted `delete` on
`pods`, `networkpolicies`, and `volumesnapshots`, plus the
`isolate_network` / `rollout_restart` / `get_logs` / `volume_snapshot`
rules — a real least-privilege violation, surfaced by a customer
during an RBAC security review.

The helper now returns "false" early when the master flag is falsy.
Per-action `trigger: "none"` overrides remain effective when the master
flag is on, so user-facing per-action disablement keeps working.

Adds unit tests covering: defaults no longer leak per-action rules, and
per-action `trigger: "none"` still suppresses only the targeted rule
when the master flag is enabled.
The seven `response_actions.*` rule blocks in
`templates/cluster/clusterrole.yaml` lived outside the outer
`{{ if .Values.cluster.rbac.create }} ... {{ end }}` wrapper. With
`cluster.rbac.create: false`, the document head (apiVersion / kind /
metadata / rules:) was correctly suppressed but the per-action rule
snippets still rendered, producing a top-level YAML array that Helm
could not parse:

    Error: YAML parse error on shield/templates/cluster/clusterrole.yaml:
    error unmarshaling JSON: while decoding JSON:
    json: cannot unmarshal array into Go value of type util.SimpleHead

Move the closing `{{ end }}` past the response_actions blocks so the
entire ClusterRole template (including those rules) is suppressed when
`cluster.rbac.create` is false, matching the pattern already used in
`templates/host/clusterrole.yaml`. Bumps the chart to 1.36.1 and adds
regression unittests covering the failure mode.

Reported and originally fixed by @EdwardArchive in #2604; this commit
folds the same change into the broader response_actions RBAC fix on
this branch.

Closes #2603

Co-Authored-By: Edward Kim <[email protected]>
@francesco-furlan francesco-furlan force-pushed the fix/shield-response-actions-rbac-leak branch from 5999276 to e1e84ac Compare April 29, 2026 08:05
@EdwardArchive
Copy link
Copy Markdown
Contributor

Thank you for hard work @francesco-furlan !

@francesco-furlan
Copy link
Copy Markdown
Contributor Author

Thank you for hard work @francesco-furlan !

Thank you for raising the issue @EdwardArchive ! 🚀

@francesco-furlan francesco-furlan merged commit 9694ac9 into main Apr 29, 2026
6 checks passed
@francesco-furlan francesco-furlan deleted the fix/shield-response-actions-rbac-leak branch April 29, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] cluster.rbac.create: false causes helm template to fail with SimpleHead unmarshal error in templates/cluster/clusterrole.yaml

3 participants