Skip to content

Update Helm release netdata to v3.7.164#2136

Open
renovate[bot] wants to merge 2 commits intomainfrom
renovate/netdata-3.x
Open

Update Helm release netdata to v3.7.164#2136
renovate[bot] wants to merge 2 commits intomainfrom
renovate/netdata-3.x

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate Bot commented May 4, 2026

This PR contains the following updates:

Package Update Change Pending
netdata (source) patch 3.7.1633.7.164 3.7.165

Release Notes

netdata/helmchart (netdata)

v3.7.164

Compare Source

Real-time performance monitoring, done right!


Configuration

📅 Schedule: (in timezone America/New_York)

  • Branch creation
    • "after 2am and before 8am on monday"
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate Bot added the renovate label May 4, 2026
@renovate renovate Bot requested a review from claytono as a code owner May 4, 2026 06:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

netdata (helm) 3.7.163 -> 3.7.164

Risk: 🔴 Risk

The Deep Dive

Update Scope

Helm chart netdata/netdata 3.7.163 → 3.7.164. The chart bump itself is a single commit (Update agent version to v2.10.2 (#529)) — there are no template changes. The substantive change is the bundled netdata agent image: netdata/netdata:v2.9.0netdata/netdata:v2.10.2 (visible at child/daemonset.yaml:54, parent/deployment.yaml, and k8s-state/deployment.yaml). That is a minor agent bump that crosses v2.10.0, v2.10.1, and v2.10.2. The kustomize overlay does NOT pin the image (kubernetes/netdata/kustomization.yaml has no images: entry), so the agent really is upgrading from v2.9.0 to v2.10.2 on parent, child DaemonSet, and k8s-state Deployment.

Performance & Stability

From v2.10.0: faster agent startup, ML prediction optimization, Alerts API speedup, multiple crash and race fixes, plus an eBPF memory/PID indexing optimization (#22050) that — see Hazards — actually introduced the regression flagged below. From v2.10.1: SNMP collector retries-serialization fix (#22179) and a buffered dyncfg command channel that reduces transient 503s on dynamic enable/disable (#22183). From v2.10.2: diskspace.plugin ZFS NULL-guard / lightweight pool-capacity cache (#22188) — not relevant here, the homelab is on Synology iSCSI + NFS, not ZFS; SNMP MaxOIDs default reduced 60→20 and IF-MIB 32-bit fallbacks removed (#22203) — not relevant, no SNMP collector configured; and false-positive dyncfg "timed out" warning fix (#22201).

Features & UX

Most v2.10.0 features ride along but require explicit opt-in and are not configured here:

  • Secrets management (#21951, #22081, #22083; docs) — env/file/cmd resolvers + AWS/Azure/GCP/Vault secretstores. Not configured. Current values inject the streaming API key via an init container that templates ${NETDATA_STREAM_API_KEY} from a K8s Secret (values.yaml:42-67); this continues to work unchanged. To use the new feature you would put ${env:NETDATA_STREAM_API_KEY} directly in stream.conf and drop the init container, but there is no benefit — the existing setup already keeps the secret out of the configmap.
  • Nagios plugins collector (#21908, #22008; docs) and Azure Monitor collector (#21993, #22007, #22095; docs) — not relevant, no Nagios checks or Azure resources in this homelab.
  • Dell PowerStore (#21929) / PowerVault ME4/ME5 (#21936) collectors — not relevant, storage is Synology iSCSI + NFS.
  • vSphere datastore/cluster/resource-pool monitoring (#21924), MSSQL Always-On AG monitoring (#21927), and SNMP IPSec/VPN profiles for FortiGate, Juniper, MikroTik, Check Point (#21926) — none configured (only prometheus go.d module is enabled per values.yaml:78-93 and the child configmap; PVE exporter is the only prometheus job; no vSphere, MSSQL, or SNMP devices in the homelab).
  • Docker container-listing function (#21868) — NOT enabled in this deployment. The go.d/docker module needs /var/run/docker.sock; the child DaemonSet's volumeMounts (helm/netdata/child/daemonset.yaml:97-129) include /host/proc, /host/sys, /host/etc/os-release, /host/var/log and the netdata configs only — no docker socket. Cluster nodes also use containerd (no docker daemon), so the listing function would have nothing to talk to even if the module loaded. The child go.d configmap explicitly enables prometheus: yes and disables pulsar: no (helm/netdata/child/configmap.yaml:15-18), with no docker: directive — leaving it on default-disable for this deployment.
  • Advanced alerting — recurrence rules for Alerts Silencing, evaluating alert definitions against historical data, and acknowledging in-flight alerts (introduced in v2.10.0; see the "Advanced Alerting" highlight in the v2.10.0 release notes — these are Netdata Cloud / dashboard-side features and do not appear individually in the agent contributions list). Not applicable on the child, [health] enabled = no is set in the child configmap (helm/netdata/child/configmap.yaml:39-40); alerts run on the parent only, and most of these features are surfaced through Netdata Cloud which is not used here (the parent is exposed via Authentik-protected ingress on netdata.k.oneill.net).

Security

No CVEs introduced or resolved by this update. Searched GitHub Security Advisories for netdata/netdata — the only published advisories all affect netdata 1.x and were fixed before v2.0.0:

None apply to either the current v2.9.0 deployment or the proposed v2.10.2 — both are well above the highest patched range. Posture is unchanged by this PR.

Key Fixes

Most fixes do not apply to this deployment:

  • SNMP retries/MaxOIDs/IF-MIB fixes (#22179, #22203) — no SNMP collector configured (go.d.conf enables only prometheus).
  • diskspace.plugin ZFS crash guard (#22188) — no ZFS in use; storage is Synology iSCSI and NFS (values.yaml:4-6,11-16).
  • Dyncfg back-pressure / 503 fixes (#22183, #22201) — applies whenever you toggle collectors via the Netdata UI; mildly useful but not load-bearing for this deployment.
  • The eBPF memory optimization (#22050) bundled in v2.10.0 is what introduces the hazard below — it is not a fix from the user's perspective.

Newer Versions

Strongly consider waiting for chart 3.7.165 instead of merging this PR.

  • Chart netdata-3.7.165 was published 2026-04-28 and bumps the agent to v2.10.3 (#530).
  • Agent v2.10.3 ships #22232, which fixes a critical eBPF regression introduced in v2.10.0 (see Hazards).
  • Renovate's 14-day minimumReleaseAge means the 3.7.165 PR will land soon. Holding this PR until 3.7.165 is proposed avoids deploying a known-buggy intermediate. Alternatively, merging this and immediately taking the next Renovate PR also works; just don't sit on v2.10.2 for an extended period on the child DaemonSets.

Hazards & Risks

Critical regression in v2.10.0–v2.10.2 fixed in v2.10.3: #22232 — "ebpf.plugin: fix PID accounting shared-memory pool leak and 100% CPU spin". Root cause: the eBPF memory/PID indexing rewrite (#22050) merged 2026-03-29 (post-v2.9.0, in v2.10.0) added aggregation call sites that allocate slots in the per-PID shared-memory pool (/dev/shm/netdata_shm_integration_ebpf, 32,768 slots) without a matching reset path. Symptoms reported on the upstream PR: on a host with ~1,300 live PIDs and normal churn, the pool fills in ~15 hours, after which bpf_map_get_next_key() enters an infinite loop and one CPU core is pegged at 100%.

Why this affects this deployment:

  • The child DaemonSet (helm/netdata/child/daemonset.yaml:32-34,131-134) runs on every node with hostPID: true, hostIPC: true, hostNetwork: true, and securityContext.capabilities.add: [SYS_PTRACE, SYS_ADMIN] — exactly the privileges ebpf.plugin needs.
  • The child ConfigMap (helm/netdata/child/configmap.yaml) does not contain a [plugins] section disabling ebpf — only [health] enabled = no and [ml] enabled = no are set. With default plugin gating and the privileges above, ebpf.plugin will load.
  • Each child pod accumulates the leak independently. After ~15 hours per node, expect a stuck thread pegging a CPU until the pod is restarted.

Other notable items (not blocking):

  • v2.10.0 deprecation notice: API v1 and v2 are deprecated; only v3 will be supported in the next major release. No action required for this minor bump — the parent ingress (netdata.k.oneill.net) and dashboard continue to work.
  • v2.10.0 SNMP profile additions (FortiGate, Juniper, MikroTik, Check Point IPSec/VPN) are new but unused — no SNMP jobs configured.
  • Chart label chart: netdata-3.7.163chart: netdata-3.7.164 updates on every rendered resource — purely cosmetic.

No effect on these features: Authentik SSO via the components/authentik overlay, External Secrets streaming TLS bundle (netdata-streaming-tls), Synology iSCSI/NFS PVCs, parent streaming over netdata-stream.k.oneill.net:19999:SSL, kubelet/kubeproxy go.d jobs on the child, prometheus PVE-exporter scrape on the parent. None of these were touched between v2.9.0 and v2.10.2.

Sources


🔴 Verdict: Risk

Marked renovate:risk because this chart "patch" actually carries a v2.9.0 → v2.10.2 minor agent bump that includes the eBPF.plugin shared-memory pool leak / 100% CPU spin regression introduced in v2.10.0 (#22050) and not fixed until v2.10.3 / chart 3.7.165. The deployment runs the privileged child DaemonSet on every node where ebpf.plugin will load by default, so each node would hit the leak after ~15 hours. Recommend waiting for the upcoming Renovate PR for chart 3.7.165 (v2.10.3) and merging that instead, or merging this and the 3.7.165 PR back-to-back rather than sitting on v2.10.2.

@renovate renovate Bot force-pushed the renovate/netdata-3.x branch 16 times, most recently from 15bfbfe to 9987978 Compare May 10, 2026 14:33
@renovate renovate Bot force-pushed the renovate/netdata-3.x branch from 5f797d6 to 5b5add2 Compare May 11, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants