fix(metrics): use sanitized labels for executor/application metric vecs#2931
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Fixes Prometheus metric label sanitization mismatches in Spark executor/application metrics so that label keys used during observation match the label names used when registering the corresponding *Vec metrics (preventing dropped datapoints when user-provided labels include - and when a metrics prefix is configured).
Changes:
- Store the sanitized label list inside
SparkExecutorMetricssoGetMetricWithuses the same label keys as the registered*Vecs. - Sanitize SparkApplication label keys without re-applying the metrics prefix in
getMetricLabels, aligning it with constructor-time sanitization.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/metrics/sparkpod_metrics.go | Aligns stored executor label names with the sanitized label names used to register executor metric vecs. |
| internal/metrics/sparkapplication_metrics.go | Ensures SparkApplication label sanitization at observation time matches constructor-time sanitization (no prefix re-application). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
SparkExecutorMetrics stored the unsanitized label list while registering the Prometheus vecs with the sanitized list, so any user label containing '-' resolved to an unknown key in getMetricLabels and silently failed GetMetricWith — the data point was dropped instead of recorded. SparkApplicationMetrics had the symmetric bug in getMetricLabels: it re-applied the metric prefix when sanitizing pod label keys while the constructor sanitized with an empty prefix, so when prefix was non-empty every label collapsed to "Unknown". Align both paths so labels are sanitized once with an empty prefix and the stored label set matches the keys produced at observation time. Signed-off-by: dineshkumar181094 <dineshkumar181094@gmail.com>
6aeffde to
806e251
Compare
|
@vjanelle: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
nabuskey
left a comment
There was a problem hiding this comment.
Great catch. Thank you!
Purpose of this PR
SparkExecutorMetrics stored the unsanitized label list while registering the Prometheus vecs with the sanitized list, so any user label containing '-' resolved to an unknown key in getMetricLabels and silently failed GetMetricWith — the data point was dropped instead of recorded.
SparkApplicationMetrics had the symmetric bug in getMetricLabels: it re-applied the metric prefix when sanitizing pod label keys while the constructor sanitized with an empty prefix, so when prefix was non-empty every label collapsed to "Unknown".
Align both paths so labels are sanitized once with an empty prefix and the stored label set matches the keys produced at observation time.
Proposed changes:
Change Category
Rationale
Checklist
Additional Notes