Skip to content

fix(metrics): use sanitized labels for executor/application metric vecs#2931

Open
dineshkumar181094 wants to merge 1 commit intokubeflow:masterfrom
dineshkumar181094:fix/metrics-invalid-prometheus-labels
Open

fix(metrics): use sanitized labels for executor/application metric vecs#2931
dineshkumar181094 wants to merge 1 commit intokubeflow:masterfrom
dineshkumar181094:fix/metrics-invalid-prometheus-labels

Conversation

@dineshkumar181094
Copy link
Copy Markdown

Purpose of this PR

SparkExecutorMetrics stored the unsanitized label list while registering the Prometheus vecs with the sanitized list, so any user label containing '-' resolved to an unknown key in getMetricLabels and silently failed GetMetricWith — the data point was dropped instead of recorded.

SparkApplicationMetrics had the symmetric bug in getMetricLabels: it re-applied the metric prefix when sanitizing pod label keys while the constructor sanitized with an empty prefix, so when prefix was non-empty every label collapsed to "Unknown".

Align both paths so labels are sanitized once with an empty prefix and the stored label set matches the keys produced at observation time.

Proposed changes:

  • Fixes in the metrics.

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

Copilot AI review requested due to automatic review settings May 6, 2026 14:17
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Prometheus metric label sanitization mismatches in Spark executor/application metrics so that label keys used during observation match the label names used when registering the corresponding *Vec metrics (preventing dropped datapoints when user-provided labels include - and when a metrics prefix is configured).

Changes:

  • Store the sanitized label list inside SparkExecutorMetrics so GetMetricWith uses the same label keys as the registered *Vecs.
  • Sanitize SparkApplication label keys without re-applying the metrics prefix in getMetricLabels, aligning it with constructor-time sanitization.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
internal/metrics/sparkpod_metrics.go Aligns stored executor label names with the sanitized label names used to register executor metric vecs.
internal/metrics/sparkapplication_metrics.go Ensures SparkApplication label sanitization at observation time matches constructor-time sanitization (no prefix re-application).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SparkExecutorMetrics stored the unsanitized label list while registering
the Prometheus vecs with the sanitized list, so any user label containing
'-' resolved to an unknown key in getMetricLabels and silently failed
GetMetricWith — the data point was dropped instead of recorded.

SparkApplicationMetrics had the symmetric bug in getMetricLabels: it
re-applied the metric prefix when sanitizing pod label keys while the
constructor sanitized with an empty prefix, so when prefix was non-empty
every label collapsed to "Unknown".

Align both paths so labels are sanitized once with an empty prefix and
the stored label set matches the keys produced at observation time.

Signed-off-by: dineshkumar181094 <dineshkumar181094@gmail.com>
@dineshkumar181094 dineshkumar181094 force-pushed the fix/metrics-invalid-prometheus-labels branch from 6aeffde to 806e251 Compare May 6, 2026 14:25
Copy link
Copy Markdown

@vjanelle vjanelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@google-oss-prow
Copy link
Copy Markdown
Contributor

@vjanelle: changing LGTM is restricted to collaborators

Details

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants