fix(metrics): use sanitized labels for executor/application metric vecs by dineshkumar181094 · Pull Request #2931 · kubeflow/spark-operator

dineshkumar181094 · 2026-05-06T14:17:38Z

Purpose of this PR

SparkExecutorMetrics stored the unsanitized label list while registering the Prometheus vecs with the sanitized list, so any user label containing '-' resolved to an unknown key in getMetricLabels and silently failed GetMetricWith — the data point was dropped instead of recorded.

SparkApplicationMetrics had the symmetric bug in getMetricLabels: it re-applied the metric prefix when sanitizing pod label keys while the constructor sanitized with an empty prefix, so when prefix was non-empty every label collapsed to "Unknown".

Align both paths so labels are sanitized once with an empty prefix and the stored label set matches the keys produced at observation time.

Proposed changes:

Fixes in the metrics.

Change Category

Bugfix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that could affect existing functionality)
Documentation update

Rationale

Checklist

I have conducted a self-review of my own code.
I have updated documentation accordingly.
I have added tests that prove my changes are effective or that my feature works.
Existing unit tests pass locally with my changes.

Additional Notes

google-oss-prow · 2026-05-06T14:17:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

Fixes Prometheus metric label sanitization mismatches in Spark executor/application metrics so that label keys used during observation match the label names used when registering the corresponding *Vec metrics (preventing dropped datapoints when user-provided labels include - and when a metrics prefix is configured).

Changes:

Store the sanitized label list inside SparkExecutorMetrics so GetMetricWith uses the same label keys as the registered *Vecs.
Sanitize SparkApplication label keys without re-applying the metrics prefix in getMetricLabels, aligning it with constructor-time sanitization.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
internal/metrics/sparkpod_metrics.go	Aligns stored executor label names with the sanitized label names used to register executor metric vecs.
internal/metrics/sparkapplication_metrics.go	Ensures SparkApplication label sanitization at observation time matches constructor-time sanitization (no prefix re-application).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SparkExecutorMetrics stored the unsanitized label list while registering the Prometheus vecs with the sanitized list, so any user label containing '-' resolved to an unknown key in getMetricLabels and silently failed GetMetricWith — the data point was dropped instead of recorded. SparkApplicationMetrics had the symmetric bug in getMetricLabels: it re-applied the metric prefix when sanitizing pod label keys while the constructor sanitized with an empty prefix, so when prefix was non-empty every label collapsed to "Unknown". Align both paths so labels are sanitized once with an empty prefix and the stored label set matches the keys produced at observation time. Signed-off-by: dineshkumar181094 <dineshkumar181094@gmail.com>

vjanelle

LGTM

google-oss-prow · 2026-05-06T14:32:40Z

@vjanelle: changing LGTM is restricted to collaborators

Details

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nabuskey

Great catch. Thank you!

Copilot AI review requested due to automatic review settings May 6, 2026 14:17

google-oss-prow Bot requested review from ImpSy and nabuskey May 6, 2026 14:17

google-oss-prow Bot added the size/XS label May 6, 2026

Copilot started reviewing on behalf of dineshkumar181094 May 6, 2026 14:18 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

dineshkumar181094 force-pushed the fix/metrics-invalid-prometheus-labels branch from 6aeffde to 806e251 Compare May 6, 2026 14:25

vjanelle approved these changes May 6, 2026

View reviewed changes

nabuskey approved these changes May 8, 2026

View reviewed changes

google-oss-prow Bot assigned nabuskey May 8, 2026

google-oss-prow Bot added the lgtm label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metrics): use sanitized labels for executor/application metric vecs#2931

fix(metrics): use sanitized labels for executor/application metric vecs#2931
dineshkumar181094 wants to merge 1 commit intokubeflow:masterfrom
dineshkumar181094:fix/metrics-invalid-prometheus-labels

dineshkumar181094 commented May 6, 2026

Uh oh!

google-oss-prow Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

vjanelle left a comment

Uh oh!

google-oss-prow Bot commented May 6, 2026

Uh oh!

nabuskey left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dineshkumar181094 commented May 6, 2026

Purpose of this PR

Change Category

Rationale

Checklist

Additional Notes

Uh oh!

google-oss-prow Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

vjanelle left a comment

Choose a reason for hiding this comment

Uh oh!

google-oss-prow Bot commented May 6, 2026

Uh oh!

nabuskey left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants