feat(inkless): extended metrics for diskless migration states tracking#589
Draft
giuseppelillo wants to merge 1 commit into
Draft
feat(inkless): extended metrics for diskless migration states tracking#589giuseppelillo wants to merge 1 commit into
giuseppelillo wants to merge 1 commit into
Conversation
71a7db1 to
25c0544
Compare
Add gauges for: - number of migrations in flight - partitions in intermediate migration states - oldest age per state Add meters for completed, failed and retried migrations.
25c0544 to
931671f
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds expanded Yammer/JMX metrics to track classic→diskless migration progress in InitDisklessLogManager, including per-state in-flight gauges, oldest-age-per-state gauges, and completed/failed/retried meters. It also updates the batch queues and tests to validate the new metrics and to avoid cross-test metric leakage.
Changes:
- Added
InitDisklessLogManagermetrics (per-state count + oldest-age gauges; completed/failed/retried meters) and wired metric refreshes into state transitions. - Added an
onRetrycallback toRetriableInitDisklessLogBatchQueueso retries can be metered. - Updated unit/flow tests to assert metric behavior and clean up metrics between test runs; plumbed
TimeintoInitDisklessLogManager.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/scala/kafka/server/InitDisklessLogManager.scala | Adds migration metrics, refresh hooks, retry/failure/completion metering, and a removeMetrics() helper. |
| core/src/main/scala/kafka/server/InitDisklessLogBatchQueue.scala | Adds onRetry callback and invokes it when a retry is enqueued. |
| core/src/main/scala/kafka/server/BrokerServer.scala | Passes time into InitDisklessLogManager construction. |
| core/src/test/scala/unit/kafka/server/InitDisklessLogManagerTest.scala | Ensures metrics are removed after each test; updates manager construction to pass time. |
| core/src/test/scala/unit/kafka/server/metadata/InitDisklessLogFlowTest.scala | Adds an end-to-end test asserting per-state gauges/ages/meters; cleans up metrics on shutdown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+275
to
+278
| metricsGroup.newGauge(MigrationsInFlightMetricName, () => trackedSize()) | ||
| metricsGroup.newGauge(WaitingForReplicationCountMetricName, () => enteredAtByTp.values().asScala.count(_.stateClass eq classOf[WaitingForReplication])) | ||
| metricsGroup.newGauge(SendingToControllerCountMetricName, () => enteredAtByTp.values().asScala.count(_.stateClass eq classOf[SendingToController])) | ||
| metricsGroup.newGauge(AwaitingMetadataCountMetricName, () => enteredAtByTp.values().asScala.count(_.stateClass eq classOf[AwaitingMetadata])) |
Comment on lines
+280
to
+282
| metricsGroup.newGauge(OldestWaitingForReplicationAgeMsMetricName, () => oldestAgeMs(classOf[WaitingForReplication])) | ||
| metricsGroup.newGauge(OldestSendingToControllerAgeMsMetricName, () => oldestAgeMs(classOf[SendingToController])) | ||
| metricsGroup.newGauge(OldestAwaitingMetadataAgeMsMetricName, () => oldestAgeMs(classOf[AwaitingMetadata])) |
| } | ||
| } | ||
|
|
||
| def removeMetrics(): Unit = metrics.removeMetrics() |
Comment on lines
+231
to
+241
| withQueueLock { | ||
| val retryAttemptNumber = attempt.attemptNumber + 1 | ||
| Option(queuedByTp.get(tp)) match { | ||
| case Some(existing) => | ||
| // Keep the already queued state (it may be fresher), but ensure retry progression is not lost. | ||
| queuedByTp.put(tp, Attempt(existing.state, Math.max(existing.attemptNumber, retryAttemptNumber))) | ||
| case None => | ||
| queuedByTp.put(tp, Attempt(attempt.state, retryAttemptNumber)) | ||
| } | ||
| } | ||
| onRetry() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add the following metrics:
ClassicToDisklessMigrationsInFlightClassicToDisklessMigrationsWaitingForReplicationCountClassicToDisklessMigrationsSendingToControllerCountClassicToDisklessMigrationsAwaitingMetadataCountClassicToDisklessMigrationOldestWaitingForReplicationAgeMsClassicToDisklessMigrationOldestSendingToControllerAgeMsClassicToDisklessMigrationOldestAwaitingMetadataAgeMsClassicToDisklessMigrationsCompletedPerSecClassicToDisklessMigrationsFailedPerSecClassicToDisklessMigrationsRetriedPerSec