Skip to content

client/v3: warn when watcher stream buffer is backlogged#21704

Open
xigang wants to merge 1 commit into
etcd-io:mainfrom
xigang:watchstream_buf
Open

client/v3: warn when watcher stream buffer is backlogged#21704
xigang wants to merge 1 commit into
etcd-io:mainfrom
xigang:watchstream_buf

Conversation

@xigang
Copy link
Copy Markdown

@xigang xigang commented May 4, 2026

@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xigang
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown

Hi @xigang. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Copy Markdown
Member

/cc @richabanker

@k8s-ci-robot
Copy link
Copy Markdown

@serathius: GitHub didn't allow me to request PR reviews from the following users: richabanker.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @richabanker

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Copy Markdown
Member

This looks similar to K8s logic for detecting buffer growing. Let's take learning from it kubernetes/kubernetes#138126

@serathius
Copy link
Copy Markdown
Member

/cc @mborsz

@k8s-ci-robot
Copy link
Copy Markdown

@serathius: GitHub didn't allow me to request PR reviews from the following users: mborsz.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @mborsz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread client/v3/watch.go Outdated
@serathius
Copy link
Copy Markdown
Member

/ok-to-test

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 92.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.35%. Comparing base (a3ad218) to head (be24a63).

Files with missing lines Patch % Lines
client/v3/block_logger.go 91.66% 1 Missing and 1 partial ⚠️
client/v3/watch.go 92.30% 1 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/block_logger.go 91.66% <91.66%> (ø)
client/v3/watch.go 94.10% <92.30%> (-0.10%) ⬇️

... and 19 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21704      +/-   ##
==========================================
+ Coverage   70.22%   70.35%   +0.12%     
==========================================
  Files         426      427       +1     
  Lines       35211    35261      +50     
==========================================
+ Hits        24727    24807      +80     
+ Misses       9090     9059      -31     
- Partials     1394     1395       +1     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3ad218...be24a63. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xigang xigang force-pushed the watchstream_buf branch from 2bc61ec to e2661b1 Compare May 5, 2026 11:01
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels May 5, 2026
@xigang
Copy link
Copy Markdown
Author

xigang commented May 6, 2026

@richabanker PTAL. Thanks.

Comment thread client/v3/watch.go Outdated
zap.String("range-end", ws.initReq.end),
zap.Int("buffered-responses", bufLen),
)
ws.nextBufWarnLen *= 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would only log when the buffer hits 1024, 2048, 4096, etc. Which is great for detecting growth of the buffer, but if the consumer is slow for an extended period of time and say the buffer hovers at 1500 for an hour, it would only log once at the very beginning?

What about doing some time interval based logging instead of this size based logging (perhaps thats what @serathius meant by this comment) - basically log at most once per interval (say every 1 minute) as long as the buffer is above the threshold (keep threshold static) ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. The current implementation has this issue.

I checked the PR @serathius mentioned, and I’ll adjust this to use time-based logging with a static threshold instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I changed this to time-based rate-limited logging, so it logs at most once per interval while buffered delivery is blocked.

@xigang xigang force-pushed the watchstream_buf branch 2 times, most recently from 38bc8e0 to 768cfb5 Compare May 12, 2026 06:08
@xigang xigang force-pushed the watchstream_buf branch from 768cfb5 to be24a63 Compare May 12, 2026 06:48
@serathius
Copy link
Copy Markdown
Member

/retest

@serathius
Copy link
Copy Markdown
Member

@mborsz could you take a look if this implementation matches same requirements as you proposed in kubernetes/kubernetes#138126 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

watcherStream.buf in apiserver can grow indefinitely

4 participants