Avoid reusing db_stress listeners across open retries#14765
Conversation
|
| Check | Count |
|---|---|
concurrency-mt-unsafe |
1 |
| Total | 1 |
Details
db_stress_tool/db_stress_test_base.cc (1 warning(s))
db_stress_tool/db_stress_test_base.cc:149:7: warning: function is not thread safe [concurrency-mt-unsafe]
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 84045ee ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 84045ee SummaryClean, well-motivated refactoring of db_stress listener initialization into a reusable helper, correctly addressing stale listener state across No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1.
|
| Context | Affected? | Notes |
|---|---|---|
| MultiOpsTxnsStressTest | Yes, safe | RegisterAdditionalListeners() is virtual and overridden to add MultiOpsTxnsStressListener. The helper calls it correctly, so fresh listeners are created for that subclass too. |
| Transaction DB path | No | The retry loop with InitializeListenersForOpen is inside the !FLAGS_use_txn branch. Transaction DB opens don't use fault injection retries. |
| ReadOnly DB | No | ReadOnly opens go through the same while(true) loop and benefit from the fix. |
Positive Observations
- The fix correctly identifies the root cause: stale per-
DBImpllistener state carried across open retries. - Extracting
InitializeListenersForOpenreduces code duplication and makes the retry path clearly match the initial path. - The
options_.listeners.clear()at the top of the helper ensures no accumulation of listeners across retries. - Good commit message with detailed diagnosis of the original failure.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Root cause: StressTest::Open() built DbStressListener once before its DB::Open() retry loop. With open fault injection, a DB::Open() attempt can create a DBImpl, schedule background compaction, and then fail during late open work such as persisting OPTIONS. During teardown of that failed DBImpl, DBImpl's shutdown flag can suppress later compaction callbacks, leaving listener-local compaction bookkeeping stale. Diagnosis: Sandcastle DB LOGs showed file 16821 flushed, then a failed open attempt with an injected read error and a background compaction picking 16821. The crash was in DbStressListener::OnCompactionBegin, so this was stale db_stress listener state across open attempts rather than DBImpl allowing a real concurrent compaction of the same SST. Fix: factor listener construction into InitializeListenersForOpen() and call it before each DB::Open() attempt, including the retry path after open/open-compaction failure. Each DBImpl open attempt now gets fresh listener state. Verification: make clean; make db_stress -j192; make check-sources; git diff --check.
84045ee to
3edfee1
Compare
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D105969381. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 3edfee1 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 3edfee1 SummaryClean, well-motivated fix for a real stress-test crash. The refactoring is correct and minimal. The No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMNone. 🟢 LOW / NITL1. Second retry path also reinitializes listeners before
|
| Context | Applies? | Assessment |
|---|---|---|
| TransactionDB path | NO | Retry loop is inside !FLAGS_use_txn branch. Unaffected. |
| MultiOpsTxnsStressTest | YES | RegisterAdditionalListeners() correctly called inside the helper, so multi-ops txn listener also refreshed on retry. |
| BlobDB path | YES | Both BlobDB and regular DB::Open share options_.listeners. Correct. |
| ReadOnly open | YES | DB::OpenForReadOnly also uses options_.listeners. Correct. |
Positive Observations
- Excellent root cause analysis in the PR description.
- Minimal, focused refactoring -- extracts exactly the right code.
- No behavioral change on the happy path.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang merged this pull request in e42af37. |
Summary
InitializeListenersForOpen()helper.DB::Open()after injected open or open-compaction failures.Context
Test Plan
make cleanmake db_stress -j192make check-sourcesgit diff --check