Expose ReadOptions::optimize_multiget_for_io in the C API#14752
Expose ReadOptions::optimize_multiget_for_io in the C API#14752zaidoon1 wants to merge 1 commit into
Conversation
✅ clang-tidy: No findings on changed linesCompleted in 56.4s. |
f026bcb to
09ef52c
Compare
The async MultiGet support introduced two ReadOptions flags: async_io and optimize_multiget_for_io. The setter/getter for async_io was exposed in the C API in ff04fb1; the corresponding pair for optimize_multiget_for_io was not. Add the missing pair, mirroring the async_io implementation exactly. The flag is consulted in db/version_set.cc only inside the #if USE_COROUTINES guard, so this setter has no behavioral effect in non-coroutine builds. It matters for: 1. API parity with the C++ surface and with the existing async_io C API. The two flags were introduced together as part of the async MultiGet feature; only exposing one is an oversight. 2. CPU/latency tuning in USE_COROUTINES builds. Per the Asynchronous IO in RocksDB blog post, async_io=true with optimize_multiget_for_io=false (single-level parallel reads) ran 775 us/op vs 508 us/op with optimize_multiget_for_io=true (multi-level), with the latter incurring additional CPU overhead from coroutine scheduling. Without this setter, coroutine-enabled builds cannot reach the single-level configuration from C. No change to the C++ API.
09ef52c to
b3b060f
Compare
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit b3b060f ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit b3b060f SummaryClean, minimal PR that adds C API getter/setter for No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMNone. 🟢 LOW / NITL1. Missing
|
| Context | Relevant? | Notes |
|---|---|---|
| Non-coroutine builds | No effect | optimize_multiget_for_io is only read inside #if USE_COROUTINES |
| Coroutine builds | Setter works | Allows C users to toggle single-level vs multi-level parallel reads |
| All other contexts | N/A | Pure C API plumbing, no new logic |
The field is a simple bool with default true in options.h:2168. The setter uses unsigned char (C convention for bool), matching the async_io pattern. Implicit conversion from unsigned char to bool and back is well-defined.
Positive Observations
- Exact mirror of the
async_ioC API pattern — easy to review - Test checks both the default value (
true/1) and the round-trip after setting to0 - Release note is well-written with appropriate context about
USE_COROUTINES - Correct file placement in
unreleased_history/public_api_changes/
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Three corrections: 1. Performance table row was wrong about which config gets 775 us/op. The original wording said 'single-level parallel reads (works without this feature)' which is false - 775 us/op requires the coroutines feature AND async_io=true AND optimize_multiget_for_io=false. Without coroutines, both parallel paths in version_set.cc are short-circuited by the !using_coroutines() check, and MultiGet falls back to one-file-at-a-time. Now the table labels each row with the exact ReadOptions combo. 2. optimize_multiget_for_io paragraph now explains the flag as a CPU/latency knob within the coroutine-enabled space rather than an absent setter. Both 'on' (multi-level) and 'off' (within-level only) need the coroutines feature compiled in; the flag chooses which coroutine path runs. Setting it to false keeps ~40% of the latency win (1292->775 us/op) at lower CPU than the multi-level path. Notes that the setter goes in once facebook/rocksdb#14752 merges. 3. Replace 'Meta' / 'Meta's' with 'RocksDB team' / 'remote/warm-storage flash' throughout. The benchmark is the RocksDB team's, the project's affiliation isn't relevant to the section.
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106080144. |
The RocksDB C API (include/rocksdb/c.h, db/c.cc) is maintained
reactively: new C++ options land in the C++ headers first, and a
matching C wrapper is added only when someone requests it. This crate
exposes the C++ feature surface through that C API, so any feature
without a C wrapper is unreachable from Rust. Vendoring small,
upstream-shaped patches against the vendored RocksDB submodule lets us
ship those features without waiting for round-trips through
facebook/rocksdb on every release.
Three patches ship initially, each mirroring an upstream PR against
facebook/rocksdb. Each is a standard unified diff, byte-equivalent to
the C-side additions in its upstream PR (the c_test.c additions and
unreleased_history/ release-note files from the upstream PRs are
intentionally omitted: c_test.c is not compiled by our build, and the
release-note file would collide with upstream on a submodule bump).
- 0001-expose-optimize_multiget_for_io.patch — adds
rocksdb_readoptions_{set,get}_optimize_multiget_for_io, matching
upstream PR facebook/rocksdb#14752.
- 0002-expose-uniform_cv_threshold-and-kAuto.patch — adds the
rocksdb_block_based_table_index_block_search_type_auto = 2 enum
constant and the rocksdb_block_based_options_set_uniform_cv_threshold
setter needed to make kAuto index-block search take effect.
- 0003-expose-memtable_batch_lookup_optimization.patch — adds
rocksdb_options_{set,get}_memtable_batch_lookup_optimization to
enable the skip-list memtable's batch-lookup optimization for
MultiGet.
Patch application
build.rs picks up patches/*.patch and applies them to the vendored
submodule before bindgen runs. The flow is intentionally idempotent
and resilient:
* Each patch is paired with a sentinel: a string that appears in the
LAST file the patch modifies, once it has been applied. The
sentinel lets us detect already-applied state without a marker file
and without invoking git (important: crates.io tarball consumers
ship pre-patched sources and never have a working git submodule).
The sentinel-selection rule ("last file the patch modifies") is
documented in both the build.rs module comment and
patches/README.md, because a sentinel pointing at the first file
could be tricked by a half-applied state into reporting
"applied" when it isn't.
* git apply runs in two phases against the submodule directory:
first git apply --check --ignore-whitespace as an atomic preflight
(does not mutate the working tree on failure), then the real
git apply --ignore-whitespace once the preflight passes. The flow
invokes git from inside the submodule (current_dir = rocksdb/)
rather than --directory=rocksdb from the parent crate dir: the
parent repo's git treats rocksdb/ as a submodule and silently
refuses to touch paths under it ("Skipped patch 'rocksdb/db/c.cc'"
with exit 0). --ignore-whitespace, combined with the .gitattributes
pin of patches/*.patch to LF, makes patch application robust
across Linux/macOS/Windows hosts regardless of core.autocrlf
settings.
* Both git invocations capture stderr via .output() and include it
in the panic message on failure, so triaging a patch-apply
failure in CI doesn't require digging through the cargo log's
child-process stream.
System backend safety
The safe Rust wrappers (added in the next commit) reference the
patched symbols unconditionally. When a consumer builds against a
system rocksdb via ROCKSDB_LIB_DIR or pkg-config, the patches are
not applied and bindgen would silently emit a bindings.rs missing
the new declarations, surfacing as opaque ffi::… resolution errors
deep in the parent crate. To prevent that, bindings::generate
invokes patches::assert_system_header_has_required_symbols on the
System backend path: the function reads the resolved system c.h,
reuses the same PATCHES sentinel table, and panics early with the
exact list of missing symbols, the patch file that adds each, and
two concrete resolution paths (apply the patch(es) to the system
rocksdb, or drop ROCKSDB_LIB_DIR to use the bundled vendored
rocksdb). Verified by pointing ROCKSDB_LIB_DIR + ROCKSDB_INCLUDE_DIR
at a synthesised header tree with the sentinel strings stripped:
build aborts with the expected message naming all three patches.
Tarball packaging
librocksdb-sys/Cargo.toml previously had a broad "*.md" exclude
which kept the (large) vendored RocksDB documentation out of the
published tarball but also stripped patches/README.md — referenced
by the build-script panic that fires when a patch fails to apply,
and by the System-backend panic above. Narrowed the exclude to
"rocksdb/**/*.md" so the vendored docs are still stripped while
patches/README.md ships. The crate's own README.md continues to ship
via the existing readme = "README.md" Cargo override. Verified
with cargo package --list -p rust-librocksdb-sys --allow-dirty:
patches/README.md now appears alongside the three .patch files.
Maintenance
When upstream lands one of these patches in a release we bump to,
the sentinel string becomes present in the vendored sources, the
patch becomes a no-op, and both the patch file and its entry in the
PATCHES table in build.rs should be deleted on the same submodule
bump. See patches/README.md for the full hygiene protocol,
including how to regenerate a patch against a bumped submodule.
(zaidoon1)
The RocksDB C API (include/rocksdb/c.h, db/c.cc) is maintained
reactively: new C++ options land in the C++ headers first, and a
matching C wrapper is added only when someone requests it. This crate
exposes the C++ feature surface through that C API, so any feature
without a C wrapper is unreachable from Rust. Vendoring small,
upstream-shaped patches against the vendored RocksDB submodule lets us
ship those features without waiting for round-trips through
facebook/rocksdb on every release.
Three patches ship initially, each mirroring an upstream PR against
facebook/rocksdb. Each is a standard unified diff, byte-equivalent to
the C-side additions in its upstream PR (the c_test.c additions and
unreleased_history/ release-note files from the upstream PRs are
intentionally omitted: c_test.c is not compiled by our build, and the
release-note file would collide with upstream on a submodule bump).
- 0001-expose-optimize_multiget_for_io.patch — adds
rocksdb_readoptions_{set,get}_optimize_multiget_for_io, matching
upstream PR facebook/rocksdb#14752.
- 0002-expose-uniform_cv_threshold-and-kAuto.patch — adds the
rocksdb_block_based_table_index_block_search_type_auto = 2 enum
constant and the rocksdb_block_based_options_set_uniform_cv_threshold
setter needed to make kAuto index-block search take effect.
- 0003-expose-memtable_batch_lookup_optimization.patch — adds
rocksdb_options_{set,get}_memtable_batch_lookup_optimization to
enable the skip-list memtable's batch-lookup optimization for
MultiGet.
Patch application: the shadow-tree approach
Patches are NOT applied to the submodule working tree. Instead,
build.rs materialises a shadow tree under
$OUT_DIR/rocksdb-patches-shadow/ containing patched copies of every
file the patches touch, and the rest of build.rs is told to consult
the shadow for those specific paths:
- bindings::generate uses the shadow's include/rocksdb/c.h as
bindgen's primary header, and prepends the shadow's include/
directory to the bindgen include path so transitive
'#include "rocksdb/c.h"' directives resolve to the patched copy.
- vendor::build substitutes the shadow's db/c.cc for the submodule's
in the cc::Build source list, and prepends the shadow's include/
to the C++ include path so the patched header wins there too.
The submodule itself is never modified. This is the right design for
two reasons:
1. Read-only source trees. CI workflows on this repo do
'chmod -R a-w .' before building to verify build scripts don't
pollute the source tree. Other environments (Nix sandboxes, distro
packaging) impose the same constraint. Writing into the submodule
would fail there. Restricting build.rs to $OUT_DIR satisfies the
convention and keeps the CI guard rail intact.
2. Local-dev cleanliness. Modifying the submodule working tree (even
though the parent repo doesn't track those changes) shows up as
'submodule (modified content)' on every git status after a build,
which is noise.
Shadow lifecycle: build.rs fingerprints every input that affects
shadow content (each .patch file's bytes plus each shadowed source
file's bytes from the submodule) and stores the fingerprint at
<shadow>/.fingerprint. On rerun, if the stored fingerprint matches
the freshly-computed one the shadow is reused as-is — preserving
file mtimes so cc::Build doesn't redundantly rebuild db/c.cc.o on
every build.rs rerun triggered by an unrelated env-var change. On
mismatch (or no shadow yet), the shadow is wiped, recreated, and
re-patched from scratch.
git apply invocation details: git apply runs with current_dir set to
the shadow root, and with GIT_CEILING_DIRECTORIES set to the shadow's
parent. $OUT_DIR is under target/ which is inside the parent crate's
git working tree; without the ceiling, git apply would walk up to
the parent .git, treat the shadow as inside that repo, and resolve
the patch's relative paths against the parent repo's root — silently
writing nothing. GIT_CEILING_DIRECTORIES stops the upward .git
search at the named directory, so git treats the shadow as outside
any repo and resolves paths relative to current_dir.
Each patch goes through 'git apply --check --ignore-whitespace' as
an atomic preflight (does not mutate files on failure) followed by
the real 'git apply --ignore-whitespace'. --ignore-whitespace covers
'core.autocrlf=true' (Windows default) rewriting line endings in the
source files; combined with the .gitattributes pin of
'librocksdb-sys/patches/*.patch text eol=lf' at the repo root, patch
application is robust across Linux/macOS/Windows builders. Both
invocations capture stderr via .output() and surface it in the panic
message so triaging a context-mismatch failure in CI doesn't require
digging through child-process logs.
System backend safety
The safe Rust wrappers (added in the next commit) reference the
patched symbols unconditionally. When a consumer builds against a
system rocksdb via ROCKSDB_LIB_DIR or pkg-config, no shadow is built
and bindgen would silently emit a bindings.rs missing the new
declarations, surfacing as opaque ffi::… resolution errors deep in
the parent crate. To prevent that, bindings::generate invokes
patches::assert_system_header_has_required_symbols on the System
backend path: the function reads the resolved system c.h, scans for
each patch's sentinel_string, and panics early with the exact list
of missing symbols, the patch file that adds each, and two concrete
resolution paths (apply the patch(es) to the system rocksdb, or
drop ROCKSDB_LIB_DIR to use the bundled vendored rocksdb). Verified
by pointing ROCKSDB_LIB_DIR + ROCKSDB_INCLUDE_DIR at a synthesised
header tree with the sentinel strings stripped: build aborts with
the expected message naming all three patches.
Tarball packaging
librocksdb-sys/Cargo.toml previously had a broad "*.md" exclude
which kept the (large) vendored RocksDB documentation out of the
published tarball but also stripped patches/README.md — referenced
by the System-backend panic above. Narrowed the exclude to
"rocksdb/**/*.md" so the vendored docs are still stripped while
patches/README.md ships. The crate's own README.md continues to ship
via the existing readme = "README.md" Cargo override. Verified
with 'cargo package --list -p rust-librocksdb-sys --allow-dirty':
patches/README.md now appears alongside the three .patch files.
Maintenance
When upstream lands one of these patches in a release we bump to,
both the patch file and its entry in the PATCHES table in build.rs
should be deleted on the same submodule bump. See patches/README.md
for the full hygiene protocol, including how to regenerate a patch
against a bumped submodule and how to add a new patch (including the
SHADOWED_FILES list in build.rs that must be kept in sync with the
files the patches touch).
(zaidoon1)
The RocksDB C API (include/rocksdb/c.h, db/c.cc) is maintained
reactively: new C++ options land in the C++ headers first, and a
matching C wrapper is added only when someone requests it. This
crate exposes the C++ feature surface through that C API, so any
feature without a C wrapper is unreachable from Rust.
This commit adds three such missing wrappers as a purely additive
local extension. Two new files in librocksdb-sys/c-api-extensions/
declare and define the new C symbols:
- c_api_extensions.h declares the new symbols and #includes
rocksdb/c.h, making it a clean superset of the upstream C API
header. bindgen scans it as its primary input and the new
symbols flow into the generated bindings.rs automatically.
- c_api_extensions.cc defines the new symbols by reinterpret_cast-ing
the opaque-handle pointers from c.h to their C++ rep types
(ReadOptions, Options, BlockBasedTableOptions). This works because
each opaque-handle struct in rocksdb/db/c.cc is a POD wrapper with
rep as its first field; a pointer to the C type also points at
the start of rep.
Three symbols ship initially, each mirroring an upstream PR against
facebook/rocksdb:
- rocksdb_readoptions_{set,get}_optimize_multiget_for_io, matching
upstream PR facebook/rocksdb#14752.
- rocksdb_block_based_options_set_uniform_cv_threshold setter and
the rocksdb_block_based_table_index_block_search_type_auto = 2
enum constant; both are needed for kAuto index-block search to
take effect (kAuto reads a per-block is_uniform footer bit that
is only written when uniform_cv_threshold >= 0).
- rocksdb_options_{set,get}_memtable_batch_lookup_optimization for
the skip-list memtable's batch-lookup optimization for MultiGet.
build.rs
- vendor::build adds c-api-extensions/c_api_extensions.cc to the
cc::Build source list alongside the submodule's c.cc. The linker
resolves the new symbols from the extension's .o and everything
else from the submodule's .o.
- bindings::generate uses c_api_extensions.h as the primary header.
Bindgen scans the extension header + (transitively) the submodule's
c.h and emits a single bindings.rs covering both surfaces.
- A new extensions::build_for_system_backend runs on the System
backend (ROCKSDB_LIB_DIR / pkg-config) to compile the extension
as a tiny standalone static archive and link it alongside the
user's pre-built librocksdb. The user's rocksdb is unaware of the
extension; the new symbols come from our linked .a, the rest from
the system .so/.dylib/.dll.
User-facing properties
The submodule is NEVER modified. The build script writes only to
$OUT_DIR (Rust crate convention). Two consequences:
1. The build works on read-only source trees. CI workflows that do
chmod -R a-w . to verify build scripts don't pollute the source
tree pass with no special-casing. Nix sandboxes, distro
packagers, and any other read-only-source environment work
identically.
2. No git binary is required at build time. The C++ compiler the
crate already needs is sufficient. Verified by building with
PATH stripped of git and only the C++ toolchain present.
Maintenance
When upstream merges one of the matching PRs and the submodule is
bumped to a release containing it, the corresponding declaration in
c_api_extensions.h and definition in c_api_extensions.cc can be
deleted in the same commit that does the submodule bump — bindgen
will pick up the upstream definition from rocksdb/c.h automatically.
The extension's reinterpret_cast pattern relies only on rep being
the first field of each opaque-handle wrapper (a property that has
been stable in upstream c.cc for years); a hypothetical future
upstream change that adds a field before rep would surface as a
test failure rather than silent UB, because the integration tests
in tests/test_rocksdb_options.rs round-trip every option through
both an upstream code path and our extension.
(zaidoon1)
The RocksDB C API (include/rocksdb/c.h, db/c.cc) is maintained
reactively: new C++ options land in the C++ headers first, and a
matching C wrapper is added only when someone requests it. This
crate exposes the C++ feature surface through that C API, so any
feature without a C wrapper is unreachable from Rust.
This commit adds three such missing wrappers as a purely additive
local extension. Two new files in librocksdb-sys/c-api-extensions/
declare and define the new C symbols:
- c_api_extensions.h declares the new symbols and #includes
rocksdb/c.h, making it a clean superset of the upstream C API
header. bindgen scans it as its primary input and the new
symbols flow into the generated bindings.rs automatically.
- c_api_extensions.cc defines the new symbols by reinterpret_cast-ing
the opaque-handle pointers from c.h to their C++ rep types
(ReadOptions, Options, BlockBasedTableOptions). This works because
each opaque-handle struct in rocksdb/db/c.cc is a POD wrapper with
rep as its first field; a pointer to the C type also points at
the start of rep.
Three symbols ship initially, each mirroring an upstream PR against
facebook/rocksdb:
- rocksdb_readoptions_{set,get}_optimize_multiget_for_io, matching
upstream PR facebook/rocksdb#14752.
- rocksdb_block_based_options_set_uniform_cv_threshold setter and
the rocksdb_block_based_table_index_block_search_type_auto = 2
enum constant; both are needed for kAuto index-block search to
take effect (kAuto reads a per-block is_uniform footer bit that
is only written when uniform_cv_threshold >= 0).
- rocksdb_options_{set,get}_memtable_batch_lookup_optimization for
the skip-list memtable's batch-lookup optimization for MultiGet.
build.rs
- vendor::build adds c-api-extensions/c_api_extensions.cc to the
cc::Build source list alongside the submodule's c.cc. The linker
resolves the new symbols from the extension's .o and everything
else from the submodule's .o.
- bindings::generate uses c_api_extensions.h as the primary header.
Bindgen scans the extension header + (transitively) the submodule's
c.h and emits a single bindings.rs covering both surfaces.
- A new extensions::build_for_system_backend runs on the System
backend (ROCKSDB_LIB_DIR / pkg-config) to compile the extension
as a tiny standalone static archive and link it alongside the
user's pre-built librocksdb. The user's rocksdb is unaware of the
extension; the new symbols come from our linked .a, the rest from
the system .so/.dylib/.dll.
User-facing properties
The submodule is NEVER modified. The build script writes only to
$OUT_DIR (Rust crate convention). Two consequences:
1. The build works on read-only source trees. CI workflows that do
chmod -R a-w . to verify build scripts don't pollute the source
tree pass with no special-casing. Nix sandboxes, distro
packagers, and any other read-only-source environment work
identically.
2. No git binary is required at build time. The C++ compiler the
crate already needs is sufficient. Verified by building with
PATH stripped of git and only the C++ toolchain present.
Maintenance
When upstream merges one of the matching PRs and the submodule is
bumped to a release containing it, the corresponding declaration in
c_api_extensions.h and definition in c_api_extensions.cc can be
deleted in the same commit that does the submodule bump — bindgen
will pick up the upstream definition from rocksdb/c.h automatically.
The extension's reinterpret_cast pattern relies only on rep being
the first field of each opaque-handle wrapper (a property that has
been stable in upstream c.cc for years); a hypothetical future
upstream change that adds a field before rep would surface as a
test failure rather than silent UB, because the integration tests
in tests/test_rocksdb_options.rs round-trip every option through
both an upstream code path and our extension.
(zaidoon1)
|
@xingbowang merged this pull request in a3ba3e8. |
Summary
The async MultiGet support introduced two
ReadOptionsflags:async_ioandoptimize_multiget_for_io. The setter/getter forasync_iowas exposed in the C API in ff04fb154; the corresponding pair foroptimize_multiget_for_iowas not.This PR adds
rocksdb_readoptions_set_optimize_multiget_for_ioandrocksdb_readoptions_get_optimize_multiget_for_io, mirroring the existingasync_iopattern exactly.Motivation
The flag is consulted in
db/version_set.cconly inside the#if USE_COROUTINESguard, so this setter has no behavioral effect in non-coroutine builds. It matters for:API parity with the C++ surface and with the existing
async_ioC API. The two flags were introduced together as part of the async MultiGet feature; only exposing one is an oversight.CPU/latency tuning in
USE_COROUTINESbuilds. Per the Asynchronous IO in RocksDB blog post,async_io=truewithoptimize_multiget_for_io=false(single-level parallel reads) ran 775 μs/op vs 508 μs/op withoptimize_multiget_for_io=true(multi-level), with the latter incurring additional CPU overhead from coroutine scheduling. Without this setter, coroutine-enabled builds cannot reach the single-level configuration from C.