Skip to content

Add opt-in coroutines feature for multi-level async MultiGet#218

Merged
zaidoon1 merged 10 commits into
masterfrom
add-coroutines-feature
May 18, 2026
Merged

Add opt-in coroutines feature for multi-level async MultiGet#218
zaidoon1 merged 10 commits into
masterfrom
add-coroutines-feature

Conversation

@zaidoon1
Copy link
Copy Markdown
Owner

@zaidoon1 zaidoon1 commented May 18, 2026

Adds a coroutines cargo feature. When enabled, RocksDB is compiled with USE_COROUTINES=1 and linked against folly. Calling ReadOptions::set_async_io(true) on a MultiGet then issues parallel io_uring reads across SST files in multiple LSM levels, which can lower MultiGet latency on slow storage.

Meta's published benchmark is on remote/network flash. They haven't published numbers for local NVMe; the gain there is probably smaller, but I haven't measured it.

Linux only. Needs liburing >= 2.7 (Ubuntu 25.10+ via apt; older distros are handled by the script which builds liburing from source). Needs gcc <= 14 or clang — gcc-15 breaks folly's pinned libunwind.

Usage:

./scripts/build_folly.sh
export ROCKSDB_FOLLY_INSTALL_PATH=$PWD/librocksdb-sys/folly-build/installed
cargo build --features coroutines,io-uring

The folly build takes ~20-30 minutes the first time. CI caches it.

Caveats:

  • folly's getdeps only builds libglog and libgflags as .so files. The final binary needs them at runtime via LD_LIBRARY_PATH or an rpath in the binary crate's build.rs. README has details.
  • Not compatible with mt_static (folly precludes a fully static build).
  • optimize_multiget_for_io can't be set from Rust yet. It defaults to true, which is the right choice for coroutine builds. Expose ReadOptions::optimize_multiget_for_io in the C API facebook/rocksdb#14752 adds the C API; once that merges and we bump the submodule, the setter goes in here.

Full details in the README's "Async MultiGet with C++20 Coroutines" section.

zaidoon1 added 10 commits May 17, 2026 22:59
Adds a new cargo feature, `coroutines`, that compiles RocksDB with
USE_COROUTINES=1 + USE_FOLLY=1 and links against folly. When enabled
together with `io-uring` and ReadOptions::set_async_io(true), MultiGet
issues parallel io_uring reads across SST files in different LSM
levels, not just within a single level. Per the RocksDB Asynchronous
IO blog post, on remote/slow storage this drops MultiGet latency from
~775 to ~508 us/op (~30% reduction) at the cost of ~6-15% extra CPU.

Linux only. The feature panics at build time on other targets.

Changes:

  librocksdb-sys/Cargo.toml, Cargo.toml
    New `coroutines` feature in both crates.

  librocksdb-sys/build.rs
    - validate_coroutines_target() rejects non-Linux targets early.
    - coroutines_compile_config() sets USE_COROUTINES/USE_FOLLY/
      FOLLY_NO_CONFIG/HAVE_CXX11_ATOMIC, adds -fcoroutines on GCC,
      silences folly-induced warnings, and adds include paths for
      folly + its 8 dependencies.
    - coroutines_link_config() emits the cargo:rustc-link-* directives
      for folly, boost (7 components), double-conversion, libevent,
      libsodium, fmt, glog, and gflags. Glog and gflags are linked
      dynamically because folly's getdeps does not produce static
      archives for them, with rpath entries embedded so the final
      binary can find them without LD_LIBRARY_PATH.
    - Link config runs in main() (not inside build_rocksdb) so it also
      applies when ROCKSDB_LIB_DIR points at an externally-built
      librocksdb compiled with USE_COROUTINES.
    - Dependency directories are resolved via glob since folly's
      getdeps install layout uses commit-hash-suffixed dir names.

  scripts/build_folly.sh
    Helper that wraps RocksDB's own `make build_folly` target, which
    invokes folly's getdeps.py to build folly + 8 dependencies at the
    commit pinned by librocksdb-sys/rocksdb/folly.mk:FOLLY_COMMIT_HASH.
    Prints the install path to set ROCKSDB_FOLLY_INSTALL_PATH to.

  src/lib.rs
    Adds built_with_coroutines() runtime helper that returns
    cfg!(feature = "coroutines"). Documented caveat: if linking
    against an externally-built librocksdb via ROCKSDB_LIB_DIR, this
    may not match what that library was actually compiled with.

  tests/test_coroutines.rs
    Three smoke tests that run in both feature configurations:
    - built_with_coroutines() matches the feature flag.
    - async_io=true MultiGet across multiple LSM levels returns
      results identical to a loop of single Gets.
    - same for a same-level batch.

  .github/workflows/coroutines.yml
    Ubuntu CI job that caches folly keyed on FOLLY_COMMIT_HASH, then
    builds and runs tests with --features coroutines,io-uring.

  README.md
    New "Async MultiGet with C++20 Coroutines" section under Advanced
    Features with the Meta benchmark table, build prerequisites, and
    runtime constraints (dynamic glog/gflags, mt_static incompatible).

The optimize_multiget_for_io ReadOption is not yet exposed at the Rust
level - that depends on facebook/rocksdb#14752 merging and a release
being cut. For coroutine builds the C++ default of true is the right
choice for most workloads anyway.
Five fixes responding to review comments on the previous commit:

1. Drop `cargo:rustc-link-arg=-Wl,-rpath,...` for glog and gflags
   (rust-lang/cargo#9554): these directives only apply to artifacts of
   the crate that emits them, not to downstream binaries. Embedding
   rpath that only covers our own test binaries was misleading - CI
   tests would pass while user binaries would fail at startup with
   "libglog.so: cannot open shared object file". Instead, expose the
   discovered lib directories as `cargo:folly_glog_libdir` and
   `cargo:folly_gflags_libdir` (accessible to downstream build scripts
   as `DEP_ROCKSDB_FOLLY_GLOG_LIBDIR` and `DEP_ROCKSDB_FOLLY_GFLAGS_LIBDIR`)
   and document the LD_LIBRARY_PATH / rpath / system-install options
   in the README.

2. `resolve_folly_dep`: panic with a clear message when multiple
   matching directories exist (typically a stale install from a prior
   FOLLY_COMMIT_HASH mixed with the current one). Previously we picked
   the first glob entry non-deterministically, which could silently
   link the wrong version.

3. `lib_or_lib64`: add a comment documenting the assumption that
   exactly one of `lib/` or `lib64/` holds a given dependency, derived
   from the folly version at the pinned FOLLY_COMMIT_HASH.

4. Rewrite `multi_get_async_io_matches_serial_get` to explicitly build
   and verify a multi-level LSM layout via `compact_range_opt` with
   `CompactOptions::set_target_level`, then assert via
   `rocksdb.num-files-at-levelN` that data actually spans multiple
   levels before running the MultiGet. The previous version called
   `compact_range(None, None)` which collapses everything to the
   bottom level, so the multi-level dispatch path was never exercised
   despite the test name suggesting otherwise.

5. Rename `built_with_coroutines_matches_feature_flag` to
   `built_with_coroutines_helper_is_callable` and drop the tautology
   (`built_with_coroutines()` is `cfg!(feature = "coroutines")`, so
   asserting equality between the two tested nothing). Verify
   call-stability instead and document why the function is still
   useful (single source of truth for logging/diagnostics).

Also add a 90-minute timeout-minutes to the CI job so a cold-cache
folly build cannot exceed GHA's default 6-hour limit silently.
Investigated and fixed all 10 issues raised in the second review:

#1 - CI cache key omitted OS image version.
   Pin runs-on to ubuntu-24.04 (was ubuntu-latest) and include the image
   name in the cache key. Folly is built against the host's glibc and
   libstdc++; a silent ubuntu-latest rollover with a cache hit would
   produce binaries with mismatched ABI vs the cargo build step.

#2 - getdeps.py default scratch dir was outside the cached path.
   Investigated buildopts.py:setup_build_options upstream. Confirmed:
   with no --scratch-path argument and no Facebook-internal mkscratch,
   getdeps falls back to /tmp/fbcode_builder_getdeps-<munged-cwd>. This
   is outside librocksdb-sys/rocksdb/third-party/folly, so my original
   cache covered only source - on a cache hit, show-inst-dir would
   return a /tmp/... path that doesn't exist on disk.

   Fixed by rewriting build_folly.sh to bypass 'make build_folly' and
   call getdeps.py directly with --scratch-path=<workspace>/librocksdb-sys/folly-build/.
   Replicates the few things make build_folly did:
   - Clone folly + reset to FOLLY_COMMIT_HASH
   - Apply two upstream-required perl patches (idempotent)
   - Run getdeps with CXXFLAGS=-DHAVE_CXX11_ATOMIC and GETDEPS_USE_WGET=1
   - patchelf libglog.so to embed libgflags rpath (matches folly.mk).

   CI now caches both the scratch dir AND the folly source checkout
   under a path that's stable and inside the workspace.

#3 - freebsd early-return interaction with coroutines_link_config.
   Practically blocked by validate_coroutines_target panicking earlier,
   but the comment claimed the link config 'also applies when
   ROCKSDB_LIB_DIR points at an externally-built librocksdb'. Updated
   the comment to acknowledge that this branch only handles Linux today
   and that relaxing the target validation would require revisiting.

#4 - Test compaction race.
   With disable_auto_compactions=true and L0 trigger=64, no background
   work runs concurrently, so the original test was probably safe. Added
   wait_for_compact between every put_batch/compact_range_opt phase as
   defense-in-depth so the layout is settled before level_layout()
   queries it. Cheap: each wait is a no-op with no scheduled work.

#5 - lib_or_lib64 only checked directory existence, not file presence.
   Replaced with libdir_containing(prefix, lib_name), which probes for
   lib<name>.{so,a}* in each candidate dir via glob and panics with a
   clear error if neither contains the library. Catches folly's habit
   of producing an empty lib64/ on Debian-family distros (or vice
   versa) at config time instead of via a confusing 'cannot find -l<x>'
   from the linker later.

#6 - README rpath snippet could mislead users into adding it in a
   library crate.
   Added explicit note: 'this must live in the crate that produces the
   binary you're shipping (a [[bin]] target), NOT in an intermediate
   library crate' along with the reason (rustc-link-arg doesn't
   propagate through transitive library dependencies, the same problem
   we documented above).

#7 - built_with_coroutines doc made an unverified 'still works for
   scans' claim.
   That claim is probably true per RocksDB's blog but isn't tested by
   this PR. Trimmed to just the MultiGet behavior the doc can actually
   substantiate.

#8 - Boost component list lacked a comment about which folly version it
   matches.
   Added a comment pointing at folly.mk's PLATFORM_LDFLAGS and noting
   how to react when a future FOLLY_COMMIT_HASH bump invalidates a
   component ('cannot find -lboost_<x>' at link time signals to trim).

#9 - validate_coroutines_target was called twice (in main() and again
   inside coroutines_compile_config).
   Removed the inner call and documented the precondition. The outer
   call in main() runs first so the inner call was redundant.

#10 - built_with_coroutines_helper_is_callable was a no-op.
   Replaced with built_with_coroutines_matches_feature_flag. Yes the
   assertion is currently tautological (the function literally is
   cfg!(feature)), but acknowledged in the doc - the test catches a
   refactor regression if/when we later wire the value to a runtime
   symbol (e.g. after upstream rocksdb#14752 merges and exposes
   rocksdb_compiled_with_coroutines).
The pinned folly commit references symbols from liburing 2.6
(`IORING_CQE_F_BUF_MORE`, `IOU_PBUF_RING_INC`, `io_uring_buf_ring_head`)
and 2.7+ (the entire `io_uring_zcrx_*` zero-copy receive API in
`folly/io/async/IoUringZeroCopyBufferPool.cpp`). Ubuntu 24.04 LTS
only ships liburing 2.5 via apt, causing folly to fail at compile
time with errors like "struct io_uring_zcrx_rq has incomplete type".

Two changes:

1. CI workflow: run the whole job inside an `ubuntu:25.10` Docker
   container. Ubuntu 25.10 ships liburing 2.11 via apt, comfortably
   above the required 2.7. Avoids any manual build step in CI.
   `runs-on` stays `ubuntu-24.04` (the host); `container:` makes
   every step run inside the newer image.

   Containers start minimal, so a bootstrap step installs git +
   curl + ca-certificates + sudo before `actions/checkout` and
   `setup-rust-toolchain` need them.

   Cache key suffix bumped from `-v2` to `-v3` and the embedded image
   name updated to `ubuntu-25.10` so prior caches (built against
   24.04's liburing 2.5) are invalidated.

2. `scripts/build_folly.sh`: keep a from-source liburing fallback
   for local users on older distros. It now checks the system
   liburing version via pkg-config and only builds 2.9 from source
   if the system version is < 2.7. On Ubuntu 25.10+ (which is what
   CI uses) the check passes and the from-source build is skipped.
   When the build does run, the resulting headers/libs are exported
   via PKG_CONFIG_PATH/CPATH/LIBRARY_PATH/LD_LIBRARY_PATH so both
   folly's CMake and rust-rocksdb's `io-uring` pkg-config lookup
   pick it up.

Also fixes a cosmetic CI warning: `save-if` is not a valid input
for `actions-rust-lang/setup-rust-toolchain@v1` (the right name is
`cache-save-if`); this was a copy-paste error from the original
workflow and produced a warning on every run.
`build_folly.sh` (mirroring RocksDB's folly.mk) sets
GETDEPS_USE_WGET=1, which makes folly's getdeps download sources via
wget instead of Python's built-in urllib. RocksDB uses this because
some mirrors are unreliable with urllib's default handling, and the
shipping fallback mirror script also assumes wget.

ubuntu:25.10 minimal does not ship wget. Result: folly's first
download attempt (boost-1.83.0.tar.gz) fails with
`[Errno 2] No such file or directory: 'wget'` and getdeps retries
five times before giving up.

Add wget to the apt install list. Also add wget to the local
prereq check in build_folly.sh so users on minimal hosts see a
clear error before getdeps does.
GCC 15 defaults to `-std=gnu23` for C, which makes empty
parameter lists `()` mean "no arguments" instead of the pre-C23
"unspecified arguments" semantic. Folly's pinned libunwind
(f081cf4...) was written under the older rule and its test files
contain code like:

    return func(s);  // func declared as void *(*func)()

which gcc-15 rejects with "too many arguments to function 'func';
expected 0, have 1". The libunwind library itself builds OK but
the in-tree tests folly's getdeps tries to build do not.

Install gcc-14 and g++-14 alongside the default gcc-15, then point
the cc/gcc/c++/g++ alternatives at gcc-14 for the rest of the job.
gcc-14 defaults to `-std=gnu17` where this is still permitted.

gcc-14 and gcc-15 share the same libstdc++ ABI on Ubuntu so the
subsequent cargo build (which links folly's static archives into
the test binaries) is unaffected by the switch.
folly's getdeps tree uses two naming conventions:

  - The *project being built* (folly itself) installs to
    `<install_root>/folly` with no suffix.
  - The project's *dependencies* install to
    `<install_root>/<dep>-<hash>` where the hash captures manifest+ctx.

My `resolve_folly_dep` only globbed for the hashed pattern, so
`coroutines_compile_config("folly", ...)` and the matching
`coroutines_link_config` call both failed with:

  thread 'main' panicked at librocksdb-sys/build.rs:785:15:
  could not find `folly-*` under .../folly-build/installed;
  did scripts/build_folly.sh finish successfully?

even though folly itself had built and installed correctly to
`.../folly-build/installed/folly/lib/libfolly*.a`.

Fix: probe for the unsuffixed dir first; fall back to globbing
`<name>-*` for the dependency case. Error message updated to
mention both shapes.
cargo build --release --features coroutines,io-uring now succeeds
(folly compiled, linked into the test binary), but cargo nextest
fails immediately at the "list tests" step:

  target/release/deps/rust_rocksdb-...: error while loading shared
  libraries: libglog.so.0: cannot open shared object file: No such
  file or directory

This is the exact rpath-doesn't-propagate situation the README's
"Runtime constraints" section documents: cargo:rustc-link-arg from
librocksdb-sys doesn't apply to downstream test binaries (per
rust-lang/cargo#9554), and folly's getdeps only produces glog/gflags
as .so files (no static archives). The nextest-spawned test binary
needs to find them at runtime.

Set LD_LIBRARY_PATH to <install_root>/glog-<hash>/lib(64) and
<install_root>/gflags-<hash>/lib(64) in $GITHUB_ENV right after the
folly install path step, so every subsequent step (cargo build,
nextest, doc tests) inherits it. This matches the first option the
README recommends to downstream users for the same problem.
Update the "Async MultiGet with C++20 Coroutines" section to reflect
what we actually learned getting the CI green on this branch. The
prior version had several inaccuracies and a soft-pedaled performance
claim.

Performance section
-------------------

- Stop presenting the 1292/775/508 us/op numbers as if they apply
  uniformly. They came from Meta's internal warm-storage flash
  (ws.flash.ftw3preprod1), which has ~100-1000x the per-read latency
  of a modern local NVMe.
- Drop the prior "the gain shrinks on local NVMe" sentence as an
  unverified extrapolation, and explicitly note that Meta has NOT
  published a local-NVMe equivalent. Frame the underlying reasoning
  (async_io hides per-read latency; less latency to hide means less
  to save) as reasoning, not measurement.
- Give a workload heuristic for when this is worth turning on:
  remote/network storage, or many SST files spanning multiple LSM
  levels per MultiGet batch, or both.

Prerequisites section
---------------------

- Document the actual liburing version requirement (>= 2.7) and what
  each common distro ships. Note that build_folly.sh auto-builds
  liburing 2.9 from source when the system version is too old (so
  Ubuntu 24.04 LTS users don't have to do anything special, while
  Ubuntu 25.10+ uses the apt-shipped liburing 2.11).
- Document the GCC 15 incompatibility. The pinned folly libunwind
  uses K&R-style empty parameter lists that gcc-15 rejects under its
  default -std=gnu23. Tell users to use gcc-14, clang, or any GCC
  <= 14. Point them at the CI workflow as the working example.
- Replace the vague "folly + 8 transitive deps" line with the actual
  apt package list the build needs (build-essential, cmake,
  ninja-build, double-conversion-dev, libssl-dev, liburing-dev,
  patchelf, wget, etc.). wget specifically is required because
  RocksDB's folly.mk sets GETDEPS_USE_WGET=1.
- Fix the description of scripts/build_folly.sh. It used to wrap
  RocksDB's `make build_folly`, but the script in this branch
  invokes getdeps.py directly with --scratch-path so install
  artifacts land at a predictable workspace-relative location
  (librocksdb-sys/folly-build/installed/), not in a /tmp scratch
  dir.

Runtime constraints section
---------------------------

- Reword the LD_LIBRARY_PATH option with an actual concrete glob
  example (`ls -d ""/glog-*/lib* | head -1`)
  instead of the prior <hash> placeholder that users would have had
  to manually fill in.
- Point at the CI workflow as the canonical worked example for
  setting LD_LIBRARY_PATH, since the workflow's "Export
  ROCKSDB_FOLLY_INSTALL_PATH and LD_LIBRARY_PATH" step does exactly
  this (we hit this exact runtime failure in CI and added that step
  to fix it).
- Note the folly install is large (~2 GB) and slow to rebuild, and
  reference the workflow's cache key example.

Other
-----

- Soften the "Experimental" tag with a more honest qualifier ("builds
  and tests in CI but has not been exercised on production
  workloads from this crate") rather than just the unqualified
  asterisk.
Three corrections:

1. Performance table row was wrong about which config gets 775 us/op.
   The original wording said 'single-level parallel reads (works
   without this feature)' which is false - 775 us/op requires the
   coroutines feature AND async_io=true AND optimize_multiget_for_io=false.
   Without coroutines, both parallel paths in version_set.cc are
   short-circuited by the !using_coroutines() check, and MultiGet
   falls back to one-file-at-a-time. Now the table labels each row
   with the exact ReadOptions combo.

2. optimize_multiget_for_io paragraph now explains the flag as a
   CPU/latency knob within the coroutine-enabled space rather than
   an absent setter. Both 'on' (multi-level) and 'off' (within-level
   only) need the coroutines feature compiled in; the flag chooses
   which coroutine path runs. Setting it to false keeps ~40% of the
   latency win (1292->775 us/op) at lower CPU than the multi-level
   path. Notes that the setter goes in once
   facebook/rocksdb#14752 merges.

3. Replace 'Meta' / 'Meta's' with 'RocksDB team' / 'remote/warm-storage
   flash' throughout. The benchmark is the RocksDB team's, the
   project's affiliation isn't relevant to the section.
@zaidoon1 zaidoon1 merged commit 769ef60 into master May 18, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant