fix: enable unprefixed jemalloc to override musl malloc for SQLite C code#584
Merged
ahmedhesham6 merged 14 commits intomainfrom Feb 18, 2026
Merged
fix: enable unprefixed jemalloc to override musl malloc for SQLite C code#584ahmedhesham6 merged 14 commits intomainfrom
ahmedhesham6 merged 14 commits intomainfrom
Conversation
…code tikv-jemallocator without `unprefixed_malloc_on_supported_platforms` only provides prefixed symbols (_rjem_je_malloc). Rust allocations go through jemalloc via #[global_allocator], but SQLite's embedded C code (compiled via libsql-ffi) calls the system malloc() directly — which on musl is the aggressive allocator that munmaps freed pages, causing SIGSEGV in sqlite3Close() during autopilot schedule runs. Enabling the feature makes jemalloc provide the actual malloc/free symbols, overriding musl's allocator for ALL code in the process. Also: - Re-enable aarch64-unknown-linux-musl builds (same root cause) - Fix build-and-release test step to pass --features jemalloc on musl - Add musl-jemalloc CI workflow to verify malloc symbol override - Add temp validation workflow (delete after confirming fix)
libsql has a separate SIGSEGV during process teardown on aarch64-musl that is unrelated to the runtime allocator bug. Tests pass but the process crashes on exit during sqlite3 cleanup. This is a known upstream issue. Build-only on aarch64-musl; tests already run on x86_64 and in ci.yml.
The action bundles an old Docker client (API v1.41) incompatible with newer GitHub runners that require API v1.44+. Using docker run directly avoids the version mismatch.
openssl-sys crate requires pkg-config and OpenSSL headers to build on Alpine musl.
…se, lightweight PR guard - Inline nm symbol verification into build-and-release.yml for both Docker (aarch64) and non-Docker (x86_64) musl paths - Add binutils + ca-certificates + SSL_CERT_FILE to Alpine container - Rewrite musl-jemalloc.yml as lightweight PR-only guard with path filters (cli/Cargo.toml, main.rs, Cargo.lock, workflow files) - Unify cache key prefixes across workflows to share cargo caches - Tighten jemalloc cfg guard to target_os=linux only (matches Cargo, TiKV, Vector patterns — prevents accidental override on macOS/Windows)
…-tools - Remove Docker Alpine container for aarch64-musl builds; use ubuntu-22.04-arm runner with musl-tools (same as x86_64) - Eliminates: apk packages, SSL_CERT_FILE, binutils, ca-certificates, Docker cache, container permission fixes, missing vim/nano test failures - Inline nm jemalloc verification in build-and-release for both musl targets - Lightweight PR-only musl-jemalloc.yml with path filters - Tighten jemalloc cfg to target_os=linux only
967ba14 to
06f2c7b
Compare
…atomics on ARM - aarch64-musl: back to Alpine Docker (musl-gcc atomics broken on ARM) - x86_64-musl: stays native Ubuntu + musl-tools - Add vim to Alpine apk (fixes test_detect_editor) - Add inline nm jemalloc verification in both paths - Unified cache keys across workflows
…ve redundant workflows - Uncomment pull_request trigger on build-and-release.yml (main, beta) - Delete musl-jemalloc.yml (PR guard — now covered by build-and-release) - Delete test-jemalloc-fix.yml (temp validation — no longer needed)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix autopilot SIGSEGV crash-loop on Linux musl by making jemalloc actually override the system
mallocfor all code in the process, including SQLite's embedded C.Root Cause
tikv-jemallocatorwithoutunprefixed_malloc_on_supported_platformsonly provides prefixed symbols (_rjem_je_malloc). The#[global_allocator]only routes Rust allocations through jemalloc. SQLite's C code (compiled vialibsql-ffi) callsmalloc()directly — which on a statically-linked musl binary is musl's aggressive allocator thatmunmaps freed pages.Confirmed via GDB core dump on
i-0773f92db43188a0c(eu-north-1):And via
nm:mallocwas musl's weak (W) symbol, not jemalloc.Changes Made
cli/Cargo.toml: Addunprefixed_malloc_on_supported_platformsfeature totikv-jemallocatorso jemalloc provides the realmalloc/freesymbolscli/src/main.rs: Document why the feature is criticalbuild-and-release.yml: Re-enableaarch64-unknown-linux-muslbuilds (same root cause was blocking them); fix test step to pass--features jemallocon muslmusl-jemalloc.yml(new): CI workflow that verifies vianmthatmallocis not musl's weak symboltest-jemalloc-fix.yml(new, temporary): Full build+test+verify for both arches — delete after confirmingTesting
cargo check --features jemallocpassescargo test --workspace --libpassescargo test --workspace --lib --features jemallocpassescargo fmt --checkcleancargo clippy --all-targetscleanBreaking Changes
None