Skip to content

fix: enable unprefixed jemalloc to override musl malloc for SQLite C code#584

Merged
ahmedhesham6 merged 14 commits intomainfrom
fix/jemalloc-unprefixed-malloc-override
Feb 18, 2026
Merged

fix: enable unprefixed jemalloc to override musl malloc for SQLite C code#584
ahmedhesham6 merged 14 commits intomainfrom
fix/jemalloc-unprefixed-malloc-override

Conversation

@ahmedhesham6
Copy link
Copy Markdown
Collaborator

Description

Fix autopilot SIGSEGV crash-loop on Linux musl by making jemalloc actually override the system malloc for all code in the process, including SQLite's embedded C.

Root Cause

tikv-jemallocator without unprefixed_malloc_on_supported_platforms only provides prefixed symbols (_rjem_je_malloc). The #[global_allocator] only routes Rust allocations through jemalloc. SQLite's C code (compiled via libsql-ffi) calls malloc() directly — which on a statically-linked musl binary is musl's aggressive allocator that munmaps freed pages.

Confirmed via GDB core dump on i-0773f92db43188a0c (eu-north-1):

#0  sqlite3Close+25: movzbl 0x71(%rdi),%eax   ← reading unmapped page
#1  drop_in_place<libsql::local::connection::Connection>
#2  Arc::drop_slow()
#3  run_scheduler::{{closure}}

And via nm: malloc was musl's weak (W) symbol, not jemalloc.

Changes Made

  • cli/Cargo.toml: Add unprefixed_malloc_on_supported_platforms feature to tikv-jemallocator so jemalloc provides the real malloc/free symbols
  • cli/src/main.rs: Document why the feature is critical
  • build-and-release.yml: Re-enable aarch64-unknown-linux-musl builds (same root cause was blocking them); fix test step to pass --features jemalloc on musl
  • musl-jemalloc.yml (new): CI workflow that verifies via nm that malloc is not musl's weak symbol
  • test-jemalloc-fix.yml (new, temporary): Full build+test+verify for both arches — delete after confirming

Testing

  • cargo check --features jemalloc passes
  • cargo test --workspace --lib passes
  • cargo test --workspace --lib --features jemalloc passes
  • cargo fmt --check clean
  • cargo clippy --all-targets clean
  • Temp CI validates malloc symbol override on x86_64 + aarch64

Breaking Changes

None

…code

tikv-jemallocator without `unprefixed_malloc_on_supported_platforms` only
provides prefixed symbols (_rjem_je_malloc). Rust allocations go through
jemalloc via #[global_allocator], but SQLite's embedded C code (compiled via
libsql-ffi) calls the system malloc() directly — which on musl is the
aggressive allocator that munmaps freed pages, causing SIGSEGV in
sqlite3Close() during autopilot schedule runs.

Enabling the feature makes jemalloc provide the actual malloc/free symbols,
overriding musl's allocator for ALL code in the process.

Also:
- Re-enable aarch64-unknown-linux-musl builds (same root cause)
- Fix build-and-release test step to pass --features jemalloc on musl
- Add musl-jemalloc CI workflow to verify malloc symbol override
- Add temp validation workflow (delete after confirming fix)
libsql has a separate SIGSEGV during process teardown on aarch64-musl
that is unrelated to the runtime allocator bug. Tests pass but the
process crashes on exit during sqlite3 cleanup. This is a known
upstream issue.

Build-only on aarch64-musl; tests already run on x86_64 and in ci.yml.
The action bundles an old Docker client (API v1.41) incompatible with
newer GitHub runners that require API v1.44+. Using docker run directly
avoids the version mismatch.
openssl-sys crate requires pkg-config and OpenSSL headers to build
on Alpine musl.
…se, lightweight PR guard

- Inline nm symbol verification into build-and-release.yml for both
  Docker (aarch64) and non-Docker (x86_64) musl paths
- Add binutils + ca-certificates + SSL_CERT_FILE to Alpine container
- Rewrite musl-jemalloc.yml as lightweight PR-only guard with path
  filters (cli/Cargo.toml, main.rs, Cargo.lock, workflow files)
- Unify cache key prefixes across workflows to share cargo caches
- Tighten jemalloc cfg guard to target_os=linux only (matches Cargo,
  TiKV, Vector patterns — prevents accidental override on macOS/Windows)
…-tools

- Remove Docker Alpine container for aarch64-musl builds; use
  ubuntu-22.04-arm runner with musl-tools (same as x86_64)
- Eliminates: apk packages, SSL_CERT_FILE, binutils, ca-certificates,
  Docker cache, container permission fixes, missing vim/nano test failures
- Inline nm jemalloc verification in build-and-release for both musl targets
- Lightweight PR-only musl-jemalloc.yml with path filters
- Tighten jemalloc cfg to target_os=linux only
@ahmedhesham6 ahmedhesham6 force-pushed the fix/jemalloc-unprefixed-malloc-override branch from 967ba14 to 06f2c7b Compare February 18, 2026 12:56
…atomics on ARM

- aarch64-musl: back to Alpine Docker (musl-gcc atomics broken on ARM)
- x86_64-musl: stays native Ubuntu + musl-tools
- Add vim to Alpine apk (fixes test_detect_editor)
- Add inline nm jemalloc verification in both paths
- Unified cache keys across workflows
…ve redundant workflows

- Uncomment pull_request trigger on build-and-release.yml (main, beta)
- Delete musl-jemalloc.yml (PR guard — now covered by build-and-release)
- Delete test-jemalloc-fix.yml (temp validation — no longer needed)
@ahmedhesham6 ahmedhesham6 merged commit d8af82b into main Feb 18, 2026
1 check passed
@ahmedhesham6 ahmedhesham6 deleted the fix/jemalloc-unprefixed-malloc-override branch February 18, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant