Skip to content

fix: cap ONNX intra_op threads via MEMPAL_MAX_THREADS#1071

Open
sha2fiddy wants to merge 1 commit intoMemPalace:developfrom
sha2fiddy:fix/ort-intra-op-thread-cap
Open

fix: cap ONNX intra_op threads via MEMPAL_MAX_THREADS#1071
sha2fiddy wants to merge 1 commit intoMemPalace:developfrom
sha2fiddy:fix/ort-intra-op-thread-cap

Conversation

@sha2fiddy
Copy link
Copy Markdown
Contributor

@sha2fiddy sha2fiddy commented Apr 21, 2026

Summary

Caps the ONNX Runtime intra_op pool so background mines don't pin every core. Controlled by MEMPAL_MAX_THREADS (default 2; 0/off/default/none disables). ORT has its own intra-op pool, so OMP_NUM_THREADS doesn't reach it.

HNSW thread pinning was independently fixed by #1191, so after the rebase this PR is ONNX-only.

Closes #1068.

Changes

  • mempalace/embedding.py: _read_thread_cap() parses the env var. _build_ef_class(thread_cap) overrides @cached_property model on the EF subclass so the ORT session is built with SessionOptions(intra_op_num_threads=N, inter_op=1, log_severity_level=3). get_embedding_function() cache key extended to (providers, thread_cap).
  • hooks/mempal_save_hook.sh, hooks/mempal_precompact_hook.sh: export MEMPAL_MAX_THREADS=2 TOKENIZERS_PARALLELISM=false so the background mine inherits the cap.
  • tests/test_embedding.py: 7 new tests covering env-var parsing, the capped subclass, session options, and cache keying.

How to test

```bash
ruff check . && python -m pytest tests/ -v --ignore=tests/benchmarks
```

1478 passed locally.

CPU sanity check:

```bash
MEMPAL_MAX_THREADS=2 mempalace mine

another terminal:

while true; do ps -o pid,%cpu -p "$(pgrep -f 'mempalace mine')" 2>/dev/null; sleep 1; done
```

Benchmarks (M-series Mac, 10 cores)

Run Peak %CPU
Uncapped (stock develop) 372-463
`MEMPAL_MAX_THREADS=2` ~190

Design

chromadb 1.5 doesn't expose `SessionOptions` on `init`, so the cap goes on the EF subclass. Overriding `cached_property model` is the smallest stable hook into ORT session construction. When `thread_cap=0` the override is skipped and chromadb's defaults apply.

Without this, ORT spawns ~physical-core-count workers in its intra_op
pool and a background mine pegs 400-500% CPU. OMP_NUM_THREADS does not
control the ORT pool — ORT has its own.

Cap is applied in mempalace.embedding by overriding the model
cached_property on the EF subclass so the InferenceSession is built
with explicit SessionOptions (intra_op_num_threads=N, inter_op=1).
Default cap is 2; "0"/"off"/"default"/"none" disables it.

HNSW thread pinning is already handled by _pin_hnsw_threads on develop,
so this PR only addresses the ONNX side. The auto_save and precompact
hooks export MEMPAL_MAX_THREADS=2 so background mines stay throttled.

Closes MemPalace#1068.
@sha2fiddy sha2fiddy force-pushed the fix/ort-intra-op-thread-cap branch from fcd52aa to a1b076b Compare April 29, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background mempalace mine pins 400–500 % CPU — ORT intra_op pool ignores OMP env vars

2 participants