Context
On our fork's palace (165,632 drawers, mixed project files + conversation transcripts), a cold-run single-threaded mempalace mine takes ~14 min. Two changes bring it to ~3 min on a 4-core machine without changing any other semantics:
bulk_check_mined() — paginated pre-fetch of (source_file, mtime) pairs in 10 K batches, so the "is this file already mined?" check doesn't do a per-file ChromaDB query.
--workers N flag on mempalace mine — a ThreadPoolExecutor(max_workers=N) fans out process_file() across not-yet-mined files.
Both have been running in production on jphein/mempalace since 2026-04-10. Per-file correctness is preserved because process_file() still acquires mine_lock() per file (from #784) before writing, so the fan-out never races on the same file's drawers.
Why an issue rather than a PR
--workers overlaps in intent with #784 (file-level locking, merged 2026-04-13) — both care about safe concurrency during mining — but the mechanisms differ:
The question is whether the maintainer sees this as a natural extension of #784's concurrency story, or prefers a different direction (e.g., multi-process orchestration outside a single mine invocation). Would rather ask than file a 200-line PR that goes the wrong way.
Concrete numbers
Run environment: MacBook M2, Python 3.13, chromadb 1.5.8, 165 K-drawer palace (~8 K unique source files).
| Config |
Wall time |
| Single-threaded (upstream current) |
~14 min |
bulk_check_mined() pre-fetch only |
~8 min |
--workers 4 + bulk_check_mined() |
~3 min |
--workers 8 + bulk_check_mined() |
~2 min 50 s (diminishing returns past 4) |
Open questions
- Is in-process fan-out with ThreadPoolExecutor of interest upstream, or is "multiple
mempalace mine invocations handling disjoint subsets" the preferred concurrency model?
- Should
--workers default to 1 (current single-threaded behavior) with explicit opt-in, or to min(4, cpu_count())?
- Any concern about
bulk_check_mined() memory footprint for palaces with O(100 K+) unique source files? (On our 165 K-drawer palace with ~8 K unique files, the pre-fetch is <1 MB. An O(500 K)-file palace would be ~4 MB — still fine, but a chunked iterator might be warranted at extreme scales.)
Happy to open a PR immediately if the direction is approved. If not, close this with a note and we'll keep it fork-local.
Code for reference (fork main):
Context
On our fork's palace (165,632 drawers, mixed project files + conversation transcripts), a cold-run single-threaded
mempalace minetakes ~14 min. Two changes bring it to ~3 min on a 4-core machine without changing any other semantics:bulk_check_mined()— paginated pre-fetch of(source_file, mtime)pairs in 10 K batches, so the "is this file already mined?" check doesn't do a per-file ChromaDB query.--workers Nflag onmempalace mine— aThreadPoolExecutor(max_workers=N)fans outprocess_file()across not-yet-mined files.Both have been running in production on jphein/mempalace since 2026-04-10. Per-file correctness is preserved because
process_file()still acquiresmine_lock()per file (from #784) before writing, so the fan-out never races on the same file's drawers.Why an issue rather than a PR
--workersoverlaps in intent with #784 (file-level locking, merged 2026-04-13) — both care about safe concurrency during mining — but the mechanisms differ:fcntl.flockprevents two processes writing the same file's drawers concurrently.The question is whether the maintainer sees this as a natural extension of #784's concurrency story, or prefers a different direction (e.g., multi-process orchestration outside a single
mineinvocation). Would rather ask than file a 200-line PR that goes the wrong way.Concrete numbers
Run environment: MacBook M2, Python 3.13, chromadb 1.5.8, 165 K-drawer palace (~8 K unique source files).
bulk_check_mined()pre-fetch only--workers 4+bulk_check_mined()--workers 8+bulk_check_mined()Open questions
mempalace mineinvocations handling disjoint subsets" the preferred concurrency model?--workersdefault to1(current single-threaded behavior) with explicit opt-in, or tomin(4, cpu_count())?bulk_check_mined()memory footprint for palaces with O(100 K+) unique source files? (On our 165 K-drawer palace with ~8 K unique files, the pre-fetch is <1 MB. An O(500 K)-file palace would be ~4 MB — still fine, but a chunked iterator might be warranted at extreme scales.)Happy to open a PR immediately if the direction is approved. If not, close this with a note and we'll keep it fork-local.
Code for reference (fork
main):mempalace/palace.py—bulk_check_mined()mempalace/miner.py— concurrentmine()with--workers