proposal: concurrent mining via ThreadPoolExecutor + `bulk_check_mined()` pre-fetch

## Context

On our fork's palace (165,632 drawers, mixed project files + conversation transcripts), a cold-run single-threaded `mempalace mine` takes ~14 min. Two changes bring it to ~3 min on a 4-core machine without changing any other semantics:

1. **`bulk_check_mined()`** — paginated pre-fetch of `(source_file, mtime)` pairs in 10 K batches, so the "is this file already mined?" check doesn't do a per-file ChromaDB query.
2. **`--workers N` flag on `mempalace mine`** — a `ThreadPoolExecutor(max_workers=N)` fans out `process_file()` across not-yet-mined files.

Both have been running in production on [jphein/mempalace](https://github.com/jphein/mempalace) since 2026-04-10. Per-file correctness is preserved because `process_file()` still acquires `mine_lock()` per file (from #784) before writing, so the fan-out never races on the same file's drawers.

## Why an issue rather than a PR

`--workers` overlaps in *intent* with #784 (file-level locking, merged 2026-04-13) — both care about safe concurrency during mining — but the mechanisms differ:

- **#784** — per-file `fcntl.flock` prevents two processes writing the same file's drawers concurrently.
- **This proposal** — a single process fanning out across *different* files, still respecting #784's per-file lock on each.

The question is whether the maintainer sees this as a natural extension of #784's concurrency story, or prefers a different direction (e.g., multi-process orchestration outside a single `mine` invocation). Would rather ask than file a 200-line PR that goes the wrong way.

## Concrete numbers

Run environment: MacBook M2, Python 3.13, chromadb 1.5.8, 165 K-drawer palace (~8 K unique source files).

| Config | Wall time |
|---|---|
| Single-threaded (upstream current) | ~14 min |
| `bulk_check_mined()` pre-fetch only | ~8 min |
| `--workers 4` + `bulk_check_mined()` | ~3 min |
| `--workers 8` + `bulk_check_mined()` | ~2 min 50 s (diminishing returns past 4) |

## Open questions

1. Is in-process fan-out with ThreadPoolExecutor of interest upstream, or is "multiple `mempalace mine` invocations handling disjoint subsets" the preferred concurrency model?
2. Should `--workers` default to `1` (current single-threaded behavior) with explicit opt-in, or to `min(4, cpu_count())`?
3. Any concern about `bulk_check_mined()` memory footprint for palaces with O(100 K+) unique source files? (On our 165 K-drawer palace with ~8 K unique files, the pre-fetch is <1 MB. An O(500 K)-file palace would be ~4 MB — still fine, but a chunked iterator might be warranted at extreme scales.)

Happy to open a PR immediately if the direction is approved. If not, close this with a note and we'll keep it fork-local.

Code for reference (fork `main`):
- [`mempalace/palace.py` — `bulk_check_mined()`](https://github.com/jphein/mempalace/blob/main/mempalace/palace.py)
- [`mempalace/miner.py` — concurrent `mine()` with `--workers`](https://github.com/jphein/mempalace/blob/main/mempalace/miner.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: concurrent mining via ThreadPoolExecutor + `bulk_check_mined()` pre-fetch #1088

Context

Why an issue rather than a PR

Concrete numbers

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Config	Wall time
Single-threaded (upstream current)	~14 min
`bulk_check_mined()` pre-fetch only	~8 min
`--workers 4` + `bulk_check_mined()`	~3 min
`--workers 8` + `bulk_check_mined()`	~2 min 50 s (diminishing returns past 4)

proposal: concurrent mining via ThreadPoolExecutor + bulk_check_mined() pre-fetch #1088

Description

Context

Why an issue rather than a PR

Concrete numbers

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

proposal: concurrent mining via ThreadPoolExecutor + `bulk_check_mined()` pre-fetch #1088