Summary
MemPalace 3.3.3 has a race condition in hooks_cli.py that causes concurrent ChromaDB writes when the PreCompact hook fires, leading to HNSW index corruption.
Root cause
hook_precompact() at line ~656 calls two ingest paths without any palace-wide lock:
_ingest_transcript(transcript_path) — spawns an async subprocess.Popen (fire-and-forget)
_mine_sync(...) — runs a sync subprocess.run immediately after
Both paths write to the same ChromaDB collection. If a Stop/SessionEnd hook or a background mempalace mine is already running, the result is two concurrent HNSW writers.
Neither path is gated by the existing mine.pid PID guard — _ingest_transcript bypasses it entirely, and _mine_sync is a separate code path.
Observable symptom
hook.log shows:
chromadb.errors.InternalError: Error in compaction: Failed to apply logs to the hnsw segment writer
In our case, link_lists.bin (HNSW higher-level connections) grew from ~50 MB to 210 GB apparent / 41 GB real (sparse file expansion) before we caught it.
Reproduction
- Enable both
PreCompact and SessionEnd hooks in Claude Code settings.json
- Have a long session (so both hooks fire close together on context compaction)
- Observe HNSW errors in
hook.log; du will show the palace growing
Proposed fix
Add a palace-wide fcntl.flock(LOCK_EX) in hooks_cli.py before any HNSW write operation. The lock file should be shared across all three write paths:
hook_stop() → _ingest_transcript() / _maybe_auto_ingest()
hook_precompact() → both ingest calls
- CLI
mempalace mine command
Example pattern (already working in our workaround):
import fcntl, os
LOCK_FILE = os.path.join(palace_dir, ".palace-write.lock")
with open(LOCK_FILE, "w") as lf:
fcntl.flock(lf, fcntl.LOCK_EX)
# ... all HNSW writes here
Workaround (used in production)
We disabled PreCompact entirely (no-op exit 0 script) and moved conversation mining to a cron job with a shared flock -n guard. See the monkey-patch approach in stop_diary_only.py that disables the async paths while keeping the diary checkpoint.
Environment
- MemPalace 3.3.3
- ChromaDB (bundled version)
- Claude Code
SessionEnd + PreCompact hooks enabled
- Host: WSL2 on Windows
Related
- Single-slot
mine.pid PID guard has no O_EXCL and is bypassed by both _ingest_transcript (Popen) and _mine_sync paths — worth hardening separately
Summary
MemPalace 3.3.3 has a race condition in
hooks_cli.pythat causes concurrent ChromaDB writes when thePreCompacthook fires, leading to HNSW index corruption.Root cause
hook_precompact()at line ~656 calls two ingest paths without any palace-wide lock:_ingest_transcript(transcript_path)— spawns an asyncsubprocess.Popen(fire-and-forget)_mine_sync(...)— runs a syncsubprocess.runimmediately afterBoth paths write to the same ChromaDB collection. If a
Stop/SessionEndhook or a backgroundmempalace mineis already running, the result is two concurrent HNSW writers.Neither path is gated by the existing
mine.pidPID guard —_ingest_transcriptbypasses it entirely, and_mine_syncis a separate code path.Observable symptom
hook.logshows:In our case,
link_lists.bin(HNSW higher-level connections) grew from ~50 MB to 210 GB apparent / 41 GB real (sparse file expansion) before we caught it.Reproduction
PreCompactandSessionEndhooks in Claude Codesettings.jsonhook.log;duwill show the palace growingProposed fix
Add a palace-wide
fcntl.flock(LOCK_EX)inhooks_cli.pybefore any HNSW write operation. The lock file should be shared across all three write paths:hook_stop()→_ingest_transcript()/_maybe_auto_ingest()hook_precompact()→ both ingest callsmempalace minecommandExample pattern (already working in our workaround):
Workaround (used in production)
We disabled
PreCompactentirely (no-opexit 0script) and moved conversation mining to a cron job with a sharedflock -nguard. See the monkey-patch approach instop_diary_only.pythat disables the async paths while keeping the diary checkpoint.Environment
SessionEnd+PreCompacthooks enabledRelated
mine.pidPID guard has noO_EXCLand is bypassed by both_ingest_transcript(Popen) and_mine_syncpaths — worth hardening separately