PreCompact hook: double-ingest race condition causes HNSW corruption (no palace-wide write lock)

## Summary

MemPalace 3.3.3 has a race condition in `hooks_cli.py` that causes concurrent ChromaDB writes when the `PreCompact` hook fires, leading to HNSW index corruption.

## Root cause

`hook_precompact()` at line ~656 calls two ingest paths without any palace-wide lock:

1. `_ingest_transcript(transcript_path)` — spawns an **async** `subprocess.Popen` (fire-and-forget)
2. `_mine_sync(...)` — runs a **sync** `subprocess.run` immediately after

Both paths write to the same ChromaDB collection. If a `Stop`/`SessionEnd` hook or a background `mempalace mine` is already running, the result is two concurrent HNSW writers.

Neither path is gated by the existing `mine.pid` PID guard — `_ingest_transcript` bypasses it entirely, and `_mine_sync` is a separate code path.

## Observable symptom

`hook.log` shows:
```
chromadb.errors.InternalError: Error in compaction: Failed to apply logs to the hnsw segment writer
```

In our case, `link_lists.bin` (HNSW higher-level connections) grew from ~50 MB to **210 GB apparent / 41 GB real** (sparse file expansion) before we caught it.

## Reproduction

1. Enable both `PreCompact` and `SessionEnd` hooks in Claude Code `settings.json`
2. Have a long session (so both hooks fire close together on context compaction)
3. Observe HNSW errors in `hook.log`; `du` will show the palace growing

## Proposed fix

Add a palace-wide `fcntl.flock(LOCK_EX)` in `hooks_cli.py` **before** any HNSW write operation. The lock file should be shared across all three write paths:

- `hook_stop()` → `_ingest_transcript()` / `_maybe_auto_ingest()`
- `hook_precompact()` → both ingest calls
- CLI `mempalace mine` command

Example pattern (already working in our workaround):
```python
import fcntl, os
LOCK_FILE = os.path.join(palace_dir, ".palace-write.lock")
with open(LOCK_FILE, "w") as lf:
    fcntl.flock(lf, fcntl.LOCK_EX)
    # ... all HNSW writes here
```

## Workaround (used in production)

We disabled `PreCompact` entirely (no-op `exit 0` script) and moved conversation mining to a cron job with a shared `flock -n` guard. See the monkey-patch approach in `stop_diary_only.py` that disables the async paths while keeping the diary checkpoint.

## Environment

- MemPalace 3.3.3
- ChromaDB (bundled version)
- Claude Code `SessionEnd` + `PreCompact` hooks enabled
- Host: WSL2 on Windows

## Related

- Single-slot `mine.pid` PID guard has no `O_EXCL` and is bypassed by both `_ingest_transcript` (Popen) and `_mine_sync` paths — worth hardening separately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreCompact hook: double-ingest race condition causes HNSW corruption (no palace-wide write lock) #1253

Summary

Root cause

Observable symptom

Reproduction

Proposed fix

Workaround (used in production)

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PreCompact hook: double-ingest race condition causes HNSW corruption (no palace-wide write lock) #1253

Description

Summary

Root cause

Observable symptom

Reproduction

Proposed fix

Workaround (used in production)

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions