fix(hooks): restore precompact save nudge with opt-in two-phase block (refs #858)#1226
Open
junbuluv wants to merge 77 commits intoMemPalace:mainfrom
Open
fix(hooks): restore precompact save nudge with opt-in two-phase block (refs #858)#1226junbuluv wants to merge 77 commits intoMemPalace:mainfrom
junbuluv wants to merge 77 commits intoMemPalace:mainfrom
Conversation
- repair.py: wrap upsert loop in try/except; restore from backup on failure instead of leaving a partially rebuilt collection - migrate.py: replace non-atomic rmtree+move with rename-aside swap so a crash between the two calls does not destroy both copies - cli.py: use offset += len(batch["ids"]) with empty-batch guard instead of fixed offset += batch_size to prevent skipping drawers
agent_name and entry are validated via sanitize_name/sanitize_content, but topic is stored raw into ChromaDB metadata. Apply the same sanitize_name guard to reject null bytes, path traversal, and oversized payloads.
- repair.py: define backup_path before the conditional block so it is always in scope when the except handler references it - migrate.py: restore old palace from .old if both os.rename and shutil.move fail during the swap step
Bumps [actions/deploy-pages](https://github.com/actions/deploy-pages) from 4 to 5. - [Release notes](https://github.com/actions/deploy-pages/releases) - [Commits](actions/deploy-pages@v4...v5) --- updated-dependencies: - dependency-name: actions/deploy-pages dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 3 to 5. - [Release notes](https://github.com/actions/upload-pages-artifact/releases) - [Commits](actions/upload-pages-artifact@v3...v5) --- updated-dependencies: - dependency-name: actions/upload-pages-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]>
Fixes four issues causing silent hook failures: 1. **Relative paths** → Absolute paths (/absolute/path/to/hooks/...) Claude Code resolves hooks from working directory, not repo root. 2. **Wrong matcher** → Stop uses *, PreCompact has no matcher PreCompact doesn't use matcher (only Stop hooks do). 3. **Missing timeout** → Added timeout: 30 to both hooks Matches hooks/README.md specification. 4. **Ambiguous target** → Specified ~/.claude/settings.local.json Clarified global vs project-scoped config. Also added executable chmod instructions and path replacement note. Fixes MemPalace#1037
shutil.move() can partially create palace_path before raising, which would trip a bare os.replace(stale_path, palace_path) rollback (dest exists). - Switch the primary swap to os.replace so same-filesystem moves stay atomic - Branch on errno.EXDEV before falling back to shutil.move, so real errors (permissions, EIO) surface instead of silently attempting a slow copy - Extract rollback into _restore_stale_palace which clears any partial destination and, if the restore itself fails, logs both stale_path and palace_path so the operator can recover by hand Adds three regression tests covering clean rollback, partial-copy cleanup, and logged failure on rollback-failure. Flagged by the Qodo reviewer on MemPalace#935.
Enable setup-python's built-in pip cache on all CI jobs to avoid re-downloading ~300 MB of dependencies (chromadb, onnxruntime, hnswlib) on every run. Bump macOS and Windows from Python 3.9 to 3.11 -- Linux matrix already covers 3.9 compatibility, and 3.11 is faster on these platforms.
~/.mempalace/tunnels.json (introduced in MemPalace#790) was created via plain open(..., "w") with no chmod, and its parent dir via os.makedirs() without mode=0o700. On Linux with default umask 022 both end up world-readable (0o644 / 0o755). Tunnels reveal cross-wing connections — which projects, people, and rooms the user has explicitly linked — so they are sensitive metadata that should not be readable by other local users on shared systems. Apply the same 0o700 / 0o600 pattern that MemPalace#814 established for the other sensitive palace files. Chmod calls are wrapped in try/except (OSError, NotImplementedError) for Windows / unsupported-filesystem compatibility. Closes MemPalace#1165
The miner upserted one drawer per ChromaDB call, paying tokenizer + ONNX session setup per chunk. The embedding device was CPU-only because no EmbeddingFunction was ever wired through the backend. Two changes, each a speedup in its own right; stacked they give ~10x end-to-end on a medium corpus (20 files, 568 drawers): 1. Batched upsert. `process_file` and `_file_chunks_locked` now collect all chunks of a file into a single `collection.upsert(...)` so the embedding model runs one forward pass per file instead of N. 2. Hardware-accelerated embedding function. New `mempalace/embedding.py` wraps `ONNXMiniLM_L6_V2` with configurable `preferred_providers`. `MEMPALACE_EMBEDDING_DEVICE` (or `embedding_device` in config.json) selects auto / cpu / cuda / coreml / dml. Unavailable accelerators log a warning and fall back to CPU. The factory subclasses `ONNXMiniLM_L6_V2` and spoofs its `name()` to `"default"` so the persisted EF identity matches existing palaces created with ChromaDB's bare `DefaultEmbeddingFunction` -- same model, same 384-dim vectors, no rebuild needed when turning GPU on. `ChromaBackend.get_collection` / `create_collection` now pass the resolved EF on every call so miner writes and searcher reads agree. Benchmarks (i9-12900KF + RTX 3090, medium scenario, 568 drawers): per-chunk + CPU 19.77s · 29 drw/s (baseline) batched + CPU 8.07s · 70 drw/s (2.4x) batched + CUDA 2.15s · 264 drw/s (9.2x) Reproducible via `benchmarks/mine_bench.py`. Install paths: pip install mempalace[gpu] # NVIDIA CUDA pip install mempalace[dml] # DirectML (Windows) pip install mempalace[coreml] # macOS Neural Engine Mine header now prints `Device: cpu|cuda|...` so users can confirm the accelerator engaged.
Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <[email protected]>
Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <[email protected]>
Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <[email protected]>
Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <[email protected]>
Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/3213a67a-6871-4bb2-9ae0-23fa11001a22 Co-authored-by: igorls <[email protected]>
perf(mining): batch per-chunk upserts + optional GPU acceleration
When two wings have one or more confirmed TOPIC labels in common, the miner now drops a symmetric tunnel between them at mine time so the palace graph reflects shared themes (frameworks, vendors, recurring concepts). - llm_refine: TOPIC label routes to a dedicated `topics` bucket so the signal survives confirmation instead of getting collapsed into `uncertain` and dropped. - entity_detector / project_scanner: bucket plumbed through the detection pipeline; `confirm_entities` returns confirmed topics alongside people/projects. - miner.add_to_known_entities: optional `wing` parameter records the confirmed topics under `topics_by_wing` in `~/.mempalace/known_entities.json`. Wing names do NOT leak into the flat known-name set used by drawer-tagging. - palace_graph: `compute_topic_tunnels` and `topic_tunnels_for_wing` create symmetric tunnels via the existing `create_tunnel` API so they share dedup and persistence with explicit tunnels. - miner.mine: post-file-loop pass calls `topic_tunnels_for_wing` for the freshly-mined wing. Failures are logged but never abort the mine. - config: `topic_tunnel_min_count` knob (env `MEMPALACE_TOPIC_TUNNEL_MIN_COUNT` or `~/.mempalace/config.json`), default 1. Tests cover topic persistence through init->mine, tunnel creation when wings share a topic, no tunnel below threshold, cross-wing tunnel retrieval via `list_tunnels`, dedup on recompute, case-insensitive overlap, and the end-to-end mine-time wiring. Out of scope for this PR (called out in the PR body): manifest- dependency overlap, per-topic allow/deny lists, search-result surfacing.
… field
Previously a cross-wing topic tunnel for "Angular" stored the room as
"Angular" — colliding with a wing's literal folder-derived "Angular" room
at follow_tunnels/list_tunnels read time, and exposing raw topic strings
(which may contain characters rejected by sanitize_name) to the MCP
surface.
Topic tunnels now store their room as "topic:<original-casing>" and carry
kind="topic" on the stored dict. Explicit tunnels get kind="explicit"
(default). follow_tunnels("wing", "Angular") on a literal Angular room
no longer surfaces topic connections for the same name, and any LLM
scanning list_tunnels has a visible discriminator.
…c-tunnels feat(graph): cross-wing tunnels by shared topics (MemPalace#1180)
chore: add OpenArena owner claim verification file
Three tightly-coupled search-quality fixes for v3.3.3: 1. CLI `mempalace search` now routes through the same `_hybrid_rank` the MCP path already used. Drawers whose text contains every query term but embed as file-tree noise (directory listings, diffs, log fragments) were scoring cosine distance >= 1.0 — the display formula `max(0, 1 - dist)` then floored every result to `Match: 0.0`, with no way for the user to tell a lexical match from a total miss. BM25 catches these cleanly; the display surfaces both `cosine=` and `bm25=` so users see which component is firing. 2. Legacy-palace distance-metric warning. Palaces created before `hnsw:space=cosine` was consistently set silently use ChromaDB's default L2 metric, which breaks the cosine-similarity formula (L2 distances routinely exceed 1.0 on normalized 384-dim vectors). The search path now detects this at query time and prints a one-line notice pointing at `mempalace repair`. Only fires for legacy palaces; new palaces already set cosine correctly. 3. Invariant tests pinning `hnsw:space=cosine` on every collection- creation path — legacy `get_or_create_collection`, legacy `create_collection`, RFC 001 `get_collection(create=True)`, the public `palace.get_collection`, and a round-trip through reopen. Locks down the correctness that new-user palaces already have so a future refactor can't silently regress it. Also adds a `metadata` property to `ChromaCollection` so callers can read the underlying hnsw:space without reaching into `_collection`. Tests: - New regression: simulate three candidates at distance 1.5 (cosine=0), one containing query terms — must rank first with non-zero bm25. - New: legacy metric (empty or non-cosine) produces stderr warning. - New: correctly-configured palace produces no warning. - New: all five creation paths pin cosine metadata. All existing tests still pass.
`test_fresh_palace_via_full_stack_gets_cosine` used `tempfile.Temporary-
Directory()` as a context manager, which tries to delete the temp path
on exit. On Windows, ChromaDB still holds SQLite file handles to
`chroma.sqlite3` when the context closes, producing:
PermissionError: [WinError 32] The process cannot access the file
because it is being used by another process: '...\\chroma.sqlite3'
NotADirectoryError: [WinError 267] The directory name is invalid
Other tests in the same file use pytest's `tmp_path` fixture, which
defers cleanup to session end (when the process is exiting and the
file-lock contention is moot). Align this one with the rest of the
file.
CLAUDE.md already documents the 80% Windows coverage allowance due to
"ChromaDB file lock cleanup" — the fix is to stop fighting the lock.
…ality fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4)
`mempalace init` now ends with a `Mine this directory now? [Y/n]` prompt and runs `mine()` in-process when accepted; `--yes` skips the prompt and auto-mines for non-interactive callers. Declining prints the resume command. Removes the "remember to type the next command" friction since rooms + entities just got set up. `mempalace mine` now wraps its main loop in `try / except KeyboardInterrupt` and prints `files_processed`, `drawers_filed`, and `last_file` before exiting with code 130 on Ctrl-C. Re-mining is safe because deterministic drawer IDs make the upsert idempotent. The hooks PID lock at `~/.mempalace/hook_state/mine.pid` is now actively removed in a `finally` when its entry points at us, on clean exit, error, or interrupt — preventing the next hook fire from briefly waiting on a stale PID. Closes MemPalace#1181, MemPalace#1182.
…ore mine prompt
Reviewer feedback on the previous commit flagged two real problems:
1. Overloading --yes to also auto-mine was a silent behaviour change for
scripted callers. Today --yes only auto-accepts entities — making it
ALSO trigger a multi-minute ChromaDB write breaks every script that
currently runs `mempalace init --yes <dir>` for the fast non-interactive
entity path. Add a separate `--auto-mine` flag instead. Combinations:
mempalace init --yes <dir> # entities auto, STILL prompt mine
mempalace init --auto-mine <dir> # prompt entities, skip mine prompt
mempalace init --yes --auto-mine <dir> # fully non-interactive
--yes behaviour is now identical to pre-PR.
2. The mine prompt was firing without telling the user how big the job
was. On a real corpus mine takes minutes-to-tens-of-minutes; hitting
Enter on default-Y with no size cue is a footgun. Show a one-line
estimate computed from scan_project (the same walk we hand into mine)
BEFORE the prompt:
~423 files (~12 MB) would be mined into this palace.
Mine this directory now? [Y/n]
The estimate uses a single corpus walk: scan_project's output is
passed into mine() via a new optional files= kwarg, so we never walk
the tree twice.
Tests: replaced the old "--yes auto-mines" assertion with a regression
guard that --yes alone STILL prompts; added coverage for --auto-mine
alone, --yes --auto-mine together, and the pre-prompt estimate line.
The "Skipped. Run mempalace mine <dir>" hint after declining the init prompt and the "Re-run mempalace mine <dir> to resume" hint after a Ctrl-C interruption both interpolated project_dir without shell-quoting. A path containing spaces or metacharacters produced a copy-paste-broken command. Both spots now use shlex.quote(project_dir). Adds regression tests covering each hint with a path that contains a space.
The pre-existing test_maybe_run_mine_prompt_declined_prints_hint
asserted the bare unquoted form `mempalace mine {tmp_path}`. After
the production code switched to shlex.quote on the resume hint, this
passed on Linux/macOS (POSIX paths have no characters that trigger
quoting) but failed on Windows where backslashes always get wrapped
in single quotes.
Mirror the production code in the assertion via shlex.quote so it's
portable across platforms; do the same for the two new
spaces-in-path tests for consistency.
…Palace#1208) The user-reported case in MemPalace#1208: a palace with 67,580 drawers had its HNSW files manually quarantined to recover from corruption. ``mempalace repair`` then ran cleanly and reported "Drawers found: 10000 ... Repair complete. 10000 drawers rebuilt." Backup was the v3.3.3 chroma.sqlite3 that did contain the full 67,580 — but the rebuilt collection only had the first 10K. 85% data loss, no warning. Root cause: ChromaDB's collection-layer get() silently caps at ``CHROMADB_DEFAULT_GET_LIMIT = 10_000`` rows when reading from a collection whose segment metadata is stale (typical post-quarantine state). col.count() returns the same capped value, so neither the loop bound nor the extraction count flagged the truncation. Fix is defense-in-depth, not a recovery mechanism. Repair now: 1. After extraction, queries chroma.sqlite3 directly via a read-only sqlite3 connection: COUNT(*) FROM embeddings JOIN segments JOIN collections WHERE name='mempalace_drawers'. If that count exceeds the extracted count, abort with a clear message before any destructive operation. 2. Falls back to a weaker check when the SQLite query can't run (chromadb schema drift, locked file): if extracted exactly equals CHROMADB_DEFAULT_GET_LIMIT, that's a strong-enough cap signal to refuse without explicit acknowledgement. 3. Adds ``--confirm-truncation-ok`` (CLI) and ``confirm_truncation_ok`` (rebuild_index kwarg) to override after independent verification. Useful for the rare case of a palace genuinely sized at exactly 10,000 drawers. The guard logic lives in ``repair.check_extraction_safety()`` so the two extraction paths (CLI ``cmd_repair`` and the lower-level ``rebuild_index``) share a single implementation. Raises ``TruncationDetected`` carrying the printable message. Tests: 9 new cases covering the safe path (counts match, SQLite unreadable but well under cap), both abort paths (SQLite higher than extracted, unreadable + at cap), the override flag, and end-to-end behavior of ``rebuild_index`` with the guard wired in. Plus two ``sqlite_drawer_count`` tests for the missing-file and bad-schema cases. What's NOT in this PR: actually recovering the missing 57,580 drawers from the user's case. The on-disk SQLite still holds them; recovery is a separate flow (direct-extract from chroma.sqlite3, bypass the chromadb collection layer entirely). This PR's job is to stop repair from making it worse. Refs MemPalace#1208.
…n-cap-detection fix(repair): refuse to overwrite when extraction looks truncated (MemPalace#1208)
…o 5min make_client() called _fix_blob_seq_ids but skipped quarantine_stale_hnsw, so every fresh process (stop hook, precompact hook, CLI) opened a drifted palace and segfaulted in chromadb_rust_bindings before any write-path protection could fire. MemPalace#1062 wires the quarantine call at MCP server startup (covers long-lived server processes). This fix adds it to make_client() itself — the call site that all callers (server, hooks, CLI, tests) pass through — so every fresh PersistentClient open is protected regardless of entry point. Also lowers stale_seconds default from 3600 to 300: a 0.96h-drifted segment caused production segfaults before the 1h threshold fired. ChromaDB's HNSW flush cadence means legitimate drift is seconds to low minutes; 5min gives headroom without admitting clearly corrupt segments.
Symptom on the canonical disks daemon: drift quarantines firing every 10–30 minutes throughout the day under steady write load. Logs show .drift-* directories accumulating despite the daemon being the only writer (no Syncthing replication of palace data). Root cause is a false-positive thrash in the quarantine heuristic: - chroma.sqlite3 mtime bumps on every write (millisecond cadence). - HNSW segment files (data_level0.bin) only flush to disk on chromadb's internal cadence, which can lag minutes behind sqlite under continuous write load. Once the gap exceeds the 300s threshold, quarantine_stale_hnsw renames a perfectly valid HNSW segment, chromadb rebuilds it from scratch, and the cycle repeats as soon as the next batch of writes lands. The 300s threshold (lowered from 3600s in PR MemPalace#1173 after a 0.96h-drift production segfault) is correct for the cross-machine-replication failure mode it was designed for, but wrong for a daemon-strict deployment whose only "drift" source is its own benign flush lag. Fix: gate the proactive quarantine check to the first ``make_client()`` invocation per palace per process (``ChromaBackend._quarantined_paths`` set). Real cold-start drift (replication, partial restore, crashed-mid- write) still gets caught — that's exactly when a fresh daemon process opens the palace. Real runtime drift on observed HNSW errors still gets caught via palace-daemon's ``_auto_repair`` which calls ``quarantine_stale_hnsw`` directly, bypassing this gate. Two new tests in test_backends.py verify single-fire-per-palace and per-palace independence. Conftest clears the gate between tests. Suite 1362/1362 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ush-lag Previous: quarantine fired whenever sqlite_mtime - hnsw_mtime exceeded the (lowered, in MemPalace#1173) 300s threshold. ChromaDB 1.5.x flushes HNSW asynchronously and a clean shutdown does not force-flush, so the on- disk HNSW is *always* meaningfully older than chroma.sqlite3 — that's the steady state, not corruption. Quarantine renamed valid HNSW segments on every cold-start, chromadb created empty replacements, vector recall went to 0/N until rebuild. Confirmed in production on the disks daemon journal, 2026-04-26 06:56:45: three of three HNSW segments quarantined on cold-start with 538-557s mtime gaps (post-clean-shutdown flush lag), leaving a 151,478-drawer palace with vector_ranked=0. Drift directories at *.drift-20260426-065645/ each contained a complete 253MB data_level0.bin plus 18MB index_metadata.pickle — clearly healthy indexes, renamed by the false-positive heuristic. Fix: two-stage gate. 1. mtime gate (existing) — gap > stale_seconds is necessary. 2. integrity gate (new) — sniff index_metadata.pickle for chromadb's expected protocol/terminator bytes (PROTO 0x80 head, STOP 0x2e tail) and a non-trivial size, WITHOUT deserializing the file. Healthy segment with mtime drift → keep in place; truncated / zero-filled / partial-flush → quarantine. Format-sniff is deliberately non-deserializing — pickle deserialization can execute arbitrary code, and the PROTO+STOP byte presence + size floor is sufficient to distinguish a complete chromadb write from truncation, zero-fill, or a partial flush during process kill. Real load failures (the rare case where the bytes look right but chromadb fails to load) still surface to palace-daemon's _auto_repair, which calls quarantine_stale_hnsw directly on observed HNSW errors and bypasses this gate. The cold-start gate from 70c4bc6 (row 24) remains as a perf optimization — even with the integrity check, repeating the sniff on every reconnect is unnecessary work — but its load-bearing role is now covered by this deeper fix. 4 new tests in test_backends.py: - test_quarantine_stale_hnsw_renames_corrupt_segment (drift + bad meta) - test_quarantine_stale_hnsw_leaves_healthy_segment_with_drift_alone (drift + valid meta — the production case at 06:24) - test_quarantine_stale_hnsw_leaves_segment_without_metadata_alone (fresh / never-flushed, no meta file) - test_quarantine_stale_hnsw_renames_truncated_metadata (under-floor size, partial-flush shape) Existing test_quarantine_stale_hnsw_renames_drifted_segment renamed to renames_corrupt_segment with explicit corrupt meta_bytes — the old "renames any drift" contract is gone. Suite 1366/1366 pass. Coordinated cross-repo with palace-daemon's auto-repair-on-startup workaround (separate agent's commit ed3a892). With this fork-side fix the auto-repair becomes belt-and-suspenders; the structural cause of empty-HNSW-on-restart is addressed at the quarantine layer. CLAUDE.md row 26 + README fork-change-queue row + test count 1363→1366. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
fix: sanitize topic parameter in tool_diary_write
10 files changed. 2,563 insertions, 30 deletions. 48 new tests, including end-to-end coverage live-tested with Anthropic Haiku 4.5. This PR overhauls the first-run experience of `mempalace init` end-to-end, ships a new corpus-origin detection module from scratch, wires it into entity classification and LLM refinement, adds a graceful-fallback path that means `init` never crashes on a missing LLM, and ships a meta-test that prevents internal-coordination jargon from leaking into source or tests. The headline change is that `mempalace init` now understands what kind of folder you're pointing it at — AI conversations, regular writing, code, narrative — and adapts how it classifies entities accordingly. The same folder containing `Echo`, `Sparrow`, and `Cipher` (names you've assigned to AI agents) used to dump those into your "people" list alongside biological humans. Now they go into a separate `agent_personas` bucket, and your `people` list stays clean. But the broader change is that `mempalace init` got upgraded across the board — smarter defaults, smarter degradation, smarter classification, smarter persistence, and a new way to refresh as your folder grows. Built and live-verified with Anthropic Haiku 4.5; runs unmodified on the local LLM runtimes mempalace already supports. ## What changes for users (in order, from `pip install` onwards) **Install** — `pip install mempalace` is unchanged. The package itself didn't shift. **First run — `mempalace init <folder>`:** 1. **`init` examines your folder before classifying anything.** A free regex heuristic decides in milliseconds: AI conversations, regular writing, narrative, or code? If an LLM is reachable, a second pass extracts the corpus author's name and any agent persona names from the dialogue. v3.3.3 had no such step — it dove straight into entity detection with no corpus context. 2. **LLM-assisted classification is now ON by default.** v3.3.3 made `--llm` opt-in. The LLM-assisted path is qualitatively better (extracts persona names, refines ambiguous classifications, gives the model corpus context) so it now runs by default. The provider abstraction is unchanged from v3.3.3 — three buckets are supported by `mempalace.llm_client`: - **Anthropic** (`--llm-provider anthropic` + `ANTHROPIC_API_KEY`) — the official Messages API. **This is the path live-verified end-to-end in this PR with Haiku 4.5.** Cost: ~\$0.01 per `init`. - **Ollama** (`--llm-provider ollama` — the default) — local models via `http://localhost:11434`. Fully offline. Honors the "zero-API required" promise. - **OpenAI-compatible** (`--llm-provider openai-compat` + `--llm-endpoint`) — per the v3.3.3 `mempalace/llm_client.py` docstring, this covers "OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Fireworks, Together, and most self-hosted setups." We did not test each of those individually as part of this PR; the abstraction has been stable since v3.3.3. If you try this PR with a specific provider and hit a quirk, please file an issue or comment here. 3. **`init` never blocks on a missing LLM.** No Ollama running, no API key set? `init` prints a one-line message pointing at `--no-llm` and falls through to the heuristic-only path. New default behavior, new graceful fallback to support it. `--no-llm` is the new explicit opt-out. 4. **`init` shows you what it detected.** A one-line banner — `Detected: Claude (Anthropic) (user: Jordan, agents: Echo, Sparrow, Cipher)` or `Corpus origin: not AI-dialogue (confidence: 0.98)` — tells you at a glance whether mempalace understood your folder. 5. **Entity classification gets smarter across the board.** Even non-persona candidates benefit: the LLM has corpus context (this is AI-dialogue, this is the user's name, these are agent names) and uses it to disambiguate ambiguous candidates that aren't personas at all. 6. **Agent personas live in their own bucket.** Names you've assigned to AI agents (Echo, Sparrow, Cipher) go into a new `agent_personas` bucket instead of your `people` list. Your real-person entity list stays clean. 7. **Detection result persists to `<palace>/.mempalace/origin.json`** with a `schema_version: 1` envelope, so downstream tools can read it. 8. **Re-running `init` is now idempotent.** Bug fix — running `init` twice on the same folder used to give different classification results because the detection step was sampling its own `entities.json` output. Caught by integration testing during this PR. **Later — when your folder grows:** 9. **`mempalace mine --redetect-origin`** is a new flag for refreshing the stored detection without redoing the whole `init`. Heuristic-only by design (the flag is meant to be cheap). If you want the full LLM-extracted detection refreshed (persona names, user name, etc.), run `mempalace init <yourfolder>` again — `init` is now idempotent (item 8), so re-running it on the same folder is safe. ## Behind the changes - **New module** `mempalace/corpus_origin.py` (422 lines) with two-tier detection: regex heuristic with co-occurrence rule (suppresses ambiguous terms like `Claude` / `Gemini` / `Haiku` when no unambiguous AI signal is present, so French novels, astrology forums, poetry corpora, llama-rancher journals don't false-positive), and LLM tier that extracts `user_name` and `agent_persona_names` from dialogue structure with belt-and-suspenders user-vs-agent disambiguation. - **Entity-classification consumer wiring.** `entity_detector.detect_entities` and `project_scanner.discover_entities` accept an optional `corpus_origin` kwarg. When present and the corpus is identified as AI-dialogue, candidates whose name case-insensitively matches an `agent_persona_name` are routed into the `agent_personas` bucket instead of `people`. Per-entity `type` is rewritten to `"agent_persona"`. - **LLM-refine consumer wiring.** `llm_refine.refine_entities` accepts the same `corpus_origin` kwarg and prepends a `CORPUS CONTEXT` preamble to its system prompt giving the LLM the platform / user / persona context. Existing `TOPIC` / `PERSON` / `PROJECT` / `COMMON_WORD` / `AMBIGUOUS` labels are unchanged. - **`init` overhaul.** Pass 0 (corpus-origin detection) inserted before existing Pass 1 (entity discovery). `--llm` flipped to default-on. `--no-llm` added. Graceful-fallback path replaces the previous hard-error on missing LLM. Provider precedence unchanged from the existing `llm_client` module. - **`mine` flag.** `mempalace mine --redetect-origin` re-runs corpus-origin detection on the current corpus state and overwrites `<palace>/.mempalace/origin.json`. - **`CLAUDE.md` design principle reworded** — "Local-first, zero external API by default." Local LLMs running on `localhost` (Ollama, LM Studio, llama.cpp, vLLM, unsloth studio) are part of the user's machine, not external APIs. External BYOK providers (Anthropic, OpenAI, Google) are supported but always opt-in, never default, never silent fallback. ## Cost story - **Anthropic (verified path):** ~\$0.01 per `init` via Haiku 4.5 with `ANTHROPIC_API_KEY`. - **Ollama / local LLM runtime:** zero cost. Fully offline. - **OpenAI-compatible service:** depends entirely on the service. The abstraction supports any service speaking the standard `/v1/chat/completions` API; specific quirks vary per provider. Try it and tell us how it goes. - **No LLM at all:** graceful fallback to heuristic-only. Zero cost. `init` never blocks. ## Backwards compatibility - All public function signatures gained the `corpus_origin` kwarg as optional (default `None`). Callers that don't pass it see the v3.3.3 return shape unchanged — no `agent_personas` key, no behavioral change. - The `--llm` CLI flag is preserved as a deprecated alias of the default. Existing scripts that pass it continue to work. - `corpus_origin=None` keeps `llm_refine.SYSTEM_PROMPT` byte-identical to v3.3.3. ## Test coverage - **19 unit tests** in `tests/test_corpus_origin.py` covering both tiers, the co-occurrence rule, ambiguous-term suppression, word-boundary brand matching, and user/persona disambiguation. - **29 integration tests** in `tests/test_corpus_origin_integration.py` covering end-to-end through `mempalace init`, persona reclassification, the `--redetect-origin` flag, the `--llm` default flip, graceful fallback paths, and re-init idempotency. Of those 29, five specifically cover the intersection with develop's other in-flight work (Pass 0 ↔ auto-mine ordering, topics + agent_personas bucket coexistence, entities.json shape, the `wing=` kwarg threading, llm_refine TOPIC label + corpus_origin preamble composition). - **1354 total mempalace tests pass.** 2 pre-existing environmental failures (`test_mcp_stdio_protection` — chromadb optional dep) unrelated to this change; they fail on plain `develop` too. - **Live-smoke-tested** with real Anthropic Haiku 4.5 on AI-dialogue and narrative fixtures. ## Hygiene guardrail This PR also adds a meta-test (`test_no_internal_coordination_jargon_in_source_or_tests`) that walks the source tree and asserts no internal-coordination jargon (e.g. development-phase markers, internal review-section references) leaks into runtime code, comments, docstrings, or LLM prompts. RED if anything slips in. Allowlist for legitimate RFC/spec section citations in `sources/`, `backends/`, `knowledge_graph.py`, and `i18n/`.
feat(init): context-aware corpus detection
Restore three `_pin_hnsw_threads` tests that the previous integrity-gate commit deleted. The function is still live code on develop (defined at chroma.py:207, called from chroma.py:705 + mcp_server.py), so the deletion left the import unused (ruff F401) and dropped coverage on a function unrelated to this PR's scope. Restored verbatim from main. Plus three nits @igorls flagged: - **Thread-safety doc**: `_quarantined_paths` mutation is lock-free; documented that idempotency of `quarantine_stale_hnsw` is the safety property (concurrent same-palace calls produce a benign redundant rename attempt that no-ops, no need for a lock). - **Pickle protocol assumption**: `_segment_appears_healthy` requires PROTO ≥ 2 (`0x80`). Documented; matches what chromadb writes today, and a future protocol-0/1 emission would conservatively quarantine + lazy-rebuild rather than mis-classify as healthy. - Class-level vs module-level scope: keeping class-level — the conftest reset is the controlled case, and module-level wouldn't remove the foot-gun, just relocate it. Conftest reset documented in the existing comment is the right pattern for test isolation. Style nit (`Path(marker).touch()` vs `open(marker, "a").close()`) deferred — that pattern lives in MemPalace#1177's territory, not MemPalace#1173's. 37/37 tests pass on the PR branch. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…view Three new tests cover the marker contract: - test_fix_blob_seq_ids_writes_marker_after_blob_path — marker is written after a successful BLOB → INTEGER conversion. - test_fix_blob_seq_ids_writes_marker_when_already_integer — marker is written even on a no-op (already-INTEGER) path. The point of the marker is to skip sqlite3.connect() on subsequent calls, not to record that a conversion happened. - test_fix_blob_seq_ids_skips_sqlite_when_marker_present — verifies the load-bearing property via monkeypatched sqlite3.connect: when the marker exists, the function never opens sqlite. This is the bug MemPalace#1090 fix — opening Python's sqlite3 against a live ChromaDB 1.5.x WAL DB corrupts the next PersistentClient call, and we must never do it again post-migration. Also folded in @igorls's style nit: - Path(marker).touch() instead of open(marker, "a").close(). Same effect, reads cleaner. Moved Path import to the top of the module since it didn't yet exist there. 35/35 backend tests pass. The "Production: fresh MCP server + stop hook + mempalace mine subprocess" checkbox in the PR body is checked in the PR reply — the canonical 151K palace has been running this fix since MemPalace#1177 was filed (via Syncthing-replicated source on the disks daemon at v1.7.0); zero PersistentClient corruption since. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…f replacing 2 files changed, 260 insertions, 7 deletions. 4 new tests (all RED-first). Per @igorls's review of PR MemPalace#1211 (MemPalace#1211 (comment)): the corpus-origin Pass 0 currently lets a Tier 2 LLM result REPLACE the heuristic result wholesale. With ``--llm`` default-on (since MemPalace#1211) and a small local model like Ollama gemma4:e4b, the LLM can return a wrong ``likely_ai_dialogue=False, confidence=0.90`` that overrides a confident heuristic ``True``. Tier 2's persona/user/platform extraction is the whole reason to run it; the YES/NO call should stay with the heuristic. This PR changes ``_run_pass_zero`` in ``mempalace/cli.py`` to merge fields instead of replacing: - ``likely_ai_dialogue`` → KEEP heuristic's (don't let weak LLM flip) - ``confidence`` → KEEP heuristic's (paired with the bool above) - ``primary_platform`` → TAKE LLM's when LLM provides one - ``user_name`` → TAKE LLM's when LLM provides one - ``agent_persona_names`` → TAKE LLM's when LLM provides any - ``evidence`` → COMBINE both signal trails This preserves the persona-extraction value of Tier 2 (the whole point of running it) while preventing a weak local model from flipping a confident heuristic. TDD: 4 tests added in tests/test_corpus_origin_integration.py covering the four state combinations: 1. test_merge_tier_fields_heuristic_yes_llm_no_keeps_heuristic_bool — The exact failure mode Igor caught. Heuristic confidently flags AI-dialogue; mocked LLM contradicts. Asserts merged result keeps heuristic's True AND merges LLM's persona/user/platform fields. This test was the RED that drove the implementation. 2. test_merge_tier_fields_heuristic_no_no_personas_leak — Both tiers agree NOT AI-dialogue, both report empty personas. Pins that the merge doesn't accidentally introduce personas. 3. test_merge_tier_fields_heuristic_yes_llm_yes_combines_evidence — Both tiers agree AI-dialogue, LLM extracts personas. Pins that evidence from BOTH tiers ends up in the merged audit trail and persona/user/platform come from LLM. 4. test_merge_tier_fields_no_llm_provider_returns_heuristic_only — Backwards compat: with no LLM provider (``--no-llm`` path), the merge logic doesn't fire and behavior is identical to v3.3.4. Tests: 1367 pass on the full mempalace suite. 2 pre-existing environmental failures unrelated to this change (chromadb optional dep). Ruff check + format both clean.
…client fix: call quarantine_stale_hnsw() in make_client(); lower threshold to 5min
…erge-tier-fields feat(corpus-origin): merge LLM fields into heuristic result instead of replacing
fix: skip _fix_blob_seq_ids sqlite open on already-migrated palaces (MemPalace#1090)
…etadata fix(palace_graph): skip None metadata in build_graph
fix(searcher): tolerate None documents in BM25 reranker
4 files changed, 248 insertions, 0 deletions. 7 new tests (4 unit + 3 integration), all RED-first. Per @milla-jovovich's question to @igorls during PR MemPalace#1221 review: users running `mempalace init` with an external LLM provider (Anthropic API, OpenAI hosted, etc.) need a clear, explicit warning that their folder content will be sent to the provider, that MemPalace doesn't control how the provider logs/retains/uses that data, and how to opt out. @igorls confirmed this should be a small follow-up PR scoped to the warning itself, before the v3.3.4 tag. This PR adds: - `_endpoint_is_local(url)` helper in `mempalace/llm_client.py` — URL-based heuristic returning True if the hostname is on the user's machine or private network. Covers: localhost, 127.0.0.1, ::1, hostnames ending in .local (mDNS/Bonjour), IPv4 RFC1918 ranges (10/8, 172.16-31/12, 192.168/16), and IPv6 unique-local addresses (fc00::/7). - `is_external_service` property on the `LLMProvider` base class. Subclasses inherit; the URL determines (no provider-specific hardcoding). This means: Ollama on localhost = local. LM Studio on LAN = local. Anthropic with default `https://api.anthropic.com` = external. A user proxying Anthropic through localhost (advanced setup) = local, no false-positive warning. - One-line warning print in `cmd_init` after successful provider acquisition, gated on `is_external_service`: ⚠ {provider_name} is an EXTERNAL API. Your folder content will be sent to the provider during init. MemPalace does not control how the provider logs, retains, or uses your data. Pass --no-llm to keep init fully local. The warning fires AFTER `LLM enabled: ...` so users see both that the LLM is engaged AND the privacy implications of where it lives, before Pass 0 / entity detection actually runs. LOCAL providers (Ollama on localhost, LM Studio on localhost or LAN, llama.cpp on localhost, vLLM on localhost) DO NOT trigger the warning — nothing leaves the user's machine/network in those configurations. TDD: 7 tests added across 2 files. Unit tests in `tests/test_llm_client.py` (4 tests, all RED-first): 1. test_ollama_provider_default_endpoint_is_local — pins that the default `http://localhost:11434` is classified local. 2. test_openai_compat_provider_localhost_endpoint_is_local — covers the LM Studio / llama.cpp / vLLM common case (localhost, 127.0.0.1, and 192.168.x LAN). 3. test_openai_compat_provider_cloud_endpoint_is_external — pins that pointing openai-compat at https://api.openai.com (or any non-local URL) classifies as external. 4. test_anthropic_provider_default_endpoint_is_external — pins that AnthropicProvider's default endpoint is external (the dominant user-facing case for `--llm-provider anthropic`). Integration tests in `tests/test_corpus_origin_integration.py` (3 tests, RED-first; 1 was the critical RED — the other 2 passed by accident since nothing printed "EXTERNAL API" before this PR): 5. test_init_prints_privacy_warning_when_provider_is_external — captures stdout from cmd_init with a mocked external provider, asserts the warning text contains "EXTERNAL API" + "--no-llm" + language about MemPalace not controlling provider behavior. 6. test_init_no_privacy_warning_when_provider_is_local — same flow with a mocked local provider, asserts the warning text does NOT appear. 7. test_init_no_privacy_warning_with_no_llm_flag — pins the --no-llm path: no provider acquisition attempted, no warning fires. Tests: 1382 total mempalace tests pass. 2 pre-existing environmental failures unrelated to this change (chromadb optional dep). Ruff check + format both clean. Backwards compatible: `is_external_service` is a new property; existing callers don't reference it. The warning is a new print statement that fires only when an external endpoint is acquired. The `--no-llm` opt-out existed before this PR and continues to work identically. Out of scope for follow-up (deliberately not in this PR per Igor's "small PR" guidance): Tailscale CGNAT (100.64.0.0/10) treatment, pre-init confirmation prompt, persistent privacy-mode config flag, explicit cloud-provider name detection. Tracked for future iteration.
…ternal-llm feat(privacy): warn when LLM tier sends content to external API
…urce contract Two follow-ups to PR MemPalace#1221's merge-fields behavior, both raised by the Copilot review on that PR: - Evidence merge now prefixes each entry with `Tier-1 heuristic: ` or `Tier-2 LLM: ` so the on-disk `origin.json` audit record retains tier provenance. The pre-MemPalace#1221 code labeled heuristic evidence; the merge-fields refactor flattened that. Re-prefixing is idempotent. - Tests now assert that the merged `confidence` is the heuristic's, not the LLM's. Added inline assertions to the two existing contradiction/disagreement tests, plus a dedicated `test_merge_tier_fields_confidence_matches_heuristic_call` that compares to `detect_origin_heuristic` directly so a future regression letting Tier 2 confidence leak through cannot pass silently. Tests: 1378 pass. Ruff check + format both clean (CI-pinned 0.4.x).
- cli.py: stringify each evidence entry exactly once before the startswith check (was calling str(e) twice per element). - tests: replace brittle `confidence != 0.90` assertion with an equality check against detect_origin_heuristic on the same samples. The original would have spuriously fired if the heuristic ever legitimately produced 0.90 for these samples; the new form pins the contract directly.
2 files changed, 60 insertions, 0 deletions. 2 new tests (RED-first). Follow-up to MemPalace#1224's privacy warning. The URL-based heuristic in ``mempalace.llm_client._endpoint_is_local`` shipped without recognizing Tailscale's CGNAT range (100.64.0.0/10), so a user running LM Studio, Ollama, or any local LLM accessible via a Tailscale-assigned 100.x.x.x address would currently get a wrong privacy warning — Tailscale addresses are network-private (only reachable inside the user's Tailnet) but they're not RFC1918, so the heuristic was treating them as external. This PR adds CGNAT recognition: when the hostname starts with ``100.`` AND the second octet is between 64 and 127 inclusive, it's classified as local. Addresses in 100.x.x.x outside that range (i.e. second octet < 64 or > 127) are regular allocated public space and remain external, so a user pointing at a public 100.0.0.1 still gets the warning. Concrete user impact: Before: ``mempalace init --llm-provider openai-compat --llm-endpoint http://100.100.50.50:1234`` (LM Studio on Tailnet) → triggers privacy warning incorrectly. After: same command → no warning. data stays inside the user's Tailnet, which is what the warning is supposed to protect against. TDD: 2 tests added in ``tests/test_llm_client.py``, both RED-first. 1. ``test_openai_compat_provider_tailscale_cgnat_endpoint_is_local`` — covers three Tailscale CGNAT addresses (start, middle, near-end of the range) and pins they're all classified local. This was the RED that drove the implementation. 2. ``test_openai_compat_provider_outside_tailscale_cgnat_is_external`` — pins the boundary on both sides: addresses with second octet 0-63 and 128-255 stay external. Prevents future "treat all 100.x.x.x as local" overcorrection. Tests: 1388 total mempalace tests pass. 2 pre-existing environmental failures unrelated to this change (chromadb optional dep). Ruff check + format both clean. Backwards compatible: only widens the local-recognition set. Anything classified local before is still classified local; anything classified external before remains so unless it's specifically in the CGNAT range. Out of scope (tracked for future iteration based on real user feedback, not built speculatively): pre-init confirmation prompt before sending to external API, persistent ``private-only`` config flag that refuses external endpoints entirely, explicit cloud-provider name detection ("Using Anthropic's hosted API at ..." vs the current generic warning).
…merge-followup chore(corpus-origin): tag merged evidence by tier + pin confidence-source contract
…ilscale-cgnat feat(privacy): treat Tailscale CGNAT range (100.64.0.0/10) as local
Issue MemPalace#858 reported that the precompact hook unconditionally returned decision: block with no escape, permanently blocking /compact. PR MemPalace#885 fixed the symptom by always returning {}, but in doing so dropped the save nudge entirely — the model now gets no signal that compaction is imminent, defeating the safety feature. This restores the nudge in two ways: 1. Default (warn-only mode, issue MemPalace#858 Option 3): print the nudge to stderr — which Claude Code surfaces to the model — and return {} so compaction still proceeds. Strict superset of the post-MemPalace#885 behavior. 2. Opt-in (MEMPAL_BLOCK_COMPACT=1, issue MemPalace#858 Option 2): two-phase hard-block. The first /compact attempt this session blocks with the nudge as the block reason; the next attempt clears the per-session flag (under $STATE_DIR/${SESSION_ID}_blocked_compact) and lets compaction through. Gives users a forced save-before-compact pause without re-introducing the original MemPalace#858 footgun. Also updates the stale "ALWAYS blocks" header comments to describe the actual current behavior.
Contributor
|
Heads up: this PR targets One framing nit on the PR description: #885 was a different fix (replacing the invalid |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #858 / #885. Issue #858 reported that the precompact hook hard-blocked
/compactwith no escape. PR #885 fixed the symptom by always returning{}— but dropped the save nudge entirely, so the model now gets no signal that compaction is imminent. This PR restores the nudge without re-introducing the original footgun.What changed
{}. Compaction still proceeds; the model just sees the nudge first. Strict superset of the post-fix: replace invalid 'decision: allow' with {} in hooks (closes #872) #885 behavior.MEMPAL_BLOCK_COMPACT=1, issue PreCompact hook always blocks /compact — no way to acknowledge/bypass #858 Option 2): two-phase hard-block. First/compactthis session blocks with the nudge as the reason; the next attempt clears a per-session flag at\$STATE_DIR/\${SESSION_ID}_blocked_compactand lets compaction proceed. Gives users a forced save-before-compact pause without re-creating the PreCompact hook always blocks /compact — no way to acknowledge/bypass #858 footgun (the original block had no escape).Mirrors the
MEMPAL_VERBOSEopt-in pattern already used bymempal_save_hook.sh.Why not just leave it as
echo '{}'The hook header in
mempal_precompact_hook.shdescribes its purpose as a safety net — "forces one final save of EVERYTHING before [compaction]". After #885, there's no save signal at all: the hook does its backgroundmempalace mine(only ifMEMPAL_DIRis set, which is empty by default) and exits silent. The model never learns compaction is about to happen.Restoring a model-visible nudge re-establishes the documented purpose; routing it through stderr (rather than a
decision: block) keeps/compactnon-blocking for the default case.Test plan
bash -n hooks/mempal_precompact_hook.sh— syntax OK{\"session_id\":\"X\"}→ stdout{}, stderr contains nudge, exit 0MEMPAL_BLOCK_COMPACT=1first call: stdout{\"decision\":\"block\",\"reason\":\"…\"}, flag file createdMEMPAL_BLOCK_COMPACT=1second call: stdout{}, flag file removedRefs