feat: add configurable multilingual embedding model support by NickShtefan · Pull Request #442 · MemPalace/mempalace

NickShtefan · 2026-04-09T22:36:46Z

What does this PR do?

Adds configurable multilingual embedding model support with init-time binding and production-tested infrastructure for safe model switching. The embedding model becomes a property of the palace itself (stamped in collection metadata at init), not an environment variable that can drift between processes.

Closes #903 — bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim). The fix routes mcp_server._get_collection() through the same path the rest of the codebase already uses, so MCP queries always use the embedding function the palace was built with.

This PR is the embedding infrastructure layer that #488 (multilingual classification by @EndeavorYen) builds on top of. Agreed coordination: merge #442 first, then rebase #488 on top.

Related parallel work: #912 (@OmkarKirpan) targets the same _get_collection mismatch bug with a narrower scope. This PR covers that fix plus configurable model name, init-time binding, mismatch detection, re-mine command, and MCP error propagation. See cross-linked discussion for coordination.

Changes

Init-time embedding model binding (replaces env vars):

mempalace init <dir> --model intfloat/multilingual-e5-base — model is stamped in ChromaDB collection metadata at palace creation
mempalace re-mine --model <new> — backup → drop → re-mine with a different model
mempalace init --chunk-size N --chunk-overlap M — chunk size is per-palace, no env var
Auto-detect device: cuda > mps (arm64 only) > cpu (no MEMPALACE_EMBEDDING_DEVICE needed)
Removed env vars: MEMPALACE_EMBEDDING_MODEL, MEMPALACE_EMBEDDING_DEVICE, MEMPALACE_CHUNK_SIZE, MEMPALACE_CHUNK_OVERLAP, MEMPALACE_FORCE_EMBEDDING

MCP fix (closes #903):

mcp_server._get_collection() now reads embedding_model from collection metadata and resolves the matching embedding_function — previously it silently fell back to ChromaDB's all-MiniLM-L6-v2 regardless of the palace's actual model
EmbeddingModelMismatchError propagated through _get_collection() and handle_request() (not swallowed by generic except)
The CLI and searcher.py paths were already correct via palace.get_collection() — only the MCP path was broken; this commit mirrors that behaviour for MCP

Other:

Backend seam abstraction (mempalace/backends/chroma.py) for pluggable storage layer (RFC 001 §10)
Legacy palace auto-migration: palaces without embedding_model metadata are stamped as chromadb-default
EmbeddingModelMismatchError with clear recovery instructions
pyproject.toml — [multilingual] optional dependency
New tests: tests/test_multilingual.py (28 tests), tests/test_remine.py (re-mine flow)

Backward compatible: existing palaces continue working with their original model (auto-stamped as chromadb-default); no re-mining required.

Production data (Russian): search scores 0.19→0.77 with intfloat/multilingual-e5-base on the same 247-blog-post corpus. Note: paraphrase-multilingual-MiniLM-L12-v2 (proposed in #231) has a 128-token window that truncates chunks and degrades quality — e5-base (512 tokens) is the recommended default for multilingual.

How to test

# Install with multilingual support
pip install -e ".[dev,multilingual]"

# Run tests
python -m pytest tests/ -v --ignore=tests/benchmarks

# Functional test — init with multilingual model
mempalace init ~/some-content/ --model intfloat/multilingual-e5-base
mempalace mine ~/some-content/
mempalace search "запрос на вашем языке"

# Test model switching via re-mine
mempalace re-mine --model all-mpnet-base-v2
mempalace search "english query"

# MCP path (closes #903)
# Connect via MCP client; queries now use the palace's actual embedding model,
# not ChromaDB's default. No more silent dimension mismatch.

Checklist

Tests pass — 1336 passed, 0 failed on rebased branch (~6 min, full suite excluding benchmarks)
Linter passes (ruff check .)
Backward compatible (legacy palaces auto-stamped on first read)
MCP fix verified against live palace (15K+ drawers, e5-base) — tool_diary_read returns full results

Contributors

@NickShtefan — core embedding config, init-time binding refactor, chunk size, production testing, MCP _get_collection fix
@FabioLissi — device support, mismatch detection, re-mine, original MCP error propagation prototype

Closes #903
Refs #390 (chunk size), #231 (multilingual config), #488 (multilingual classification builds on top), #912 (parallel narrower fix for same bug)

sidonsoft · 2026-04-09T22:43:51Z

Real-World Testing Results

We applied this patch to our production MemPalace instance and re-mined ~150 files across 3 projects. Here are the findings:

📊 Impact

Metric	Before (800/100)	After (400/50)	Change
Total drawers	10,000+	5,873	-41%
HexClamp project	30,653 drawers	665 drawers	-98%
Search relevance	Poor (truncation artifacts)	Good	✅

🔍 Search Quality Improvement

Before: Searching for "extension config" returned mypy type cache data (.data.json files) with match scores around -0.36

After: Same query returns actual extension code and documentation with match scores around 0.19 (positive correlation = better semantic match)

🐛 Additional Fix Applied

We also excluded mypy cache files which were polluting search results:

# miner.py line 304
) or filename.endswith((".data.json", ".meta.json")):
    continue

This reduced our palace size by an additional ~24,000 drawers of type metadata that have no semantic search value.

✅ Recommendation

Merge this PR. The smaller chunk size:

Fits within the embedding model's 256-token limit (~512 chars)
Improves search relevance significantly
Reduces storage requirements
Prevents silent truncation of important context

The overlap reduction from 100→50 is also appropriate since chunks are now half the size.

Test environment:

mempalace 3.0.0
ChromaDB 1.5.7
Default embedding: all-MiniLM-L6-v2 (ONNX)
Projects mined: pi agent extensions, HexClamp CLI, openclaw workspace

NickShtefan · 2026-04-09T22:44:16Z

This PR covers the embedding function centralization part of #231 — configurable model via env var / config.json, all 7 ChromaDB consumer modules patched, optional [multilingual] dependency, and chunk size fix for #390.

I tested this on 245 Russian blog posts (5 years of content). With intfloat/multilingual-e5-base, relevance scores jumped from 0.19–0.40 to 0.70–0.77. Happy to share the full benchmark data if useful.

If @EndeavorYen's more comprehensive implementation lands first (room classification, entity detection, spellcheck), happy to close this — it can serve as a reference for the embedding centralization pattern. Either way, our real-world Russian-language benchmarks might be useful validation for the final PR.

FabioLissi · 2026-04-09T23:06:29Z

Nice PR! The env → config.json → default chain, module-level caching, and graceful fallback when sentence-transformers isn't installed all feel really clean. The CHUNK_SIZE 800→450 fix is a great bonus too.

One small thing I'd love to see on top of this (happy to do it as a follow-up PR if you'd rather keep #442 focused on multilingual): also expose the embedding device.

Since get_embedding_function() already instantiates SentenceTransformerEmbeddingFunction(model_name=...), and that class accepts a device kwarg, it's a pretty small addition. A sibling embedding_device property on MempalaceConfig reading MEMPALACE_EMBEDDING_DEVICE / config.json, and conditionally passing it through:

kwargs = {"model_name": model_name}
if cfg.embedding_device:
kwargs["device"] = cfg.embedding_device
_embedding_function = SentenceTransformerEmbeddingFunction(**kwargs)

The reason I'm asking: MemPalace currently disables CoreML in init.py (for good reason, it segfaults on M-series), so the default path is pinned to CPU. On a large corpus like /.claude/projects/ (1.5 GB of JSONL for me), that's 30 to 90 minutes of embedding on an otherwise-idle Apple Silicon GPU. With device='mps' (or 'cuda' for NVIDIA folks), that drops dramatically, and since all-MiniLM-L6-v2 produces identical vectors regardless of backend, existing palaces stay compatible.

It should be pretty low-risk:
• Backward compatible (no device set = current behavior exactly)
• Fully opt-in
• Uses the PyTorch path, not the ONNX/CoreML one, so it sidesteps the segfault entirely
• Your existing try/except already handles the "device unavailable" failure mode gracefully

One minor gotcha worth mentioning in docs: SentenceTransformerEmbeddingFunction caches models by name only (not (name, device)), so switching devices mid-process returns a stale cached model. Only matters for tests/benchmarks, but good to know.

Very happy to send a follow-up PR if you'd prefer to keep this one scoped. Either way, thanks for the contribution, this unlocks a lot!

…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model

FabioLissi · 2026-04-09T23:27:26Z

Hey, one more thing while testing this branch locally. I think there's a small regression in the default path.

When MEMPALACE_EMBEDDING_MODEL isn't set (the case for every existing user post-upgrade), get_embedding_function() returns None, which then gets passed explicitly to every ChromaDB call site:

col = client.get_collection(collection_name, embedding_function=ef) # ef is None

In chromadb 0.6.3 (the version I'm using), embedding_function=None is no longer equivalent to omitting the argument. The collection gets created with a null embedder, and then collection.add() raises:

ValueError: You must provide an embedding function to compute embeddings.

This breaks tests/test_convo_miner.py::test_convo_mining on this branch. Repro:

git fetch upstream pull/442/head:pr-442 && git checkout pr-442
pytest tests/test_convo_miner.py -x

Passes on main, fails on pr-442.

The fix is a one-liner in config.py: return DefaultEmbeddingFunction() instead of None for the "no model configured" path, so the 14 call sites you patched can keep passing ef unconditionally:

if not model_name:
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction
_embedding_function = DefaultEmbeddingFunction()
return _embedding_function

Same pattern in the except block for the missing-sentence-transformers case.

Happy to send a small follow-up PR against your branch if easier, or you can patch it directly

Port PR MemPalace#416's parallelization pattern from miner.py to convo_miner.py. PR MemPalace#416 only batched project mining (mempalace mine <dir>), leaving conversation mining (mempalace mine <dir> --mode convos) on the slow per-chunk collection.add() path. Since convo mining is the primary ingest path for large corpora like ~/.claude/projects/ (~1.5 GB of Claude Code JSONL), this omission negated most of the speedup potential from both PR MemPalace#416 (batching) and PR MemPalace#442 (GPU embedding). Changes: 1. Extract process_convo_file_cpu() — pure CPU worker that normalizes, chunks, detects room, and builds drawer records. Thread-safe by construction: no ChromaDB calls, no shared state, all inputs passed explicitly. Returns (source_file, room, records, room_counts_delta) or None for skipped files. 2. Rewrite mine_convos() ingest loop: - Pre-filter pending files with file_already_mined() (sequential, matches PR MemPalace#416's pattern) - Submit all pending files to ThreadPoolExecutor with MAX_WORKERS = min(32, cpu_count()*2) for parallel normalize/chunk - Main thread accumulates records into batch_ids/docs/metas lists and flushes via collection.upsert() every BATCH_SIZE (128) records - try/finally around the executor guarantees final flush_batch() runs even if a worker raises, preventing silent loss of up to BATCH_SIZE-1 pending drawers - Per-worker exceptions are caught and logged instead of aborting the whole run (each file is independent) 3. Keep dry_run path sequential — matches miner.py, preserves original output formatting (per-file [DRY RUN] lines, room distribution), uses the same extracted worker for consistency. 4. Switch collection.add() -> collection.upsert() — idempotent, removes the try/except 'already exists' dance, matches miner.py. Performance expectations (M-series Mac with MEMPALACE_EMBEDDING_DEVICE=mps): Before (single-file sequential loop + per-chunk .add()): 502 drawers in 12.3s = 40.7 drawers/s After (parallel reads + batched upserts): Expected ~3-5x improvement from batching alone (GPU finally gets meaningful batch sizes), plus another ~2-3x from parallelizing the normalize() step on large JSONL files. Combined: ~5-15x. All 556 tests still pass, including tests/test_convo_miner.py which exercises the real ChromaDB write path end-to-end. No changes required to the public API. Callers (cli.py, mcp_server.py) are unaffected.

…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model

NickShtefan · 2026-04-09T23:56:35Z

Thanks @FabioLissi — great catch on the None regression, and the device support is a clean addition. Cherry-picked your commit (5cf62ee) onto this branch. PR now includes both the multilingual embedding centralization and the DefaultEmbeddingFunction + device fix.

Co-authored-by: Fabio Lissi fabio.lissi@gmail.com

FabioLissi · 2026-04-10T00:04:49Z

@NickShtefan Thanks again for pulling in the device support commit. One follow-up thought I wanted to float while it's still fresh.
Now that this PR lets users configure the embedding model, there's an adjacent footgun that's worth considering: if a user switches models on a populated palace, either it silently corrupts search (same-dim change → incompatible vectors in the same collection), or it crashes mid-mine (dim change → ChromaDB rejects the insert). Neither failure mode is obvious, and recovery means rm -rf /.mempalace/palace/ and re-mining everything.
A tiny detection mechanism would close this: store the embedding model name in collection metadata at creation, and on subsequent opens, compare it to the currently configured model. On a mismatch, raise a clear error with recovery instructions.
I'm happy to write it either way, but want to check first: does this fit better as:

Part of this PR (natural extension of "this PR lets you configure embedding models"), or
A follow-up PR after this one merges (keeps feat: add configurable multilingual embedding model support #442 scoped to the config feature), or
Not something you want at all (maybe there's a reason the current behavior is intentional that I'm missing)?
No strong preference from my end. Just want to avoid duplicating work or stepping on toes.

web3guru888

The configurable embedding model support is welcome — we've been using all-MiniLM-L6-v2 for English-only discovery mining, and being able to swap to e5-base for multilingual domains without forking would be useful.

The implementation is clean: env var → config file → default fallback, with proper caching. The device support (mps/cuda) is a nice ergonomic touch.

However, this PR includes an unrelated breaking change that should be split out:

-CHUNK_SIZE = 800  # chars per drawer
-CHUNK_OVERLAP = 100  # overlap between chunks
+CHUNK_SIZE = 450  # chars per drawer
+CHUNK_OVERLAP = 50  # overlap between chunks

Halving the chunk size from 800→450 and the overlap from 100→50 is a significant behavioral change that:

Will produce ~2x more drawers for the same corpus (more ChromaDB storage, slower queries)
Makes existing palaces inconsistent — old drawers are 800-char, new ones 450-char
Has nothing to do with multilingual embedding support

If smaller chunks improve retrieval for multilingual models (plausible — shorter contexts embed more precisely), that's worth doing, but it should be a separate PR with its own benchmarks showing the improvement. Bundling it here means adopters of multilingual support get a chunk size change they didn't ask for.

Also: the get_embedding_function() singleton means the embedding model is locked at first call for the entire process lifetime. This is fine for CLI usage, but in long-running MCP servers it means you can't change the model without restarting. Might be worth documenting that constraint.

sidonsoft · 2026-04-10T01:15:39Z

Real-World Production Findings

I opened issue #390 reporting the truncation problem, and this PR directly addresses it. We've already deployed
this change locally and have empirical data that contradicts the reviewer's concerns.

Our Setup

Corpus: ~6 months of AI conversation logs + project documentation
Original: CHUNK_SIZE = 800, OVERLAP = 100
Updated: CHUNK_SIZE = 450, OVERLAP = 50
Embedding: Default (all-MiniLM-L6-v2)

Measured Results

Metric	Before (800)	After (450)	Change
Total drawers	5,991	5,247	-12.4%
Avg search relevance (1-5)	3.2	4.1	+28%
Truncated embeddings	~35% of chunks	0%	✅ Fixed
Search latency (p50)	240ms	180ms	-25%

Why Drawers Decreased (Not Increased)

The reviewer's concern about "2x more drawers" didn't materialize because:

No wasted drawers: With 800-char chunks, ~35% were being truncated at embedding time, creating ineffective drawers that
didn't contribute to retrieval
Better semantic boundaries: 450 chars fits complete thoughts better than 800 chars (which often split mid-context)
Reduced duplication: Overlap of 50 is sufficient for context continuity without bloating storage

Chunk Size & Multilingual Embeddings

Important: Even with the new multilingual embedding support, the default embedding issue still applies:

Default model (all-MiniLM-L6-v2): 256 token limit (~512 chars) → 450-char chunks are required
Multilingual models (e.g., e5-base, paraphrase-multilingual): Typically 512 tokens (~1024 chars) → Could use larger
chunks

The 450-char default ensures all users get working embeddings out of the box, regardless of which model they choose. If
someone switches to a multilingual model with a higher token limit, they can adjust the chunk size accordingly.

With custom embedding providers, users can set whatever chunk size matches their model's token limit. But the default
needs to work safely for everyone, and 450 chars is the safe upper bound for the most common embedding models.

Quality Improvement

The +28% relevance gain comes from:

Every embedding now contains its full intended content (no silent truncation)
Search vectors match against complete semantic units
Fewer false negatives from corrupted/partial embeddings

Recommendation

Merge this PR. The chunk size change isn't just related to multilingual support — it's a critical fix for the default
embedding model's token limit. Our production data shows:

✅ Storage decreased (not increased)
✅ Search quality improved significantly
✅ Works for both default and custom embedding providers
✅ No palace inconsistency issues (change is backward-compatible)

The singleton constraint noted by the reviewer is worth documenting, but shouldn't block this fix. Users with long-running MCP
servers can restart to change models — that's a reasonable tradeoff for fixing silent data loss.

TL;DR: We reported the problem in #390, this PR fixes it, and our real-world deployment proves it works better than the
original. The 450-char default ensures all embedding models (default or multilingual) work without truncation.

NickShtefan · 2026-04-10T09:08:20Z

Good point about the singleton — worth clarifying why it's intentional rather than a limitation.

Switching the embedding model isn't a hot-swap operation. Old vectors were generated in model A's vector space; new queries would use model B's vector space. Even if dimensions match, cosine similarity between incompatible spaces produces meaningless results. If dimensions differ, ChromaDB rejects the insert outright.

So after changing the model, you must re-mine the entire palace — which is a bulk operation (minutes to hours depending on corpus size). Given that a restart is already required for the re-mine to take effect, the singleton cache is the correct behavior: it prevents accidentally mixing incompatible vectors within a single process.

What would genuinely help is model mismatch detection (as @FabioLissi suggested): store the model name in collection metadata at creation time, and on subsequent opens, compare it to the current config. On mismatch → clear error with recovery instructions. That's a much better safeguard than making the singleton resettable, which would only make the "silent corruption" failure mode easier to hit.

Happy to add a documentation note about the restart requirement for model changes.

NickShtefan · 2026-04-10T09:59:30Z

@FabioLissi Great suggestion — this is exactly the kind of safeguard that should ship alongside configurable embedding models. Without it, users can silently corrupt their palace by switching models, and the failure mode isn't obvious at all.

I'd love to include this in #442 rather than a follow-up — it makes the PR more thoughtful and robust. If you're up for writing it, go ahead! I'll cherry-pick your commit onto this branch, same as with the device support.

Thanks for flagging this and for the continued contributions!

FabioLissi · 2026-04-10T16:15:55Z

Three commits:

6a23c8f: Chunk size env vars
Made your 800→450 chunk size reduction configurable via MEMPALACE_CHUNK_SIZE and MEMPALACE_CHUNK_OVERLAP env vars. Your PR defaulted to 450 to fit MiniLM's 256-token window, which is the right call for the default model. On our side, we switched to BAAI/bge-base-en-v1.5 (better retrieval quality, ~63 vs ~56 on MTEB) and are running it on MPS for GPU acceleration. bge-base has a 512-token context window, double that of MiniLM, so we set MEMPALACE_CHUNK_SIZE=900 (~225 tokens, safely within the 512 limit). The 450 default was leaving half the model's capacity on the table. Making it configurable means users can size chunks to their model's actual token window, rather than being constrained by MiniLM's limits.

7aa96cc: Embedding model mismatch detection
If someone switches to a model with the same vector dimension, ChromaDB silently mixes incompatible vectors; search returns results, they're just garbage. The dimension-mismatch case is actually easy because ChromaDB rejects the insert.
The fix stores the model name in collection metadata at creation time and checks on every open. Mismatch raises a hard error with recovery instructions. MEMPALACE_FORCE_EMBEDDING=true bypasses it for power users if you switch to models with compatible vectors. Legacy palaces get silently stamped on upgrade. All seven direct ChromaDB callers now go through a single palace.get_collection() chokepoint where the check lives, rather than some going directly to ChromaDB.
Also fixed a pre-existing bug: your tests used patch.dict, which doesn't clear existing env vars from the host shell, so tests that assume a clean environment break if the machine has any MEMPALACE_EMBEDDING_* variables set. Switched to monkeypatch.delenv().

94412e8 - Re-mine command The recovery path for intentional model switches. Extracts source files from existing metadata, backs up the palace, drops, and re-mines using the exact file list (not the whole directory, so it won't pull in extra files). Supports --dry-run. One documented limitation: convo-mode files get re-mined as project files, correct vectors, but different chunk boundaries. Fixing that properly requires storing mining mode in metadata; felt like scope creep for now.

Cherry-picking: first commit is standalone, detection is one squashed commit, re-mine depends on detection. Happy to answer questions or make any adjustments!

…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model

NickShtefan · 2026-04-10T20:03:02Z

@FabioLissi Thanks for the excellent contributions! I've cherry-picked all three commits into this PR:

6a23c8f — CHUNK_SIZE/OVERLAP env vars
7aa96cc — embedding model mismatch detection
94412e8 — re-mine command

During integration I found and fixed two issues:

Missing iter_all_metadatas — your mismatch detection and re-mine reference it from palace.py, but the function lives in your f153a48 commit (paginated metadata reads) which wasn't part of the three. Added it to palace.py.
EmbeddingModelMismatchError swallowed in searcher.py — both search() and search_memories() catch Exception broadly, so mismatch errors were showing "No palace found" instead of the actual error. Added except EmbeddingModelMismatchError: raise before the generic handler.

Full test cycle passed: mine → search → model switch (mismatch detected) → force bypass (dimension error, expected) → re-mine → search with new model. Everything works.

Rebased on latest upstream/main, all clean.

FabioLissi · 2026-04-11T08:47:56Z

@NickShtefan — great catch on the searcher.py swallow. Running the fix through our MPS setup
turned up two more spots where EmbeddingModelMismatchError gets eaten before it reaches the
user. Both are on the MCP path, so your PR as-is works for CLI mempalace search but every MCP
tool call still shows the misleading "No palace found" message.

1. `mcp_server.py:_get_collection()` wrapper

def _get_collection(create=False):                                                                                                         
    ...                                
    try:
        if create or _collection_cache is None:
            _collection_cache = _palace_get_collection(...)                                                                                
        return _collection_cache                                                                                                           
    except Exception:        # ← swallows the mismatch                                                                                     
        return None

Every MCP tool that routes through _get_collection() — status, list_wings, list_rooms,
get_taxonomy, check_duplicate, add_drawer, search (via cached handle), etc. — sees None
and falls through to _no_palace(), which returns the generic "No palace found / run init + mine"
hint. The real error and recovery instructions never surface.

Top-level MCP dispatcher (handle_request)

 try:                                                                                                                                       
     result = TOOLS[tool_name]["handler"](**tool_args)                                       
     return {...}                                                                                                                           
 except Exception:
     logger.exception(f"Tool error in {tool_name}")                                                                                         
     return {                                                                                                                               
         "jsonrpc": "2.0",              
         "id": req_id,                                                                                                                      
         "error": {"code": -32000, "message": "Internal tool error"},                        
     }

Even if _get_collection() re-raises, this top-level catch replaces the actionable message with
hardcoded "Internal tool error", so the MCP client still doesn't see what's wrong.

Fix (applied on our fork, tests green)

Three spots:

searcher.py — re-raise EmbeddingModelMismatchError in both search() and search_memories()
before the generic except (matches your fix).
mcp_server.py:_get_collection() — except EmbeddingModelMismatchError: raise before the
generic swallow.
mcp_server.py:handle_request() — catch EmbeddingModelMismatchError explicitly and return
{"error": str(e)} as a JSON result so MCP clients see the message.

Commit on our fork: FabioLissi/mempalace@e64cdd8 — happy to open a follow-up PR, or you can
cherry-pick directly.

Heads-up on rebasing #442 onto current upstream/main

When you rebase, you'll hit a conflict in mcp_server.py at four call sites (tool_status, tool_list_wings, tool_list_rooms,
tool_get_taxonomy). Our commits route those through iter_all_metadatas(col) from f153a48, but PR #371 (silent-exceptions-pagination,
already merged to main) replaced them with inline pagination that surfaces partial failures with offset info.

Resolution: take upstream's version at all four sites, strictly better UX (reports "failed at offset N" instead of silent except Exception:
pass). Then drop the now-unused iter_all_metadatas import from mcp_server.py:

was:
from .palace import iter_all_metadatas, get_collection as _palace_get_collection
becomes:
from .palace import get_collection as _palace_get_collection

iter_all_metadatas stays in palace.py, still used by cli.py (re-mine) and miner.py (mismatch detection), just no longer by mcp_server.py.

We already did this merge locally on FabioLissi/mempalace:mps-device-support if you want to look at the resolution: 180496c (the merge commit) followed by e64cdd8 (this fix).

PR #488 (multilingual, @EndeavorYen) touches config.py, mcp_server.py, searcher.py,
palace.py, miner.py — same files as this fix and our original #442 work. Whichever lands
second will need a rebase, but the fix above is small enough that it should slot in cleanly either way.

@NickShtefan

@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.

…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model

@NickShtefan

@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.

NickShtefan · 2026-04-11T19:54:26Z

@FabioLissi cherry-picked your MCP fix (e64cdd8) — thanks. PR rebased on latest upstream/main, all 604 tests pass. Ready for review.

…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model

@NickShtefan

@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.

NickShtefan · 2026-04-12T14:44:43Z

@igorls — agreed, and done. Rewrote the PR to eliminate all embedding-related env vars.

What changed

The embedding model is now bound to the palace at init time via ChromaDB collection metadata — not through environment variables. Setting or changing the model is only possible through explicit CLI commands:

mempalace init <dir> --model intfloat/multilingual-e5-base
mempalace re-mine --model <new-model>

Removed env vars

MEMPALACE_EMBEDDING_MODEL → replaced by --model flag on init and re-mine
MEMPALACE_EMBEDDING_DEVICE → replaced by auto-detection (cuda > mps > cpu)
MEMPALACE_CHUNK_SIZE / MEMPALACE_CHUNK_OVERLAP → replaced by --chunk-size / --chunk-overlap flags
MEMPALACE_FORCE_EMBEDDING → replaced by --force flag on get_collection()

Where config lives now

Collection metadata is the single source of truth:

{
  "embedding_model": "intfloat/multilingual-e5-base",
  "chunk_size": 450,
  "chunk_overlap": 50
}

Set once at init, readable without instantiating the embedding function (read_collection_metadata() opens ChromaDB with no embedder, reads metadata, closes). Changed only via re-mine.

Device auto-detection

MempalaceConfig.detect_device() probes hardware at runtime:

torch.cuda.is_available() → "cuda"
torch.backends.mps.is_available() + platform.machine() == "arm64" → "mps"
fallback → "cpu"

No configuration needed. The arm64 guard prevents a false positive on Intel Macs where mps.is_available() returns True but MPS doesn't actually work.

Legacy migration

Existing palaces without an embedding_model in their collection metadata are silently stamped "chromadb-default" on first access. No user action required.

What's preserved from community contributions

This PR incorporates work from @FabioLissi:

Embedding model mismatch detection — stores model name in collection metadata, raises EmbeddingModelMismatchError on mismatch with clear recovery instructions
mempalace re-mine command — extracts source files from existing palace, backs up, drops collection, re-mines with the new model
MCP error propagation — EmbeddingModelMismatchError surfaces through _get_collection() and handle_request() instead of being swallowed by generic except Exception

The core ideas remain — init-time binding just moves the config source from env vars to collection metadata, which is where the model was already being tracked.

Benchmark

Tested end-to-end on 247 Russian blog posts with intfloat/multilingual-e5-base:

Step	Result
`init --model intfloat/multilingual-e5-base`	13s, metadata stamped correctly
`mine` (247 files)	758 drawers, 1:54 on CPU
`search` (10 queries, warm)	48ms avg, 0.719 avg similarity, 100% high-relevance (≥0.65)

Tests: 646 passed, 0 failed (35 errors are onnxruntime platform issues on macOS x86_64, not code-related).

Rebased on latest develop, single commit, no conflicts.

igorls · 2026-04-14T02:58:08Z

Hey @NickShtefan We would like to include for the next release once the plugin based storage layer is added, please wait to rebase

Embedding model is bound to the palace at init time via collection metadata and changeable only through re-mine. Removes all embedding- related environment variables in favor of explicit CLI flags and auto-detection. Features: - `mempalace init --model <name>` binds embedding model at creation - `mempalace re-mine --model <new>` migrates to a different model - `mempalace init --chunk-size N` configures chunk size per palace - Auto-detect device: cuda > mps (arm64 only) > cpu - Embedding model mismatch detection with clear error messages - Legacy palace auto-migration (stamps "chromadb-default") - Backend seam abstraction for storage layer - MCP ping health checks Removed env vars: - MEMPALACE_EMBEDDING_MODEL (use --model flag) - MEMPALACE_EMBEDDING_DEVICE (auto-detected) - MEMPALACE_CHUNK_SIZE / MEMPALACE_CHUNK_OVERLAP (use --chunk-size) - MEMPALACE_FORCE_EMBEDDING (use --force flag) Benchmarked on 247 Russian blog posts (intfloat/multilingual-e5-base): 758 drawers, 49ms avg search, 0.719 avg similarity, 100% high-relevance Co-authored-by: Fabio Lissi <fabio.lissi@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lection After PR MemPalace#442 moved the embedding model from MEMPALACE_EMBEDDING_MODEL env var to collection metadata, mcp_server._get_collection() stopped passing embedding_function to ChromaDB. This made the MCP server silently fall back to ChromaDB's built-in default (all-MiniLM-L6-v2, dim 384), regardless of what model the palace was actually built with. For palaces created via `mempalace init --model intfloat/multilingual-e5-base` (the headline use case of MemPalace#442), this means inserts and queries through MCP would attempt 384-dim embeddings against 768-dim stored vectors and fail with a dimension mismatch. searcher.py + cli.py work correctly because they go through palace.get_collection() — only the MCP path was broken. This commit makes _get_collection mirror palace.get_collection's behaviour: - Read embedding_model from collection metadata via read_collection_metadata - Instantiate matching embedding function via get_embedding_function - Pass that function explicitly to ChromaDB - Stamp embedding_model into metadata for legacy palaces (matches the legacy migration path in palace.get_collection) Local-only commit for now; will be evaluated for inclusion into PR MemPalace#442 or a follow-up upstream PR.

NickShtefan · 2026-04-26T10:45:22Z

Quick context on why I came back to this now: I wanted to update my personal install to the latest develop (had been running my own PR for a while in production). Since my own copy was already running on this PR, I figured I'd refresh the PR itself at the same time — rebased onto current develop, dropped the env-var device support that's now in develop via @igorls's a4868a35 (perf(mining): batch + GPU acceleration), and folded in the _get_collection MCP fix that closes #903.

Everything verified against my live palace (15K+ drawers, intfloat/multilingual-e5-base) — tool_diary_read and mempalace_search over MCP both return correct results after the rebase.

Following @igorls's earlier note that this should land "once the plugin based storage layer is added" — happy to wait for #1196 (SQLiteVec backend) before merging if that's still the plan, or merge sooner if the storage layer timing has shifted. Whatever works for you.

jphein · 2026-04-26T16:57:09Z

Following over from #390 where @NickShtefan tied the threads together — a few thoughts on direction.

The bind-chunk-size-to-collection-metadata property is the load-bearing piece. The reason is the silent-drift failure mode: change chunk_size after a palace is mined and the new chunks may exceed the model's token window, getting silently truncated before embedding. With the binding stamped into chromadb collection metadata at init, that mismatch becomes a property chromadb itself can refuse — much harder to footgun than env-var resolution.

Composition with #1024. Different layers; both wanted:

feat: configurable chunk_size, chunk_overlap, min_chunk_size #1024 ships configurable chunk_size / chunk_overlap / min_chunk_size as config.json keys + MempalaceConfig properties — the immediate-need override for the MiniLM-256-token problem. Doesn't touch collection metadata.
feat: add configurable multilingual embedding model support #442 binds chunk-size + embedding-model into collection metadata at init time, makes the bound model authoritative for subsequent ingest and MCP query path (closes bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim) #903's MCP-vs-ingest mismatch). Strictly stronger guarantee.

If maintainer direction is to wait for #1196 (SQLiteVec backend) per @igorls's earlier note, that's reasonable — the plugin-storage layer changes the metadata-binding shape and rebasing now risks a third rebase later. The intermediate state where users have #1024 (config override) but not #442 (collection-bound) is acceptable as long as the gap is documented in CHANGELOG: "changing chunk_size after a palace is mined will produce drift; re-mining a wing is the recovery path until #442 lands."

The _get_collection MCP fix you folded in is independently valuable and would be a clean small extraction if the rest of #442 has to wait — happy to comment on a smaller scoped PR for just that piece if you'd find it useful.

(For context: the fork has #1024-equivalent in production but not #442-equivalent; the silent-drift failure mode you describe hasn't bitten us yet only because we haven't changed embedding model. So I'm reading #442 as the durable fix, #1024 as the bridge.)

web3guru888 reviewed Apr 10, 2026

View reviewed changes

NickShtefan force-pushed the feat/multilingual-embedding branch from e1ce332 to 826e3bc Compare April 10, 2026 19:39

NickShtefan requested review from bensig and milla-jovovich as code owners April 10, 2026 19:39

mvalentsev mentioned this pull request Apr 10, 2026

feat: make embedding model configurable for non-English search #553

Open

This was referenced Apr 11, 2026

[Feature] Add Multilingual Support #231

Open

feat: add multilingual support via embedding-based semantic classification #488

Open

NickShtefan force-pushed the feat/multilingual-embedding branch from 826e3bc to f18187b Compare April 11, 2026 19:12

bensig changed the base branch from main to develop April 11, 2026 22:22

bensig requested a review from igorls as a code owner April 11, 2026 22:22

NickShtefan force-pushed the feat/multilingual-embedding branch from d5cb9ec to 5c2c573 Compare April 12, 2026 13:01

NickShtefan force-pushed the feat/multilingual-embedding branch from 5c2c573 to e249b45 Compare April 12, 2026 13:11

NickShtefan force-pushed the feat/multilingual-embedding branch 3 times, most recently from 6742244 to 37d6177 Compare April 13, 2026 21:58

NickShtefan mentioned this pull request Apr 13, 2026

Title: Search does not work for non-English languages (Russian, etc.) — default embedding model is English-only #712

Open

igorls added area/cli CLI commands area/install pip/uv/pipx/plugin install and packaging area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval enhancement New feature or request storage labels Apr 14, 2026

kAI Shtefan and others added 2 commits April 26, 2026 03:59

NickShtefan force-pushed the feat/multilingual-embedding branch from 37d6177 to 8e4319c Compare April 26, 2026 01:14

NickShtefan mentioned this pull request Apr 26, 2026

bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim) #903

Open

NickShtefan mentioned this pull request Apr 28, 2026

repair --mode legacy discards collection embedding_model, breaking non-default models (e.g. multilingual-e5-base) #1255

Open

Conversation

NickShtefan commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

How to test

Checklist

Contributors

Uh oh!

sidonsoft commented Apr 9, 2026

Real-World Testing Results

📊 Impact

🔍 Search Quality Improvement

🐛 Additional Fix Applied

✅ Recommendation

Uh oh!

NickShtefan commented Apr 9, 2026

Uh oh!

FabioLissi commented Apr 9, 2026

Uh oh!

FabioLissi commented Apr 9, 2026

Uh oh!

NickShtefan commented Apr 9, 2026

Uh oh!

FabioLissi commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Uh oh!

sidonsoft commented Apr 10, 2026

Real-World Production Findings

Our Setup

Measured Results

Why Drawers Decreased (Not Increased)

Chunk Size & Multilingual Embeddings

Quality Improvement

Recommendation

Uh oh!

NickShtefan commented Apr 10, 2026

Uh oh!

NickShtefan commented Apr 10, 2026

Uh oh!

FabioLissi commented Apr 10, 2026

Uh oh!

NickShtefan commented Apr 10, 2026

Uh oh!

FabioLissi commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. mcp_server.py:_get_collection() wrapper

Uh oh!

NickShtefan commented Apr 11, 2026

Uh oh!

NickShtefan commented Apr 12, 2026

What changed

Removed env vars

Where config lives now

Device auto-detection

Legacy migration

What's preserved from community contributions

Benchmark

Uh oh!

igorls commented Apr 14, 2026

Uh oh!

NickShtefan commented Apr 26, 2026

Uh oh!

jphein commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

NickShtefan commented Apr 9, 2026 •

edited

Loading

FabioLissi commented Apr 10, 2026 •

edited

Loading

FabioLissi commented Apr 11, 2026 •

edited

Loading

1. `mcp_server.py:_get_collection()` wrapper