feat: add configurable multilingual embedding model support#442
feat: add configurable multilingual embedding model support#442NickShtefan wants to merge 2 commits intoMemPalace:developfrom
Conversation
Real-World Testing ResultsWe applied this patch to our production MemPalace instance and re-mined ~150 files across 3 projects. Here are the findings: 📊 Impact
🔍 Search Quality ImprovementBefore: Searching for "extension config" returned mypy type cache data ( After: Same query returns actual extension code and documentation with match scores around 🐛 Additional Fix AppliedWe also excluded mypy cache files which were polluting search results: # miner.py line 304
) or filename.endswith((".data.json", ".meta.json")):
continueThis reduced our palace size by an additional ~24,000 drawers of type metadata that have no semantic search value. ✅ RecommendationMerge this PR. The smaller chunk size:
The overlap reduction from 100→50 is also appropriate since chunks are now half the size. Test environment:
|
|
This PR covers the embedding function centralization part of #231 — configurable model via env var / config.json, all 7 ChromaDB consumer modules patched, optional I tested this on 245 Russian blog posts (5 years of content). With If @EndeavorYen's more comprehensive implementation lands first (room classification, entity detection, spellcheck), happy to close this — it can serve as a reference for the embedding centralization pattern. Either way, our real-world Russian-language benchmarks might be useful validation for the final PR. |
|
Nice PR! The env → config.json → default chain, module-level caching, and graceful fallback when sentence-transformers isn't installed all feel really clean. The CHUNK_SIZE 800→450 fix is a great bonus too. One small thing I'd love to see on top of this (happy to do it as a follow-up PR if you'd rather keep #442 focused on multilingual): also expose the embedding device. Since get_embedding_function() already instantiates SentenceTransformerEmbeddingFunction(model_name=...), and that class accepts a device kwarg, it's a pretty small addition. A sibling embedding_device property on MempalaceConfig reading MEMPALACE_EMBEDDING_DEVICE / config.json, and conditionally passing it through: kwargs = {"model_name": model_name} The reason I'm asking: MemPalace currently disables CoreML in init.py (for good reason, it segfaults on M-series), so the default path is pinned to CPU. On a large corpus like /.claude/projects/ (1.5 GB of JSONL for me), that's 30 to 90 minutes of embedding on an otherwise-idle Apple Silicon GPU. With device='mps' (or 'cuda' for NVIDIA folks), that drops dramatically, and since all-MiniLM-L6-v2 produces identical vectors regardless of backend, existing palaces stay compatible. It should be pretty low-risk: One minor gotcha worth mentioning in docs: SentenceTransformerEmbeddingFunction caches models by name only (not (name, device)), so switching devices mid-process returns a stale cached model. Only matters for tests/benchmarks, but good to know. Very happy to send a follow-up PR if you'd prefer to keep this one scoped. Either way, thanks for the contribution, this unlocks a lot! |
…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model
|
Hey, one more thing while testing this branch locally. I think there's a small regression in the default path. When MEMPALACE_EMBEDDING_MODEL isn't set (the case for every existing user post-upgrade), get_embedding_function() returns None, which then gets passed explicitly to every ChromaDB call site: col = client.get_collection(collection_name, embedding_function=ef) # ef is None In chromadb 0.6.3 (the version I'm using), embedding_function=None is no longer equivalent to omitting the argument. The collection gets created with a null embedder, and then collection.add() raises: ValueError: You must provide an embedding function to compute embeddings. This breaks tests/test_convo_miner.py::test_convo_mining on this branch. Repro: git fetch upstream pull/442/head:pr-442 && git checkout pr-442 Passes on main, fails on pr-442. The fix is a one-liner in config.py: return DefaultEmbeddingFunction() instead of None for the "no model configured" path, so the 14 call sites you patched can keep passing ef unconditionally: if not model_name: Same pattern in the except block for the missing-sentence-transformers case. Happy to send a small follow-up PR against your branch if easier, or you can patch it directly |
Port PR MemPalace#416's parallelization pattern from miner.py to convo_miner.py. PR MemPalace#416 only batched project mining (mempalace mine <dir>), leaving conversation mining (mempalace mine <dir> --mode convos) on the slow per-chunk collection.add() path. Since convo mining is the primary ingest path for large corpora like ~/.claude/projects/ (~1.5 GB of Claude Code JSONL), this omission negated most of the speedup potential from both PR MemPalace#416 (batching) and PR MemPalace#442 (GPU embedding). Changes: 1. Extract process_convo_file_cpu() — pure CPU worker that normalizes, chunks, detects room, and builds drawer records. Thread-safe by construction: no ChromaDB calls, no shared state, all inputs passed explicitly. Returns (source_file, room, records, room_counts_delta) or None for skipped files. 2. Rewrite mine_convos() ingest loop: - Pre-filter pending files with file_already_mined() (sequential, matches PR MemPalace#416's pattern) - Submit all pending files to ThreadPoolExecutor with MAX_WORKERS = min(32, cpu_count()*2) for parallel normalize/chunk - Main thread accumulates records into batch_ids/docs/metas lists and flushes via collection.upsert() every BATCH_SIZE (128) records - try/finally around the executor guarantees final flush_batch() runs even if a worker raises, preventing silent loss of up to BATCH_SIZE-1 pending drawers - Per-worker exceptions are caught and logged instead of aborting the whole run (each file is independent) 3. Keep dry_run path sequential — matches miner.py, preserves original output formatting (per-file [DRY RUN] lines, room distribution), uses the same extracted worker for consistency. 4. Switch collection.add() -> collection.upsert() — idempotent, removes the try/except 'already exists' dance, matches miner.py. Performance expectations (M-series Mac with MEMPALACE_EMBEDDING_DEVICE=mps): Before (single-file sequential loop + per-chunk .add()): 502 drawers in 12.3s = 40.7 drawers/s After (parallel reads + batched upserts): Expected ~3-5x improvement from batching alone (GPU finally gets meaningful batch sizes), plus another ~2-3x from parallelizing the normalize() step on large JSONL files. Combined: ~5-15x. All 556 tests still pass, including tests/test_convo_miner.py which exercises the real ChromaDB write path end-to-end. No changes required to the public API. Callers (cli.py, mcp_server.py) are unaffected.
…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model
|
Thanks @FabioLissi — great catch on the Co-authored-by: Fabio Lissi fabio.lissi@gmail.com |
|
@NickShtefan Thanks again for pulling in the device support commit. One follow-up thought I wanted to float while it's still fresh.
|
web3guru888
left a comment
There was a problem hiding this comment.
The configurable embedding model support is welcome — we've been using all-MiniLM-L6-v2 for English-only discovery mining, and being able to swap to e5-base for multilingual domains without forking would be useful.
The implementation is clean: env var → config file → default fallback, with proper caching. The device support (mps/cuda) is a nice ergonomic touch.
However, this PR includes an unrelated breaking change that should be split out:
-CHUNK_SIZE = 800 # chars per drawer
-CHUNK_OVERLAP = 100 # overlap between chunks
+CHUNK_SIZE = 450 # chars per drawer
+CHUNK_OVERLAP = 50 # overlap between chunksHalving the chunk size from 800→450 and the overlap from 100→50 is a significant behavioral change that:
- Will produce ~2x more drawers for the same corpus (more ChromaDB storage, slower queries)
- Makes existing palaces inconsistent — old drawers are 800-char, new ones 450-char
- Has nothing to do with multilingual embedding support
If smaller chunks improve retrieval for multilingual models (plausible — shorter contexts embed more precisely), that's worth doing, but it should be a separate PR with its own benchmarks showing the improvement. Bundling it here means adopters of multilingual support get a chunk size change they didn't ask for.
Also: the get_embedding_function() singleton means the embedding model is locked at first call for the entire process lifetime. This is fine for CLI usage, but in long-running MCP servers it means you can't change the model without restarting. Might be worth documenting that constraint.
Real-World Production FindingsI opened issue #390 reporting the truncation problem, and this PR directly addresses it. We've already deployed Our Setup
Measured Results
Why Drawers Decreased (Not Increased)The reviewer's concern about "2x more drawers" didn't materialize because:
Chunk Size & Multilingual EmbeddingsImportant: Even with the new multilingual embedding support, the default embedding issue still applies:
The 450-char default ensures all users get working embeddings out of the box, regardless of which model they choose. If With custom embedding providers, users can set whatever chunk size matches their model's token limit. But the default Quality ImprovementThe +28% relevance gain comes from:
RecommendationMerge this PR. The chunk size change isn't just related to multilingual support — it's a critical fix for the default
The singleton constraint noted by the reviewer is worth documenting, but shouldn't block this fix. Users with long-running MCP TL;DR: We reported the problem in #390, this PR fixes it, and our real-world deployment proves it works better than the |
|
Good point about the singleton — worth clarifying why it's intentional rather than a limitation. Switching the embedding model isn't a hot-swap operation. Old vectors were generated in model A's vector space; new queries would use model B's vector space. Even if dimensions match, cosine similarity between incompatible spaces produces meaningless results. If dimensions differ, ChromaDB rejects the insert outright. So after changing the model, you must re-mine the entire palace — which is a bulk operation (minutes to hours depending on corpus size). Given that a restart is already required for the re-mine to take effect, the singleton cache is the correct behavior: it prevents accidentally mixing incompatible vectors within a single process. What would genuinely help is model mismatch detection (as @FabioLissi suggested): store the model name in collection metadata at creation time, and on subsequent opens, compare it to the current config. On mismatch → clear error with recovery instructions. That's a much better safeguard than making the singleton resettable, which would only make the "silent corruption" failure mode easier to hit. Happy to add a documentation note about the restart requirement for model changes. |
|
@FabioLissi Great suggestion — this is exactly the kind of safeguard that should ship alongside configurable embedding models. Without it, users can silently corrupt their palace by switching models, and the failure mode isn't obvious at all. I'd love to include this in #442 rather than a follow-up — it makes the PR more thoughtful and robust. If you're up for writing it, go ahead! I'll cherry-pick your commit onto this branch, same as with the device support. Thanks for flagging this and for the continued contributions! |
|
Three commits: 6a23c8f: Chunk size env vars 7aa96cc: Embedding model mismatch detection 94412e8 - Re-mine command The recovery path for intentional model switches. Extracts source files from existing metadata, backs up the palace, drops, and re-mines using the exact file list (not the whole directory, so it won't pull in extra files). Supports --dry-run. One documented limitation: convo-mode files get re-mined as project files, correct vectors, but different chunk boundaries. Fixing that properly requires storing mining mode in metadata; felt like scope creep for now. Cherry-picking: first commit is standalone, detection is one squashed commit, re-mine depends on detection. Happy to answer questions or make any adjustments! |
…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model
e1ce332 to
826e3bc
Compare
|
@FabioLissi Thanks for the excellent contributions! I've cherry-picked all three commits into this PR:
During integration I found and fixed two issues:
Full test cycle passed: mine → search → model switch (mismatch detected) → force bypass (dimension error, expected) → re-mine → search with new model. Everything works. Rebased on latest upstream/main, all clean. |
|
@NickShtefan — great catch on the 1.
|
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.
826e3bc to
f18187b
Compare
…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.
|
@FabioLissi cherry-picked your MCP fix (e64cdd8) — thanks. PR rebased on latest upstream/main, all 604 tests pass. Ready for review. |
d5cb9ec to
5c2c573
Compare
…ression Builds on top of MemPalace#442 to add two improvements: 1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key. This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series). 2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder), so users don't have to know the model name to get GPU acceleration, and existing palaces remain vector-compatible. 3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to return None, which newer ChromaDB rejects with 'You must provide an embedding function' at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly, restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again. All 552 tests pass, including 7 new tests covering: - embedding_device property reads from env var and config.json - device is passed through to SentenceTransformerEmbeddingFunction when set - device alone activates the default model - device is NOT passed when unset (preserves original MemPalace#442 call signature) - device can be set via config.json independent of model
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising EmbeddingModelMismatchError past the generic except clause. The MCP path had the same bug via two additional swallows: - mcp_server.py:_get_collection() wraps _palace_get_collection in except Exception: return None, so every tool using _get_collection (status, list_wings, list_rooms, get_taxonomy, check_duplicate, search, etc.) silently showed "No palace found" instead of the actual mismatch error. - The top-level dispatcher in handle_request() catches Exception and returns hardcoded "Internal tool error", so even if the wrapper re-raises, the actionable message (stored vs current model + recovery instructions) never reaches the caller. Fix in three spots: 1. searcher.py: re-raise EmbeddingModelMismatchError in search() and search_memories() before the generic except (matches @NickShtefan's fix). 2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError before the generic swallow. 3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError explicitly and return the message as a JSON result so MCP clients see the real error and recovery instructions.
5c2c573 to
e249b45
Compare
|
@igorls — agreed, and done. Rewrote the PR to eliminate all embedding-related env vars. What changedThe embedding model is now bound to the palace at init time via ChromaDB collection metadata — not through environment variables. Setting or changing the model is only possible through explicit CLI commands: Removed env vars
Where config lives nowCollection metadata is the single source of truth: {
"embedding_model": "intfloat/multilingual-e5-base",
"chunk_size": 450,
"chunk_overlap": 50
}Set once at Device auto-detection
No configuration needed. The Legacy migrationExisting palaces without an What's preserved from community contributionsThis PR incorporates work from @FabioLissi:
The core ideas remain — init-time binding just moves the config source from env vars to collection metadata, which is where the model was already being tracked. BenchmarkTested end-to-end on 247 Russian blog posts with
Tests: 646 passed, 0 failed (35 errors are onnxruntime platform issues on macOS x86_64, not code-related). Rebased on latest |
6742244 to
37d6177
Compare
|
Hey @NickShtefan We would like to include for the next release once the plugin based storage layer is added, please wait to rebase |
Embedding model is bound to the palace at init time via collection metadata and changeable only through re-mine. Removes all embedding- related environment variables in favor of explicit CLI flags and auto-detection. Features: - `mempalace init --model <name>` binds embedding model at creation - `mempalace re-mine --model <new>` migrates to a different model - `mempalace init --chunk-size N` configures chunk size per palace - Auto-detect device: cuda > mps (arm64 only) > cpu - Embedding model mismatch detection with clear error messages - Legacy palace auto-migration (stamps "chromadb-default") - Backend seam abstraction for storage layer - MCP ping health checks Removed env vars: - MEMPALACE_EMBEDDING_MODEL (use --model flag) - MEMPALACE_EMBEDDING_DEVICE (auto-detected) - MEMPALACE_CHUNK_SIZE / MEMPALACE_CHUNK_OVERLAP (use --chunk-size) - MEMPALACE_FORCE_EMBEDDING (use --force flag) Benchmarked on 247 Russian blog posts (intfloat/multilingual-e5-base): 758 drawers, 49ms avg search, 0.719 avg similarity, 100% high-relevance Co-authored-by: Fabio Lissi <fabio.lissi@gmail.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lection After PR MemPalace#442 moved the embedding model from MEMPALACE_EMBEDDING_MODEL env var to collection metadata, mcp_server._get_collection() stopped passing embedding_function to ChromaDB. This made the MCP server silently fall back to ChromaDB's built-in default (all-MiniLM-L6-v2, dim 384), regardless of what model the palace was actually built with. For palaces created via `mempalace init --model intfloat/multilingual-e5-base` (the headline use case of MemPalace#442), this means inserts and queries through MCP would attempt 384-dim embeddings against 768-dim stored vectors and fail with a dimension mismatch. searcher.py + cli.py work correctly because they go through palace.get_collection() — only the MCP path was broken. This commit makes _get_collection mirror palace.get_collection's behaviour: - Read embedding_model from collection metadata via read_collection_metadata - Instantiate matching embedding function via get_embedding_function - Pass that function explicitly to ChromaDB - Stamp embedding_model into metadata for legacy palaces (matches the legacy migration path in palace.get_collection) Local-only commit for now; will be evaluated for inclusion into PR MemPalace#442 or a follow-up upstream PR.
37d6177 to
8e4319c
Compare
|
Quick context on why I came back to this now: I wanted to update my personal install to the latest Everything verified against my live palace (15K+ drawers, Following @igorls's earlier note that this should land "once the plugin based storage layer is added" — happy to wait for #1196 (SQLiteVec backend) before merging if that's still the plan, or merge sooner if the storage layer timing has shifted. Whatever works for you. |
|
Following over from #390 where @NickShtefan tied the threads together — a few thoughts on direction. The bind-chunk-size-to-collection-metadata property is the load-bearing piece. The reason is the silent-drift failure mode: change Composition with #1024. Different layers; both wanted:
If maintainer direction is to wait for #1196 (SQLiteVec backend) per @igorls's earlier note, that's reasonable — the plugin-storage layer changes the metadata-binding shape and rebasing now risks a third rebase later. The intermediate state where users have #1024 (config override) but not #442 (collection-bound) is acceptable as long as the gap is documented in CHANGELOG: "changing The (For context: the fork has #1024-equivalent in production but not #442-equivalent; the silent-drift failure mode you describe hasn't bitten us yet only because we haven't changed embedding model. So I'm reading #442 as the durable fix, #1024 as the bridge.) |
What does this PR do?
Adds configurable multilingual embedding model support with init-time binding and production-tested infrastructure for safe model switching. The embedding model becomes a property of the palace itself (stamped in collection metadata at
init), not an environment variable that can drift between processes.Closes #903 — bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim). The fix routes
mcp_server._get_collection()through the same path the rest of the codebase already uses, so MCP queries always use the embedding function the palace was built with.This PR is the embedding infrastructure layer that #488 (multilingual classification by @EndeavorYen) builds on top of. Agreed coordination: merge #442 first, then rebase #488 on top.
Related parallel work: #912 (@OmkarKirpan) targets the same
_get_collectionmismatch bug with a narrower scope. This PR covers that fix plus configurable model name, init-time binding, mismatch detection,re-minecommand, and MCP error propagation. See cross-linked discussion for coordination.Changes
Init-time embedding model binding (replaces env vars):
mempalace init <dir> --model intfloat/multilingual-e5-base— model is stamped in ChromaDB collection metadata at palace creationmempalace re-mine --model <new>— backup → drop → re-mine with a different modelmempalace init --chunk-size N --chunk-overlap M— chunk size is per-palace, no env varcuda>mps(arm64 only) >cpu(noMEMPALACE_EMBEDDING_DEVICEneeded)MEMPALACE_EMBEDDING_MODEL,MEMPALACE_EMBEDDING_DEVICE,MEMPALACE_CHUNK_SIZE,MEMPALACE_CHUNK_OVERLAP,MEMPALACE_FORCE_EMBEDDINGMCP fix (closes #903):
mcp_server._get_collection()now readsembedding_modelfrom collection metadata and resolves the matchingembedding_function— previously it silently fell back to ChromaDB'sall-MiniLM-L6-v2regardless of the palace's actual modelEmbeddingModelMismatchErrorpropagated through_get_collection()andhandle_request()(not swallowed by genericexcept)searcher.pypaths were already correct viapalace.get_collection()— only the MCP path was broken; this commit mirrors that behaviour for MCPOther:
mempalace/backends/chroma.py) for pluggable storage layer (RFC 001 §10)embedding_modelmetadata are stamped aschromadb-defaultEmbeddingModelMismatchErrorwith clear recovery instructionspyproject.toml—[multilingual]optional dependencytests/test_multilingual.py(28 tests),tests/test_remine.py(re-mine flow)Backward compatible: existing palaces continue working with their original model (auto-stamped as
chromadb-default); no re-mining required.Production data (Russian): search scores 0.19→0.77 with
intfloat/multilingual-e5-baseon the same 247-blog-post corpus. Note:paraphrase-multilingual-MiniLM-L12-v2(proposed in #231) has a 128-token window that truncates chunks and degrades quality —e5-base(512 tokens) is the recommended default for multilingual.How to test
Checklist
1336 passed, 0 failedon rebased branch (~6 min, full suite excluding benchmarks)ruff check .)tool_diary_readreturns full resultsContributors
_get_collectionfixCloses #903
Refs #390 (chunk size), #231 (multilingual config), #488 (multilingual classification builds on top), #912 (parallel narrower fix for same bug)