Problem
▎
▎ MemPalace uses ChromaDB with the default embedding function (all-MiniLM-L6-v2 from sentence-transformers). This model is trained primarily on English text and produces near-random vectors for non-English
▎ languages (Russian, Chinese, Arabic, etc.).
▎
▎ As a result, semantic search queries in Russian return irrelevant results with very low similarity scores (0.04–0.18), making the tool practically unusable for non-English projects or users who store
▎ memories in their native language.
▎
▎ Steps to reproduce
▎ 1. Mine a project that contains Russian-language text, or store a drawer with Russian content via mempalace_add_drawer
▎ 2. Search for it in Russian: mempalace search "запрос на русском"
▎ 3. Results are unrelated, similarity scores < 0.2
▎
▎ Expected behavior
▎
▎ Search should work across languages, or at minimum there should be a way to configure a multilingual embedding model.
▎
▎ Suggested fix
▎
▎ Allow configuring the embedding model in ~/.mempalace/config.json, defaulting to a multilingual model such as paraphrase-multilingual-mpnet-base-v2 (supports 50+ languages, similar size to all-MiniLM-L6-v2).
▎ Alternatively, use multilingual-e5-small for a lighter option.
▎
▎ {
▎ "embedding_model": "paraphrase-multilingual-mpnet-base-v2"
▎ }
Problem
▎
▎ MemPalace uses ChromaDB with the default embedding function (all-MiniLM-L6-v2 from sentence-transformers). This model is trained primarily on English text and produces near-random vectors for non-English
▎ languages (Russian, Chinese, Arabic, etc.).
▎
▎ As a result, semantic search queries in Russian return irrelevant results with very low similarity scores (0.04–0.18), making the tool practically unusable for non-English projects or users who store
▎ memories in their native language.
▎
▎ Steps to reproduce
▎ 1. Mine a project that contains Russian-language text, or store a drawer with Russian content via mempalace_add_drawer
▎ 2. Search for it in Russian: mempalace search "запрос на русском"
▎ 3. Results are unrelated, similarity scores < 0.2
▎
▎ Expected behavior
▎
▎ Search should work across languages, or at minimum there should be a way to configure a multilingual embedding model.
▎
▎ Suggested fix
▎
▎ Allow configuring the embedding model in ~/.mempalace/config.json, defaulting to a multilingual model such as paraphrase-multilingual-mpnet-base-v2 (supports 50+ languages, similar size to all-MiniLM-L6-v2).
▎ Alternatively, use multilingual-e5-small for a lighter option.
▎
▎ {
▎ "embedding_model": "paraphrase-multilingual-mpnet-base-v2"
▎ }