Title: Search does not work for non-English languages (Russian, etc.) — default embedding model is English-only

Problem                                                                                                                                                                                                        
  ▎                                               
  ▎ MemPalace uses ChromaDB with the default embedding function (all-MiniLM-L6-v2 from sentence-transformers). This model is trained primarily on English text and produces near-random vectors for non-English    
  ▎ languages (Russian, Chinese, Arabic, etc.).                                              
  ▎                                                                                                                                                                                                                
  ▎ As a result, semantic search queries in Russian return irrelevant results with very low similarity scores (0.04–0.18), making the tool practically unusable for non-English projects or users who store        
  ▎ memories in their native language.     
  ▎                                                                                                                                                                                                                
  ▎ Steps to reproduce                                                                       
  ▎ 1. Mine a project that contains Russian-language text, or store a drawer with Russian content via mempalace_add_drawer
  ▎ 2. Search for it in Russian: mempalace search "запрос на русском"                                                                                                                                              
  ▎ 3. Results are unrelated, similarity scores < 0.2                                                                                                                                                              
  ▎                                                                                                                                                                                                                
  ▎ Expected behavior                                                                                                                                                                                              
  ▎                                                                                                                                                                                                                
  ▎ Search should work across languages, or at minimum there should be a way to configure a multilingual embedding model.                                                                                          
  ▎                                               
  ▎ Suggested fix                                                                                                                                                                                                  
  ▎                                                                                          
  ▎ Allow configuring the embedding model in ~/.mempalace/config.json, defaulting to a multilingual model such as paraphrase-multilingual-mpnet-base-v2 (supports 50+ languages, similar size to all-MiniLM-L6-v2).
  ▎  Alternatively, use multilingual-e5-small for a lighter option.
  ▎                                                                                                                                                                                                                
  ▎ {                                                                                        
  ▎   "embedding_model": "paraphrase-multilingual-mpnet-base-v2"
  ▎ }                                          

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Title: Search does not work for non-English languages (Russian, etc.) — default embedding model is English-only #712

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Title: Search does not work for non-English languages (Russian, etc.) — default embedding model is English-only #712

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions