Skip to content

Multilingual support: French (and 100+ languages) via Ollama BGE-M3 #92

@Motokiyo

Description

@Motokiyo

Problem

MemPalace v3.0.0 uses ChromaDB's default embedding model (all-MiniLM-L6-v2), which is primarily trained on English text. When used with French content (or other non-English languages), search results are irrelevant — negative similarity scores, wrong documents returned.

Solution

We created a patch that replaces the default embedding with BGE-M3 via Ollama, supporting 100+ languages while keeping everything local (no cloud dependency).

Changes:

  • New ollama_embedding.py — ChromaDB-compatible EmbeddingFunction using Ollama's /api/embed endpoint
  • Patched miner.py and searcher.py to use BGE-M3 when Ollama is available, with graceful fallback to MiniLM-L6

Results on our French corpus (1500 files):

Query: "détection de chute reachy care" Before (MiniLM) After (BGE-M3)
Top result store.js (JS in-app purchases — zero relevance) PROJET_ARISTOTE_VISION.md (Reachy Care stack — perfect match)
Score -0.428 (negative = no match) +0.101 (positive = relevant)

Patch

Available at: https://github.com/Motokiyo/mempalace-multilingual

One-line install:

git clone https://github.com/Motokiyo/mempalace-multilingual && cd mempalace-multilingual && ./install.sh

Suggestion for upstream

Consider making the embedding function configurable in mempalace.yaml:

embedding:
  provider: ollama  # or "default" for MiniLM
  model: bge-m3
  base_url: http://localhost:11434

This would let users choose their embedding model without patching source files. Happy to submit a PR if you're interested.

— Alexandre Ferran / EIFFEL AI

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/i18nMultilingual, Unicode, non-English embeddingsarea/searchSearch and retrievalenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions