Web search tool in mistral.rs

mistral.rs is compatible with OpenAI's web_search_options parameter! Once enabled, this allows web searching for models.

This works with all models that support tool calling. The following models have been tested:

Hermes 3 3b/8b
Mistral 3 24b
Llama 4 Scout/Maverick
Qwen 3

Web search is supported both in streaming and completion responses! This makes it easy to integrate and test out in interactive mode!

Besides tool calling and parsing of web content, we also use an embedding model to select the most relevant search results.

You can use the web search tool in all the APIs: Python, Rust, and server.

Selecting a search embedding model

Internally, we now use google/embeddinggemma-300m to embed documents for ranking. You can pick from the built-in reranker variants (currently just embedding_gemma) in every API:

Rust: with_search(SearchEmbeddingModel::EmbeddingGemma300M) in the builder
Python: search_embedding_model="embedding_gemma" in the Runner
Server: --search-embedding-model embedding_gemma flag

Specifying a custom search callback

By default, mistral.rs uses a DuckDuckGo-based search callback. To override this, you can provide your own search function:

Rust: use .with_search_callback(...) on the model builder with an Arc<dyn Fn(&SearchFunctionParameters) -> anyhow::Result<Vec<SearchResult>> + Send + Sync>.
Python: pass the search_callback keyword argument to Runner, which should be a function def search_callback(query: str) -> List[Dict[str, str]] returning a list of results with keys "title", "description", "url", and "content".

Example in Python:

def search_callback(query: str) -> list[dict[str, str]]:
    # Implement your custom search logic here, returning a list of result dicts
    return [
        {
            "title": "Example Result",
            "description": "An example description",
            "url": "https://example.com",
            "content": "Full text content of the page",
        },
        # more results...
    ]

from mistralrs import Runner, Which, Architecture
runner = Runner(
    which=Which.Plain(model_id="YourModel/ID", arch=Architecture.Mistral),
    enable_search=True,
    search_callback=search_callback,
)

HTTP server

Be sure to add --enable-search!

Here are some examples using various models. Note that this works for both streaming and completion requests, so interactive mode is featured here!

mistralrs run --enable-search --isq 4 -m Qwen/Qwen3-4B

mistralrs serve --enable-search -p 1234 --isq 4 --jinja-explicit chat_templates/mistral_small_tool_call.jinja -m mistralai/Mistral-Small-3.1-24B-Instruct-2503

mistralrs run --enable-search --isq 4 -m NousResearch/Hermes-3-Llama-3.1-8B

from openai import OpenAI

client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")

messages = [
    {
        "role": "user",
        "content": "Can you show me some code using mistral.rs for running Llama 3.2 Vision?",
    }
]

completion = client.chat.completions.create(
    model="default",
    messages=messages,
    tool_choice="auto",
    max_tokens=1024,
    web_search_options={},
)

# print(completion.usage)
print(completion.choices[0].message.content)

if completion.choices[0].message.tool_calls is not None:
    # Should never happen.
    tool_called = completion.choices[0].message.tool_calls[0].function
    print(tool_called)

Python SDK

from mistralrs import (
    Runner,
    Which,
    ChatCompletionRequest,
    Architecture,
    WebSearchOptions,
)

# Define a custom search callback if desired
def my_search_callback(query: str) -> list[dict[str, str]]:
    # Fetch or compute search results here
    return [
        {
            "title": "Mistral.rs GitHub",
            "description": "Official mistral.rs repository",
            "url": "https://github.com/EricLBuehler/mistral.rs",
            "content": "mistral.rs is a Rust binding for Mistral models...",
        },
    ]

runner = Runner(
    which=Which.Plain(
        model_id="NousResearch/Hermes-3-Llama-3.1-8B",
        arch=Architecture.Llama,
    ),
    enable_search=True,
    search_callback=my_search_callback,
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="default",
        messages=[
            {
                "role": "user",
                "content": "Can you show me some code using mistral.rs for running Llama 3.2 Vision?",
            }
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
        web_search_options=WebSearchOptions(
            search_context_size=None, user_location=None
        ),
    )
)
print(res.choices[0].message.content)
print(res.usage)

Rust SDK

use anyhow::Result;
use mistralrs::{
    SearchEmbeddingModel, IsqType, RequestBuilder, TextMessageRole, TextMessages, TextModelBuilder,
    WebSearchOptions,
};

#[tokio::main]
async fn main() -> Result<()> {
    let model = TextModelBuilder::new("NousResearch/Hermes-3-Llama-3.1-8B")
        .with_isq(IsqType::Q4K)
        .with_logging()
        .with_search(SearchEmbeddingModel::default())
        .build()
        .await?;

    let messages = TextMessages::new().add_message(
        TextMessageRole::User,
        "What is the weather forecast for Boston?",
    );
    let messages =
        RequestBuilder::from(messages).with_web_search_options(WebSearchOptions::default());

    let response = model.send_chat_request(messages).await?;

    println!("What is the weather forecast for Boston?\n\n");
    println!("{}", response.choices[0].message.content.as_ref().unwrap());
    dbg!(
        response.usage.avg_prompt_tok_per_sec,
        response.usage.avg_compl_tok_per_sec
    );

    Ok(())
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web search tool in mistral.rs

Selecting a search embedding model

Specifying a custom search callback

HTTP server

Python SDK

Rust SDK

See Also

FilesExpand file tree

WEB_SEARCH.md

Latest commit

History

WEB_SEARCH.md

File metadata and controls

Web search tool in mistral.rs

Selecting a search embedding model

Specifying a custom search callback

HTTP server

Python SDK

Rust SDK

See Also