mistral.rs supports video input for compatible multimodal models. Videos are decoded into frames and passed through the model's vision encoder alongside any image or audio inputs.
Supported models: Gemma 4
Non-GIF video formats require the FFmpeg binary to be available on your PATH. FFmpeg is used to decode video files into individual frames for processing.
GIF files are decoded natively using the image crate and do not require FFmpeg.
Linux (Debian/Ubuntu):
sudo apt install ffmpegLinux (Fedora/RHEL):
sudo dnf install ffmpegmacOS:
brew install ffmpegWindows:
Download from https://ffmpeg.org/download.html and add the binary to your PATH.
Docker:
RUN apt-get update && apt-get install -y ffmpegVideos are uniformly sampled to 32 frames by default. The sampled frames are evenly spaced across the full duration of the video, preserving temporal coverage regardless of the original frame rate or video length.
Any format that FFmpeg can decode is supported, including:
- mp4
- avi
- mov
- mkv
- webm
- m4v
- gif (decoded natively without FFmpeg)
Video inputs use the video_url content type in the OpenAI-compatible chat completion API:
{
"type": "video_url",
"video_url": {
"url": "path/to/video.mp4"
}
}The url field accepts either a local file path or a URL.
Use --video with -i for one-shot video queries from the command line:
# Describe a local video
mistralrs run -m google/gemma-4-E4B-it --video clip.mp4 -i "What happens in this video?"
# Use a URL
mistralrs run -m google/gemma-4-E4B-it --video https://example.com/video.mp4 -i "Summarize this video"
# Multiple videos
mistralrs run -m google/gemma-4-E4B-it --video clip1.mp4 --video clip2.mp4 -i "Compare these two videos"Or use video files directly in interactive mode by including the path in your prompt:
> What happens in this video? clip.mp4
from openai import OpenAI
client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")
completion = client.chat.completions.create(
model="default",
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "path/to/video.mp4"
},
},
{
"type": "text",
"text": "What happens in this video?",
},
],
}
],
max_tokens=256,
)
print(completion.choices[0].message.content)See GEMMA4.md for full examples across all APIs.