This document covers environment variables and server configuration for mistral.rs.
Runtime Environment Variables
Variable
Description
MISTRALRS_DEBUG=1
Enable debug mode: outputs tensor info files for GGUF/GGML models, increases logging verbosity
MISTRALRS_NO_MMAP=1
Disable memory-mapped file loading, forcing all tensor data into memory
MISTRALRS_NO_MLA=1
Disable MLA (Multi-head Latent Attention) optimization for DeepSeek V2/V3 and GLM-4.7-Flash
MISTRALRS_ISQ_SINGLETHREAD=1
Force ISQ (In-Situ Quantization) to run single-threaded
MISTRALRS_IGPU_MEMORY_FRACTION
Memory fraction for integrated/unified-memory CUDA GPUs (e.g. NVIDIA Grace Blackwell, Jetson). Float between 0.0 and 1.0, default: 0.75
MCP_CONFIG_PATH
Fallback path for MCP client configuration (used if --mcp-config not provided)
KEEP_ALIVE_INTERVAL
SSE keep-alive interval in milliseconds (default: 10000)
HF_HUB_CACHE
Override Hugging Face Hub cache directory
Build-Time Environment Variables
Variable
Description
MISTRALRS_METAL_PRECOMPILE=0
Skip Metal kernel precompilation (useful for CI)
NVCC_CCBIN
Set CUDA compiler path
CUDA_NVCC_FLAGS=-fPIE
Required on some Linux distributions
CUDA_COMPUTE_CAP
Override CUDA compute capability (e.g., "80" for RTX 3090)
When running the HTTP server with mistralrs serve, these defaults apply:
Setting
Default Value
Server IP
0.0.0.0 (all interfaces)
Max request body
50 MB
Max running sequences
32
Prefix cache count
16
SSE keep-alive
10 seconds
PagedAttention (CUDA)
Enabled
PagedAttention (Metal)
Disabled
PA GPU memory usage
90% of free memory
PA block size
32 tokens
Multi-Node Distributed Configuration
For multi-node setups, configure the head node and workers using environment variables.
Variable
Description
MISTRALRS_MN_GLOBAL_WORLD_SIZE
Total number of devices across all nodes
MISTRALRS_MN_HEAD_NUM_WORKERS
Number of worker nodes
MISTRALRS_MN_HEAD_PORT
Port for head node communication
Variable
Description
MISTRALRS_MN_WORKER_SERVER_ADDR
Address of head server to connect to
MISTRALRS_MN_WORKER_ID
This worker's ID
MISTRALRS_MN_LOCAL_WORLD_SIZE
Number of GPUs on this node
MISTRALRS_NO_NCCL=1
Disable NCCL (use alternative backend)