This document describes Hugind server config parsing as implemented in
src/core/config/loader.rs and src/core/config/server.rs.
Config files are written by hugind config init to ~/.hugind/configs/<name>.yml.
servermodelcontextmultimodalsamplinglorafitquantizeadvanced
The base template is in src/resources/config.yml.
host(default0.0.0.0)port(default8080)api_key(optional bearer token)max_slots(defaults tocontext.seq_max)system_prompt(default"You are a helpful assistant.")system_prompt_file(optional path; errors if unreadable)embeddings(boolish; defaultfalse)session_home(optional path; defaults to~/.hugind/sessions)unified_memory_mode(boolish; defaultfalse)verbose(boolish; defaultfalse)enable_thinking_default(boolish; defaultfalse)thinking_budget_tokens(optional u32; null = no cap)
path(model.ggufpath)name(optional public model name)mmproj_path(optional vision projector path)- Model params:
gpu_layers,split_mode,main_gpu,tensor_split,use_mmap,use_mlock, etc.
Notes:
gpu_layersalso accepts aliasn_gpu_layers.- Relative paths resolved relative to the config file directory.
~in paths is expanded to home directory.
size(n_ctx)batch_size(n_batch)ubatch_size(n_ubatch)seq_max(n_seq_max)threads/threads_batch- Attention/rope/pooling options
- KV cache settings (
cache_type_k,cache_type_v,offload_kqv, etc.)
Notes:
flash_attention: trueauto-setsflash_attn_typetoon.- For vision models, low
batch_sizeis auto-raised to8192.
enabled(defaultfalse)target_mib(per-device memory targets in MiB)min_ctx(minimum context size when fitting)
When enabled, the engine adjusts context size at startup to fit available memory.
multimodal, sampling, lora, quantize, advanced map directly to
structs in src/core/config/server.rs. See src/resources/config.yml for
the full key set with defaults.
For boolish fields, Hugind accepts booleans and common strings:
- true:
true,on,yes,enabled,1 - false:
false,off,no,disabled,0
Unrecognized values produce a warning.
hugind config init detects hardware and auto-configures:
- GPU layers (99 for NVIDIA/Apple Silicon, 0 for CPU-only)
- Flash attention (enabled for NVIDIA)
- KV offload (enabled when GPU available)
- Thread count (optimized for CPU architecture)
- Unified memory mode (enabled for Apple Silicon)