This repository was archived by the owner on Mar 21, 2026. It is now read-only.
v0.9.3
Highlights
- server: add support for flash attention v2
- server: add support for llamav2
Features
- launcher: add debug logs
- server: rework the quantization to support all models
Full Changelog: v0.9.2...v0.9.3