forked from TheTom/llama-cpp-turboquant
-
Notifications
You must be signed in to change notification settings - Fork 22
Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat(wasm): add tools/wasm/ Emscripten entrypoint for browser-resident inference
AMD ZenDNN
Apple Metal
Ascend NPU
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Hexagon
IBM zDNN
jinja parser
model
nix
Nvidia GPU
OpenCL
OpenVINO
python
script
server
SYCL
testing
Vulkan
WebGPU
#15
opened May 17, 2026 by
wordingone
Loading…
fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error
ggml
#12
opened May 13, 2026 by
sujitvasanth
Loading…
feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone
examples
ggml
server
#8
opened May 11, 2026 by
sujitvasanth
Loading…
Enhance CUDA flash attention kernel selection for DKQ=512 with low gq…
ggml
Nvidia GPU
#6
opened May 8, 2026 by
Ooooze
Loading…
Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86
documentation
Improvements or additions to documentation
#5
opened May 8, 2026 by
jameseiten
•
Draft
ProTip!
no:milestone will show everything without a milestone.