-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: pass ctx (not &ctx) to HAP_power_set in ggml-hexagon HMX power-up calls
#23367
opened May 19, 2026 by
mvanhorn
Loading…
fix: guard cpy_scalar_transpose against strided dst in ggml-cuda cpy dispatch
#23366
opened May 19, 2026 by
mvanhorn
Loading…
Add trace (level 5) to -lv help text and emit 'T' prefix for trace log lines
#23365
opened May 19, 2026 by
mvanhorn
Loading…
fix: restore inter-iteration __syncthreads in ggml-cuda ssm_scan_f32 to prevent smemB/smemC race
#23364
opened May 19, 2026 by
mvanhorn
Loading…
Align openvino and cann Dockerfiles with other backends by adding GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS flags
#23363
opened May 19, 2026 by
mvanhorn
Loading…
Use password input type for WebUI API key field to prevent browser autofill in clear text
#23362
opened May 19, 2026 by
mvanhorn
Loading…
ggml: fix SCHED_DEBUG output when logging with timestamps
#23360
opened May 19, 2026 by
yiding
Loading…
ggml-cpu: Optimized Arm NEON cpu q1_0 dot (with plain/DP/I8MM)
#23358
opened May 19, 2026 by
pl752
Contributor
Loading…
ui: fix stop/continue during an agentic loop
#23356
opened May 19, 2026 by
ServeurpersoCom
Contributor
Loading…
Add keepalive messages to HTTP streaming connections
#23355
opened May 19, 2026 by
komadori82
•
Draft
ngram-map : use quality-weighted scoring and EMA acceptance tracking
#23350
opened May 19, 2026 by
neerajdad123-byte
Loading…
1 task done
Support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
python
python script changes
testing
Everything test related
#23346
opened May 19, 2026 by
fairydreaming
Collaborator
Loading…
vocab : add domyn-small Metaspace BPE pre-tokenizer type
#23343
opened May 19, 2026 by
martin-cimmino
Loading…
server : add --slot-context for per-slot context size configuration
examples
server
#23340
opened May 19, 2026 by
AlexRednic
•
Draft
CUDA: add STQ1_0 dequantization kernel
examples
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
#23332
opened May 19, 2026 by
antrc2
Loading…
ui: Refactor
isMobile as reactive value in viewport store
examples
server/ui
#23330
opened May 19, 2026 by
allozaur
Contributor
Loading…
hunyuan-vl : merge HunyuanOCR into HunyuanVL and fix OCR vision precision
examples
model
Model specific
python
python script changes
server
#23329
opened May 19, 2026 by
wendadawen
Loading…
opencl: refactor backend initilization
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
#23318
opened May 19, 2026 by
lhez
Contributor
Loading…
OpenCL: OP_GATED_DELTA_NET
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
#23312
opened May 19, 2026 by
ymcki
Contributor
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:master.