add rocm-7.2.1-pr21344 toolbox (gfx1151 MMQ/MMVQ tile + nwarp tuning)

Adds a new toolbox variant based on PR #21344 (pedapudi/llama.cpp@gfx1151-opt)
which tunes MMQ tile sizes (x_max=48, y=64) and warp counts (nwarps=4) for
RDNA3_5 gfx1151, yielding up to +100% prefill throughput at small batch sizes.

Also adds BMI2/FMA/F16C CPU SIMD flags and GGML_CUDA_FA_ALL_QUANTS=ON to match
the benchmark build used in the PR. Wire up CI (build matrix + prune), the
refresh script, and run_benchmarks.sh so results land alongside rocm-7.2.1.
This commit is contained in:
Donato Capitella
2026-04-15 09:23:58 +01:00
parent 14fae26ad0
commit 2c2c36d3da
6 changed files with 154 additions and 2 deletions
+1
View File
@@ -9,6 +9,7 @@ TOOLBOXES["llama-vulkan-amdvlk"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulka
TOOLBOXES["llama-vulkan-radv"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv --device /dev/dri --group-add video --security-opt seccomp=unconfined"
TOOLBOXES["llama-rocm-6.4.4"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
TOOLBOXES["llama-rocm-7.2.1"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
TOOLBOXES["llama-rocm-7.2.1-pr21344"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1-pr21344 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
TOOLBOXES["llama-rocm7-nightlies"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm7-nightlies --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
function usage() {