add rocm-7.2.1-pr21344 toolbox (gfx1151 MMQ/MMVQ tile + nwarp tuning)
Adds a new toolbox variant based on PR #21344 (pedapudi/llama.cpp@gfx1151-opt) which tunes MMQ tile sizes (x_max=48, y=64) and warp counts (nwarps=4) for RDNA3_5 gfx1151, yielding up to +100% prefill throughput at small batch sizes. Also adds BMI2/FMA/F16C CPU SIMD flags and GGML_CUDA_FA_ALL_QUANTS=ON to match the benchmark build used in the PR. Wire up CI (build matrix + prune), the refresh script, and run_benchmarks.sh so results land alongside rocm-7.2.1.
This commit is contained in:
@@ -63,6 +63,7 @@ echo
|
||||
declare -A CMDS=(
|
||||
[rocm6_4_4]="toolbox run -c llama-rocm-6.4.4 -- /usr/local/bin/llama-bench"
|
||||
[rocm-7_2_1]="toolbox run -c llama-rocm-7.2.1 -- /usr/local/bin/llama-bench"
|
||||
[rocm-7_2_1-pr21344]="toolbox run -c llama-rocm-7.2.1-pr21344 -- /usr/local/bin/llama-bench"
|
||||
[rocm7-nightlies]="toolbox run -c llama-rocm7-nightlies -- /usr/local/bin/llama-bench"
|
||||
[vulkan_amdvlk]="toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench"
|
||||
[vulkan_radv]="toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench"
|
||||
|
||||
Reference in New Issue
Block a user