diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 8679479..4927f03 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -39,35 +39,49 @@ All scripts expect models in the `models/` directory (absolute path is recommend ### Prompt Processing (pp512) — tokens/second -| Model | Vulkan Radv | Vulkan Amdvlk | Rocm6 4 2 | Rocm7 Beta | Rocm7 Rc | Winner | -|---|---|---|---|---|---|---| -| **gemma-3-12b-it-UD-Q8_K_XL** | 508.55 ± 0.90 | 683.07 ± 1.03 | 223.36 ± 0.23 | 222.95 ± 0.15 | 222.99 ± 0.24 | 🏆 **vulkan_amdvlk** (+34%) | -| **gemma-3-27b-it-BF16** | 135.40 ± 0.29 | ⚠️ Load Error | 88.73 ± 0.50 | 82.31 ± 0.29 | 83.18 ± 0.41 | 🏆 **vulkan_radv** (+53%) | -| **Kimi-Dev-72B-UD-Q8_K_XL** | 76.48 ± 0.23 | ⚠️ Load Error | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **vulkan_radv** | -| **Llama-3.3-70B-Instruct-UD-Q8_K_XL** | 79.71 ± 0.13 | 96.23 ± 0.16 | 33.17 ± 0.07 | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **vulkan_amdvlk** (+21%) | -| **Llama-4-Scout-17B-16E-Instruct-Q6_K** | 137.97 ± 0.99 | 243.19 ± 1.20 | 121.52 ± 0.98 | ⚠️ GPU Hang | 135.36 ± 0.39 | 🏆 **vulkan_amdvlk** (+76%) | -| **Llama-4-Scout-17B-16E-Instruct-Q8_0** | 145.86 ± 2.44 | 238.93 ± 2.89 | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **vulkan_amdvlk** (+64%) | -| **Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL** | 133.49 ± 1.83 | 208.84 ± 1.35 | 132.66 ± 0.56 | 133.71 ± 0.64 | ⚠️ Runtime Error | 🏆 **vulkan_amdvlk** (+56%) | -| **llama3.3-70.6B-Q4_K_M** | 79.12 ± 0.14 | 72.75 ± 0.03 | 33.89 ± 0.03 | 33.91 ± 0.04 | 33.82 ± 0.05 | 🏆 **vulkan_radv** (+9%) | -| **Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL** | 58.40 ± 0.21 | 99.94 ± 0.91 | 69.48 ± 0.09 | ⚠️ GPU Hang | 74.69 ± 0.17 | 🏆 **vulkan_amdvlk** (+34%) | -| **Qwen3-30B-A3B-BF16** | 71.16 ± 0.92 | 90.91 ± 0.35 | 157.74 ± 2.65 | 151.25 ± 3.33 | 154.95 ± 1.58 | 🏆 **rocm6_4_2** (+2%) | -| **Qwen3-Coder-30B-A3B-Instruct-BF16** | 71.53 ± 1.06 | 90.38 ± 0.57 | 150.53 ± 1.83 | 147.31 ± 2.22 | 144.59 ± 3.08 | 🏆 **rocm6_4_2** (+2%) | +| Model | Host | Rocm6 4 2 | Rocm7 Beta | Rocm7 Rc | Vulkan Amdvlk | Vulkan Radv | Winner | +|---|---|---|---|---|---|---|---| +| **gemma-3-12b-it-UD-Q8_K_XL** | — | 223.36 ± 0.23 | 222.95 ± 0.15 | 222.99 ± 0.24 | 683.07 ± 1.03 | 508.55 ± 0.90 | 🏆 **vulkan_amdvlk** (+34%) | +| **gemma-3-27b-it-BF16** | — | 88.73 ± 0.50 | 82.31 ± 0.29 | 83.18 ± 0.41 | ⚠️ Load Error | 135.40 ± 0.29 | 🏆 **vulkan_radv** (+53%) | +| **gemma-3-4b-it-Q3_K_S** | — | 729.02 ± 0.82 | 729.93 ± 1.29 | 728.63 ± 1.23 | 1616.55 ± 4.61 | 1520.07 ± 5.39 | 🏆 **vulkan_amdvlk** (+6%) | +| **GLM-4.5-Air-UD-Q4_K_XL** | — | ⚠️ Runtime Error | ⚠️ GPU Hang | 129.20 ± 0.38 | 199.54 ± 0.38 | 128.00 ± 0.23 | 🏆 **vulkan_amdvlk** (+54%) | +| **GLM-4.5-Air-UD-Q6_K_XL** | — | 124.86 ± 0.54 | ⚠️ GPU Hang | ⚠️ Runtime Error | 221.02 ± 0.58 | 126.86 ± 0.40 | 🏆 **vulkan_amdvlk** (+74%) | +| **gpt-oss-120b-F16** | — | ⚠️ GPU Hang | 357.68 ± 1.49 | 355.47 ± 0.55 | 449.22 ± 1.12 | 230.32 ± 0.72 | 🏆 **vulkan_amdvlk** (+26%) | +| **gpt-oss-120b-mxfp4** | — | 352.53 ± 1.06 | ⚠️ GPU Hang | 351.08 ± 0.86 | 485.98 ± 2.23 | 239.16 ± 1.26 | 🏆 **vulkan_amdvlk** (+38%) | +| **gpt-oss-20b-F32** | — | 323.64 ± 4.29 | 324.15 ± 3.76 | 324.27 ± 5.39 | 369.86 ± 1.57 | 318.82 ± 1.63 | 🏆 **vulkan_amdvlk** (+14%) | +| **gpt-oss-20b-mxfp4** | — | 580.67 ± 2.03 | 584.04 ± 2.48 | 584.15 ± 2.11 | 1206.08 ± 8.80 | 646.77 ± 4.63 | 🏆 **vulkan_amdvlk** (+86%) | +| **Kimi-Dev-72B-UD-Q8_K_XL** | — | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | ⚠️ Load Error | 76.48 ± 0.23 | 🏆 **vulkan_radv** | +| **Llama-3.3-70B-Instruct-UD-Q8_K_XL** | — | 33.17 ± 0.07 | ⚠️ GPU Hang | ⚠️ Runtime Error | 96.23 ± 0.16 | 79.71 ± 0.13 | 🏆 **vulkan_amdvlk** (+21%) | +| **Llama-4-Scout-17B-16E-Instruct-Q6_K** | — | 121.52 ± 0.98 | ⚠️ GPU Hang | 135.36 ± 0.39 | 243.19 ± 1.20 | 137.97 ± 0.99 | 🏆 **vulkan_amdvlk** (+76%) | +| **Llama-4-Scout-17B-16E-Instruct-Q8_0** | — | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 238.93 ± 2.89 | 145.86 ± 2.44 | 🏆 **vulkan_amdvlk** (+64%) | +| **Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL** | — | 132.66 ± 0.56 | 133.71 ± 0.64 | ⚠️ Runtime Error | 208.84 ± 1.35 | 133.49 ± 1.83 | 🏆 **vulkan_amdvlk** (+56%) | +| **llama3.3-70.6B-Q4_K_M** | — | 33.89 ± 0.03 | 33.91 ± 0.04 | 33.82 ± 0.05 | 72.75 ± 0.03 | 79.12 ± 0.14 | 🏆 **vulkan_radv** (+9%) | +| **Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL** | — | 69.48 ± 0.09 | ⚠️ GPU Hang | 74.69 ± 0.17 | 99.94 ± 0.91 | 58.40 ± 0.21 | 🏆 **vulkan_amdvlk** (+34%) | +| **Qwen3-30B-A3B-BF16** | — | 157.74 ± 2.65 | 151.25 ± 3.33 | 154.95 ± 1.58 | 90.91 ± 0.35 | 71.16 ± 0.92 | 🏆 **rocm6_4_2** (+2%) | +| **Qwen3-Coder-30B-A3B-Instruct-BF16** | — | 150.53 ± 1.83 | 147.31 ± 2.22 | 144.59 ± 3.08 | 90.38 ± 0.57 | 71.53 ± 1.06 | 🏆 **rocm6_4_2** (+2%) | ### Text Generation (tg128) — tokens/second -| Model | Vulkan Radv | Vulkan Amdvlk | Rocm6 4 2 | Rocm7 Beta | Rocm7 Rc | Winner | -|---|---|---|---|---|---|---| -| **gemma-3-12b-it-UD-Q8_K_XL** | 13.65 ± 0.02 | 13.84 ± 0.02 | 13.81 ± 0.00 | 13.80 ± 0.00 | 13.81 ± 0.00 | 🏆 **vulkan_amdvlk** (+0%) | -| **gemma-3-27b-it-BF16** | 3.98 ± 0.00 | ⚠️ Load Error | 4.02 ± 0.00 | 3.99 ± 0.01 | 3.99 ± 0.00 | 🏆 **rocm6_4_2** (+1%) | -| **Kimi-Dev-72B-UD-Q8_K_XL** | 2.65 ± 0.00 | ⚠️ Load Error | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **vulkan_radv** | -| **Llama-3.3-70B-Instruct-UD-Q8_K_XL** | 2.72 ± 0.00 | 2.72 ± 0.00 | 2.72 ± 0.00 | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **rocm6_4_2** (+0%) | -| **Llama-4-Scout-17B-16E-Instruct-Q6_K** | 15.07 ± 0.05 | 15.28 ± 0.03 | 14.28 ± 0.00 | ⚠️ GPU Hang | 14.29 ± 0.00 | 🏆 **vulkan_amdvlk** (+1%) | -| **Llama-4-Scout-17B-16E-Instruct-Q8_0** | 12.27 ± 0.00 | 12.25 ± 0.01 | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 🏆 **vulkan_radv** (+0%) | -| **Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL** | 19.99 ± 0.01 | 20.06 ± 0.01 | 17.29 ± 0.00 | 17.35 ± 0.00 | ⚠️ Runtime Error | 🏆 **vulkan_amdvlk** (+0%) | -| **llama3.3-70.6B-Q4_K_M** | 4.97 ± 0.00 | 5.01 ± 0.00 | 4.59 ± 0.00 | 4.60 ± 0.00 | 4.52 ± 0.00 | 🏆 **vulkan_amdvlk** (+1%) | -| **Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL** | 16.29 ± 0.01 | 15.72 ± 0.01 | 13.54 ± 0.01 | ⚠️ GPU Hang | 13.56 ± 0.00 | 🏆 **vulkan_radv** (+4%) | -| **Qwen3-30B-A3B-BF16** | 7.33 ± 0.00 | 7.96 ± 0.03 | 22.88 ± 0.01 | 23.80 ± 0.09 | 23.08 ± 0.08 | 🏆 **rocm7_beta** (+3%) | -| **Qwen3-Coder-30B-A3B-Instruct-BF16** | 7.34 ± 0.01 | 8.00 ± 0.03 | 22.13 ± 0.00 | 24.12 ± 0.06 | 23.48 ± 0.01 | 🏆 **rocm7_beta** (+3%) | +| Model | Host | Rocm6 4 2 | Rocm7 Beta | Rocm7 Rc | Vulkan Amdvlk | Vulkan Radv | Winner | +|---|---|---|---|---|---|---|---| +| **gemma-3-12b-it-UD-Q8_K_XL** | — | 13.81 ± 0.00 | 13.80 ± 0.00 | 13.81 ± 0.00 | 13.84 ± 0.02 | 13.65 ± 0.02 | 🏆 **vulkan_amdvlk** (+0%) | +| **gemma-3-27b-it-BF16** | — | 4.02 ± 0.00 | 3.99 ± 0.01 | 3.99 ± 0.00 | ⚠️ Load Error | 3.98 ± 0.00 | 🏆 **rocm6_4_2** (+1%) | +| **gemma-3-4b-it-Q3_K_S** | — | 76.04 ± 0.03 | 76.52 ± 0.03 | 75.59 ± 0.03 | 83.89 ± 0.22 | 85.93 ± 0.09 | 🏆 **vulkan_radv** (+2%) | +| **GLM-4.5-Air-UD-Q4_K_XL** | — | ⚠️ Runtime Error | ⚠️ GPU Hang | 19.61 ± 0.00 | 22.75 ± 0.01 | 22.88 ± 0.02 | 🏆 **vulkan_radv** (+1%) | +| **GLM-4.5-Air-UD-Q6_K_XL** | — | 15.27 ± 0.00 | ⚠️ GPU Hang | ⚠️ Runtime Error | 16.47 ± 0.01 | 16.76 ± 0.00 | 🏆 **vulkan_radv** (+2%) | +| **gpt-oss-120b-F16** | — | ⚠️ GPU Hang | 33.70 ± 0.01 | 33.65 ± 0.00 | 33.49 ± 0.05 | 33.06 ± 0.02 | 🏆 **rocm7_beta** (+0%) | +| **gpt-oss-120b-mxfp4** | — | 43.56 ± 0.00 | ⚠️ GPU Hang | 44.63 ± 0.03 | 48.09 ± 0.04 | 48.93 ± 0.06 | 🏆 **vulkan_radv** (+2%) | +| **gpt-oss-20b-F32** | — | 26.64 ± 0.06 | 26.90 ± 0.00 | 26.86 ± 0.00 | 8.59 ± 0.01 | 7.77 ± 0.01 | 🏆 **rocm7_beta** (+0%) | +| **gpt-oss-20b-mxfp4** | — | 64.26 ± 0.01 | 64.37 ± 0.01 | 64.38 ± 0.01 | 68.90 ± 0.18 | 69.82 ± 0.03 | 🏆 **vulkan_radv** (+1%) | +| **Kimi-Dev-72B-UD-Q8_K_XL** | — | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | ⚠️ Load Error | 2.65 ± 0.00 | 🏆 **vulkan_radv** | +| **Llama-3.3-70B-Instruct-UD-Q8_K_XL** | — | 2.72 ± 0.00 | ⚠️ GPU Hang | ⚠️ Runtime Error | 2.72 ± 0.00 | 2.72 ± 0.00 | 🏆 **rocm6_4_2** (+0%) | +| **Llama-4-Scout-17B-16E-Instruct-Q6_K** | — | 14.28 ± 0.00 | ⚠️ GPU Hang | 14.29 ± 0.00 | 15.28 ± 0.03 | 15.07 ± 0.05 | 🏆 **vulkan_amdvlk** (+1%) | +| **Llama-4-Scout-17B-16E-Instruct-Q8_0** | — | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | 12.25 ± 0.01 | 12.27 ± 0.00 | 🏆 **vulkan_radv** (+0%) | +| **Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL** | — | 17.29 ± 0.00 | 17.35 ± 0.00 | ⚠️ Runtime Error | 20.06 ± 0.01 | 19.99 ± 0.01 | 🏆 **vulkan_amdvlk** (+0%) | +| **llama3.3-70.6B-Q4_K_M** | — | 4.59 ± 0.00 | 4.60 ± 0.00 | 4.52 ± 0.00 | 5.01 ± 0.00 | 4.97 ± 0.00 | 🏆 **vulkan_amdvlk** (+1%) | +| **Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL** | — | 13.54 ± 0.01 | ⚠️ GPU Hang | 13.56 ± 0.00 | 15.72 ± 0.01 | 16.29 ± 0.01 | 🏆 **vulkan_radv** (+4%) | +| **Qwen3-30B-A3B-BF16** | — | 22.88 ± 0.01 | 23.80 ± 0.09 | 23.08 ± 0.08 | 7.96 ± 0.03 | 7.33 ± 0.00 | 🏆 **rocm7_beta** (+3%) | +| **Qwen3-Coder-30B-A3B-Instruct-BF16** | — | 22.13 ± 0.00 | 24.12 ± 0.06 | 23.48 ± 0.01 | 8.00 ± 0.03 | 7.34 ± 0.01 | 🏆 **rocm7_beta** (+3%) | ##### Error Legend