From a9618d881b187672421405c24d167ddaae8dbda1 Mon Sep 17 00:00:00 2001 From: Donato Capitella Date: Sun, 10 Aug 2025 13:21:06 +0100 Subject: [PATCH] - Corrected typo in WMMA (was spelt wrong as waam) - Included rocm-7rc-rocwmma toolbox - Included updated results from benchmarks including rocm 7rc with ROMWMMA and hipBLASLt --- .github/workflows/build_and_publish.yml | 2 +- README.md | 67 +- benchmark/generate_results.json.py | 10 +- ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 2 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 8 +- ...r-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log | 6 +- ...Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 2 +- ...-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log | 8 +- ...4_K_XL-00001-of-00002__rocm7_beta__fa1.log | 8 +- ...K_XL-00001-of-00002__rocm7_beta__hblt0.log | 10 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 6 + ..._K_XL-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + ...ir-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log | 6 +- ...-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log | 6 +- ...4_K_XL-00001-of-00002__rocm7_rc__hblt0.log | 10 + ...L-00001-of-00002__rocm7_rc__hblt0__fa1.log | 10 + ...-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...UD-Q4_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 4 +- ...K_XL-00001-of-00003__rocm6_4_2-rocwmma.log | 6 +- ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 2 +- ...r-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log | 4 +- ...Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log | 8 +- ...-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log | 6 +- ...6_K_XL-00001-of-00003__rocm7_beta__fa1.log | 8 +- ...K_XL-00001-of-00003__rocm7_beta__hblt0.log | 10 + ...00001-of-00003__rocm7_beta__hblt0__fa1.log | 6 + ..._K_XL-00001-of-00003__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00003__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00003__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00003__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...ir-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log | 6 +- ...-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log | 7 +- ...6_K_XL-00001-of-00003__rocm7_rc__hblt0.log | 5 + ...L-00001-of-00003__rocm7_rc__hblt0__fa1.log | 5 + ...-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00003__vulkan_amdvlk__fa1.log | 6 +- ...UD-Q6_K_XL-00001-of-00003__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00003__vulkan_radv__fa1.log | 6 +- ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 2 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 8 +- ...B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log | 2 +- ...Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 2 +- ...-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log | 8 +- ...8_K_XL-00001-of-00002__rocm7_beta__fa1.log | 2 +- ...K_XL-00001-of-00002__rocm7_beta__hblt0.log | 6 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 6 + ..._K_XL-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...2B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log | 6 +- ...-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log | 7 +- ...8_K_XL-00001-of-00002__rocm7_rc__hblt0.log | 5 + ...L-00001-of-00002__rocm7_rc__hblt0__fa1.log | 5 + ...UD-Q8_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 2 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 2 +- ...t-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log | 4 +- ...Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 6 +- ...-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log | 8 +- ...8_K_XL-00001-of-00002__rocm7_beta__fa1.log | 2 +- ...K_XL-00001-of-00002__rocm7_beta__hblt0.log | 6 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 6 + ..._K_XL-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...ct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log | 7 +- ...-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log | 1 + ...8_K_XL-00001-of-00002__rocm7_rc__hblt0.log | 5 + ...L-00001-of-00002__rocm7_rc__hblt0__fa1.log | 10 + ...-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...UD-Q8_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log | 8 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 2 +- ...nstruct-Q6_K-00001-of-00002__rocm6_4_2.log | 6 +- ...ct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log | 2 +- ...struct-Q6_K-00001-of-00002__rocm7_beta.log | 2 +- ...t-Q6_K-00001-of-00002__rocm7_beta__fa1.log | 2 +- ...Q6_K-00001-of-00002__rocm7_beta__hblt0.log | 6 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 10 + ...-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...Instruct-Q6_K-00001-of-00002__rocm7_rc.log | 4 +- ...uct-Q6_K-00001-of-00002__rocm7_rc__fa1.log | 7 +- ...t-Q6_K-00001-of-00002__rocm7_rc__hblt0.log | 10 + ...K-00001-of-00002__rocm7_rc__hblt0__fa1.log | 5 + ...uct-Q6_K-00001-of-00002__vulkan_amdvlk.log | 6 +- ...6_K-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...truct-Q6_K-00001-of-00002__vulkan_radv.log | 6 +- ...-Q6_K-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log | 8 +- ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 2 +- ...nstruct-Q8_0-00001-of-00003__rocm6_4_2.log | 6 +- ...ct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log | 2 +- ...struct-Q8_0-00001-of-00003__rocm7_beta.log | 8 +- ...t-Q8_0-00001-of-00003__rocm7_beta__fa1.log | 2 +- ...Q8_0-00001-of-00003__rocm7_beta__hblt0.log | 6 + ...00001-of-00003__rocm7_beta__hblt0__fa1.log | 6 + ...-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00003__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00003__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00003__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...Instruct-Q8_0-00001-of-00003__rocm7_rc.log | 7 +- ...t-Q8_0-00001-of-00003__rocm7_rc__hblt0.log | 5 + ...0-00001-of-00003__rocm7_rc__hblt0__fa1.log | 5 + ...uct-Q8_0-00001-of-00003__vulkan_amdvlk.log | 6 +- ...8_0-00001-of-00003__vulkan_amdvlk__fa1.log | 6 +- ...truct-Q8_0-00001-of-00003__vulkan_radv.log | 6 +- ...-Q8_0-00001-of-00003__vulkan_radv__fa1.log | 6 +- ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 8 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 2 +- ...t-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log | 6 +- ...Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 8 +- ...-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log | 6 +- ...4_K_XL-00001-of-00002__rocm7_beta__fa1.log | 6 +- ...K_XL-00001-of-00002__rocm7_beta__hblt0.log | 6 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 6 + ..._K_XL-00001-of-00002__rocm7_rc-rocwmma.log | 5 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...ct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log | 7 +- ...-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log | 7 +- ...4_K_XL-00001-of-00002__rocm7_rc__hblt0.log | 5 + ...L-00001-of-00002__rocm7_rc__hblt0__fa1.log | 5 + ...-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...UD-Q4_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...K_XL-00001-of-00003__rocm6_4_2-rocwmma.log | 2 +- ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 2 +- ...7-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log | 6 +- ...Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log | 8 +- ...-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log | 2 +- ...3_K_XL-00001-of-00003__rocm7_beta__fa1.log | 3 +- ...K_XL-00001-of-00003__rocm7_beta__hblt0.log | 6 + ...00001-of-00003__rocm7_beta__hblt0__fa1.log | 6 + ...K_XL-00001-of-00003__rocm7_rc-rocwmma.log} | 6 +- ...-00001-of-00003__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00003__rocm7_rc-rocwmma__hblt0.log | 5 + ...of-00003__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...07-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log | 7 +- ...-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log | 7 +- ...3_K_XL-00001-of-00003__rocm7_rc__hblt0.log | 5 + ...L-00001-of-00003__rocm7_rc__hblt0__fa1.log | 5 + ...-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00003__vulkan_amdvlk__fa1.log | 6 +- ...UD-Q3_K_XL-00001-of-00003__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00003__vulkan_radv__fa1.log | 6 +- ...BF16-00001-of-00002__rocm6_4_2-rocwmma.log | 6 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 +- ...30B-A3B-BF16-00001-of-00002__rocm6_4_2.log | 6 +- ...3B-BF16-00001-of-00002__rocm6_4_2__fa1.log | 6 +- ...0B-A3B-BF16-00001-of-00002__rocm7_beta.log | 6 +- ...B-BF16-00001-of-00002__rocm7_beta__fa1.log | 8 +- ...BF16-00001-of-00002__rocm7_beta__hblt0.log | 10 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 10 + ...-BF16-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + ...-30B-A3B-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...A3B-BF16-00001-of-00002__rocm7_rc__fa1.log | 7 +- ...B-BF16-00001-of-00002__rocm7_rc__hblt0.log | 10 + ...6-00001-of-00002__rocm7_rc__hblt0__fa1.log | 10 + ...A3B-BF16-00001-of-00002__vulkan_amdvlk.log | 6 +- ...F16-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...B-A3B-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...uct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log | 6 +- ...507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log | 6 +- ...3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log | 6 +- ...struct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log | 6 +- ...B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log | 6 +- ...truct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log | 6 +- ...uct-2507-UD-Q6_K_XL__rocm7_beta__hblt0.log | 10 + ...507-UD-Q6_K_XL__rocm7_beta__hblt0__fa1.log | 10 + ...ruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log | 10 + ...2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log | 10 + ...07-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0.log | 10 + ...-Q6_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + ...A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log | 6 +- ...nstruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log | 6 +- ...truct-2507-UD-Q6_K_XL__rocm7_rc__hblt0.log | 10 + ...-2507-UD-Q6_K_XL__rocm7_rc__hblt0__fa1.log | 10 + ...nstruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log | 6 +- ...ct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log | 6 +- ...-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log | 6 +- ...ruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log | 6 +- ...BF16-00001-of-00002__rocm6_4_2-rocwmma.log | 6 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 +- ...nstruct-BF16-00001-of-00002__rocm6_4_2.log | 6 +- ...ct-BF16-00001-of-00002__rocm6_4_2__fa1.log | 8 +- ...struct-BF16-00001-of-00002__rocm7_beta.log | 6 +- ...t-BF16-00001-of-00002__rocm7_beta__fa1.log | 2 +- ...BF16-00001-of-00002__rocm7_beta__hblt0.log | 10 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 10 + ...-BF16-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + ...Instruct-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...uct-BF16-00001-of-00002__rocm7_rc__fa1.log | 7 +- ...t-BF16-00001-of-00002__rocm7_rc__hblt0.log | 10 + ...6-00001-of-00002__rocm7_rc__hblt0__fa1.log | 10 + ...uct-BF16-00001-of-00002__vulkan_amdvlk.log | 6 +- ...F16-00001-of-00002__vulkan_amdvlk__fa1.log | 6 +- ...truct-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log | 6 +- ...-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log | 6 +- .../gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log | 6 +- ...ma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log | 6 +- .../gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log | 6 +- ...a-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log | 4 +- ...3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0.log | 10 + ...-it-UD-Q8_K_XL__rocm7_beta__hblt0__fa1.log | 10 + ...-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log | 10 + ...b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log | 10 + ...it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0.log | 10 + ...-Q8_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log | 6 +- ...mma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log | 4 +- ...a-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0.log | 10 + ...2b-it-UD-Q8_K_XL__rocm7_rc__hblt0__fa1.log | 10 + ...mma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log | 6 +- ...-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log | 6 +- ...gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log | 6 +- ...-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log | 6 +- ...BF16-00001-of-00002__rocm6_4_2-rocwmma.log | 6 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 +- ...-27b-it-BF16-00001-of-00002__rocm6_4_2.log | 8 +- ...it-BF16-00001-of-00002__rocm6_4_2__fa1.log | 4 +- ...27b-it-BF16-00001-of-00002__rocm7_beta.log | 4 +- ...t-BF16-00001-of-00002__rocm7_beta__fa1.log | 6 +- ...BF16-00001-of-00002__rocm7_beta__hblt0.log | 10 + ...00001-of-00002__rocm7_beta__hblt0__fa1.log | 10 + ...-BF16-00001-of-00002__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00002__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00002__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00002__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + ...3-27b-it-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...-it-BF16-00001-of-00002__rocm7_rc__fa1.log | 4 +- ...t-BF16-00001-of-00002__rocm7_rc__hblt0.log | 10 + ...6-00001-of-00002__rocm7_rc__hblt0__fa1.log | 5 + ...7b-it-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 6 +- ...emma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log | 6 +- ...3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm6_4_2.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_beta.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log | 6 +- ...emma-3-4b-it-Q3_K_S__rocm7_beta__hblt0.log | 10 + ...3-4b-it-Q3_K_S__rocm7_beta__hblt0__fa1.log | 10 + ...gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log | 10 + ...-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log | 10 + ...-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0.log | 10 + ...t-Q3_K_S__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__rocm7_rc.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0.log | 10 + ...a-3-4b-it-Q3_K_S__rocm7_rc__hblt0__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log | 6 +- ...mma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log | 6 +- .../gemma-3-4b-it-Q3_K_S__vulkan_radv.log | 6 +- ...gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log | 6 +- .../gpt-oss-120b-F16__rocm6_4_2-rocwmma.log | 6 +- ...t-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log | 8 +- .../results/gpt-oss-120b-F16__rocm6_4_2.log | 6 +- .../gpt-oss-120b-F16__rocm6_4_2__fa1.log | 6 +- .../results/gpt-oss-120b-F16__rocm7_beta.log | 6 +- .../gpt-oss-120b-F16__rocm7_beta__fa1.log | 6 +- .../gpt-oss-120b-F16__rocm7_beta__hblt0.log | 10 + ...t-oss-120b-F16__rocm7_beta__hblt0__fa1.log | 10 + .../gpt-oss-120b-F16__rocm7_rc-rocwmma.log | 10 + ...pt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log | 10 + ...-oss-120b-F16__rocm7_rc-rocwmma__hblt0.log | 10 + ...120b-F16__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../results/gpt-oss-120b-F16__rocm7_rc.log | 6 +- .../gpt-oss-120b-F16__rocm7_rc__fa1.log | 6 +- .../gpt-oss-120b-F16__rocm7_rc__hblt0.log | 10 + ...gpt-oss-120b-F16__rocm7_rc__hblt0__fa1.log | 10 + .../gpt-oss-120b-F16__vulkan_amdvlk.log | 6 +- .../gpt-oss-120b-F16__vulkan_amdvlk__fa1.log | 6 +- .../results/gpt-oss-120b-F16__vulkan_radv.log | 6 +- .../gpt-oss-120b-F16__vulkan_radv__fa1.log | 6 +- ...xfp4-00001-of-00003__rocm6_4_2-rocwmma.log | 6 +- ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 8 +- ...s-120b-mxfp4-00001-of-00003__rocm6_4_2.log | 8 +- ...b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log | 6 +- ...-120b-mxfp4-00001-of-00003__rocm7_beta.log | 6 +- ...-mxfp4-00001-of-00003__rocm7_beta__fa1.log | 6 +- ...xfp4-00001-of-00003__rocm7_beta__hblt0.log | 6 + ...00001-of-00003__rocm7_beta__hblt0__fa1.log | 10 + ...mxfp4-00001-of-00003__rocm7_rc-rocwmma.log | 10 + ...-00001-of-00003__rocm7_rc-rocwmma__fa1.log | 10 + ...0001-of-00003__rocm7_rc-rocwmma__hblt0.log | 10 + ...of-00003__rocm7_rc-rocwmma__hblt0__fa1.log | 5 + ...ss-120b-mxfp4-00001-of-00003__rocm7_rc.log | 6 +- ...0b-mxfp4-00001-of-00003__rocm7_rc__fa1.log | 7 +- ...-mxfp4-00001-of-00003__rocm7_rc__hblt0.log | 10 + ...4-00001-of-00003__rocm7_rc__hblt0__fa1.log | 10 + ...0b-mxfp4-00001-of-00003__vulkan_amdvlk.log | 6 +- ...fp4-00001-of-00003__vulkan_amdvlk__fa1.log | 6 +- ...120b-mxfp4-00001-of-00003__vulkan_radv.log | 6 +- ...mxfp4-00001-of-00003__vulkan_radv__fa1.log | 6 +- .../gpt-oss-20b-F32__rocm6_4_2-rocwmma.log | 6 +- ...pt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log | 6 +- .../results/gpt-oss-20b-F32__rocm6_4_2.log | 6 +- .../gpt-oss-20b-F32__rocm6_4_2__fa1.log | 6 +- .../results/gpt-oss-20b-F32__rocm7_beta.log | 6 +- .../gpt-oss-20b-F32__rocm7_beta__fa1.log | 6 +- .../gpt-oss-20b-F32__rocm7_beta__hblt0.log | 10 + ...pt-oss-20b-F32__rocm7_beta__hblt0__fa1.log | 10 + .../gpt-oss-20b-F32__rocm7_rc-rocwmma.log | 10 + ...gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log | 10 + ...t-oss-20b-F32__rocm7_rc-rocwmma__hblt0.log | 10 + ...-20b-F32__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../results/gpt-oss-20b-F32__rocm7_rc.log | 6 +- .../gpt-oss-20b-F32__rocm7_rc__fa1.log | 6 +- .../gpt-oss-20b-F32__rocm7_rc__hblt0.log | 10 + .../gpt-oss-20b-F32__rocm7_rc__hblt0__fa1.log | 10 + .../gpt-oss-20b-F32__vulkan_amdvlk.log | 6 +- .../gpt-oss-20b-F32__vulkan_amdvlk__fa1.log | 6 +- .../results/gpt-oss-20b-F32__vulkan_radv.log | 6 +- .../gpt-oss-20b-F32__vulkan_radv__fa1.log | 6 +- .../gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log | 6 +- ...-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log | 6 +- .../results/gpt-oss-20b-mxfp4__rocm6_4_2.log | 6 +- .../gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log | 6 +- .../results/gpt-oss-20b-mxfp4__rocm7_beta.log | 4 +- .../gpt-oss-20b-mxfp4__rocm7_beta__fa1.log | 6 +- .../gpt-oss-20b-mxfp4__rocm7_beta__hblt0.log | 10 + ...-oss-20b-mxfp4__rocm7_beta__hblt0__fa1.log | 10 + .../gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log | 10 + ...t-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log | 10 + ...oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0.log | 10 + ...0b-mxfp4__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../results/gpt-oss-20b-mxfp4__rocm7_rc.log | 6 +- .../gpt-oss-20b-mxfp4__rocm7_rc__fa1.log | 6 +- .../gpt-oss-20b-mxfp4__rocm7_rc__hblt0.log | 10 + ...pt-oss-20b-mxfp4__rocm7_rc__hblt0__fa1.log | 10 + .../gpt-oss-20b-mxfp4__vulkan_amdvlk.log | 6 +- .../gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log | 6 +- .../gpt-oss-20b-mxfp4__vulkan_radv.log | 6 +- .../gpt-oss-20b-mxfp4__vulkan_radv__fa1.log | 6 +- ...ama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log | 8 +- ...3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log | 8 +- .../llama3.3-70.6B-Q4_K_M__rocm6_4_2.log | 6 +- .../llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log | 6 +- .../llama3.3-70.6B-Q4_K_M__rocm7_beta.log | 4 +- ...llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log | 4 +- ...ma3.3-70.6B-Q4_K_M__rocm7_beta__hblt0.log} | 4 +- ...3-70.6B-Q4_K_M__rocm7_beta__hblt0__fa1.log | 10 + ...lama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log | 10 + ....3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log | 10 + ...-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0.log | 10 + ...B-Q4_K_M__rocm7_rc-rocwmma__hblt0__fa1.log | 10 + .../llama3.3-70.6B-Q4_K_M__rocm7_rc.log | 6 +- .../llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log | 6 +- ...lama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0.log} | 2 +- ....3-70.6B-Q4_K_M__rocm7_rc__hblt0__fa1.log} | 2 +- .../llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log | 6 +- ...ma3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log | 6 +- .../llama3.3-70.6B-Q4_K_M__vulkan_radv.log | 6 +- ...lama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log | 6 +- benchmark/results/run_benchmarks.log | 283 +- ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 + ...r-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log | 10 + ...4_K_XL-00001-of-00002__rocm6_4_2__fa1.log} | 4 +- ...-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log | 2 +- ...4_K_XL-00001-of-00002__rocm7_beta__fa1.log | 6 + ...ir-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log | 6 +- ...-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log | 10 + ...-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...UD-Q4_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 8 + ...K_XL-00001-of-00003__rocm6_4_2-rocwmma.log | 10 + ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 6 + ...r-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log | 6 +- ...Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log | 6 + ...-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log | 10 + ...6_K_XL-00001-of-00003__rocm7_beta__fa1.log | 6 + ...ir-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log | 10 + ...-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log | 5 + ...-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00003__vulkan_amdvlk__fa1.log | 8 + ...UD-Q6_K_XL-00001-of-00003__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00003__vulkan_radv__fa1.log | 8 + ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 + ...B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log | 2 +- ...Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 6 + ...-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log | 2 +- ...8_K_XL-00001-of-00002__rocm7_beta__fa1.log | 6 + ...2B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log | 10 + ...-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log | 10 + ...-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log | 2 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...UD-Q8_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 8 + ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 + ...t-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log | 6 +- ...Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 10 + ...-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log | 2 +- ...8_K_XL-00001-of-00002__rocm7_beta__fa1.log | 6 + ...ct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log | 0 ...-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log | 5 + ...-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...UD-Q8_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 8 + ...Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 + ...nstruct-Q6_K-00001-of-00002__rocm6_4_2.log | 6 +- ...ct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log | 6 + ...struct-Q6_K-00001-of-00002__rocm7_beta.log | 2 +- ...t-Q6_K-00001-of-00002__rocm7_beta__fa1.log | 6 + ...Instruct-Q6_K-00001-of-00002__rocm7_rc.log | 6 +- ...uct-Q6_K-00001-of-00002__rocm7_rc__fa1.log | 5 + ...uct-Q6_K-00001-of-00002__vulkan_amdvlk.log | 6 +- ...6_K-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...truct-Q6_K-00001-of-00002__vulkan_radv.log | 6 +- ...-Q6_K-00001-of-00002__vulkan_radv__fa1.log | 8 + ...Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 6 + ...nstruct-Q8_0-00001-of-00003__rocm6_4_2.log | 10 + ...t-Q8_0-00001-of-00003__rocm6_4_2__fa1.log} | 4 +- ...struct-Q8_0-00001-of-00003__rocm7_beta.log | 2 +- ...t-Q8_0-00001-of-00003__rocm7_beta__fa1.log | 6 + ...Instruct-Q8_0-00001-of-00003__rocm7_rc.log | 0 ...uct-Q8_0-00001-of-00003__rocm7_rc__fa1.log | 5 + ...uct-Q8_0-00001-of-00003__vulkan_amdvlk.log | 6 +- ...8_0-00001-of-00003__vulkan_amdvlk__fa1.log | 8 + ...truct-Q8_0-00001-of-00003__vulkan_radv.log | 6 +- ...-Q8_0-00001-of-00003__vulkan_radv__fa1.log | 8 + ...K_XL-00001-of-00002__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 6 + ...t-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log | 6 +- ...Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log | 6 + ...-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log | 6 +- ...4_K_XL-00001-of-00002__rocm7_beta__fa1.log | 10 + ...ct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log | 0 ...-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log | 10 + ...-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...UD-Q4_K_XL-00001-of-00002__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00002__vulkan_radv__fa1.log | 8 + ...K_XL-00001-of-00003__rocm6_4_2-rocwmma.log | 6 + ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 6 + ...7-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log | 6 +- ...Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log | 6 + ...-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log | 2 +- ...3_K_XL-00001-of-00003__rocm7_beta__fa1.log | 6 + ...07-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log | 5 + ...-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log | 5 + ...-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log | 6 +- ..._XL-00001-of-00003__vulkan_amdvlk__fa1.log | 8 + ...UD-Q3_K_XL-00001-of-00003__vulkan_radv.log | 6 +- ..._K_XL-00001-of-00003__vulkan_radv__fa1.log | 8 + ...BF16-00001-of-00002__rocm6_4_2-rocwmma.log | 10 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 10 + ...30B-A3B-BF16-00001-of-00002__rocm6_4_2.log | 6 +- ...3B-BF16-00001-of-00002__rocm6_4_2__fa1.log | 10 + ...0B-A3B-BF16-00001-of-00002__rocm7_beta.log | 6 +- ...B-BF16-00001-of-00002__rocm7_beta__fa1.log | 10 + ...-30B-A3B-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...A3B-BF16-00001-of-00002__rocm7_rc__fa1.log | 10 + ...A3B-BF16-00001-of-00002__vulkan_amdvlk.log | 6 +- ...F16-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...B-A3B-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 8 + ...uct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log | 10 + ...507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log | 10 + ...3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log | 10 + ...struct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log | 10 + ...B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log | 10 + ...truct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log | 10 + ...A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log | 10 + ...nstruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log | 10 + ...nstruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log | 8 + ...ct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log | 8 + ...-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log | 8 + ...ruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log | 8 + ...BF16-00001-of-00002__rocm6_4_2-rocwmma.log | 10 + ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 10 + ...nstruct-BF16-00001-of-00002__rocm6_4_2.log | 6 +- ...ct-BF16-00001-of-00002__rocm6_4_2__fa1.log | 10 + ...struct-BF16-00001-of-00002__rocm7_beta.log | 6 +- ...t-BF16-00001-of-00002__rocm7_beta__fa1.log | 6 + ...Instruct-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...uct-BF16-00001-of-00002__rocm7_rc__fa1.log | 5 + ...uct-BF16-00001-of-00002__vulkan_amdvlk.log | 6 +- ...F16-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...truct-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 8 + ...3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log | 10 + ...-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log | 10 + .../gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log | 6 +- ...ma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log | 10 + .../gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log | 6 +- ...a-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log | 10 + .../gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log | 6 +- ...mma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log | 10 + ...mma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log | 6 +- ...-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log | 8 + ...gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log | 6 +- ...-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log | 8 + ...F16-00001-of-00002__rocm6_4_2-rocwmma.log} | 6 +- ...00001-of-00002__rocm6_4_2-rocwmma__fa1.log | 10 + ...27b-it-BF16-00001-of-00002__rocm6_4_2.log} | 4 +- ...it-BF16-00001-of-00002__rocm6_4_2__fa1.log | 10 + ...27b-it-BF16-00001-of-00002__rocm7_beta.log | 6 +- ...t-BF16-00001-of-00002__rocm7_beta__fa1.log | 10 + ...3-27b-it-BF16-00001-of-00002__rocm7_rc.log | 6 +- ...-it-BF16-00001-of-00002__rocm7_rc__fa1.log | 10 + ...-it-BF16-00001-of-00002__vulkan_amdvlk.log | 2 +- ...F16-00001-of-00002__vulkan_amdvlk__fa1.log | 8 + ...7b-it-BF16-00001-of-00002__vulkan_radv.log | 6 +- ...-BF16-00001-of-00002__vulkan_radv__fa1.log | 8 + ...emma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log | 10 + ...3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__rocm6_4_2.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__rocm7_beta.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__rocm7_rc.log | 6 +- .../gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log | 10 + .../gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log | 6 +- ...mma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log | 8 + .../gemma-3-4b-it-Q3_K_S__vulkan_radv.log | 6 +- ...gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log | 8 + .../gpt-oss-120b-F16__rocm6_4_2-rocwmma.log | 10 + ...t-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log | 10 + .../gpt-oss-120b-F16__rocm6_4_2.log | 10 + .../gpt-oss-120b-F16__rocm6_4_2__fa1.log | 10 + .../gpt-oss-120b-F16__rocm7_beta.log | 6 +- .../gpt-oss-120b-F16__rocm7_beta__fa1.log | 10 + .../gpt-oss-120b-F16__rocm7_rc.log | 6 +- .../gpt-oss-120b-F16__rocm7_rc__fa1.log | 10 + .../gpt-oss-120b-F16__vulkan_amdvlk.log | 6 +- .../gpt-oss-120b-F16__vulkan_amdvlk__fa1.log | 8 + .../gpt-oss-120b-F16__vulkan_radv.log | 6 +- .../gpt-oss-120b-F16__vulkan_radv__fa1.log | 8 + ...xfp4-00001-of-00003__rocm6_4_2-rocwmma.log | 10 + ...00001-of-00003__rocm6_4_2-rocwmma__fa1.log | 10 + ...s-120b-mxfp4-00001-of-00003__rocm6_4_2.log | 6 +- ...b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log | 10 + ...-120b-mxfp4-00001-of-00003__rocm7_beta.log | 10 + ...-mxfp4-00001-of-00003__rocm7_beta__fa1.log | 10 + ...ss-120b-mxfp4-00001-of-00003__rocm7_rc.log | 6 +- ...b-mxfp4-00001-of-00003__rocm7_rc__fa1.log} | 3 +- ...0b-mxfp4-00001-of-00003__vulkan_amdvlk.log | 6 +- ...fp4-00001-of-00003__vulkan_amdvlk__fa1.log | 8 + ...120b-mxfp4-00001-of-00003__vulkan_radv.log | 6 +- ...mxfp4-00001-of-00003__vulkan_radv__fa1.log | 8 + .../gpt-oss-20b-F32__rocm6_4_2-rocwmma.log | 10 + ...pt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log | 10 + .../gpt-oss-20b-F32__rocm6_4_2.log | 6 +- .../gpt-oss-20b-F32__rocm6_4_2__fa1.log | 10 + .../gpt-oss-20b-F32__rocm7_beta.log | 6 +- .../gpt-oss-20b-F32__rocm7_beta__fa1.log | 10 + .../gpt-oss-20b-F32__rocm7_rc.log | 6 +- .../gpt-oss-20b-F32__rocm7_rc__fa1.log | 10 + .../gpt-oss-20b-F32__vulkan_amdvlk.log | 6 +- .../gpt-oss-20b-F32__vulkan_amdvlk__fa1.log | 8 + .../gpt-oss-20b-F32__vulkan_radv.log | 6 +- .../gpt-oss-20b-F32__vulkan_radv__fa1.log | 8 + .../gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log | 10 + ...-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log | 10 + .../gpt-oss-20b-mxfp4__rocm6_4_2.log | 6 +- .../gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log | 10 + .../gpt-oss-20b-mxfp4__rocm7_beta.log | 6 +- .../gpt-oss-20b-mxfp4__rocm7_beta__fa1.log | 10 + .../gpt-oss-20b-mxfp4__rocm7_rc.log | 6 +- .../gpt-oss-20b-mxfp4__rocm7_rc__fa1.log | 10 + .../gpt-oss-20b-mxfp4__vulkan_amdvlk.log | 6 +- .../gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log | 8 + .../gpt-oss-20b-mxfp4__vulkan_radv.log | 6 +- .../gpt-oss-20b-mxfp4__vulkan_radv__fa1.log | 8 + ...ama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log | 10 + ...3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log | 10 + .../llama3.3-70.6B-Q4_K_M__rocm6_4_2.log | 6 +- .../llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log | 10 + .../llama3.3-70.6B-Q4_K_M__rocm7_beta.log | 6 +- ...llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log | 10 + .../llama3.3-70.6B-Q4_K_M__rocm7_rc.log | 6 +- .../llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log | 10 + .../llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log | 6 +- ...ma3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log | 8 + .../llama3.3-70.6B-Q4_K_M__vulkan_radv.log | 6 +- ...lama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log | 8 + .../results_08-08-2025/run_benchmarks.log | 1153 ++ benchmark/run_benchmarks.log | 1274 +- benchmark/run_benchmarks.sh | 65 +- docs/benchmarks.md | 105 +- docs/index.html | 72 +- docs/results.json | 13820 ++++++++++++---- refresh-toolboxes.sh | 4 +- toolboxes/Dockerfile.rocm-6.4.2 | 7 - ...-rocwaam => Dockerfile.rocm-6.4.2-rocwmma} | 7 - ...rc-rocwaam => Dockerfile.rocm-7rc-rocwmma} | 10 +- toolboxes/apply-rocwmma-fix.sh | 1 + toolboxes/build-rocwaam.sh | 1 + 619 files changed, 16448 insertions(+), 4651 deletions(-) create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0.log create mode 100644 benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log rename benchmark/{results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log => results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma.log} (79%) create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0__fa1.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0.log create mode 100644 benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0__fa1.log rename benchmark/{results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log => results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0.log} (60%) create mode 100644 benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0__fa1.log create mode 100644 benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log create mode 100644 benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log create mode 100644 benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0.log create mode 100644 benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0__fa1.log rename benchmark/{results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log => results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0.log} (73%) rename benchmark/{results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log => results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0__fa1.log} (72%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log rename benchmark/{results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log => results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log} (50%) rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log (81%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log (84%) create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log (81%) create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log (100%) create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log (82%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log rename benchmark/{results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log => results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log} (74%) rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log (80%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log (100%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log (78%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log (78%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log (100%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log (78%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log (82%) create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log create mode 100644 benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log rename benchmark/{results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log => results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log} (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old/gpt-oss-120b-F16__rocm6_4_2.log => results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log} (58%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log (85%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-4b-it-Q3_K_S__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-4b-it-Q3_K_S__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gemma-3-4b-it-Q3_K_S__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-F16__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-F16__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-F16__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-F16__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log (79%) rename benchmark/{results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log => results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log} (60%) rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-F32__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-F32__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-F32__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-F32__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-F32__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-mxfp4__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-mxfp4__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-mxfp4__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-mxfp4__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/gpt-oss-20b-mxfp4__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log rename benchmark/{results_old => results_08-08-2025}/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log (79%) create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log rename benchmark/{results_old => results_08-08-2025}/llama3.3-70.6B-Q4_K_M__rocm7_beta.log (79%) create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log rename benchmark/{results_old => results_08-08-2025}/llama3.3-70.6B-Q4_K_M__rocm7_rc.log (79%) create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log rename benchmark/{results_old => results_08-08-2025}/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log (79%) create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log rename benchmark/{results_old => results_08-08-2025}/llama3.3-70.6B-Q4_K_M__vulkan_radv.log (79%) create mode 100644 benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log create mode 100644 benchmark/results_08-08-2025/run_benchmarks.log rename toolboxes/{Dockerfile.rocm-6.4.2-rocwaam => Dockerfile.rocm-6.4.2-rocwmma} (87%) rename toolboxes/{Dockerfile.rocm-7rc-rocwaam => Dockerfile.rocm-7rc-rocwmma} (96%) diff --git a/.github/workflows/build_and_publish.yml b/.github/workflows/build_and_publish.yml index fd1e3b5..d58bb0d 100644 --- a/.github/workflows/build_and_publish.yml +++ b/.github/workflows/build_and_publish.yml @@ -28,7 +28,7 @@ jobs: IN='${{ inputs.backends }}' if [[ "$IN" == "all" || -z "$IN" ]]; then - JSON='["rocm-6.4.2","rocm-6.4.2-rocwaam", "rocm-7beta","rocm-7rc","rocm-7rc-rocwaam","vulkan-amdvlk","vulkan-radv"]' + JSON='["rocm-6.4.2","rocm-6.4.2-rocwmma", "rocm-7beta","rocm-7rc","rocm-7rc-rocwmma","vulkan-amdvlk","vulkan-radv"]' else # Remove spaces and build JSON array from comma list IN_CLEAN=$(echo "$IN" | tr -d '[:space:]') diff --git a/README.md b/README.md index b99bf4e..128a355 100644 --- a/README.md +++ b/README.md @@ -41,17 +41,19 @@ This project uses [Llama.cpp](https://github.com/ggerganov/llama.cpp), a high-pe ### 1.1 Supported Container Images +You can check the containers on DockerHub: https://hub.docker.com/r/kyuz0/amd-strix-halo-toolboxes/tags. + | Container Tag | Backend/Stack | Purpose / Notes | | -------------------- | ------------------------ | --------------- | | `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. | | `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. | | `rocm-6.4.2` | ROCm 6.4.2 (HIP) | Latest stable ROCm. Great for BF16 models. Occasional crashes possible. | -| `rocm-6.4.2-rocwaam` | ROCm 6.4.2 (HIP) + ROCWMMA | ROCm with ROCWMMA enabled for improved flash attention on RDNA3+/CDNA. | -| `rocm-7beta` | ROCm 7.0 Beta (HIP) | Latest ROCm beta. No real gain for Llama.cpp. Same model limits as 6.4.2. | -| `rocm-7rc` | ROCm 7.0 RC (HIP) | Release candidate for ROCm 7.0. Same behavior as beta. | +| `rocm-6.4.2-rocwmma` | ROCm 6.4.2 (HIP) + ROCWMMA | ROCm with ROCWMMA enabled for improved flash attention on RDNA3+/CDNA. | +| `rocm-7beta` | ROCm 7.0 Beta (HIP) + hipBLASLt* | Latest ROCm beta. No real gain for Llama.cpp. Same model limits as 6.4.2. | +| `rocm-7rc` | ROCm 7.0 RC (HIP) + hipBLASLt* | Release candidate for ROCm 7.0. Same behavior as beta. | +| `rocm-7rc-rocwmma` | ROCm 7.0 RC (HIP) + ROCWMMA + hipBLASLt* | Release candidate for ROCm 7.0, with hipBLASLt and ROCWMMA for improved flash attention on RDNA3+/CDNA | - -You can also check the containers on DockerHub: https://hub.docker.com/r/kyuz0/amd-strix-halo-toolboxes/tags. +* All ROCm 7 toolboxes now export `ROCBLAS_USE_HIPBLASLT=1` as this currently results in better perfromance and stability. > These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated, ensuring you get the latest bug fixes and new model support. The easiest way to update to the newest versions is by running the `refresh-toolboxes.sh` [script below](#211-toolbox-refresh-script-automatic-updates). @@ -146,39 +148,42 @@ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download unsloth/Qwen3-Coder-30B-A3B `HF_HUB_ENABLE_HF_TRANSFER=1` uses a Rust-based package that enables faster download (install from [Pypi](https://pypi.org/project/hf-transfer/)). ## 3. Performance Benchmarks (Key Results) -Got it — here’s the **concise, no-“we”** version, with the table embedded and pointing to deeper analysis. ---- +Benchmarks were run on **AMD Ryzen AI Max “Strix Halo”** across all supported backends, testing both **prompt processing (PP)** and **token generation (TG)** throughput. +Reported values were analysed using error margins (mean ± σ). Backends whose ranges overlapped were treated as statistical ties rather than hard wins. -## 🔍 Key Findings from Benchmarks - -Representative LLMs were tested on **AMD Ryzen AI Max “Strix Halo”** across all supported backends, using identical model builds in [Llama.cpp](https://github.com/ggerganov/llama.cpp). - -PP = prompt processing (tokens/sec prefill), TG = token generation (tokens/sec interactive). - -| Model | 🏆 Best PP | 🏆 Best TG | Vulkan (AMDVLK) | Vulkan (RADV) | ROCm 6.4.2 | ROCm 6.4.2 + ROCWMMA | ROCm 7.0 Beta | ROCm 7.0 RC | -|---|---|---|---|---|---|---|---|---| -| **Gemma3 12B Q8_0** | 🏆 **AMDVLK** (FA off) | 🏆 **AMDVLK** (FA off) | 677 pp (FA off) / 14.0 tg (FA off) | 503 pp (FA off) / 13.8 tg (FA off) | 223 pp (FA off) / 13.8 tg (FA off) | 230 pp (FA on) / 13.9 tg (FA off) | 223 pp (FA off) / 13.9 tg (FA off) | 222 pp (FA off) / 13.9 tg (FA off) | -| **Gemma3 27B BF16** | 🏆 **RADV** (FA on) | 🏆 **ROCm6.4.2+ROCWMMA** (FA off) | ⚠️ Load Error | 139 pp (FA on) / 4.0 tg (FA off) | 84 pp (FA on) / 4.0 tg (FA on) | 95 pp (FA on) / 4.0 tg (FA off) | 92 pp (FA off) / 4.0 tg (FA off) | 83 pp (FA on) / 4.0 tg (FA on) | -| **Llama-4-Scout 17B Q8_0** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA off) | 260 pp (FA on) / 12.2 tg (FA off) | 172 pp (FA on) / 12.3 tg (FA off) | 135 pp (FA off) / 11.6 tg (FA off) | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | -| **Llama-4-Scout 17B Q4_K XL** | 🏆 **AMDVLK** (FA on) | 🏆 **AMDVLK** (FA off) | 221 pp (FA on) / 20.0 tg (FA off) | 155 pp (FA on) / 20.0 tg (FA off) | 138 pp (FA off) / 17.4 tg (FA off) | ⚠️ GPU Hang | 139 pp (FA off) / 17.6 tg (FA off) | 124 pp (FA on) / 17.6 tg (FA on) | -| **Qwen3 30B BF16** | 🏆 **ROCm6.4.2+ROCWMMA** (FA on) | 🏆 **ROCm7 RC** (FA off) | 108 pp (FA on) / 8.0 tg (FA off) | 87 pp (FA on) / 7.4 tg (FA on) | 158 pp (FA off) / 24.3 tg (FA on) | 162 pp (FA on) / 24.5 tg (FA off) | 153 pp (FA off) / 24.5 tg (FA off) | 152 pp (FA off) / 24.6 tg (FA off) | -| **Qwen3-235B Q3_K XL** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA on) | 116 pp (FA on) / 16.0 tg (FA off) | 67 pp (FA on) / 16.8 tg (FA on) | 74 pp (FA off) / 13.7 tg (FA off) | ⚠️ GPU Hang | ⚠️ GPU Hang | ⚠️ Runtime Error | -| **GLM-4.5-Air-Q4_K_XL** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA on) | 202 pp (FA on) / 22.8 tg (FA on) | 133 pp (FA on) / 23.3 tg (FA on) | 130 pp (FA off) / 19.4 tg (FA off) | ⚠️ GPU Hang | ⚠️ GPU Hang | 130 pp (FA off) / 20.1 tg (FA on) | -| **GLM-4.5-Air-Q6_K_XL** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA on) | 225 pp (FA on) / 16.5 tg (FA on) | 132 pp (FA on) / 17.0 tg (FA on) | 125 pp (FA off) / 15.3 tg (FA off) | 114 pp (FA off) / 15.5 tg (FA off) | 121 pp (FA off) / 15.5 tg (FA off) | 124 pp (FA off) / 15.5 tg (FA off) | -| **gpt-oss-120b-mxfp4** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA off) | 546 pp (FA on) / 48.1 tg (FA off) | 255 pp (FA on) / 49.0 tg (FA off) | 353 pp (FA off) / 44.1 tg (FA off) | 408 pp (FA on) / 45.0 tg (FA off) | 355 pp (FA off) / 45.0 tg (FA off) | 353 pp (FA off) / 45.1 tg (FA off) | -| **gpt-oss-20b-mxfp4** | 🏆 **AMDVLK** (FA on) | 🏆 **RADV** (FA off) | 1473 pp (FA on) / 68.8 tg (FA off) | 728 pp (FA on) / 69.9 tg (FA off) | 583 pp (FA off) / 64.5 tg (FA off) | 649 pp (FA on) / 64.5 tg (FA off) | 584 pp (FA off) / 64.4 tg (FA off) | 582 pp (FA off) / 64.5 tg (FA off) | +🌐 Interactive exploration of the latest benchmark runs: [Interactie Benchmark Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/) -**Observations:** +| Workload Focus | 🏆 Recommended Backend/Config | Win + Tie Count¹ | Typical Runner-Up | Stability Notes | +| ------------------------------------------------- | ----------------------------------- | ---------------: | ---------------------------------- | ------------------------------------------------------------------------------------- | +| **Prompt processing** (pp512, Flash Attention ON) | **ROCm 7 RC + ROCWMMA + hipBLASLt** | 15 | Vulkan AMDVLK (4) | 0% errors in tests | +| **Token generation** (tg128, Flash Attention ON) | **Vulkan RADV** | 13 | Vulkan AMDVLK (1) | 0% errors in tests | +| **Balanced workloads** | **Vulkan AMDVLK** | — | RADV / ROCm 7 RC+ROCWMMA+hipBLASLt | Fast PP & decent TG; \~5.6 % load failure rate due to ≤ 2 GiB single-allocation limit | +| **BF16 models** | **ROCm 7 RC + ROCWMMA + hipBLASLt** | — | ROCm 6.4.2 + ROCWMMA | Best PP & TG among ROCm backends; stable with Flash Attention ON | -* **AMDVLK (Vulkan)** delivers the highest prompt processing speeds for most models, but is limited by ≤2 GiB single-buffer allocation and may fail to load some models. -* **RADV (Vulkan)** is the most stable and compatible backend; typically slower than AMDVLK in PP but often competitive in TG. -* **ROCm 6.4.2 + ROCWMMA** excels in BF16 workloads and can outperform Vulkan in certain cases, though ROCm stability issues remain. -* ROCm 7.0 Beta/RC show similar performance to 6.4.2 without consistent gains. +¹ Counts show number of times the backend placed 1st (alone or tied) across tested models/quantisations. + + +### Key take-aways + +* **ROCm 7 RC + ROCWMMA + hipBLASLt + Flash Attention ON** + * Fastest prompt processing in the vast majority of tests (15/22 wins or ties). + * Best ROCm option for BF16 models. + * Zero recorded errors with Flash Attention ON. + +* **Vulkan RADV** + * Best token generation throughput (13/15 wins or ties). + * Most stable and broadly compatible backend overall. + +* **Vulkan AMDVLK** + + * Competitive in both PP and TG; benefits from margin-aware tie handling. + * Limited by ≤ 2 GiB single buffer allocation, which can block some model architectures. + * Other ROCm variants (beta, hblt0, 6.4.2 w/o ROCWMMA) + * Inconsistent performance and/or higher error rates; best suited for experimental use. 📄 Full per-model analysis: [docs/benchmarks.md](docs/benchmarks.md) -🌐 Interactive exploration: [Live Benchmark Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/) ## 4. Memory Planning & VRAM Estimator diff --git a/benchmark/generate_results.json.py b/benchmark/generate_results.json.py index 3baac1d..4a0937a 100644 --- a/benchmark/generate_results.json.py +++ b/benchmark/generate_results.json.py @@ -41,12 +41,18 @@ def clean_model_name(raw): return base def parse_env_and_fa(basename): - # pattern: __[__fa1] + # pattern: __[__fa1][__hblt0] parts = basename.split("__") if len(parts) < 2: return None, False + env = parts[1] - fa = (len(parts) > 2 and parts[2].lower() == "fa1") + # scan any extra suffix segments + suffixes = {p.lower() for p in parts[2:]} + fa = ("fa1" in suffixes) + if "hblt0" in suffixes: + env = f"{env}-hblt0" + return env, fa def env_base_and_variant(env): diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log index d97d416..f7d6678 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2edd2a90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x19cb8050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index d044208..b800555 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x432ea90) reason :GPU Hang -✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 139.31 ± 0.13 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 19.97 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log index e1a550e..379088a 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 129.88 ± 0.57 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.43 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 130.07 ± 0.32 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.48 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log index 268535b..8df6842 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x834aa90) on address 0x7f10fb96f000. Reason: Page not present or supervisor privilege. +HW Exception by GPU node-1 (Agent handle: 0x50e2050) reason :GPU Hang ✖ ! [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log index 52deb8e..00fe3c5 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x100d3790) reason :GPU Hang -✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.50 ± 0.25 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 20.02 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log index 8039123..d5b577e 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x13829790) on address 0x7fa8ef9a9000. Reason: Page not present or supervisor privilege. -✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 100.80 ± 0.14 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 20.13 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..305470f --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 130.22 ± 0.35 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 20.00 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..7cca3a8 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1f3f20c0) reason :GPU Hang +✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..a945543 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 120.16 ± 0.21 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.96 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..d65e6e5 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 133.91 ± 0.57 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 19.94 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..10c8b0a --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 129.49 ± 0.48 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 19.95 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..1d0ab1d --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 138.34 ± 0.27 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 19.90 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log index fcf0f01..9927f0b 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 130.17 ± 0.38 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.83 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.65 ± 0.23 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.91 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log index 94079a7..86f99ad 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 103.63 ± 0.10 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 20.09 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 100.90 ± 0.22 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 20.15 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..441e956 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 129.49 ± 0.14 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 19.88 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..28b4354 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 103.73 ± 0.14 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 20.07 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log index 4ef718e..15c9127 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 200.76 ± 0.32 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.78 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 201.03 ± 0.31 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.82 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log index 4bbf6de..c0e6775 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 201.86 ± 0.27 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 22.83 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 201.89 ± 0.37 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 22.85 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log index 90347e7..c38f2a1 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 127.73 ± 0.23 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.88 ± 0.02 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 128.01 ± 0.31 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.92 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log index cf98168..12bf239 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 132.54 ± 0.34 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 132.56 ± 0.31 | | glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 23.31 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log index 5dc10c6..deada9f 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 113.62 ± 0.21 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.47 ± 0.04 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 124.75 ± 0.42 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 15.43 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log index a0c808c..6e7bcaa 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2f508a90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x2d9b050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log index d1de7a1..685d734 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.82 ± 0.18 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.94 ± 0.42 | | glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.35 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log index 5ed10e0..b9a03bd 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x1527fa90) on address 0x7f55d5f6f000. Reason: Page not present or supervisor privilege. -✖ ! [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 100.41 ± 0.16 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 15.53 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log index 273166e..d839f3e 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 120.54 ± 0.30 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.49 ± 0.00 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 118.61 ± 0.54 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.51 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log index c23fe13..1208365 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2a849790) reason :GPU Hang -✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 90.24 ± 0.13 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 15.55 ± 0.04 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0.log new file mode 100644 index 0000000..8a0ff5a --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 123.75 ± 0.39 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 15.48 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..0aab0d1 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x36bce0c0) on address 0x7f6ee1f6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..55995b2 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 118.92 ± 0.39 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.47 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..161145c --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 127.14 ± 0.27 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 15.47 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..bff1e47 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..b3eb2c2 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log index 5fbf5b3..97c3a25 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.18 ± 0.48 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.49 ± 0.00 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 118.52 ± 0.35 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.52 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log index 28ae734..c6194e3 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 97.36 ± 0.07 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 15.57 ± 0.02 | + +build: 79c1160b (6123) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0.log new file mode 100644 index 0000000..7f9bb58 --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..8e70bdf --- /dev/null +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log index 4247170..f5209b4 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 223.02 ± 0.69 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.47 ± 0.01 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 223.59 ± 0.50 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.51 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log index e3bc753..3ba2fc2 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 224.54 ± 0.65 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 16.49 ± 0.00 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 225.75 ± 0.69 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 16.53 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log index 5f0ace5..c01b8a2 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 127.36 ± 0.46 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.78 ± 0.01 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 127.35 ± 0.43 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.80 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log index 1973a52..3647c19 100644 --- a/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log +++ b/benchmark/results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 131.78 ± 0.46 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 16.99 ± 0.01 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 131.91 ± 0.42 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 17.02 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log index 135d108..2da2c5e 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x121f0a90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0xd98d050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index 29b2095..bb5ac9d 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x17018a90) reason :GPU Hang -✖ ! [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 99 | 1 | 0 | pp512 | 33.87 ± 0.05 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 99 | 1 | 0 | tg128 | 2.64 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log index 08dae7b..c119ae0 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x11442a90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x2dab2050) reason :GPU Hang ✖ ! [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log index 1849a77..49b6a40 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x64dea90) reason :GPU Hang +Memory access fault by GPU node-1 (Agent handle: 0xae0b050) on address 0x7f17943a9000. Reason: Page not present or supervisor privilege. ✖ ! [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log index e01b520..820c8ea 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0xa636790) reason :GPU Hang -✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | pp512 | 108.88 ± 0.21 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | tg128 | 2.65 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log index 2f2342b..ecdf26e 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1417b7b0) reason :GPU Hang +Memory access fault by GPU node-1 (Agent handle: 0x1f7690e0) on address 0x7f6093d6f000. Reason: Page not present or supervisor privilege. ✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..9bbdd27 --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2ae290c0) reason :GPU Hang +✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..3a354c6 --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x19f880e0) reason :GPU Hang +✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..01916bd --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | pp512 | 109.02 ± 0.07 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | tg128 | 2.65 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..4d3f05b --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | pp512 | 117.34 ± 0.09 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | tg128 | 2.65 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..433bb8d --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..b5a7803 --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log index c479337..343727e 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | pp512 | 33.30 ± 0.04 | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | tg128 | 2.64 ± 0.00 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | pp512 | 109.17 ± 0.12 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | tg128 | 2.65 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log index 7b0ea20..d9b5fe6 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -2,9 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | pp512 | 31.09 ± 0.02 | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | tg128 | 2.65 ± 0.00 | - -build: cd6983d5 (6119) +✖ ! [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..fac1830 --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..f08d646 --- /dev/null +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log index c6c72c5..f0955ad 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | pp512 | 78.70 ± 0.20 | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | tg128 | 2.66 ± 0.00 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | pp512 | 78.54 ± 0.14 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | tg128 | 2.67 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log index ea12120..0c2bb42 100644 --- a/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | pp512 | 81.29 ± 0.14 | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | tg128 | 2.66 ± 0.00 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | pp512 | 81.12 ± 0.08 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | tg128 | 2.67 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log index a418f5b..5c9071b 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0xcd80a90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0xd004050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index 3de552f..182cfd1 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1496da90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x1fdc2050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log index 409a36b..1e5d45d 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.32 ± 0.04 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.28 ± 0.05 | | llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 2.73 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log index 952dddb..db68588 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.28 ± 0.02 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 2.74 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 30.88 ± 0.02 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 2.73 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log index 2acc073..1275625 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0xfeef7b0) reason :GPU Hang -✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 95.65 ± 0.23 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 2.74 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log index 7a57ad3..d4fb01f 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x6d017c0) on address 0x7f967f1a9000. Reason: Page not present or supervisor privilege. +Memory access fault by GPU node-1 (Agent handle: 0x2e9460f0) on address 0x7f23cf58a000. Reason: Page not present or supervisor privilege. ✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..256df75 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2c3170e0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..51e9900 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0xe3f70e0) on address 0x7f4e23b6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..2d8ff98 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 95.63 ± 0.19 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 2.73 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..78ef763 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 103.15 ± 0.13 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 2.73 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..eb0e257 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..588ee30 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log index b9ba150..7715818 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 95.15 ± 0.14 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 2.74 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log index c55bab8..a10eecc 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -2,4 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +:0:rocdevice.cpp :3594: 448132897452 us: Callback: Queue 0x7f7ecc400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 ✖ ! [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..e557f4b --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..f3c72e9 --- /dev/null +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 30.04 ± 0.04 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.74 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log index eb3efec..ac2d0df 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 98.14 ± 0.14 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.73 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 98.20 ± 0.18 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.75 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log index 966e109..9e22472 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 99.24 ± 0.16 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.72 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 99.14 ± 0.35 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.74 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log index 80c3a0e..b4da67c 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 80.11 ± 0.09 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.73 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 79.91 ± 0.16 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.75 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log index 5826f3e..5e3f60f 100644 --- a/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 82.90 ± 0.14 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.73 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 82.40 ± 0.16 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.75 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log index 40f418b..b5a7155 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x28bb9a90) reason :GPU Hang -✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 134.21 ± 0.58 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 14.43 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index a94cdd6..3a30aaf 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x194fea90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x10997050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log index f7132fb..f81dd87 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 134.39 ± 0.32 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.33 ± 0.00 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 133.77 ± 0.46 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.30 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log index 53feea1..b6d7516 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x3b11ea90) reason :GPU Hang +Memory access fault by GPU node-1 (Agent handle: 0x1732e050) on address 0x7fcb1a36f000. Reason: Page not present or supervisor privilege. ✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log index 6d3b4ea..e2f4dbe 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x17ad57b0) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x225860e0) reason :GPU Hang ✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log index 107b01e..311e082 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x2314b7b0) on address 0x7f38249a9000. Reason: Page not present or supervisor privilege. +:0:rocdevice.cpp :3675: 454572762136 us: Callback: Queue 0x7fb3f1400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 ✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..7188df3 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x11dec0e0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..6375788 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 1 | 0 | pp512 | 103.96 ± 0.18 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 1 | 0 | tg128 | 14.47 ± 0.02 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..be95461 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 273.64 ± 0.59 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.43 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..70811e5 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 293.87 ± 1.35 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 14.31 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..b84d351 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..a6ad809 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log index ccf7ac1..07c550f 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 135.25 ± 0.50 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 269.30 ± 1.99 | | llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.43 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log index 8df0b3e..0a2d01d 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 225.70 ± 1.00 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 14.46 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..c9103d5 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 135.16 ± 0.44 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 14.41 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..560fe07 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log index dc80b9d..3cc2007 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 243.45 ± 1.29 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.29 ± 0.01 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 243.54 ± 1.24 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.34 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log index 08242f2..7dafa9d 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 247.48 ± 1.28 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.03 ± 0.00 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 246.48 ± 1.35 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.09 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log index ba7a655..80c940d 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 148.25 ± 0.91 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.21 ± 0.06 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 147.36 ± 0.80 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.30 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log index 14f12dd..e72dffe 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 149.82 ± 0.83 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.21 ± 0.04 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 150.06 ± 1.13 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.27 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log index 2faeaa3..0910303 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x9ae6a90) reason :GPU Hang -✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 135.23 ± 0.81 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 11.62 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log index 6ff4745..07a24ef 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x6e9ba90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0xf461050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log index 8678b7b..db50520 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 135.44 ± 0.76 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.61 ± 0.00 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 135.29 ± 0.58 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.60 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log index 099d9b2..4d84455 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2fba3a90) reason :GPU Hang +Memory access fault by GPU node-1 (Agent handle: 0x13dd2050) on address 0x7f6913b6f000. Reason: Page not present or supervisor privilege. ✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log index c768b8e..1441b69 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x4081f7b0) reason :GPU Hang -✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 262.13 ± 9.71 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.65 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log index 98c472e..4dc7de6 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x3c0f27b0) reason :GPU Hang +Memory access fault by GPU node-1 (Agent handle: 0x2b4130e0) on address 0x7f8a7ed6f000. Reason: Page not present or supervisor privilege. ✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0.log new file mode 100644 index 0000000..236d063 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x12790e0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..3db64ad --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x14e4a0e0) on address 0x7f859916f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..930340d --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 267.45 ± 1.90 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.60 ± 0.05 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..1ee598c --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 293.37 ± 7.08 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 11.54 ± 0.03 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..89b1951 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..fa14ec0 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log index 9c06e2b..8f33074 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 272.38 ± 1.28 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.64 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0.log new file mode 100644 index 0000000..2758045 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..0ab337b --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log index 3bdeae7..40cd552 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 258.18 ± 1.38 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.23 ± 0.01 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 255.55 ± 1.38 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.27 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log index 2060565..e8041dc 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 260.16 ± 1.44 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.09 ± 0.00 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 259.07 ± 1.30 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.11 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log index d9b6ebc..e52091e 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 168.63 ± 0.81 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.26 ± 0.01 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 168.01 ± 0.85 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.30 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log index 579e532..154221d 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 172.37 ± 0.92 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.25 ± 0.00 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 172.71 ± 0.91 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.28 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log index 070646e..f13eb82 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1a40fa90) reason :GPU Hang -✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 137.82 ± 0.73 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 17.41 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index 3fa46c3..5cd6f40 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2e0ffa90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x1624d050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log index 3ec496d..d0be2b8 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 138.27 ± 0.66 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.40 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 137.63 ± 0.80 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.29 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log index 9d0c061..e7a2f72 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x3a741a90) reason :GPU Hang -✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 122.98 ± 0.59 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.53 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log index fb93137..cc14e7d 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 138.90 ± 0.66 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.62 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 281.87 ± 1.98 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.59 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log index ee0d484..c46ae93 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 123.61 ± 0.50 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.60 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 233.14 ± 0.90 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.59 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..05da15f --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2334b0e0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..20fd2ca --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1b1f20f0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..d475882 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..4935293 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 307.08 ± 2.67 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.34 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..ac13496 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 138.22 ± 0.46 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 17.45 ± 0.09 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..5ed8446 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log index 2e1a6fc..1e4897a 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 281.24 ± 1.95 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.56 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log index bde171a..9eb1c08 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -2,9 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 123.58 ± 0.18 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.55 ± 0.00 | - -build: cd6983d5 (6119) +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..ca4dda3 --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __hblt0 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..9086eec --- /dev/null +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log index 75ac351..f03b1f2 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 218.18 ± 0.83 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 20.04 ± 0.02 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 218.27 ± 0.80 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 20.09 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log index a745a31..2706deb 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 221.15 ± 0.74 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.58 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 220.73 ± 0.69 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.64 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log index 4b78701..8058f91 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 152.21 ± 0.66 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 19.98 ± 0.01 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 152.77 ± 0.73 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 20.02 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log index ee535dc..953d42a 100644 --- a/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 155.22 ± 1.09 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.93 ± 0.01 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 155.24 ± 1.01 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.99 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log index aa6dfe3..24b7806 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x153dfa90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x3eeda050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log index e2df164..c0f73f3 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2bd2ba90) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x2d723050) reason :GPU Hang ✖ ! [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log index 1bc098e..95b0795 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | pp512 | 74.15 ± 0.18 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | tg128 | 13.73 ± 0.00 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | pp512 | 73.83 ± 0.16 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | tg128 | 13.68 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log index 40b3223..22c3d0b 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x25011a90) on address 0x7fdcc1b6f000. Reason: Page not present or supervisor privilege. -✖ ! [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | pp512 | 61.47 ± 0.09 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | tg128 | 13.83 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log index b5a6749..c4bfe32 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x513c7b0) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x359cb0e0) reason :GPU Hang ✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log index 7826050..4707f93 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log @@ -2,5 +2,6 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x2567c7c0) on address 0x7ee66236f000. Reason: Page not present or supervisor privilege. +:0:rocdevice.cpp :3675: 456558403486 us: Callback: Queue 0x7f04ef600000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 +Memory access fault by GPU node-1 (Agent handle: 0x2e8f0f0) on address 0x7eeca7f6f000. Reason: Page not present or supervisor privilege. ✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0.log new file mode 100644 index 0000000..4554363 --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1c2260e0) reason :GPU Hang +✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..d31397b --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x11f900f0) on address 0x7f6f91d6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma.log similarity index 79% rename from benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log rename to benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma.log index 2421a1b..440a82e 100644 --- a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 99 | 0 | pp512 | 74.69 ± 0.17 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 99 | 0 | tg128 | 13.56 ± 0.00 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | pp512 | 129.70 ± 0.81 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | tg128 | 13.66 ± 0.00 | -build: 4cb208c9 (6066) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..15a84cb --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | pp512 | 145.18 ± 0.48 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | tg128 | 13.43 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..1e0e2f6 --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..f7f488e --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log index dbd9c47..d8318fb 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | pp512 | 130.56 ± 0.46 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | tg128 | 13.87 ± 0.02 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log index 57b950a..3d86ec7 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | pp512 | 97.08 ± 0.34 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 1 | 0 | tg128 | 13.90 ± 0.03 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0.log new file mode 100644 index 0000000..3cb748b --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..443bad3 --- /dev/null +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log index af5c138..b14c784 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 114.49 ± 0.60 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 15.98 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 114.76 ± 0.62 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 16.06 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log index 19e5e37..c01f816 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 116.07 ± 0.64 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 15.84 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 116.18 ± 0.67 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 15.90 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log index 2aefda4..077cf15 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 64.85 ± 0.38 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 16.58 ± 0.00 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 64.79 ± 0.39 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 16.61 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log index c0359f0..d9c6cb3 100644 --- a/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log +++ b/benchmark/results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 66.76 ± 0.43 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 16.83 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 66.84 ± 0.42 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 16.86 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log index 3c0cef6..562318d 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 157.95 ± 2.63 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.53 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 157.78 ± 2.71 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.56 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index af7dc3f..77ad01b 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 162.19 ± 3.06 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.03 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 161.64 ± 2.99 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 23.94 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log index 03365ca..9a6adc0 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 157.69 ± 2.52 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 23.89 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 157.64 ± 2.49 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 23.93 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log index 86ac559..000d477 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 140.32 ± 2.10 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.33 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 140.32 ± 1.99 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.32 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log index ea26bd0..6b685b0 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 153.49 ± 1.19 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.52 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 424.74 ± 7.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.48 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log index bb2103f..2940f7a 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 138.49 ± 2.52 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.35 ± 0.01 | - -build: cd6983d5 (6119) +Memory access fault by GPU node-1 (Agent handle: 0x16acc0c0) on address 0x7f24fed6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..b98ad69 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 154.45 ± 1.39 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.52 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..8773673 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 138.46 ± 1.64 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.29 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..5088e35 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 425.56 ± 3.28 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.80 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..20a8ebc --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 472.05 ± 4.59 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.12 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..6fb55b7 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 153.54 ± 2.25 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.74 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..e0dd74e --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 158.20 ± 2.47 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.12 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log index e446a9b..1c911fe 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 152.26 ± 2.41 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.55 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 426.72 ± 7.55 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.57 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log index d73c640..d2a18a1 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -2,9 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 137.52 ± 1.75 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.33 ± 0.00 | - -build: cd6983d5 (6119) +✖ ! [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..25f3bf6 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 153.89 ± 1.73 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.57 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..53b2312 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 137.06 ± 2.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.32 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log index 1687c7e..fc1a60f 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 107.48 ± 0.16 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.04 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 107.55 ± 0.11 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.09 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log index a9a752b..f84135e 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.64 ± 0.13 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.96 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.68 ± 0.13 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 8.03 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log index ccca043..1458372 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 85.97 ± 0.12 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.38 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 86.02 ± 0.11 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.46 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log index 48148ef..ec2a50a 100644 --- a/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 87.05 ± 0.10 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.40 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 86.93 ± 0.15 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.44 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log index dc6f1a9..015e9b4 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.77 ± 0.97 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.31 ± 0.01 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 387.45 ± 1.17 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 50.42 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log index 60992bd..a5d6d7e 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 412.35 ± 1.06 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 48.26 ± 0.01 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 411.60 ± 0.78 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 48.14 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log index bd9bc1c..4565d4e 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.72 ± 2.63 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.19 ± 0.01 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 385.52 ± 0.67 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.06 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log index 2a04531..a0064ea 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 301.29 ± 0.54 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.58 ± 0.00 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 300.86 ± 0.38 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.71 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log index a3987ef..7b4fa67 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 390.07 ± 0.40 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.19 ± 0.01 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 534.84 ± 2.48 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.21 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log index a9ca9ef..bbdc595 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 300.60 ± 2.31 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.78 ± 0.00 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 411.72 ± 2.56 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.76 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0.log new file mode 100644 index 0000000..f59b880 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 387.34 ± 1.49 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 50.23 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..6a605f7 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 300.58 ± 1.17 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 49.78 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..bd849a1 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 535.44 ± 6.90 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.07 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..57f3363 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 619.02 ± 7.73 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 47.63 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..922286f --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 387.98 ± 0.76 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 50.09 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..a66f360 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 413.28 ± 2.05 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 47.63 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log index 8ff09a7..625d68e 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.99 ± 1.86 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.31 ± 0.01 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 540.14 ± 5.22 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.65 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log index db6f9b0..43c8e66 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 302.87 ± 0.88 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.90 ± 0.00 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 418.60 ± 2.58 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.63 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0.log new file mode 100644 index 0000000..74fbd99 --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 386.87 ± 1.67 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 50.50 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..763fdaf --- /dev/null +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 300.40 ± 1.44 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 49.69 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log index 51b45f0..4f467f9 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 736.95 ± 3.72 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 56.89 ± 0.26 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 741.97 ± 2.92 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 57.22 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log index 3f2a08e..8cbd25b 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 727.71 ± 2.81 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 53.34 ± 0.31 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 731.64 ± 2.80 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 53.53 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log index 5140ff3..993ae07 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 395.16 ± 1.55 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 58.95 ± 0.45 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 396.38 ± 1.53 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 59.54 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log index 6bbc4f2..296a137 100644 --- a/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log +++ b/benchmark/results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 405.61 ± 1.85 | -| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 58.06 ± 0.28 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 406.84 ± 1.62 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 58.50 ± 0.10 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log index 6625574..3e8cb4e 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 150.50 ± 1.69 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.55 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 150.37 ± 1.75 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.49 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index ded0220..911ffe7 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 154.09 ± 1.98 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.02 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 153.97 ± 1.90 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 23.98 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log index 222959d..dc8823d 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 150.34 ± 1.74 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.14 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 150.06 ± 1.71 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 23.13 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log index 207a2a1..d63b9d0 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 134.40 ± 1.47 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.32 ± 0.01 | - -build: cd6983d5 (6119) +Memory access fault by GPU node-1 (Agent handle: 0x168bc050) on address 0x7ef358d6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log index cc48f94..4d0291a 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 146.55 ± 1.77 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.54 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 408.29 ± 1.82 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.53 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log index 285bed2..0830d17 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x2bd8a7b0) on address 0x7fe0b0d6f000. Reason: Page not present or supervisor privilege. +Memory access fault by GPU node-1 (Agent handle: 0xf2660e0) on address 0x7fb2199a9000. Reason: Page not present or supervisor privilege. ✖ ! [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..eb240a1 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 145.29 ± 1.91 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.53 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..d7843f5 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 130.39 ± 1.57 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.31 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..0a1bfa8 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 414.47 ± 3.10 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.61 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..8c77605 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 460.12 ± 5.58 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.02 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..6c331f6 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 145.43 ± 1.04 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.80 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..b9050d5 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 150.58 ± 1.93 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.13 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log index 29fa537..04d0c86 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 145.91 ± 1.76 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.57 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 413.05 ± 2.36 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.15 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log index 1416318..c7a5573 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 325.48 ± 1.77 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.31 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..c900ebe --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 145.83 ± 2.39 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.12 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..a409195 --- /dev/null +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | pp512 | 130.20 ± 1.39 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 1 | 0 | tg128 | 24.35 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log index 65ecb3e..6c09327 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 106.99 ± 0.10 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.03 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 107.16 ± 0.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.08 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log index 2b69233..5f7c40c 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.10 ± 0.08 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.98 ± 0.02 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.26 ± 0.11 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 8.04 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log index 3a2d167..a65273a 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 85.50 ± 0.06 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.42 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 85.88 ± 0.10 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.48 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log index 9132fa2..a14f281 100644 --- a/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 86.52 ± 0.06 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.40 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 86.57 ± 0.11 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.49 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log index 9f1e992..4638587 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 223.38 ± 0.29 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.86 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 192.14 ± 0.71 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 10.75 ± 3.44 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log index 348f5ed..2b45f78 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 229.77 ± 0.32 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.59 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | pp512 | 229.77 ± 0.18 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | tg128 | 13.58 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log index 5872035..fe52481 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.86 ± 0.11 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.85 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.24 ± 0.39 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.86 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log index de6f8de..62f3c9e 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 202.13 ± 0.24 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.58 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 201.58 ± 0.09 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.57 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log index 6493650..c8ae56d 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.67 ± 0.37 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.88 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 706.58 ± 0.96 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.87 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log index a535d64..214947a 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 203.12 ± 0.35 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 567.65 ± 0.94 | | gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.60 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0.log new file mode 100644 index 0000000..305aaa3 --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 222.31 ± 0.28 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.88 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..71ec3f9 --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | pp512 | 203.03 ± 0.17 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | tg128 | 13.58 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..86106ee --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 703.10 ± 0.68 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.83 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..3898840 --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 818.63 ± 0.82 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.47 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..2e84f94 --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 222.39 ± 0.17 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.81 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..a31080a --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | pp512 | 228.56 ± 0.31 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | tg128 | 13.51 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log index f1ec100..b52b25c 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.49 ± 0.29 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.86 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 706.92 ± 0.89 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.87 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log index f4493e0..228b25b 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 201.47 ± 0.21 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 554.98 ± 0.46 | | gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.61 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0.log new file mode 100644 index 0000000..419821e --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 222.26 ± 0.30 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.86 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..8c0ebea --- /dev/null +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | pp512 | 201.53 ± 0.07 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 1 | 0 | tg128 | 13.59 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log index 5ac352f..670f9fa 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 676.94 ± 0.85 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 13.99 ± 0.01 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 675.90 ± 1.28 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 14.26 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log index b3193bd..f680dfa 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 371.17 ± 0.24 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.30 ± 0.01 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 371.03 ± 0.33 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.49 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log index b620676..36cc6ea 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 503.27 ± 1.09 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 13.76 ± 0.02 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 504.61 ± 2.97 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 14.05 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log index 5e9431a..df5009a 100644 --- a/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log +++ b/benchmark/results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 495.99 ± 2.36 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 13.61 ± 0.03 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 495.37 ± 0.71 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 13.87 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log index 96d541d..aab4706 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 92.52 ± 0.44 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.05 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 92.82 ± 0.46 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 4.05 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log index a5e826d..43f28f1 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 94.54 ± 0.52 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.03 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | pp512 | 94.62 ± 0.56 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | tg128 | 4.03 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log index c646996..b7e4cd2 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log @@ -2,5 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x10c4a90) reason :GPU Hang -✖ ! [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 failed (exit 134) +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 91.25 ± 0.44 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.04 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log index d3b262b..da3d8bb 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.75 ± 0.35 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 84.81 ± 0.48 | | gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log index aaaffba..6b535e0 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 91.54 ± 0.50 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 405.35 ± 0.62 | | gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.04 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log index 18449f7..6b1dd72 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.61 ± 0.31 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 310.92 ± 0.73 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.05 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0.log new file mode 100644 index 0000000..26890da --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 86.80 ± 0.36 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 4.02 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..846a5fd --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | pp512 | 82.85 ± 0.49 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | tg128 | 4.03 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..8b094b5 --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 404.79 ± 0.61 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.04 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..a690d20 --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 472.91 ± 1.05 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.03 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..6f151a6 --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 91.08 ± 0.67 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 4.03 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..13775d4 --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | pp512 | 93.26 ± 0.55 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 1 | 0 | tg128 | 4.03 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log index 9e9f25c..1615077 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 55.68 ± 0.47 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 3.11 ± 0.98 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 368.33 ± 0.38 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 3.71 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log index f7ce012..3803787 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.08 ± 0.42 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 311.83 ± 0.31 | | gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0.log new file mode 100644 index 0000000..eba50a2 --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 80.07 ± 0.21 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 4.00 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..4575c7f --- /dev/null +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] gemma-3-27b-it-BF16-00001-of-00002 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log index 0c3a407..73c8358 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | pp512 | 135.58 ± 0.45 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | tg128 | 4.00 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | pp512 | 135.01 ± 0.28 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | tg128 | 4.03 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log index f2077af..8c6f730 100644 --- a/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log +++ b/benchmark/results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | pp512 | 138.61 ± 0.55 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | tg128 | 4.00 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | pp512 | 137.76 ± 0.25 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | tg128 | 4.03 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log index 34e7e86..43d8ffa 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 729.91 ± 1.22 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.14 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 727.59 ± 1.45 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 76.22 ± 0.03 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log index 16dd036..5f4bc59 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 752.25 ± 0.73 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.93 ± 0.01 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | pp512 | 750.30 ± 1.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | tg128 | 69.96 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log index f07fba3..8397b72 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 730.51 ± 1.49 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.35 ± 0.02 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 728.24 ± 0.55 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 75.89 ± 0.03 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log index 08d39fe..ac89e51 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 645.88 ± 0.61 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.63 ± 0.01 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 643.29 ± 0.97 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.53 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log index ea76e52..e24e049 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 732.13 ± 1.42 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.23 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 1812.73 ± 7.38 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.55 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log index 76e9619..f67fdc0 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 652.29 ± 0.45 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.62 ± 0.02 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 1548.20 ± 4.48 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.64 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0.log new file mode 100644 index 0000000..2791d8d --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 729.03 ± 0.75 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 76.59 ± 0.03 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..e88558b --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | pp512 | 651.26 ± 1.22 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | tg128 | 69.44 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..502bbc1 --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 1799.45 ± 7.32 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 75.43 ± 0.03 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..c34e4a4 --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 2267.56 ± 6.61 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 68.27 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..d86865b --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 729.58 ± 0.87 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 75.48 ± 0.02 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..9ddd54a --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | pp512 | 750.44 ± 0.80 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | tg128 | 68.27 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log index ce94640..b4f3d2a 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 730.59 ± 1.69 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.01 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 1812.27 ± 4.63 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.22 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log index 4c71363..a78e906 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 646.16 ± 0.39 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.53 ± 0.02 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 1510.06 ± 4.96 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.58 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0.log new file mode 100644 index 0000000..e4cd337 --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 729.81 ± 1.15 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 76.03 ± 0.04 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..a600775 --- /dev/null +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | pp512 | 645.48 ± 1.40 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 1 | 0 | tg128 | 69.67 ± 0.02 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log index b707702..8dafbf8 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1614.72 ± 4.91 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 84.00 ± 0.23 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1628.18 ± 1.73 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 84.23 ± 0.15 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log index 6055d96..fc50285 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 942.34 ± 1.76 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 57.70 ± 0.22 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 947.36 ± 1.47 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 60.35 ± 0.15 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log index 5a56858..2ecedbd 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1527.75 ± 3.86 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 85.54 ± 0.99 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1529.98 ± 0.80 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 86.95 ± 0.31 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log index ab5608b..309d21b 100644 --- a/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log +++ b/benchmark/results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 1489.57 ± 4.71 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 80.63 ± 0.22 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 1498.81 ± 1.70 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 81.29 ± 0.12 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log index 2301d16..aa28166 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 355.01 ± 0.57 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.66 ± 0.00 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 353.66 ± 0.64 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.65 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log index dbb739d..11185b5 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 411.33 ± 1.01 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.50 ± 0.00 | - -build: cd6983d5 (6119) +HW Exception by GPU node-1 (Agent handle: 0x2ad71050) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] gpt-oss-120b-F16 __fa1 failed (exit 134) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2.log b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2.log index fc1ded3..e00d36f 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.36 ± 0.53 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 31.90 ± 0.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 352.40 ± 1.12 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 31.99 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2__fa1.log index a62923c..5f9c27c 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm6_4_2__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 247.95 ± 0.40 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.04 ± 0.00 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 321.54 ± 0.46 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.03 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_beta.log b/benchmark/results/gpt-oss-120b-F16__rocm7_beta.log index 4e2c281..218c087 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm7_beta.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 357.38 ± 0.76 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.62 ± 0.00 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 604.24 ± 4.34 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.69 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_beta__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__fa1.log index 707b558..5a91196 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm7_beta__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 249.65 ± 0.33 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.04 ± 0.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 548.27 ± 2.65 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.07 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0.log b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0.log new file mode 100644 index 0000000..957c7c2 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 355.23 ± 1.71 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.66 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..b665daa --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | pp512 | 323.79 ± 0.87 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | tg128 | 33.04 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..f5db248 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 592.27 ± 5.61 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.68 ± 0.02 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..60e0975 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 735.02 ± 5.32 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.34 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..7e97f90 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 353.49 ± 1.71 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.63 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..a07f108 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | pp512 | 388.50 ± 1.06 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | tg128 | 33.28 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc.log index 63dd9d9..c05f1a3 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm7_rc.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 356.67 ± 0.74 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.68 ± 0.02 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 598.68 ± 9.32 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.75 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__fa1.log index 8096c36..f9a46a3 100644 --- a/benchmark/results/gpt-oss-120b-F16__rocm7_rc__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 247.49 ± 0.65 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.07 ± 0.00 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 546.30 ± 3.37 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.04 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0.log new file mode 100644 index 0000000..19aa96b --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 354.34 ± 0.67 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.76 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..2733c29 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-F16__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | pp512 | 324.26 ± 0.80 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 1 | 0 | tg128 | 33.05 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk.log b/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk.log index 755a9cf..461cbc2 100644 --- a/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk.log +++ b/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 448.17 ± 1.37 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.39 ± 0.03 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 450.26 ± 1.46 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.56 ± 0.03 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log b/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log index 152170f..2219116 100644 --- a/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 498.69 ± 2.19 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.06 ± 0.03 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 499.80 ± 1.95 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.18 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__vulkan_radv.log b/benchmark/results/gpt-oss-120b-F16__vulkan_radv.log index 5ab95e4..c45dee8 100644 --- a/benchmark/results/gpt-oss-120b-F16__vulkan_radv.log +++ b/benchmark/results/gpt-oss-120b-F16__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 229.59 ± 0.74 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.08 ± 0.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 230.22 ± 0.76 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.16 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-F16__vulkan_radv__fa1.log b/benchmark/results/gpt-oss-120b-F16__vulkan_radv__fa1.log index 9d830ae..718febd 100644 --- a/benchmark/results/gpt-oss-120b-F16__vulkan_radv__fa1.log +++ b/benchmark/results/gpt-oss-120b-F16__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 243.40 ± 0.99 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.07 ± 0.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 243.20 ± 1.11 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.15 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log index 3f432b9..c520d12 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.53 ± 0.62 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.05 ± 0.08 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 352.37 ± 0.72 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 45.11 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log index fa4767b..9dc5fe3 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 408.50 ± 1.91 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.69 ± 0.18 | - -build: cd6983d5 (6119) +HW Exception by GPU node-1 (Agent handle: 0x3c5a6050) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log index c1f2f78..03da684 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.45 ± 1.22 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 44.12 ± 0.01 | - -build: cd6983d5 (6119) +HW Exception by GPU node-1 (Agent handle: 0x8bc5050) reason :GPU Hang +✖ ! [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 failed (exit 134) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log index 769bedc..97f0889 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 246.76 ± 0.35 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 43.67 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 319.23 ± 0.62 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 43.79 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log index 3892e39..88c729a 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 354.82 ± 1.02 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.00 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 589.45 ± 4.75 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.00 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log index 69476e2..bbf6d17 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 248.22 ± 0.50 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.05 ± 0.00 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 539.93 ± 1.23 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.01 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0.log new file mode 100644 index 0000000..1b8a39f --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x261760b0) reason :GPU Hang +✖ ! [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 __hblt0 failed (exit 134) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..2a30ca3 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 1 | 0 | pp512 | 323.04 ± 0.94 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 1 | 0 | tg128 | 44.01 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..77cb354 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 586.82 ± 5.23 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 44.72 ± 0.30 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..ae5c27f --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 684.17 ± 67.05 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.14 ± 0.27 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..c0da5bb --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 350.89 ± 1.88 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 44.93 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..60423c2 --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc-rocwaam] gpt-oss-120b-mxfp4-00001-of-00003 __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log index 3a57ced..933469d 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.20 ± 0.59 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.15 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 589.82 ± 5.12 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.12 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log index 93e7fca..456548d 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log @@ -2,4 +2,9 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 failed (exit 134) +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 540.27 ± 2.82 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 43.89 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0.log new file mode 100644 index 0000000..b3222aa --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 354.60 ± 1.20 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 45.04 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..b82d07b --- /dev/null +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 1 | 0 | pp512 | 319.46 ± 0.48 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 1 | 0 | tg128 | 43.90 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log index d229658..7d5b354 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 486.90 ± 2.23 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 48.08 ± 0.03 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 488.47 ± 2.30 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 48.21 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log index b556c96..3441dc1 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 546.41 ± 2.88 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 47.25 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 547.53 ± 3.03 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 47.49 ± 0.08 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log index 802c652..f6cbe94 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 239.72 ± 1.23 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 49.01 ± 0.06 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 239.44 ± 1.23 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 49.15 ± 0.02 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log index 6b8a8c4..5538ed2 100644 --- a/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log +++ b/benchmark/results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 255.17 ± 1.65 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 48.93 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 255.37 ± 1.68 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 49.31 ± 0.08 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log index e1b0205..f7c6172 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 324.54 ± 4.39 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.87 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 324.31 ± 4.50 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.87 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log index 8e851e8..020f7b9 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 380.87 ± 8.21 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.79 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 343.30 ± 5.27 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 26.76 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2.log b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2.log index d9bd7eb..88a7e15 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 323.86 ± 4.33 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.27 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 322.55 ± 4.18 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 24.90 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2__fa1.log index 266806c..41e5d8b 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm6_4_2__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 257.11 ± 2.63 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.47 ± 0.08 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 304.86 ± 3.77 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.58 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_beta.log b/benchmark/results/gpt-oss-20b-F32__rocm7_beta.log index d76138e..c97bff7 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm7_beta.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 322.43 ± 2.59 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.89 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1135.90 ± 9.10 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.88 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_beta__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__fa1.log index 6dd4954..3123235 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm7_beta__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 254.08 ± 3.99 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.62 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 1011.32 ± 4.33 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.65 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0.log b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0.log new file mode 100644 index 0000000..c01b9a9 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 313.05 ± 6.96 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.86 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..41d7fd7 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 301.30 ± 4.81 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 26.65 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..1912f8c --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1130.14 ± 7.45 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.84 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..5046a32 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 1502.62 ± 12.84 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.67 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..c83d4a2 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 319.92 ± 6.39 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.83 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..fe7a810 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 338.36 ± 5.02 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 26.71 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc.log index 67b820b..1c550a3 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm7_rc.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 319.36 ± 3.07 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.88 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1130.86 ± 14.88 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.89 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__fa1.log index e07a069..a59b260 100644 --- a/benchmark/results/gpt-oss-20b-F32__rocm7_rc__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 254.87 ± 2.27 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.62 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 1007.82 ± 22.14 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.66 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0.log new file mode 100644 index 0000000..84a8d46 --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 321.80 ± 6.18 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.83 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..ec0c58d --- /dev/null +++ b/benchmark/results/gpt-oss-20b-F32__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 302.84 ± 5.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 26.61 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk.log b/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk.log index 52536d1..6875f68 100644 --- a/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk.log +++ b/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 369.69 ± 1.79 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 8.59 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 369.60 ± 1.30 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 8.72 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log b/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log index 974e845..b4f9322 100644 --- a/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 389.86 ± 2.13 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 8.58 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 389.96 ± 1.87 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 8.70 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__vulkan_radv.log b/benchmark/results/gpt-oss-20b-F32__vulkan_radv.log index 7decf08..ad6bfa4 100644 --- a/benchmark/results/gpt-oss-20b-F32__vulkan_radv.log +++ b/benchmark/results/gpt-oss-20b-F32__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 319.09 ± 1.46 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 7.79 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 318.04 ± 1.50 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 7.89 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-F32__vulkan_radv__fa1.log b/benchmark/results/gpt-oss-20b-F32__vulkan_radv__fa1.log index a9ce691..072c052 100644 --- a/benchmark/results/gpt-oss-20b-F32__vulkan_radv__fa1.log +++ b/benchmark/results/gpt-oss-20b-F32__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 335.15 ± 1.80 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 7.79 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 334.64 ± 1.46 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 7.90 ± 0.01 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log index c377132..59744c9 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 580.83 ± 2.46 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.47 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 581.92 ± 2.00 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.34 ± 0.02 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log index cb2c45b..97f911b 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 649.48 ± 3.21 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 64.18 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 642.40 ± 3.59 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 63.74 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2.log index 343b2b0..2bd619e 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 582.89 ± 2.32 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.45 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 582.94 ± 2.35 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.35 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log index 34d817d..8a01f71 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 394.67 ± 1.08 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 62.97 ± 0.00 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 522.14 ± 1.92 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 62.97 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta.log index 441cec1..7ccf6e0 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 583.52 ± 2.76 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1128.54 ± 2.40 | | gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.39 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log index e5f1e99..9c6c567 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 396.75 ± 0.60 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 62.98 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 1005.66 ± 1.52 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 63.07 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0.log new file mode 100644 index 0000000..57a687a --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 585.03 ± 1.84 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.36 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..661d58f --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 528.92 ± 2.02 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 63.00 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..f49d7ff --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1124.54 ± 9.14 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.19 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..82390bf --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 1474.70 ± 11.50 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 63.31 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..b1e54db --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 583.69 ± 2.09 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.26 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..3068d5b --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 642.92 ± 1.97 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 63.28 ± 0.01 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc.log index 97fab79..d848311 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 581.83 ± 1.10 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.50 ± 0.02 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 1125.60 ± 1.90 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.35 ± 0.01 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log index 3e34f41..bd4d588 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 394.87 ± 0.73 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 63.06 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 997.74 ± 8.16 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 63.00 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0.log new file mode 100644 index 0000000..8b8a81a --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 584.02 ± 1.44 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.50 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0__fa1.log new file mode 100644 index 0000000..d0567df --- /dev/null +++ b/benchmark/results/gpt-oss-20b-mxfp4__rocm7_rc__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | pp512 | 525.48 ± 1.39 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 1 | 0 | tg128 | 63.04 ± 0.01 | + +build: 79c1160b (6123) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log index 2d4b788..ec3e361 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 1205.02 ± 7.18 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 68.84 ± 0.04 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 1218.18 ± 8.08 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 69.76 ± 0.07 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log index 9a5c4c5..fbda2c7 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 1472.56 ± 14.39 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 67.78 ± 0.18 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 1482.59 ± 12.76 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 68.63 ± 0.11 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv.log b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv.log index f400d0f..a2342cc 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 648.85 ± 6.28 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 69.88 ± 0.04 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 649.86 ± 5.16 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 70.72 ± 0.04 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log index 1959c7e..d1051a1 100644 --- a/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log +++ b/benchmark/results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 728.38 ± 8.17 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 69.80 ± 0.05 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 728.71 ± 8.40 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 70.49 ± 0.04 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log index e9da9da..1848af6 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.47 ± 0.04 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.62 ± 0.00 | - -build: cd6983d5 (6119) +HW Exception by GPU node-1 (Agent handle: 0x1fec7050) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M failed (exit 134) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log index 0388774..d800da3 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log @@ -2,9 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -| model | size | params | backend | ngl | fa | mmap | test | t/s | -| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 34.51 ± 0.02 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.61 ± 0.00 | - -build: cd6983d5 (6119) +HW Exception by GPU node-1 (Agent handle: 0x3e596050) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M __fa1 failed (exit 134) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log index 01f32df..15e8d8f 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.79 ± 0.03 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.52 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.76 ± 0.04 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.48 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log index f9ae86b..cc4c84c 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.67 ± 0.04 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.69 ± 0.04 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.62 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log index f6959d1..ab19f84 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.88 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 99.09 ± 0.10 | | llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.61 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log index 2869c45..9bd37ad 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.67 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 81.54 ± 0.11 | | llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0.log similarity index 60% rename from benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log rename to benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0.log index 784987e..b92eb71 100644 --- a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x299852d0) reason :GPU Hang -✖ ! [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 failed (exit 134) +HW Exception by GPU node-1 (Agent handle: 0x2595b0b0) reason :GPU Hang +✖ ! [rocm7_beta] llama3.3-70.6B-Q4_K_M __hblt0 failed (exit 134) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0__fa1.log new file mode 100644 index 0000000..3fdb6c0 --- /dev/null +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_beta__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 31.63 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 4.62 ± 0.00 | + +build: 79c1160b (6123) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log new file mode 100644 index 0000000..80974fd --- /dev/null +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 99.41 ± 0.11 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.62 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log new file mode 100644 index 0000000..928b750 --- /dev/null +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 106.70 ± 0.12 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.60 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0.log new file mode 100644 index 0000000..d78ff89 --- /dev/null +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | pp512 | 33.87 ± 0.08 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | tg128 | 4.61 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0__fa1.log new file mode 100644 index 0000000..1757ab6 --- /dev/null +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__hblt0__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 34.48 ± 0.05 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 4.61 ± 0.00 | + +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log index 6bd1b01..5075f85 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.91 ± 0.03 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.61 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 99.16 ± 0.09 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.62 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log index 77dd920..afc44e9 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.66 ± 0.04 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 81.56 ± 0.09 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.62 ± 0.00 | -build: cd6983d5 (6119) +build: 79c1160b (6123) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0.log similarity index 73% rename from benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log rename to benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0.log index d44f4f5..9718332 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0.log @@ -2,4 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 failed (exit 134) +✖ ! [rocm7_rc] llama3.3-70.6B-Q4_K_M __hblt0 failed (exit 134) diff --git a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0__fa1.log similarity index 72% rename from benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log rename to benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0__fa1.log index d77c334..9d5bb3c 100644 --- a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__rocm7_rc__hblt0__fa1.log @@ -2,4 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -✖ ! [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) +✖ ! [rocm7_rc] llama3.3-70.6B-Q4_K_M __hblt0__fa1 failed (exit 134) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log index bc604f8..f70a707 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 72.75 ± 0.02 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.03 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 72.73 ± 0.05 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.08 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log index 7ac44cb..d13dc99 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 73.57 ± 0.02 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 5.00 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 73.47 ± 0.03 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 5.04 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log index 4cc5212..5ccc1cf 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 78.99 ± 0.18 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.00 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 78.79 ± 0.21 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.04 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log index 869327e..375bd60 100644 --- a/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log +++ b/benchmark/results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 80.92 ± 0.05 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 4.99 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 80.58 ± 0.13 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 5.03 ± 0.00 | -build: cd6983d5 (6119) +build: 34c9d765 (6122) diff --git a/benchmark/results/run_benchmarks.log b/benchmark/results/run_benchmarks.log index 073dde1..98462df 100644 --- a/benchmark/results/run_benchmarks.log +++ b/benchmark/results/run_benchmarks.log @@ -1,4 +1,4 @@ -Found 18 model(s) to bench: +Found 19 model(s) to bench: • /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf • /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf • /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf @@ -16,6 +16,7 @@ Found 18 model(s) to bench: • /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf • /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf • /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + • /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf • /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf @@ -75,13 +76,22 @@ Found 18 model(s) to bench: → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf - * [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 : FAILED ▶ [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 __fa1 → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + ▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf @@ -142,6 +152,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_rc-rocwaam] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + ▶ [rocm7_rc] gemma-3-4b-it-Q3_K_S → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf @@ -202,6 +222,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_rc-rocwaam] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + ▶ [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf @@ -216,13 +246,11 @@ Found 18 model(s) to bench: → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf - * [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 : FAILED ▶ [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 - * [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -234,7 +262,6 @@ Found 18 model(s) to bench: → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 - * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED ▶ [vulkan_radv] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log @@ -267,6 +294,16 @@ Found 18 model(s) to bench: * [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED +▶ [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + ▶ [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf @@ -276,7 +313,6 @@ Found 18 model(s) to bench: → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 - * [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED ▶ [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log @@ -287,12 +323,12 @@ Found 18 model(s) to bench: → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 - * [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED ▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 : FAILED ▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -329,7 +365,16 @@ Found 18 model(s) to bench: → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 - * [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwaam] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + ▶ [rocm7_rc] gpt-oss-120b-F16 → log: results/gpt-oss-120b-F16__rocm7_rc.log @@ -355,6 +400,7 @@ Found 18 model(s) to bench: → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + * [rocm6_4_2-rocwmma] gpt-oss-120b-F16 : FAILED ▶ [rocm6_4_2-rocwmma] gpt-oss-120b-F16 __fa1 → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log @@ -391,6 +437,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_rc-rocwaam] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + ▶ [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf @@ -400,7 +456,6 @@ Found 18 model(s) to bench: → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 - * [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 : FAILED ▶ [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log @@ -446,12 +501,23 @@ Found 18 model(s) to bench: → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + * [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 : FAILED ▶ [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwaam] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + ▶ [rocm7_rc] gpt-oss-20b-F32 → log: results/gpt-oss-20b-F32__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf @@ -512,6 +578,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_rc-rocwaam] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + ▶ [rocm7_rc] gpt-oss-20b-mxfp4 → log: results/gpt-oss-20b-mxfp4__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf @@ -572,6 +648,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 +▶ [rocm7_rc-rocwaam] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_rc-rocwaam] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + ▶ [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf @@ -581,12 +667,12 @@ Found 18 model(s) to bench: → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + * [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED ▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -598,7 +684,6 @@ Found 18 model(s) to bench: → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED ▶ [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -640,11 +725,20 @@ Found 18 model(s) to bench: * [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED +▶ [rocm7_rc-rocwaam] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + ▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED ▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -656,7 +750,6 @@ Found 18 model(s) to bench: → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED ▶ [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -706,6 +799,16 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 +▶ [rocm7_rc-rocwaam] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + ▶ [rocm7_rc] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf @@ -766,16 +869,26 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 +▶ [rocm7_rc-rocwaam] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_rc-rocwaam] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf - * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log @@ -828,7 +941,17 @@ Found 18 model(s) to bench: → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 - * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log @@ -839,7 +962,6 @@ Found 18 model(s) to bench: → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 - * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log @@ -896,11 +1018,20 @@ Found 18 model(s) to bench: * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log @@ -912,7 +1043,6 @@ Found 18 model(s) to bench: → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log @@ -963,17 +1093,25 @@ Found 18 model(s) to bench: * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwaam] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + ▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf - * [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED ▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 - * [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED ▶ [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log @@ -1028,7 +1166,16 @@ Found 18 model(s) to bench: → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 - * [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc-rocwaam] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwaam] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + ▶ [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log @@ -1039,6 +1186,7 @@ Found 18 model(s) to bench: → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + * [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log @@ -1049,6 +1197,7 @@ Found 18 model(s) to bench: → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + * [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 : FAILED ▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -1090,6 +1239,86 @@ Found 18 model(s) to bench: → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 +▶ [rocm7_rc-rocwaam] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_rc] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm7_beta] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_beta] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [vulkan_radv] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [vulkan_radv] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm6_4_2] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm7_rc-rocwaam] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_rc-rocwaam] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + ▶ [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf @@ -1099,7 +1328,6 @@ Found 18 model(s) to bench: → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 - * [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log @@ -1151,3 +1379,14 @@ Found 18 model(s) to bench: → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + * [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwaam] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwaam.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwaam] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwaam__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwaam -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..d97d416 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2edd2a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..d044208 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x432ea90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log new file mode 100644 index 0000000..e1a550e --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 129.88 ± 0.57 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.43 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log similarity index 50% rename from benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log index 5e3d8f8..268535b 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -Memory access fault by GPU node-1 (Agent handle: 0x7f5e570) on address 0x7f3192c0f000. Reason: Page not present or supervisor privilege. -✖ ! [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 failed (exit 134) +Memory access fault by GPU node-1 (Agent handle: 0x834aa90) on address 0x7f10fb96f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log similarity index 81% rename from benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log index c3f4dab..52deb8e 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x16bd82e0) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x100d3790) reason :GPU Hang ✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..8039123 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x13829790) on address 0x7fa8ef9a9000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log index 4a8358a..fcf0f01 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 129.20 ± 0.38 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 19.61 ± 0.00 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 130.17 ± 0.38 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 19.83 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..94079a7 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | pp512 | 103.63 ± 0.10 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 999 | 1 | 0 | tg128 | 20.09 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log index a5b862a..4ef718e 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 0 | pp512 | 199.54 ± 0.38 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 0 | tg128 | 22.75 ± 0.01 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 200.76 ± 0.32 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.78 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..4bbf6de --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 201.86 ± 0.27 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 22.83 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log index b242732..90347e7 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 0 | pp512 | 128.00 ± 0.23 | -| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 0 | tg128 | 22.88 ± 0.02 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 127.73 ± 0.23 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 22.88 ± 0.02 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..cf98168 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 132.54 ± 0.34 | +| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 23.31 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..5dc10c6 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 113.62 ± 0.21 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.47 ± 0.04 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..a0c808c --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2f508a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log index 8519a29..d1de7a1 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | pp512 | 124.86 ± 0.54 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 0 | tg128 | 15.27 ± 0.00 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.82 ± 0.18 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.35 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log new file mode 100644 index 0000000..5ed10e0 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x1527fa90) on address 0x7f55d5f6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log new file mode 100644 index 0000000..273166e --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 120.54 ± 0.30 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.49 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log new file mode 100644 index 0000000..c23fe13 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2a849790) reason :GPU Hang +✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log new file mode 100644 index 0000000..5fbf5b3 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | pp512 | 124.18 ± 0.48 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 999 | 0 | tg128 | 15.49 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log new file mode 100644 index 0000000..28ae734 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log index 0fb7ad2..4247170 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 0 | pp512 | 221.02 ± 0.58 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 0 | tg128 | 16.47 ± 0.01 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 223.02 ± 0.69 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.47 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..e3bc753 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 224.54 ± 0.65 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 16.49 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log similarity index 79% rename from benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log rename to benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log index 9f7f467..5f0ace5 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 0 | pp512 | 126.86 ± 0.40 | -| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 0 | tg128 | 16.76 ± 0.00 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | pp512 | 127.36 ± 0.46 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 0 | tg128 | 16.78 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log new file mode 100644 index 0000000..1973a52 --- /dev/null +++ b/benchmark/results_08-08-2025/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | pp512 | 131.78 ± 0.46 | +| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 999 | 1 | 0 | tg128 | 16.99 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..135d108 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x121f0a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..29b2095 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x17018a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log index 3cb770a..08dae7b 100644 --- a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x68b7b10) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x11442a90) reason :GPU Hang ✖ ! [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..1849a77 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x64dea90) reason :GPU Hang +✖ ! [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log similarity index 79% rename from benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log index d0101aa..e01b520 100644 --- a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1587b430) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0xa636790) reason :GPU Hang ✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..2f2342b --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1417b7b0) reason :GPU Hang +✖ ! [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log new file mode 100644 index 0000000..c479337 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | pp512 | 33.30 ± 0.04 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 0 | tg128 | 2.64 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..7b0ea20 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | pp512 | 31.09 ± 0.02 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | ROCm | 999 | 1 | 0 | tg128 | 2.65 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log similarity index 84% rename from benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log index 7d3b718..4581b23 100644 --- a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -4,5 +4,5 @@ ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | ggml_vulkan: Device memory allocation of size 2491416576 failed. ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory -main: error: failed to load model '/home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf' +main: error: failed to load model '/mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf' ✖ ! [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 failed (exit 1) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..8835330 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +ggml_vulkan: Device memory allocation of size 2491416576 failed. +ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory +main: error: failed to load model '/mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf' +✖ ! [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 1) diff --git a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log index f9cf4af..c6c72c5 100644 --- a/benchmark/results_old/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 99 | 0 | pp512 | 76.48 ± 0.23 | -| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 99 | 0 | tg128 | 2.65 ± 0.00 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | pp512 | 78.70 ± 0.20 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 0 | tg128 | 2.66 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..ea12120 --- /dev/null +++ b/benchmark/results_08-08-2025/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | pp512 | 81.29 ± 0.14 | +| qwen2 70B Q8_0 | 78.21 GiB | 72.71 B | Vulkan | 999 | 1 | 0 | tg128 | 2.66 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..a418f5b --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0xcd80a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..3de552f --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1496da90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log index 1d1603d..409a36b 100644 --- a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 0 | pp512 | 33.17 ± 0.07 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 0 | tg128 | 2.72 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.32 ± 0.04 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 2.73 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..952dddb --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.28 ± 0.02 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 2.74 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log similarity index 81% rename from benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log index 0d64f23..2acc073 100644 --- a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0xa5e9440) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0xfeef7b0) reason :GPU Hang ✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..7a57ad3 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x6d017c0) on address 0x7f967f1a9000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log similarity index 100% rename from benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..c55bab8 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log index 747dc38..eb3efec 100644 --- a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 99 | 0 | pp512 | 96.23 ± 0.16 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 99 | 0 | tg128 | 2.72 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 98.14 ± 0.14 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.73 ± 0.00 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..966e109 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 99.24 ± 0.16 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.72 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log index 22a6a30..80c3a0e 100644 --- a/benchmark/results_old/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 99 | 0 | pp512 | 79.71 ± 0.13 | -| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 99 | 0 | tg128 | 2.72 ± 0.00 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 80.11 ± 0.09 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 2.73 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..5826f3e --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 82.90 ± 0.14 | +| llama 70B Q8_0 | 75.65 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 2.73 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..40f418b --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x28bb9a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..a94cdd6 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x194fea90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log index 33fcb65..f7132fb 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 121.52 ± 0.98 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 14.28 ± 0.00 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 134.39 ± 0.32 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.33 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..53feea1 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x3b11ea90) reason :GPU Hang +✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log similarity index 82% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log index 535626f..6d3b4ea 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x27159430) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x17ad57b0) reason :GPU Hang ✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..107b01e --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x2314b7b0) on address 0x7f38249a9000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log index 995bcbd..ccf7ac1 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 135.36 ± 0.39 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 14.29 ± 0.00 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 135.25 ± 0.50 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 14.43 ± 0.00 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..8df0b3e --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log index f1d30fc..dc80b9d 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 243.19 ± 1.20 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 15.28 ± 0.03 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 243.45 ± 1.29 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.29 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..08242f2 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 247.48 ± 1.28 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.03 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log index 89e18de..ba7a655 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 137.97 ± 0.99 | -| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 15.07 ± 0.05 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 148.25 ± 0.91 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 15.21 ± 0.06 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..14f12dd --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 149.82 ± 0.83 | +| llama4 17Bx16E (Scout) Q6_K | 82.35 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 15.21 ± 0.04 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..2faeaa3 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x9ae6a90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..6ff4745 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x6e9ba90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log new file mode 100644 index 0000000..8678b7b --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 135.44 ± 0.76 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 11.61 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log similarity index 74% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log index b7b6ab3..099d9b2 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2b17db10) reason :GPU Hang -✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) +HW Exception by GPU node-1 (Agent handle: 0x2fba3a90) reason :GPU Hang +✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log similarity index 80% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log index 4981e9d..c768b8e 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1a77430) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x4081f7b0) reason :GPU Hang ✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log new file mode 100644 index 0000000..98c472e --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x3c0f27b0) reason :GPU Hang +✖ ! [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log similarity index 100% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log new file mode 100644 index 0000000..3ccfa82 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log index cda78f6..3bdeae7 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 238.93 ± 2.89 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 12.25 ± 0.01 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 258.18 ± 1.38 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.23 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..2060565 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 260.16 ± 1.44 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.09 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log index 6a5f1fb..d9b6ebc 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 145.86 ± 2.44 | -| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 12.27 ± 0.00 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 168.63 ± 0.81 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 12.26 ± 0.01 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log new file mode 100644 index 0000000..579e532 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 172.37 ± 0.92 | +| llama4 17Bx16E (Scout) Q8_0 | 106.65 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.25 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..070646e --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x1a40fa90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..3fa46c3 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2e0ffa90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log similarity index 78% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log index 5242c84..3ec496d 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 132.66 ± 0.56 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 17.29 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 138.27 ± 0.66 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.40 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..9d0c061 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x3a741a90) reason :GPU Hang +✖ ! [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log similarity index 78% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log index c8275dd..fb93137 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | pp512 | 133.71 ± 0.64 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 99 | 0 | tg128 | 17.35 ± 0.00 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | pp512 | 138.90 ± 0.66 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 0 | tg128 | 17.62 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..ee0d484 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 123.61 ± 0.50 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.60 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log similarity index 100% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..bde171a --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | pp512 | 123.58 ± 0.18 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | ROCm | 999 | 1 | 0 | tg128 | 17.55 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log similarity index 78% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log index 72a362e..75ac351 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 208.84 ± 1.35 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 20.06 ± 0.01 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 218.18 ± 0.83 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 20.04 ± 0.02 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..a745a31 --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 221.15 ± 0.74 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.58 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log index 71adfea..4b78701 100644 --- a/benchmark/results_old/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 99 | 0 | pp512 | 133.49 ± 1.83 | -| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 99 | 0 | tg128 | 19.99 ± 0.01 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | pp512 | 152.21 ± 0.66 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 0 | tg128 | 19.98 ± 0.01 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..ee535dc --- /dev/null +++ b/benchmark/results_08-08-2025/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | pp512 | 155.22 ± 1.09 | +| llama4 17Bx16E (Scout) Q4_K - Medium | 57.73 GiB | 107.77 B | Vulkan | 999 | 1 | 0 | tg128 | 19.93 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..aa6dfe3 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x153dfa90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..e2df164 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +HW Exception by GPU node-1 (Agent handle: 0x2bd2ba90) reason :GPU Hang +✖ ! [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log rename to benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log index c21206d..1bc098e 100644 --- a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 99 | 0 | pp512 | 69.48 ± 0.09 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 99 | 0 | tg128 | 13.54 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | pp512 | 74.15 ± 0.18 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | ROCm | 999 | 0 | tg128 | 13.73 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log new file mode 100644 index 0000000..40b3223 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x25011a90) on address 0x7fdcc1b6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log similarity index 82% rename from benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log rename to benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log index 6cb77a4..b5a6749 100644 --- a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x1a8d440) reason :GPU Hang +HW Exception by GPU node-1 (Agent handle: 0x513c7b0) reason :GPU Hang ✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log new file mode 100644 index 0000000..7826050 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x2567c7c0) on address 0x7ee66236f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log new file mode 100644 index 0000000..dbd9c47 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 failed (exit 134) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log new file mode 100644 index 0000000..57b950a --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log index dba1565..af5c138 100644 --- a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 99 | 0 | pp512 | 99.94 ± 0.91 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 99 | 0 | tg128 | 15.72 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 114.49 ± 0.60 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 15.98 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..19e5e37 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 116.07 ± 0.64 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 15.84 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log rename to benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log index 11f7672..2aefda4 100644 --- a/benchmark/results_old/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 99 | 0 | pp512 | 58.40 ± 0.21 | -| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 99 | 0 | tg128 | 16.29 ± 0.01 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | pp512 | 64.85 ± 0.38 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 0 | tg128 | 16.58 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log new file mode 100644 index 0000000..c0359f0 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | pp512 | 66.76 ± 0.43 | +| qwen3moe 235B.A22B Q3_K - Medium | 96.99 GiB | 235.09 B | Vulkan | 999 | 1 | 0 | tg128 | 16.83 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..3c0cef6 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 157.95 ± 2.63 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.53 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..af7dc3f --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 162.19 ± 3.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.03 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log index e10adbb..03365ca 100644 --- a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 157.74 ± 2.65 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 22.88 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 157.69 ± 2.52 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 23.89 ± 0.01 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..86ac559 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 140.32 ± 2.10 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.33 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log similarity index 79% rename from benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log index bb3fa29..ea26bd0 100644 --- a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 151.25 ± 3.33 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 23.80 ± 0.09 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 153.49 ± 1.19 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.52 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..bb2103f --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 138.49 ± 2.52 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.35 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log similarity index 79% rename from benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log index 47cba1d..e446a9b 100644 --- a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 154.95 ± 1.58 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 23.08 ± 0.08 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 152.26 ± 2.41 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.55 ± 0.00 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..d73c640 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 137.52 ± 1.75 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.33 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log index c75a868..1687c7e 100644 --- a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | pp512 | 90.91 ± 0.35 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | tg128 | 7.96 ± 0.03 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 107.48 ± 0.16 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.04 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..a9a752b --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.64 ± 0.13 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.96 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log index ef72d92..ccca043 100644 --- a/benchmark/results_old/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | pp512 | 71.16 ± 0.92 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | tg128 | 7.33 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 85.97 ± 0.12 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.38 ± 0.01 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..48148ef --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 87.05 ± 0.10 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.40 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..dc6f1a9 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.77 ± 0.97 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.31 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..60992bd --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 412.35 ± 1.06 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 48.26 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log new file mode 100644 index 0000000..bd9bc1c --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.72 ± 2.63 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.19 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log new file mode 100644 index 0000000..2a04531 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 301.29 ± 0.54 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.58 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log new file mode 100644 index 0000000..a3987ef --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 390.07 ± 0.40 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.19 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log new file mode 100644 index 0000000..a9ca9ef --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 300.60 ± 2.31 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.78 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log new file mode 100644 index 0000000..8ff09a7 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 388.99 ± 1.86 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 50.31 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log new file mode 100644 index 0000000..db6f9b0 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 302.87 ± 0.88 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 49.90 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log new file mode 100644 index 0000000..51b45f0 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 736.95 ± 3.72 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 56.89 ± 0.26 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..3f2a08e --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 727.71 ± 2.81 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 53.34 ± 0.31 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log new file mode 100644 index 0000000..5140ff3 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 395.16 ± 1.55 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 58.95 ± 0.45 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log new file mode 100644 index 0000000..6bbc4f2 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 405.61 ± 1.85 | +| qwen3moe 30B.A3B Q6_K | 24.53 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 58.06 ± 0.28 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..6625574 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 150.50 ± 1.69 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.55 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..ded0220 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 154.09 ± 1.98 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.02 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log index 445b37e..222959d 100644 --- a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 150.53 ± 1.83 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 22.13 ± 0.00 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 150.34 ± 1.74 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.14 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..207a2a1 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | pp512 | 134.40 ± 1.47 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 1 | 0 | tg128 | 24.32 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log similarity index 79% rename from benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log index 1204c49..cc48f94 100644 --- a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 147.31 ± 2.22 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 24.12 ± 0.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 146.55 ± 1.77 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.54 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..285bed2 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,6 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +Memory access fault by GPU node-1 (Agent handle: 0x2bd8a7b0) on address 0x7fe0b0d6f000. Reason: Page not present or supervisor privilege. +✖ ! [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log similarity index 79% rename from benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log index b366cae..29fa537 100644 --- a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | pp512 | 144.59 ± 3.08 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 99 | 0 | tg128 | 23.48 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | pp512 | 145.91 ± 1.76 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | ROCm | 999 | 0 | tg128 | 24.57 ± 0.01 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..1416318 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,5 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +✖ ! [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 failed (exit 134) diff --git a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log index 33fe404..65ecb3e 100644 --- a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | pp512 | 90.38 ± 0.57 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | tg128 | 8.00 ± 0.03 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 106.99 ± 0.10 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 8.03 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..2b69233 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 107.10 ± 0.08 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.98 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log index cd83c90..3a2d167 100644 --- a/benchmark/results_old/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | pp512 | 71.53 ± 1.06 | -| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 99 | 0 | tg128 | 7.34 ± 0.01 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | pp512 | 85.50 ± 0.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 0 | tg128 | 7.42 ± 0.01 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..9132fa2 --- /dev/null +++ b/benchmark/results_08-08-2025/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | pp512 | 86.52 ± 0.06 | +| qwen3moe 30B.A3B BF16 | 56.89 GiB | 30.53 B | Vulkan | 999 | 1 | 0 | tg128 | 7.40 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..9f1e992 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 223.38 ± 0.29 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.86 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..348f5ed --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 229.77 ± 0.32 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.59 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log rename to benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log index 21a2b99..5872035 100644 --- a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 223.36 ± 0.23 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.81 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.86 ± 0.11 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.85 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log new file mode 100644 index 0000000..de6f8de --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 202.13 ± 0.24 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.58 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log rename to benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log index fc2cc5b..6493650 100644 --- a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 222.95 ± 0.15 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.80 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.67 ± 0.37 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.88 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log new file mode 100644 index 0000000..a535d64 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 203.12 ± 0.35 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.60 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log rename to benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log index acf4970..f1ec100 100644 --- a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | pp512 | 222.99 ± 0.24 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 99 | 0 | tg128 | 13.81 ± 0.00 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | pp512 | 222.49 ± 0.29 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 0 | tg128 | 13.86 ± 0.00 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log new file mode 100644 index 0000000..f4493e0 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | pp512 | 201.47 ± 0.21 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | ROCm | 999 | 1 | 0 | tg128 | 13.61 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log index 2ba5269..5ac352f 100644 --- a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 99 | 0 | pp512 | 683.07 ± 1.03 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 99 | 0 | tg128 | 13.84 ± 0.02 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 676.94 ± 0.85 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 13.99 ± 0.01 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..b3193bd --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 371.17 ± 0.24 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 12.30 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log rename to benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log index 5d31829..b620676 100644 --- a/benchmark/results_old/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 99 | 0 | pp512 | 508.55 ± 0.90 | -| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 99 | 0 | tg128 | 13.65 ± 0.02 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | pp512 | 503.27 ± 1.09 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 0 | tg128 | 13.76 ± 0.02 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log new file mode 100644 index 0000000..5e9431a --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | pp512 | 495.99 ± 2.36 | +| gemma3 12B Q8_0 | 13.40 GiB | 11.77 B | Vulkan | 999 | 1 | 0 | tg128 | 13.61 ± 0.03 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log similarity index 79% rename from benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log index bbf9e04..96d541d 100644 --- a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 88.73 ± 0.50 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 4.02 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 92.52 ± 0.44 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.05 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..a5e826d --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 94.54 ± 0.52 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.03 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-F16__rocm6_4_2.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log similarity index 58% rename from benchmark/results_old/gpt-oss-120b-F16__rocm6_4_2.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log index 8f074d4..c646996 100644 --- a/benchmark/results_old/gpt-oss-120b-F16__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log @@ -2,5 +2,5 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x394d3570) reason :GPU Hang -✖ ! [rocm6_4_2] gpt-oss-120b-F16 failed (exit 134) +HW Exception by GPU node-1 (Agent handle: 0x10c4a90) reason :GPU Hang +✖ ! [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 failed (exit 134) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log new file mode 100644 index 0000000..d3b262b --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.75 ± 0.35 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log index a664b0b..aaaffba 100644 --- a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 82.31 ± 0.29 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 3.99 ± 0.01 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 91.54 ± 0.50 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 4.04 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log new file mode 100644 index 0000000..18449f7 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.61 ± 0.31 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log index 8ab75d9..9e9f25c 100644 --- a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | pp512 | 83.18 ± 0.41 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 99 | 0 | tg128 | 3.99 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | pp512 | 55.68 ± 0.47 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 0 | tg128 | 3.11 ± 0.98 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log new file mode 100644 index 0000000..f7ce012 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | pp512 | 83.08 ± 0.42 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | ROCm | 999 | 1 | 0 | tg128 | 4.04 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log similarity index 85% rename from benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log index 45f0b37..d74242d 100644 --- a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log @@ -4,5 +4,5 @@ ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | ggml_vulkan: Device memory allocation of size 2819260416 failed. ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory -main: error: failed to load model '/home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf' +main: error: failed to load model '/mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf' ✖ ! [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 failed (exit 1) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..a667917 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +ggml_vulkan: Device memory allocation of size 2819260416 failed. +ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory +main: error: failed to load model '/mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf' +✖ ! [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 __fa1 failed (exit 1) diff --git a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log rename to benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log index 0dccabf..0c3a407 100644 --- a/benchmark/results_old/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 99 | 0 | pp512 | 135.40 ± 0.29 | -| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 99 | 0 | tg128 | 3.98 ± 0.00 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | pp512 | 135.58 ± 0.45 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 0 | tg128 | 4.00 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log new file mode 100644 index 0000000..f2077af --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | pp512 | 138.61 ± 0.55 | +| gemma3 27B BF16 | 50.31 GiB | 27.01 B | Vulkan | 999 | 1 | 0 | tg128 | 4.00 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..34e7e86 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 729.91 ± 1.22 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.14 ± 0.03 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..16dd036 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 752.25 ± 0.73 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.93 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log rename to benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log index 059fbe3..f07fba3 100644 --- a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 729.02 ± 0.82 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 76.04 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 730.51 ± 1.49 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.35 ± 0.02 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log new file mode 100644 index 0000000..08d39fe --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 645.88 ± 0.61 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.63 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_beta.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_beta.log rename to benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta.log index 67c76bf..ea76e52 100644 --- a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 729.93 ± 1.29 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 76.52 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 732.13 ± 1.42 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.23 ± 0.03 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log new file mode 100644 index 0000000..76e9619 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 652.29 ± 0.45 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.62 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_rc.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_rc.log rename to benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc.log index 7fb6f1c..ce94640 100644 --- a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | pp512 | 728.63 ± 1.23 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 99 | 0 | tg128 | 75.59 ± 0.03 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | pp512 | 730.59 ± 1.69 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 0 | tg128 | 76.01 ± 0.03 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log new file mode 100644 index 0000000..4c71363 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | pp512 | 646.16 ± 0.39 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | ROCm | 999 | 1 | 0 | tg128 | 69.53 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log index 2cfca97..b707702 100644 --- a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 99 | 0 | pp512 | 1616.55 ± 4.61 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 99 | 0 | tg128 | 83.89 ± 0.22 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1614.72 ± 4.91 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 84.00 ± 0.23 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..6055d96 --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 942.34 ± 1.76 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 57.70 ± 0.22 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_radv.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_radv.log rename to benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv.log index 1e319e2..5a56858 100644 --- a/benchmark/results_old/gemma-3-4b-it-Q3_K_S__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 99 | 0 | pp512 | 1520.07 ± 5.39 | -| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 99 | 0 | tg128 | 85.93 ± 0.09 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | pp512 | 1527.75 ± 3.86 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 0 | tg128 | 85.54 ± 0.99 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log new file mode 100644 index 0000000..ab5608b --- /dev/null +++ b/benchmark/results_08-08-2025/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | pp512 | 1489.57 ± 4.71 | +| gemma3 4B Q3_K - Small | 1.80 GiB | 3.88 B | Vulkan | 999 | 1 | 0 | tg128 | 80.63 ± 0.22 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..2301d16 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 355.01 ± 0.57 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.66 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..dbb739d --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 411.33 ± 1.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.50 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2.log new file mode 100644 index 0000000..fc1ded3 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.36 ± 0.53 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 31.90 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2__fa1.log new file mode 100644 index 0000000..a62923c --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 247.95 ± 0.40 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.04 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-F16__rocm7_beta.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-F16__rocm7_beta.log rename to benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta.log index 035fbc5..4e2c281 100644 --- a/benchmark/results_old/gpt-oss-120b-F16__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 357.68 ± 1.49 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.70 ± 0.01 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 357.38 ± 0.76 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.62 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta__fa1.log new file mode 100644 index 0000000..707b558 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 249.65 ± 0.33 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.04 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-F16__rocm7_rc.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-F16__rocm7_rc.log rename to benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc.log index 6e2747e..63dd9d9 100644 --- a/benchmark/results_old/gpt-oss-120b-F16__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 355.47 ± 0.55 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 33.65 ± 0.00 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 356.67 ± 0.74 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 33.68 ± 0.02 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc__fa1.log new file mode 100644 index 0000000..8096c36 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 247.49 ± 0.65 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 33.07 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-F16__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-F16__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk.log index 10f4c58..755a9cf 100644 --- a/benchmark/results_old/gpt-oss-120b-F16__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 99 | 0 | pp512 | 449.22 ± 1.12 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 99 | 0 | tg128 | 33.49 ± 0.05 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 448.17 ± 1.37 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.39 ± 0.03 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..152170f --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 498.69 ± 2.19 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.06 ± 0.03 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-F16__vulkan_radv.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-F16__vulkan_radv.log rename to benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv.log index 9d49924..5ab95e4 100644 --- a/benchmark/results_old/gpt-oss-120b-F16__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 99 | 0 | pp512 | 230.32 ± 0.72 | -| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 99 | 0 | tg128 | 33.06 ± 0.02 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 229.59 ± 0.74 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 33.08 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv__fa1.log new file mode 100644 index 0000000..9d830ae --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-F16__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 243.40 ± 0.99 | +| gpt-oss ?B F16 | 60.87 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 33.07 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..3f432b9 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.53 ± 0.62 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.05 ± 0.08 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..fa4767b --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 408.50 ± 1.91 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.69 ± 0.18 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log rename to benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log index 378cd44..c1f2f78 100644 --- a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 352.53 ± 1.06 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 43.56 ± 0.00 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.45 ± 1.22 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 44.12 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log new file mode 100644 index 0000000..769bedc --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 246.76 ± 0.35 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 43.67 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log new file mode 100644 index 0000000..3892e39 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 354.82 ± 1.02 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.00 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log new file mode 100644 index 0000000..69476e2 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | pp512 | 248.22 ± 0.50 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 1 | 0 | tg128 | 44.05 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log rename to benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log index f5968ae..3a57ced 100644 --- a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | pp512 | 351.08 ± 0.86 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 99 | 0 | tg128 | 44.63 ± 0.03 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | pp512 | 353.20 ± 0.59 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | ROCm | 999 | 0 | tg128 | 45.15 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log similarity index 60% rename from benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log rename to benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log index 4391361..93e7fca 100644 --- a/benchmark/results_old/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log @@ -2,5 +2,4 @@ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 -HW Exception by GPU node-1 (Agent handle: 0x2a5da2e0) reason :GPU Hang -✖ ! [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 failed (exit 134) +✖ ! [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 failed (exit 134) diff --git a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log index 2dc8a85..d229658 100644 --- a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 0 | pp512 | 485.98 ± 2.23 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 0 | tg128 | 48.09 ± 0.04 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 486.90 ± 2.23 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 48.08 ± 0.03 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..b556c96 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 546.41 ± 2.88 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 47.25 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log rename to benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log index 19e9a00..802c652 100644 --- a/benchmark/results_old/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 0 | pp512 | 239.16 ± 1.26 | -| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 0 | tg128 | 48.93 ± 0.06 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | pp512 | 239.72 ± 1.23 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 0 | tg128 | 49.01 ± 0.06 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log new file mode 100644 index 0000000..6b8a8c4 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | pp512 | 255.17 ± 1.65 | +| gpt-oss ?B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 999 | 1 | 0 | tg128 | 48.93 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..e1b0205 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 324.54 ± 4.39 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.87 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..8e851e8 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 380.87 ± 8.21 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.79 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-F32__rocm6_4_2.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-F32__rocm6_4_2.log rename to benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2.log index ffdd70e..d9bd7eb 100644 --- a/benchmark/results_old/gpt-oss-20b-F32__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 323.64 ± 4.29 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.64 ± 0.06 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 323.86 ± 4.33 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.27 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2__fa1.log new file mode 100644 index 0000000..266806c --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 257.11 ± 2.63 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.47 ± 0.08 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-F32__rocm7_beta.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-F32__rocm7_beta.log rename to benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta.log index 40fd017..d76138e 100644 --- a/benchmark/results_old/gpt-oss-20b-F32__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 324.15 ± 3.76 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.90 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 322.43 ± 2.59 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.89 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta__fa1.log new file mode 100644 index 0000000..6dd4954 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 254.08 ± 3.99 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.62 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-F32__rocm7_rc.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-F32__rocm7_rc.log rename to benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc.log index 272ca13..67b820b 100644 --- a/benchmark/results_old/gpt-oss-20b-F32__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 324.27 ± 5.39 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 26.86 ± 0.00 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 319.36 ± 3.07 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 26.88 ± 0.00 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc__fa1.log new file mode 100644 index 0000000..e07a069 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 254.87 ± 2.27 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 26.62 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-F32__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-F32__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk.log index e8395dc..52536d1 100644 --- a/benchmark/results_old/gpt-oss-20b-F32__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 99 | 0 | pp512 | 369.86 ± 1.57 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 99 | 0 | tg128 | 8.59 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 369.69 ± 1.79 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 8.59 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..974e845 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 389.86 ± 2.13 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 8.58 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-F32__vulkan_radv.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-F32__vulkan_radv.log rename to benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv.log index 41c0d3f..7decf08 100644 --- a/benchmark/results_old/gpt-oss-20b-F32__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 99 | 0 | pp512 | 318.82 ± 1.63 | -| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 99 | 0 | tg128 | 7.77 ± 0.01 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 319.09 ± 1.46 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 7.79 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv__fa1.log new file mode 100644 index 0000000..a9ce691 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-F32__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 335.15 ± 1.80 | +| gpt-oss ?B BF16 | 38.97 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 7.79 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..c377132 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 580.83 ± 2.46 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.47 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..cb2c45b --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 649.48 ± 3.21 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 64.18 ± 0.02 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm6_4_2.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-mxfp4__rocm6_4_2.log rename to benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2.log index 03d8cf8..343b2b0 100644 --- a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 580.67 ± 2.03 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.26 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 582.89 ± 2.32 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.45 ± 0.02 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log new file mode 100644 index 0000000..34d817d --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 394.67 ± 1.08 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 62.97 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_beta.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_beta.log rename to benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta.log index a4f7d4c..441cec1 100644 --- a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_beta.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 584.04 ± 2.48 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.37 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 583.52 ± 2.76 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.39 ± 0.01 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log new file mode 100644 index 0000000..e5f1e99 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 396.75 ± 0.60 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 62.98 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_rc.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_rc.log rename to benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc.log index 7584083..97fab79 100644 --- a/benchmark/results_old/gpt-oss-20b-mxfp4__rocm7_rc.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | pp512 | 584.15 ± 2.11 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 99 | 0 | tg128 | 64.38 ± 0.01 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | pp512 | 581.83 ± 1.10 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 0 | tg128 | 64.50 ± 0.02 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log new file mode 100644 index 0000000..3e34f41 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | pp512 | 394.87 ± 0.73 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | ROCm | 999 | 1 | 0 | tg128 | 63.06 ± 0.01 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_amdvlk.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk.log index 60e3b9f..2d4b788 100644 --- a/benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 0 | pp512 | 1206.08 ± 8.80 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 0 | tg128 | 68.90 ± 0.18 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 1205.02 ± 7.18 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 68.84 ± 0.04 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..9a5c4c5 --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 1472.56 ± 14.39 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 67.78 ± 0.18 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_radv.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv.log similarity index 79% rename from benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_radv.log rename to benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv.log index d9302e5..f400d0f 100644 --- a/benchmark/results_old/gpt-oss-20b-mxfp4__vulkan_radv.log +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 0 | pp512 | 646.77 ± 4.63 | -| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 0 | tg128 | 69.82 ± 0.03 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | pp512 | 648.85 ± 6.28 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 0 | tg128 | 69.88 ± 0.04 | -build: 0d883154 (6101) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log new file mode 100644 index 0000000..1959c7e --- /dev/null +++ b/benchmark/results_08-08-2025/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | pp512 | 728.38 ± 8.17 | +| gpt-oss ?B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 999 | 1 | 0 | tg128 | 69.80 ± 0.05 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log new file mode 100644 index 0000000..e9da9da --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.47 ± 0.04 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.62 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log new file mode 100644 index 0000000..0388774 --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 34.51 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.61 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log similarity index 79% rename from benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log rename to benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log index cd91f9d..01f32df 100644 --- a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | pp512 | 33.89 ± 0.03 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | tg128 | 4.59 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.79 ± 0.03 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.52 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log new file mode 100644 index 0000000..f9ae86b --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.67 ± 0.04 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_beta.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta.log similarity index 79% rename from benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_beta.log rename to benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta.log index cdd01d1..f6959d1 100644 --- a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_beta.log +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | pp512 | 33.91 ± 0.04 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | tg128 | 4.60 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.88 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.61 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log new file mode 100644 index 0000000..2869c45 --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.67 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_rc.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc.log similarity index 79% rename from benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_rc.log rename to benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc.log index 782d37e..6bd1b01 100644 --- a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__rocm7_rc.log +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc.log @@ -4,7 +4,7 @@ ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | pp512 | 33.82 ± 0.05 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 99 | 0 | tg128 | 4.52 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | pp512 | 33.91 ± 0.03 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 0 | tg128 | 4.61 ± 0.00 | -build: 4cb208c9 (6066) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log new file mode 100644 index 0000000..77dd920 --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log @@ -0,0 +1,10 @@ +ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no +ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no +ggml_cuda_init: found 1 ROCm devices: + Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | pp512 | 31.66 ± 0.04 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | ROCm | 999 | 1 | 0 | tg128 | 4.63 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log similarity index 79% rename from benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log rename to benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log index 2755187..bc604f8 100644 --- a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 99 | 0 | pp512 | 72.75 ± 0.03 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 99 | 0 | tg128 | 5.01 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 72.75 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.03 ± 0.00 | -build: 9c35706b (6060) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log new file mode 100644 index 0000000..7ac44cb --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 73.57 ± 0.02 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 5.00 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_radv.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv.log similarity index 79% rename from benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_radv.log rename to benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv.log index b827d6f..4cc5212 100644 --- a/benchmark/results_old/llama3.3-70.6B-Q4_K_M__vulkan_radv.log +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv.log @@ -2,7 +2,7 @@ ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 99 | 0 | pp512 | 79.12 ± 0.14 | -| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 99 | 0 | tg128 | 4.97 ± 0.00 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | pp512 | 78.99 ± 0.18 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 0 | tg128 | 5.00 ± 0.00 | -build: 66625a59 (6040) +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log new file mode 100644 index 0000000..869327e --- /dev/null +++ b/benchmark/results_08-08-2025/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log @@ -0,0 +1,8 @@ +ggml_vulkan: Found 1 Vulkan devices: +ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat +| model | size | params | backend | ngl | fa | mmap | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | pp512 | 80.92 ± 0.05 | +| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | Vulkan | 999 | 1 | 0 | tg128 | 4.99 ± 0.00 | + +build: cd6983d5 (6119) diff --git a/benchmark/results_08-08-2025/run_benchmarks.log b/benchmark/results_08-08-2025/run_benchmarks.log new file mode 100644 index 0000000..073dde1 --- /dev/null +++ b/benchmark/results_08-08-2025/run_benchmarks.log @@ -0,0 +1,1153 @@ +Found 18 model(s) to bench: + • /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + • /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + • /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + • /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + • /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + • /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + • /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + • /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + • /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + • /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + • /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + • /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + • /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + • /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + • /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + • /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + • /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + • /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_radv] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + * [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 : FAILED + +▶ [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + * [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + * [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 : FAILED + +▶ [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm7_beta] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_beta] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [vulkan_radv] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [vulkan_radv] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [vulkan_amdvlk] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [vulkan_amdvlk] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm6_4_2] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm7_rc] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_rc] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm7_beta] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_beta] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [vulkan_radv] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [vulkan_radv] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [vulkan_amdvlk] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [vulkan_amdvlk] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm6_4_2] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm6_4_2] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_rc] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_beta] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [vulkan_radv] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [vulkan_amdvlk] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm6_4_2] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + * [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [vulkan_radv] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_rc] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_beta] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [vulkan_radv] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [vulkan_amdvlk] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm6_4_2] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_rc] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_beta] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [vulkan_radv] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [vulkan_amdvlk] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm6_4_2] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [vulkan_radv] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + * [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [vulkan_radv] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm6_4_2] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_rc] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm7_beta] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_beta] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [vulkan_radv] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [vulkan_radv] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [vulkan_amdvlk] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [vulkan_amdvlk] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm6_4_2] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm6_4_2] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 : FAILED + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + * [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED + +▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + * [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED + +▶ [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + * [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [vulkan_radv] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [vulkan_radv] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_radv] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_radv] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + diff --git a/benchmark/run_benchmarks.log b/benchmark/run_benchmarks.log index 0abefb2..b9e965c 100644 --- a/benchmark/run_benchmarks.log +++ b/benchmark/run_benchmarks.log @@ -1,314 +1,1392 @@ -Found 11 model(s) to bench: - • /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - • /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf - • /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf - • /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf - • /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf - • /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf - • /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf - • /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - • /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf - • /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf - • /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf - - -▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL - → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - - -▶ [rocm7_beta] gemma-3-12b-it-UD-Q8_K_XL - → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - - -▶ [vulkan_radv] gemma-3-12b-it-UD-Q8_K_XL - → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - - -▶ [vulkan_amdvlk] gemma-3-12b-it-UD-Q8_K_XL - → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - - -▶ [rocm6_4_2] gemma-3-12b-it-UD-Q8_K_XL - → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-12b-it-UD-Q8_K_XL/gemma-3-12b-it-UD-Q8_K_XL.gguf - +Found 19 model(s) to bench: + • /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + • /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + • /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + • /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + • /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + • /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + • /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + • /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + • /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + • /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + • /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + • /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + • /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + • /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + • /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + • /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + • /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + • /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + • /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf ▶ [rocm7_rc] gemma-3-27b-it-BF16-00001-of-00002 → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 ▶ [rocm7_beta] gemma-3-27b-it-BF16-00001-of-00002 → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_radv] gemma-3-27b-it-BF16-00001-of-00002 → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf * [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 : FAILED +▶ [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + * [vulkan_amdvlk] gemma-3-27b-it-BF16-00001-of-00002 __fa1 : FAILED + ▶ [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/gemma-3-27b-it-BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf - * [host] gemma-3-27b-it-BF16-00001-of-00002 : FAILED +▶ [rocm6_4_2] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] gemma-3-27b-it-BF16-00001-of-00002 __fa1 + → log: results/gemma-3-27b-it-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/BF16/gemma-3-27b-it-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_rc] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm7_beta] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_beta] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [vulkan_radv] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [vulkan_radv] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [vulkan_amdvlk] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [vulkan_amdvlk] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm6_4_2] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gemma-3-12b-it-UD-Q8_K_XL + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf + + +▶ [rocm7_rc-rocwmma] gemma-3-12b-it-UD-Q8_K_XL __fa1 + → log: results/gemma-3-12b-it-UD-Q8_K_XL__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-12b-it-UD-Q8_K_XL.gguf -fa 1 + + +▶ [rocm7_rc] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_rc] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm7_beta] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_beta] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm6_4_2-rocwmma] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [vulkan_radv] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [vulkan_radv] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [vulkan_amdvlk] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [vulkan_amdvlk] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm6_4_2] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm6_4_2] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gemma-3-4b-it-Q3_K_S + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf + + +▶ [rocm7_rc-rocwmma] gemma-3-4b-it-Q3_K_S __fa1 + → log: results/gemma-3-4b-it-Q3_K_S__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gemma-3/gemma-3-4b-it-Q3_K_S.gguf -fa 1 + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q4_K_XL/GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_beta] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 : FAILED + +▶ [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [vulkan_radv] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm6_4_2] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003 __fa1 + → log: results/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/GLM-4.5-Air/UD-Q6_K_XL/GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_rc] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_beta] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + * [rocm6_4_2-rocwmma] gpt-oss-120b-F16 : FAILED + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [vulkan_radv] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [vulkan_amdvlk] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm6_4_2] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gpt-oss-120b-F16 + → log: results/gpt-oss-120b-F16__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf + + +▶ [rocm7_rc-rocwmma] gpt-oss-120b-F16 __fa1 + → log: results/gpt-oss-120b-F16__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-F16.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_rc] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_beta] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [vulkan_radv] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + * [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 : FAILED + +▶ [rocm6_4_2] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwmma] gpt-oss-120b-mxfp4-00001-of-00003 __fa1 + → log: results/gpt-oss-120b-mxfp4-00001-of-00003__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-120b/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_rc] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_beta] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [vulkan_radv] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [vulkan_amdvlk] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm6_4_2] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gpt-oss-20b-F32 + → log: results/gpt-oss-20b-F32__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf + + +▶ [rocm7_rc-rocwmma] gpt-oss-20b-F32 __fa1 + → log: results/gpt-oss-20b-F32__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-F32.gguf -fa 1 + + +▶ [rocm7_rc] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_rc] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm7_beta] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_beta] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm6_4_2-rocwmma] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [vulkan_radv] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [vulkan_radv] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [vulkan_amdvlk] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [vulkan_amdvlk] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm6_4_2] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm6_4_2] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] gpt-oss-20b-mxfp4 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf + + +▶ [rocm7_rc-rocwmma] gpt-oss-20b-mxfp4 __fa1 + → log: results/gpt-oss-20b-mxfp4__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf -fa 1 + ▶ [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED ▶ [vulkan_radv] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf * [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED +▶ [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [vulkan_amdvlk] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + ▶ [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf * [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED - * [host] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 : FAILED +▶ [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/kimi-dev-72B-Q8_K_XL/UD-Q8_K_XL/Kimi-Dev-72B-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + ▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf - * [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 : FAILED ▶ [vulkan_radv] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 ▶ [rocm6_4_2] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf +▶ [rocm6_4_2] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002 __fa1 + → log: results/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-70B-Instruct/UD-Q8_K_XL/Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002.gguf -fa 1 + ▶ [rocm7_rc] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_rc] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 ▶ [rocm7_beta] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_beta] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm6_4_2-rocwmma] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 ▶ [vulkan_radv] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [vulkan_radv] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 ▶ [vulkan_amdvlk] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [vulkan_amdvlk] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 ▶ [rocm6_4_2] llama3.3-70.6B-Q4_K_M → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf +▶ [rocm6_4_2] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] llama3.3-70.6B-Q4_K_M + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf + + +▶ [rocm7_rc-rocwmma] llama3.3-70.6B-Q4_K_M __fa1 + → log: results/llama3.3-70.6B-Q4_K_M__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-3.3-Q4_K_M/llama3.3-70.6B-Q4_K_M.gguf -fa 1 + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf - * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 : FAILED + ▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 ▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf + + * [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 : FAILED + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf -fa 1 + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 : FAILED +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + ▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 ▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q6_K/Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00002.gguf -fa 1 + ▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm7_rc] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED ▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm7_beta] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED ▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [vulkan_radv] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 ▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 ▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf - * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 : FAILED + +▶ [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 : FAILED + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwmma] Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003 __fa1 + → log: results/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/llama-4-scout-17b-16e/Q8_0/Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00003.gguf -fa 1 ▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 ▶ [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf * [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED +▶ [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm7_beta] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + * [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + * [rocm6_4_2-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 : FAILED + ▶ [vulkan_radv] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [vulkan_radv] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 ▶ [vulkan_amdvlk] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [vulkan_amdvlk] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 ▶ [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf +▶ [rocm6_4_2] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf + + +▶ [rocm7_rc-rocwmma] Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003 __fa1 + → log: results/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-235B-Q3_K-XL/UD-Q3_K_XL/Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL-00001-of-00003.gguf -fa 1 + ▶ [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf +▶ [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + * [rocm7_rc] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 : FAILED + ▶ [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_radv] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 ▶ [rocm6_4_2] Qwen3-30B-A3B-BF16-00001-of-00002 → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf +▶ [rocm6_4_2] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] Qwen3-30B-A3B-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-30B-A3B-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/BF16/Qwen3-30B-A3B-BF16-00001-of-00002.gguf -fa 1 + + +▶ [rocm7_rc] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_rc] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm7_beta] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_beta] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [vulkan_radv] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [vulkan_radv] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [vulkan_amdvlk] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm6_4_2] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm6_4_2] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + + +▶ [rocm7_rc-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf + + +▶ [rocm7_rc-rocwmma] Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL __fa1 + → log: results/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen-3-30B-A3B/Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf -fa 1 + ▶ [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc.log - → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc__fa1.log + → cmd: toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 ▶ [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta.log - → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_beta__fa1.log + → cmd: toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + * [rocm7_beta] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm6_4_2-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm6_4_2-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_radv] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv.log - → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [vulkan_radv] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_radv__fa1.log + → cmd: toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 ▶ [vulkan_amdvlk] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk.log - → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [vulkan_amdvlk] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__vulkan_amdvlk__fa1.log + → cmd: toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 ▶ [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2.log - → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 99 -mmp 0 -m /home/kyuz0/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf +▶ [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm6_4_2__fa1.log + → cmd: toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + + * [rocm6_4_2] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 : FAILED + +▶ [rocm7_rc-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf + + +▶ [rocm7_rc-rocwmma] Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002 __fa1 + → log: results/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002__rocm7_rc-rocwmma__fa1.log + → cmd: toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench -ngl 999 -mmp 0 -m /mnt/models/qwen3-coder-30B-A3B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf -fa 1 + diff --git a/benchmark/run_benchmarks.sh b/benchmark/run_benchmarks.sh index 3254fcf..b4d15e7 100755 --- a/benchmark/run_benchmarks.sh +++ b/benchmark/run_benchmarks.sh @@ -27,8 +27,10 @@ echo declare -A CMDS=( [rocm6_4_2]="toolbox run -c llama-rocm-6.4.2 -- /usr/local/bin/llama-bench" + [rocm6_4_2-rocwmma]="toolbox run -c llama-rocm-6.4.2-rocwmma -- /usr/local/bin/llama-bench" [rocm7_beta]="toolbox run -c llama-rocm-7beta -- /usr/local/bin/llama-bench" [rocm7_rc]="toolbox run -c llama-rocm-7rc -- /usr/local/bin/llama-bench" + [rocm7_rc-rocwmma]="toolbox run -c llama-rocm-7rc-rocwmma -- /usr/local/bin/llama-bench" [vulkan_amdvlk]="toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench" [vulkan_radv]="toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench" ) @@ -39,35 +41,52 @@ for MODEL_PATH in "${MODEL_PATHS[@]}"; do for ENV in "${!CMDS[@]}"; do CMD="${CMDS[$ENV]}" - # run twice: baseline and with flash attention - for FA in 0 1; do - SUFFIX="" - EXTRA_ARGS=() - if (( FA == 1 )); then - SUFFIX="__fa1" - EXTRA_ARGS=( -fa 1 ) + # For ROCm 7 envs, run default + HIPBLASLT=0 variants; others: default only + if [[ "$ENV" == rocm7_* ]]; then + HBLT_MODES=( default off ) + else + HBLT_MODES=( default ) + fi + + for MODE in "${HBLT_MODES[@]}"; do + BASE_SUFFIX="" + CMD_EFFECTIVE="$CMD" + if [[ "$MODE" == off ]]; then + BASE_SUFFIX="__hblt0" + # inject env inside the container invocation: after the "--" + CMD_EFFECTIVE="${CMD_EFFECTIVE/-- /-- env ROCBLAS_USE_HIPBLASLT=0 }" fi - OUT="$RESULTDIR/${MODEL_NAME}__${ENV}${SUFFIX}.log" + # run twice: baseline and with flash attention + for FA in 0 1; do + SUFFIX="$BASE_SUFFIX" + EXTRA_ARGS=() + if (( FA == 1 )); then + SUFFIX="${SUFFIX}__fa1" + EXTRA_ARGS=( -fa 1 ) + fi - # skip if we already have a non-empty log - if [[ -s "$OUT" ]]; then - echo "⏩ Skipping [${ENV}] ${MODEL_NAME}${SUFFIX:+ ($SUFFIX)}, log already exists at $OUT" - continue - fi + OUT="$RESULTDIR/${MODEL_NAME}__${ENV}${SUFFIX}.log" - # build command array - FULL_CMD=( $CMD -ngl 99 -mmp 0 -m "$MODEL_PATH" "${EXTRA_ARGS[@]}" ) + # skip if we already have a non-empty log + if [[ -s "$OUT" ]]; then + echo "⏩ Skipping [${ENV}] ${MODEL_NAME}${SUFFIX:+ ($SUFFIX)}, log already exists at $OUT" + continue + fi - printf "\n▶ [%s] %s%s\n" "$ENV" "$MODEL_NAME" "${SUFFIX:+ $SUFFIX}" - printf " → log: %s\n" "$OUT" - printf " → cmd: %s\n\n" "${FULL_CMD[*]}" + # build command array + FULL_CMD=( $CMD_EFFECTIVE -ngl 99 -mmp 0 -m "$MODEL_PATH" "${EXTRA_ARGS[@]}" ) - # execute - "${FULL_CMD[@]}" >"$OUT" 2>&1 || { - echo "✖ ! [${ENV}] ${MODEL_NAME}${SUFFIX:+ $SUFFIX} failed (exit $?)" >>"$OUT" - echo " * [${ENV}] ${MODEL_NAME}${SUFFIX:+ $SUFFIX} : FAILED" - } + printf "\n▶ [%s] %s%s\n" "$ENV" "$MODEL_NAME" "${SUFFIX:+ $SUFFIX}" + printf " → log: %s\n" "$OUT" + printf " → cmd: %s\n\n" "${FULL_CMD[*]}" + + # execute + "${FULL_CMD[@]}" >"$OUT" 2>&1 || { + echo "✖ ! [${ENV}] ${MODEL_NAME}${SUFFIX:+ $SUFFIX} failed (exit $?)" >>"$OUT" + echo " * [${ENV}] ${MODEL_NAME}${SUFFIX:+ $SUFFIX} : FAILED" + } + done done done done diff --git a/docs/benchmarks.md b/docs/benchmarks.md index e14538a..5d74e16 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -1,39 +1,44 @@ # AMD Strix Halo — llama.cpp Toolboxes (Benchmarks) -**Live results:** [https://kyuz0.github.io/amd-strix-halo-toolboxes/](https://kyuz0.github.io/amd-strix-halo-toolboxes/) +**Interactive results:** [https://kyuz0.github.io/amd-strix-halo-toolboxes/](https://kyuz0.github.io/amd-strix-halo-toolboxes/) -- Filter by model name, size, and quantization -- Select backends with or without **Flash Attention (FA)** -- Compare pp512 and tg128 side-by-side -- Winners are computed with an error-aware tolerance rule. +* Filter by model name, size, and quantization +* Select backends with or without **Flash Attention** +* Compare pp512 and tg128 side-by-side +* Winners are computed using an **error-aware tolerance rule** — if two results overlap within their ± error margins, both are counted as winners. --- ## Benchmark methodology -* **pp512** — prompt processing throughput (tokens/sec) -* **tg128** — text generation throughput (tokens/sec) -* Each backend tested twice: +* **pp512** — prompt processing throughput (tokens/sec, prefill) +* **tg128** — token generation throughput (tokens/sec, interactive) +* Each backend tested twice per model: - * FA off: `-fa 0` - * FA on: `-fa 1` -* Winners determined per model using pooled ± error from both results; multiple winners are possible. + * **Flash Attention OFF:** `-fa 0` + * **Flash Attention ON:** `-fa 1` +* Winners are determined per model using pooled ± error from all relevant runs; multiple winners are possible. +* All runs were built from the same `llama.cpp` commit for consistency. -Tested backends: +**Tested backends:** * Vulkan RADV * Vulkan AMDVLK * ROCm 6.4.2 -* ROCm 6.4.2 + rocWMMA -* ROCm 7.x (beta / rc) +* ROCm 6.4.2 + ROCWMMA +* ROCm 7.x (beta / RC) +* ROCm 7.x + ROCWMMA + hipBLASLt -All runs built from the same llama.cpp commit. +**Note on ROCm 7 hipBLASLt:** +All ROCm 7 toolboxes ship with **hipBLASLt enabled by default** (`ROCBLAS_USE_HIPBLASLT=1`) because it improves performance and stability in most cases. +However, the benchmark script also includes runs with **hipBLASLt disabled** (`-hblt0`) so we can measure the impact directly. --- ## Running benchmarks Place `.gguf` models in `models/` (for sharded models, include only the first shard: `*-00001-of-*.gguf`). + Run: ```bash @@ -60,66 +65,60 @@ python benchmark/summarize_results.py --- -## Summary of current dataset +## Summary of current dataset (margin-aware, Flash Attention ON) -### pp512 (prompt processing) +### Prompt Processing (pp512) -* **Vulkan AMDVLK** leads in average throughput and most frequent wins. +* **ROCm 7 RC + ROCWMMA + hipBLASLt** dominates — **15 wins/ties** out of 22 models. +* **Vulkan AMDVLK** is second most frequent winner (**4 wins/ties**) but can’t load certain architectures due to the ≤ 2 GiB single-buffer limit. +* **Vulkan RADV** rarely wins in PP but is highly stable. - * Winner count: AMDVLK (FA on) – 11 models; AMDVLK (FA off) – 3 models. - * Average t/s: AMDVLK (FA off) – 422.46; AMDVLK (FA on) – 388.68. -* **Vulkan RADV** is competitive and shows wins on multiple models. +### Token Generation (tg128) - * Winner count: RADV (FA on) – 3 models. - * Average t/s: RADV (FA on) – 279.95; RADV (FA off) – 273.54. -* **ROCm 6.4.2 + rocWMMA** is strong in some cases. - - * Winner count: 2 models (FA on). - * Average t/s: rocWMMA (FA on) – 335.44. -* ROCm 7.x variants trail in pp512 averages. - -**Conclusion:** AMDVLK is generally fastest for prompt processing. RADV is close on certain models and is less prone to instability. ROCm+rocWMMA can match or exceed in select cases but is inconsistent. +* **Vulkan RADV** leads — **13 wins/ties** out of 15 possible. +* **Vulkan AMDVLK** is a strong second, usually just behind RADV in TG. +* **ROCm 7 RC + ROCWMMA + hipBLASLt** generally lags in TG but still posts competitive results for some models. --- -### tg128 (text generation) +### Placement counts (margin-aware, Flash Attention ON) -* **Vulkan RADV** shows the most frequent wins. +**Prompt Processing (pp512)** - * Winner count: RADV (FA off) – 6 models; RADV (FA on) – 5 models. - * Average t/s: RADV (FA off) – 23.73; RADV (FA on) – 23.45. -* **Vulkan AMDVLK** wins in some cases but is less dominant than in pp512. +| Backend | 1st | 2nd | 3rd | +| ------------------------------- | -----: | --: | --: | +| ROCm 7 RC + ROCWMMA + hipBLASLt | **15** | 2 | 1 | +| Vulkan AMDVLK | 4 | 5 | 1 | +| Vulkan RADV | 0 | 2 | 2 | - * Winner count: AMDVLK (FA off) – 4 models. - * Average t/s: AMDVLK (FA off) – 25.91; AMDVLK (FA on) – 23.85. -* **ROCm 6.4.2 + rocWMMA** achieves the highest average t/s. +**Token Generation (tg128)** - * Average t/s: rocWMMA (FA on) – 32.51; rocWMMA (FA off) – 31.96. -* ROCm 7.x and ROCm 6.4.2 also appear among winners in several models. - -**Conclusion:** RADV is the most consistent for text generation wins. ROCm+rocWMMA delivers the highest averages but with potential stability issues. AMDVLK is competitive but not consistently ahead. +| Backend | 1st | 2nd | 3rd | +| ------------------------------- | -----: | --: | --: | +| Vulkan RADV | **13** | 1 | 1 | +| Vulkan AMDVLK | 1 | 10 | 1 | +| ROCm 7 RC + ROCWMMA + hipBLASLt | 1 | 1 | 6 | --- -## Flash Attention (FA) +## Flash Attention -FA effects vary: - -* In pp512 averages, AMDVLK performs better without FA. -* In tg128, the effect depends on backend and model. - FA should be treated as a per-model tuning parameter rather than enabled or disabled globally. +* **ROCm 7 RC + ROCWMMA + hipBLASLt** benefits noticeably from Flash Attention ON in prompt processing, with no stability penalties recorded. +* **Vulkan AMDVLK** and **Vulkan RADV** show mixed changes — some models improve with FA, others slow down slightly. +* FA should be enabled or disabled **per model/backend** based on measured performance. --- ## Recommendations -* **Stability priority:** Vulkan RADV. -* **Maximum pp512 throughput:** Vulkan AMDVLK, validate per model. -* **High tg128 averages:** ROCm 6.4.2 + rocWMMA, test stability. -* **FA setting:** Evaluate per model/backend using side-by-side comparison. +* **Fastest prompt processing:** ROCm 7 RC + ROCWMMA + hipBLASLt (Flash Attention ON) +* **Fastest token generation:** Vulkan RADV (Flash Attention ON) +* **Balanced performance:** Vulkan AMDVLK (fast PP & decent TG, but ≤ 2 GiB buffer limit) +* **BF16 models:** ROCm 7 RC + ROCWMMA + hipBLASLt (best ROCm PP/TG combo, stable with FA ON) +* **Maximum stability:** Vulkan RADV --- ## Winner calculation -A backend is a winner if its mean throughput is within the best backend’s pooled ± error margin for that model and test type. +A backend is counted as a winner if its mean throughput is within the best backend’s pooled ± error margin for that model/test type. This ensures results within measurement noise are treated as ties, not false losses. diff --git a/docs/index.html b/docs/index.html index f9e3011..b81fa7e 100644 --- a/docs/index.html +++ b/docs/index.html @@ -4,7 +4,7 @@ - Strix Halo — Model ↔ Backend Comparator + AMD Ryzen AI MAX+ 395 "Strix Halo" — Llama.cpp Backend Performance Comparison