neclean up of legacy toolboxes, removal of rocwmma and renamed rocm7-alpha to rocm-7nightlies. Added new benchmarks

This commit is contained in:
Donato Capitella
2026-01-10 10:31:04 +00:00
parent f0e9bc8865
commit 783998589e
1155 changed files with 20997 additions and 27513 deletions
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 247.81 ± 0.75 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 22.45 ± 0.27 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 37.61 ± 0.00 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 3.66 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 246.64 ± 0.87 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 22.63 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 37.54 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fbe5a1d45a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fbe5a1d496b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fbe5a1d4aef]
/usr/local/lib64/libggml-hip.so.0(+0x2cb1972) [0x7fbe5cf42972]
/usr/local/lib64/libggml-hip.so.0(+0x2cb6b0e) [0x7fbe5cf47b0e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fbe5a1ebe5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fbe5d63eab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7fbe59b6a5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fbe59b6a668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm-7alpha-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 250.33 ± 0.67 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 22.70 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 84.50 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fa4112eb5a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fa4112eb96b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fa4112ebaef]
/usr/local/lib64/libggml-hip.so.0(+0x2d5a8e2) [0x7fa4141028e2]
/usr/local/lib64/libggml-hip.so.0(+0x2d5fa7e) [0x7fa414107a7e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fa411302e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fa4147d3ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7fa410c815b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fa410c81668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm-7alpha] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 250.13 ± 0.62 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 22.71 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f3b59a565a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f3b59a5696b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f3b59a56aef]
/usr/local/lib64/libggml-hip.so.0(+0x2d5a8e2) [0x7f3b5c86d8e2]
/usr/local/lib64/libggml-hip.so.0(+0x2d5fa7e) [0x7f3b5c872a7e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f3b59a6de5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f3b5cf3eab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7f3b593ec5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f3b593ec668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm-7alpha] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.74 ± 2.03 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.74 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 33.80 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x107a8d10) reason :GPU Hang
✖ ! [rocm6_4_4-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.13 ± 0.85 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.73 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 33.91 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x1f16bd10) reason :GPU Hang
✖ ! [rocm6_4_4-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 333.45 ± 1.70 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.33 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 98.64 ± 0.00 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 13.16 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 336.20 ± 2.04 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.77 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 98.44 ± 0.00 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 12.88 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.36 ± 0.16 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.68 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 47.07 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f6af45f15a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f6af45f196b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f6af45f1aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7f6af734e682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7f6af735385e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f6af4608e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f6af7a23ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f6af3f875b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f6af3f87668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.91 ± 1.10 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.68 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 46.62 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f95789005a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f957890096b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f9578900aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7f957b65d682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7f957b66285e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f9578917e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f957bd32ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f95782965b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f9578296668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.90 ± 1.42 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.83 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
Hip error: 'an illegal memory access was encountered'(700) at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/hipblaslt.cpp:147
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
✖ ! [rocm7.1.1] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 329.23 ± 1.32 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.83 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fb26dd2a5a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fb26dd2a96b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fb26dd2aaef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7fb270b325f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7fb270b377ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fb26dd41e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fb271232ab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7fb26d6c05b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fb26d6c0668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.77 ± 1.72 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.70 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 46.38 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7ffa533f15a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7ffa533f196b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7ffa533f1aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7ffa5614e682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7ffa5615385e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7ffa53408e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7ffa56823ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7ffa52d875b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7ffa52d87668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.19 ± 0.84 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.69 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 46.51 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7ff4771b65a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7ff4771b696b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7ff4771b6aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7ff479f13682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7ff479f1885e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7ff4771cde5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7ff47a5e8ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7ff476b4c5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7ff476b4c668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.87 ± 0.79 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.59 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,17 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
:0:rocdevice.cpp :3582: 48997963017 us: Callback: Queue 0x7ff041800000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Hip error: 'an illegal memory access was encountered'(700) at /therock/src/rocm-libraries/projects/hipblaslt/library/src/amd_detail/hipblaslt.cpp:147
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
Kernel Name: _ZL15flash_attn_tileILi128ELi128ELi16ELi4ELb0EEvPKcS1_S1_S1_S1_PKiPfP15HIP_vector_typeIfLj2EEffffjfiS5_IjLj3EEiiiiiiiiiiiliiliiiiil
VGPU=0xe715690 SWq=0x7ff143a14000, HWq=0x7ff041800000, id=3
Dispatch Header =0xb02 (type=2, barrier=1, acquire=1, release=1), setup=0
grid=[4096, 8, 24], workgroup=[32, 8, 1]
private_seg_size=0, group_seg_size=33792
kernel_obj=0x7fdfbe030100, kernarg_address=0x0x7ff040801600
completion_signal=0x0, correlation_id=0
rptr=15, wptr=47
✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.19 ± 0.73 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.82 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f126437b5a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f126437b96b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f126437baef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7f12671835f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7f12671887ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f1264392e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f1267858ab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7f1263d115b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f1263d11668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q4_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp512 | 228.89 ± 0.52 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg128 | 24.48 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp2048 @ d16384 | 40.49 ± 0.00 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg32 @ d16384 | 9.30 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp512 | 243.57 ± 0.43 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg128 | 24.54 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp2048 @ d16384 | 52.62 ± 0.00 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg32 @ d16384 | 14.35 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 194.43 ± 0.27 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.65 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 36.61 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f4fa9af05a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f4fa9af096b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f4fa9af0aef]
/usr/local/lib64/libggml-hip.so.0(+0x2cb1972) [0x7f4fac85e972]
/usr/local/lib64/libggml-hip.so.0(+0x2cb6b0e) [0x7f4fac863b0e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f4fa9b07e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f4facf5aab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f4fa94865b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f4fa9486668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm-7alpha-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 195.23 ± 0.26 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.64 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 36.83 ± 0.00 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 3.40 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 195.45 ± 0.65 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.50 ± 0.31 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,4 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 195.71 ± 0.70 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.69 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,4 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 275.04 ± 0.75 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.57 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 33.70 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x9cb5d10) reason :GPU Hang
✖ ! [rocm6_4_4-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 272.75 ± 1.25 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.56 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 33.85 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x2738fd10) reason :GPU Hang
✖ ! [rocm6_4_4-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 277.38 ± 0.34 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.52 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 92.73 ± 0.00 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 11.12 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 277.33 ± 0.75 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.62 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 92.73 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x3c5c0d10) reason :GPU Hang
✖ ! [rocm6_4_4] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 254.32 ± 0.84 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.51 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 46.17 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f75321e15a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f75321e196b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f75321e1aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7f7534f3e682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7f7534f4385e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f75321f8e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f7535613ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f7531b775b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f7531b77668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 253.04 ± 1.12 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.50 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 46.53 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f042c8285a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f042c82896b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f042c828aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7f042f585682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7f042f58a85e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f042c83fe5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f042fc5aab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f042c1be5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f042c1be668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 257.70 ± 0.50 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.59 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,23 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fca6c0ef5a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fca6c0ef96b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fca6c0efaef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7fca6eef75f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7fca6eefc7ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x7dd) [0x7fca6c10a46d]
/usr/local/lib64/libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa0) [0x7fca6f5f87e0]
/usr/local/lib64/libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xe2) [0x7fca6f5fa2b2]
/usr/local/lib64/libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x3bf) [0x7fca6f5ff6ff]
/usr/local/lib64/libllama.so.0(llama_decode+0xe) [0x7fca6f6004fe]
/usr/local/bin/llama-bench() [0x40ad9b]
/usr/local/bin/llama-bench() [0x408a57]
/lib64/libc.so.6(+0x35b5) [0x7fca6ba855b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fca6ba85668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 259.40 ± 0.46 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.61 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f4b572795a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f4b5727996b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f4b57279aef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7f4b5a0815f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7f4b5a0867ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f4b57290e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f4b5a781ab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7f4b56c0f5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f4b56c0f668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 254.22 ± 1.28 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.50 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 45.90 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fc72ee915a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fc72ee9196b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fc72ee91aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7fc731bee682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7fc731bf385e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fc72eea8e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fc7322c3ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7fc72e8275b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fc72e827668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 253.25 ± 1.33 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.53 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 45.93 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fc83c7145a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fc83c71496b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fc83c714aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7fc83f471682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7fc83f47685e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fc83c72be5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fc83fb46ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7fc83c0aa5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fc83c0aa668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc-rocwmma] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 258.89 ± 0.25 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 16.54 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 79.91 ± 0.00 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 10.08 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,27 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f48fca035a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f48fca0396b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f48fca03aef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7f48ff80b5f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d5fe47) [0x7f48ff81fe47]
/usr/local/lib64/libggml-hip.so.0(_Z19ggml_cuda_mul_mat_qR25ggml_backend_cuda_contextPK11ggml_tensorS3_S3_PS1_+0x7d3) [0x7f48ff98aba3]
/usr/local/lib64/libggml-hip.so.0(+0x2d5802c) [0x7f48ff81802c]
/usr/local/lib64/libggml-hip.so.0(+0x2d53e28) [0x7f48ff813e28]
/usr/local/lib64/libggml-hip.so.0(+0x2d5083f) [0x7f48ff81083f]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x7f3) [0x7f48fca1e483]
/usr/local/lib64/libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa0) [0x7f48ffee17e0]
/usr/local/lib64/libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xe2) [0x7f48ffee32b2]
/usr/local/lib64/libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x3bf) [0x7f48ffee86ff]
/usr/local/lib64/libllama.so.0(llama_decode+0xe) [0x7f48ffee94fe]
/usr/local/bin/llama-bench() [0x40ad9b]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7f48fc3995b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f48fc399668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 failed (exit 0)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7fb299bf75a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7fb299bf796b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7fb299bf7aef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7fb29c9ff5f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7fb29ca047ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7fb299c0ee5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7fb29d0d4ab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x4088ac]
/lib64/libc.so.6(+0x35b5) [0x7fb29958d5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7fb29958d668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7_rc] GLM-4.5-Air-UD-Q6_K_XL-00001-of-00003__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp512 | 279.25 ± 0.28 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg128 | 17.61 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (AMD open-source driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp2048 @ d16384 | 42.15 ± 0.00 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg32 @ d16384 | 7.96 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp512 | 244.36 ± 0.45 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg128 | 17.73 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,8 @@
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | pp2048 @ d16384 | 54.92 ± 0.00 |
| glm4moe 106B.A12B Q6_K | 94.57 GiB | 110.47 B | Vulkan | 99 | 1 | 0 | tg32 @ d16384 | 11.62 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 65.74 ± 0.01 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 23.88 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.52 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 65.41 ± 0.02 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 24.05 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.52 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 65.85 ± 0.01 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 37.47 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 2.61 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 65.38 ± 0.05 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 37.86 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 2.51 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 145.84 ± 0.07 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.78 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 26.43 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.85 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 144.36 ± 0.18 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.78 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,9 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 26.46 ± 0.00 |
HW Exception by GPU node-1 (Agent handle: 0x31c9cd10) reason :GPU Hang
✖ ! [rocm6_4_4-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__hblt0__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 145.01 ± 0.05 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.77 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 56.24 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 2.61 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 146.28 ± 0.12 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.78 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 56.12 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 2.60 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 146.01 ± 0.05 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 35.23 ± 0.00 |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f2519e175a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f2519e1796b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f2519e17aef]
/usr/local/lib64/libggml-hip.so.0(+0x2ca0682) [0x7f251cb74682]
/usr/local/lib64/libggml-hip.so.0(+0x2ca585e) [0x7f251cb7985e]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f2519e2ee5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f251d249ab0]
/usr/local/bin/llama-bench() [0x408c12]
/lib64/libc.so.6(+0x35b5) [0x7f25197ad5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f25197ad668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1-rocwmma] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 143.94 ± 0.16 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 34.82 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.86 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 147.07 ± 0.01 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,20 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
/opt/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
/usr/local/lib64/libggml-base.so.0(+0x35a5) [0x7f3d030795a5]
/usr/local/lib64/libggml-base.so.0(ggml_print_backtrace+0x1eb) [0x7f3d0307996b]
/usr/local/lib64/libggml-base.so.0(ggml_abort+0x11f) [0x7f3d03079aef]
/usr/local/lib64/libggml-hip.so.0(+0x2d4b5f2) [0x7f3d05e815f2]
/usr/local/lib64/libggml-hip.so.0(+0x2d507ce) [0x7f3d05e867ce]
/usr/local/lib64/libggml-base.so.0(ggml_backend_sched_synchronize+0x2e) [0x7f3d03090e5e]
/usr/local/lib64/libllama.so.0(_ZN13llama_context11synchronizeEv+0x10) [0x7f3d06581ab0]
/usr/local/bin/llama-bench() [0x40adbc]
/usr/local/bin/llama-bench() [0x408b3d]
/lib64/libc.so.6(+0x35b5) [0x7f3d02a0f5b5]
/lib64/libc.so.6(__libc_start_main+0x88) [0x7f3d02a0f668]
/usr/local/bin/llama-bench() [0x409c25]
✖ ! [rocm7.1.1] Llama-3.3-70B-Instruct-UD-Q8_K_XL-00001-of-00002__fa1 __longctx16384 failed (exit 0)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 145.12 ± 0.04 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 52.68 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 2.50 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 145.84 ± 0.08 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 35.12 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.85 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | pp512 | 143.47 ± 0.06 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 1 | 0 | tg128 | 2.79 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | ---: | --------------: | -------------------: |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | pp2048 @ d16384 | 35.06 ± 0.00 |
| llama 70B Q8_0 | 75.65 GiB | 70.55 B | ROCm | 99 | 2048 | 1 | 0 | tg32 @ d16384 | 1.82 ± 0.00 |
build: 2aa45ef9e (7423)

Some files were not shown because too many files have changed in this diff Show More