Files
amd-strix-halo-toolboxes/docs/benchmarks.md
T
2025-08-17 12:32:08 +01:00

158 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AMD Strix Halo — llama.cpp Toolboxes (Benchmarks)
**Interactive results:** https://kyuz0.github.io/amd-strix-halo-toolboxes/
## Table of Contents
- [Benchmark methodology](#benchmark-methodology)
- [Summary of current dataset (Flash Attention ON)](#summary-of-current-dataset-flash-attention-on)
- [Placement counts](#placement-counts)
- [Pairwise head-to-head wins](#pairwise-head-to-head-wins)
- [Average ranks](#average-ranks)
- [Analyses by feature](#analyses-by-feature)
- [Impact of Flash Attention](#impact-of-flash-attention)
- [Impact of ROCWMMA](#impact-of-rocwmma)
- [Impact of hipBLASLt](#impact-of-hipblaslt)
- [Vulkan: AMDVLK vs RADV](#vulkan-amdvlk-vs-radv)
- [Recommendations](#recommendations)
- [Winner calculation](#winner-calculation)
---
## Benchmark methodology
- **pp512** — prompt processing throughput (tokens/sec, prefill)
- **tg128** — token generation throughput (tokens/sec, interactive)
- Each backend tested twice per model: `-fa 0` and `-fa 1`
- Winners per model/test are **margin-aware**; multiple winners are possible when mean±σ overlap
- Built from the same llama.cpp commit for consistency
**Backends in this dataset:** ROCm 7 RC + ROCWMMA + hipBLASLt, ROCm 7 RC (hipBLASLt), ROCm 7 RC (hipBLASLt OFF), ROCm 7 RC + ROCWMMA (hipBLASLt OFF), ROCm 6.4.3 (hipBLASLt), ROCm 6.4.3 (hipBLASLt OFF), ROCm 6.4.3 + ROCWMMA (hipBLASLt), ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF), Vulkan AMDVLK, Vulkan RADV
**ROCm 7 hipBLASLt policy:** Toolboxes ship with **hipBLASLt enabled** by default (`ROCBLAS_USE_HIPBLASLT=1`). The benchmark script also runs **hipBLASLt OFF** variants (`-hblt0`) to measure its effect.
---
## Summary of current dataset (Flash Attention ON)
### Placement counts
**Prompt Processing (pp512)**
| Backend | 1st | 2nd | 3rd |
| --- | ---: | ---: | ---: |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt) | 9 | 5 | 0 |
| ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 3 | 3 | 8 |
| Vulkan AMDVLK | 3 | 0 | 2 |
| ROCm 7 RC + ROCWMMA + hipBLASLt | 1 | 8 | 4 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 0 | 0 | 1 |
| Vulkan RADV | 0 | 0 | 1 |
**Token Generation (tg128)**
| Backend | 1st | 2nd | 3rd |
| --- | ---: | ---: | ---: |
| Vulkan RADV | 13 | 0 | 0 |
| ROCm 6.4.3 (hipBLASLt) | 3 | 0 | 1 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt) | 1 | 4 | 3 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 1 | 2 | 4 |
| ROCm 6.4.3 (hipBLASLt OFF) | 1 | 1 | 1 |
| ROCm 7 RC (hipBLASLt OFF) | 1 | 1 | 1 |
| ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 1 | 1 | 1 |
| ROCm 7 RC (hipBLASLt) | 1 | 0 | 4 |
| Vulkan AMDVLK | 0 | 10 | 0 |
| ROCm 7 RC + ROCWMMA + hipBLASLt | 0 | 1 | 2 |
### Pairwise head-to-head wins
For any model+quant where both backends succeeded, this counts who was faster (ties when equal).
| Comparison | Test | A wins | B wins | Ties | Total |
| --- | --- | ---: | ---: | ---: | ---: |
| ROCm 7 RC + ROCWMMA + hipBLASLt vs Vulkan AMDVLK | pp512 | 11 | 4 | 0 | 15 |
| ROCm 7 RC + ROCWMMA + hipBLASLt vs Vulkan AMDVLK | tg128 | 4 | 10 | 1 | 15 |
| ROCm 7 RC + ROCWMMA + hipBLASLt vs Vulkan RADV | pp512 | 14 | 2 | 0 | 16 |
| ROCm 7 RC + ROCWMMA + hipBLASLt vs Vulkan RADV | tg128 | 3 | 13 | 0 | 16 |
| Vulkan AMDVLK vs Vulkan RADV | pp512 | 13 | 2 | 0 | 15 |
| Vulkan AMDVLK vs Vulkan RADV | tg128 | 2 | 13 | 0 | 15 |
### Average ranks
**Prompt Processing (pp512)**
| Backend | Avg Rank (↓ is better) |
| --- | ---: |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt) | 1.36 |
| Vulkan AMDVLK | 1.8 |
| ROCm 7 RC + ROCWMMA + hipBLASLt | 2.23 |
| ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 2.36 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 3.0 |
| Vulkan RADV | 3.0 |
**Token Generation (tg128)**
| Backend | Avg Rank (↓ is better) |
| --- | ---: |
| Vulkan RADV | 1.0 |
| ROCm 6.4.3 (hipBLASLt) | 1.5 |
| Vulkan AMDVLK | 2.0 |
| ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 2.0 |
| ROCm 7 RC (hipBLASLt OFF) | 2.0 |
| ROCm 6.4.3 (hipBLASLt OFF) | 2.0 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt) | 2.25 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 2.43 |
| ROCm 7 RC (hipBLASLt) | 2.6 |
| ROCm 7 RC + ROCWMMA + hipBLASLt | 2.67 |
---
## Analyses by feature
### Impact of Flash Attention
Median % change when **Flash Attention ON vs OFF**, paired by model+quant, per backend:
| Backend | pp512 Δ% (median, min..max, n) | tg128 Δ% (median, min..max, n) |
| --- | --- | --- |
| ROCm 7 RC + ROCWMMA + hipBLASLt | 8.4% (3.6..65.6), n=14 | -1.1% (-8.2..-0.3), n=14 |
| ROCm 7 RC (hipBLASLt) | -20.2% (-27.8..6.5), n=10 | -1.4% (-8.5..3.0), n=10 |
| ROCm 7 RC (hipBLASLt OFF) | -20.4% (-28.2..-16.1), n=9 | -1.9% (-8.6..0.1), n=9 |
| ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 5.8% (1.3..24.1), n=16 | -1.1% (-7.4..15.1), n=16 |
| ROCm 6.4.3 (hipBLASLt) | -19.5% (-25.7..-11.9), n=12 | -1.2% (-6.9..0.8), n=12 |
| ROCm 6.4.3 (hipBLASLt OFF) | -10.3% (-22.3..3.6), n=9 | -1.6% (-11.1..0.0), n=9 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt) | 10.9% (3.9..25.7), n=15 | -0.4% (-7.5..3.0), n=15 |
| ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 6.4% (1.8..12.3), n=10 | -0.6% (-6.5..2.3), n=10 |
| Vulkan AMDVLK | 1.1% (-45.4..20.2), n=15 | -1.5% (-28.6..0.1), n=15 |
| Vulkan RADV | 3.4% (-2.6..12.5), n=16 | 0.0% (-5.8..2.4), n=16 |
### Impact of ROCWMMA
| Context | Test | Compared Envs | Pairs | Median Δ% |
| --- | --- | --- | ---: | ---: |
| ROCm 7 RC (hipBLASLt) | pp512 | ROCm 7 RC + ROCWMMA + hipBLASLt vs ROCm 7 RC (hipBLASLt) | 16 | 16.3% |
| ROCm 7 RC (hipBLASLt) | tg128 | ROCm 7 RC + ROCWMMA + hipBLASLt vs ROCm 7 RC (hipBLASLt) | 16 | -0.7% |
| ROCm 7 RC (hipBLASLt OFF) | pp512 | ROCm 7 RC + ROCWMMA (hipBLASLt OFF) vs ROCm 7 RC (hipBLASLt OFF) | 15 | 14.6% |
| ROCm 7 RC (hipBLASLt OFF) | tg128 | ROCm 7 RC + ROCWMMA (hipBLASLt OFF) vs ROCm 7 RC (hipBLASLt OFF) | 15 | -0.7% |
| ROCm 6.4.3 (hipBLASLt) | pp512 | ROCm 6.4.3 + ROCWMMA (hipBLASLt) vs ROCm 6.4.3 (hipBLASLt) | 15 | 17.4% |
| ROCm 6.4.3 (hipBLASLt) | tg128 | ROCm 6.4.3 + ROCWMMA (hipBLASLt) vs ROCm 6.4.3 (hipBLASLt) | 15 | -0.3% |
| ROCm 6.4.3 (hipBLASLt OFF) | pp512 | ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) vs ROCm 6.4.3 (hipBLASLt OFF) | 9 | 10.2% |
| ROCm 6.4.3 (hipBLASLt OFF) | tg128 | ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) vs ROCm 6.4.3 (hipBLASLt OFF) | 9 | 0.3% |
### Impact of hipBLASLt
| Context | Test | Compared Envs | Pairs | Median Δ% |
| --- | --- | --- | ---: | ---: |
| ROCm 7 RC (no ROCWMMA) | pp512 | ROCm 7 RC (hipBLASLt) vs ROCm 7 RC (hipBLASLt OFF) | 15 | -0.2% |
| ROCm 7 RC (no ROCWMMA) | tg128 | ROCm 7 RC (hipBLASLt) vs ROCm 7 RC (hipBLASLt OFF) | 15 | -0.1% |
| ROCm 7 RC + ROCWMMA | pp512 | ROCm 7 RC + ROCWMMA + hipBLASLt vs ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 16 | 1.4% |
| ROCm 7 RC + ROCWMMA | tg128 | ROCm 7 RC + ROCWMMA + hipBLASLt vs ROCm 7 RC + ROCWMMA (hipBLASLt OFF) | 16 | 0.0% |
| ROCm 6.4.3 (no ROCWMMA) | pp512 | ROCm 6.4.3 (hipBLASLt) vs ROCm 6.4.3 (hipBLASLt OFF) | 9 | 155.5% |
| ROCm 6.4.3 (no ROCWMMA) | tg128 | ROCm 6.4.3 (hipBLASLt) vs ROCm 6.4.3 (hipBLASLt OFF) | 9 | 0.0% |
| ROCm 6.4.3 + ROCWMMA | pp512 | ROCm 6.4.3 + ROCWMMA (hipBLASLt) vs ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 13 | 116.9% |
| ROCm 6.4.3 + ROCWMMA | tg128 | ROCm 6.4.3 + ROCWMMA (hipBLASLt) vs ROCm 6.4.3 + ROCWMMA (hipBLASLt OFF) | 13 | -0.0% |
### Vulkan: AMDVLK vs RADV
Head-to-head wins with selected Flash Attention filter:
| Test | AMDVLK wins | RADV wins | Ties | Total |
| --- | ---: | ---: | ---: | ---: |
| pp512 | 13 | 2 | 0 | 15 |
| tg128 | 2 | 13 | 0 | 15 |
---
## Recommendations
- **Fastest prompt processing:** ROCm 6.4.3 + ROCWMMA (hipBLASLt) (most 1st-place finishes with selected Flash Attention filter).
- **Fastest token generation:** Vulkan RADV (most 1st-place finishes with selected Flash Attention filter).
- **Balanced choice:** ROCm 6.4.3 + ROCWMMA (hipBLASLt) (consistently near the top across PP/TG).
---
## Winner calculation
A backend is counted as a winner if its mean throughput is within the best backends pooled ± error margin for that model/test type. This treats results within measurement noise as ties instead of false losses.