neclean up of legacy toolboxes, removal of rocwmma and renamed rocm7-alpha to rocm-7nightlies. Added new benchmarks

This commit is contained in:
Donato Capitella
2026-01-10 10:31:04 +00:00
parent f0e9bc8865
commit 783998589e
1155 changed files with 20997 additions and 27513 deletions
+1 -1
View File
@@ -28,7 +28,7 @@ jobs:
IN='${{ inputs.backends }}' IN='${{ inputs.backends }}'
if [[ "$IN" == "all" || -z "$IN" ]]; then if [[ "$IN" == "all" || -z "$IN" ]]; then
JSON='["rocm-6.4.4","rocm-6.4.4-rocwmma","rocm-7.1.1","rocm-7.1.1-rocwmma","rocm-7alpha","rocm-7alpha-rocwmma","rocm-7alpha-rocwmma-improved","rocm-7rc","rocm-7rc-rocwmma","vulkan-amdvlk","vulkan-radv"]' JSON='["rocm-6.4.4","rocm-7.1.1","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
else else
# Remove spaces and build JSON array from comma list # Remove spaces and build JSON array from comma list
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]') IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
+1 -1
View File
@@ -44,7 +44,7 @@ jobs:
run: | run: |
IN='${{ github.event.inputs.backends }}' IN='${{ github.event.inputs.backends }}'
if [[ "$IN" == "all" || -z "$IN" ]]; then if [[ "$IN" == "all" || -z "$IN" ]]; then
JSON='["rocm-6.4.2","rocm-6.4.2-rocwmma","rocm-6.4.3","rocm-6.4.3-rocwmma","rocm-6.4.4","rocm-6.4.4-rocwmma","rocm-7.1","rocm-7.1-rocwmma","rocm-7beta","rocm-7alpha","rocm-7alpha-rocwmma","rocm-7alpha-rocwmma-improved","rocm-7rc","rocm-7rc-rocwmma","rocm-7rc-rocwmma-fa_all_quants","vulkan-amdvlk","vulkan-radv"]' JSON='["rocm-6.4.2","rocm-6.4.3","rocm-6.4.4","rocm-7.1.1","rocm-7beta","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
else else
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]') IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
JSON='["'${IN_CLEAN//,/\",\"}'"]' JSON='["'${IN_CLEAN//,/\",\"}'"]'
+23 -23
View File
@@ -2,6 +2,14 @@
This project provides pre-built containers (“toolboxes”) for running LLMs on **AMD Ryzen AI Max “Strix Halo”** integrated GPUs. Toolbx is the standard developer container system in Fedora (and now works on Ubuntu, openSUSE, Arch, etc). This project provides pre-built containers (“toolboxes”) for running LLMs on **AMD Ryzen AI Max “Strix Halo”** integrated GPUs. Toolbx is the standard developer container system in Fedora (and now works on Ubuntu, openSUSE, Arch, etc).
## 🚨 Updates — 2026-01-10
- **Simplified Offering**: Removed `rocwmma` containers as standard kernels in newer `llama.cpp` are now faster and stable.
- **Renamings**: `rocm-7alpha` is now `rocm7-nightlies` to better reflect that it tracks TheRock nightly builds.
- **Discontinued**: `rocm-7rc` builds are discontinued as they are obsolete.
- **Housekeeping**: Deprecated `rocm-7beta` and other older tags.
## 🚨 CRITICAL WARNING — 2026-01-08 ## 🚨 CRITICAL WARNING — 2026-01-08
**Do NOT use `linux-firmware-20251125`.** It breaks ROCm support on Strix Halo (instability/crashes). **Do NOT use `linux-firmware-20251125`.** It breaks ROCm support on Strix Halo (instability/crashes).
@@ -11,7 +19,7 @@ AMD has recalled this update, but if you have already installed it, you must dow
## 🚨 Updates — 2025-11-18 ## 🚨 Updates — 2025-11-18
- Released new toolboxes for ROCm 7 that track the nightly builds, these are now called `alpha`. - Released new toolboxes for ROCm 7 that track the nightly builds, these are now called `rocm7-nightlies`.
- Updated and extended benchmakrs across all llama.cpp backend configurations, and included bennchmarks over RPC (two nodes) and long context (32k) -> [Interactive Benchmark Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/) - Updated and extended benchmakrs across all llama.cpp backend configurations, and included bennchmarks over RPC (two nodes) and long context (32k) -> [Interactive Benchmark Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/)
## Watch the YouTube Video ## Watch the YouTube Video
@@ -50,11 +58,11 @@ toolbox create llama-vulkan-radv \
-- --device /dev/dri --group-add video --security-opt seccomp=unconfined -- --device /dev/dri --group-add video --security-opt seccomp=unconfined
``` ```
**Command — Create ROCm toolbox (6.4.4/7.1.1/7rc/7alpha)** **Command — Create ROCm toolbox (6.4.4/7.1.1/rocm7-nightlies)**
```sh ```sh
toolbox create llama-rocm-7.1.1-rocwmma \ toolbox create llama-rocm-7.1.1 \
--image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1.1-rocwmma \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1.1 \
-- --device /dev/dri --device /dev/kfd \ -- --device /dev/dri --device /dev/kfd \
--group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
``` ```
@@ -112,7 +120,7 @@ llama-cli --no-mmap -ngl 999 -fa 1 -m models/qwen3-coder-30B-A3B/BF16/Qwen3-Code
**Command — Refresh specific toolboxes** **Command — Refresh specific toolboxes**
```bash ```bash
./refresh-toolboxes.sh llama-vulkan-radv llama-rocm-7.1.1-rocwmma ./refresh-toolboxes.sh llama-vulkan-radv llama-rocm-7.1.1
``` ```
## 1. Llama.cpp Compiled for Every Backend ## 1. Llama.cpp Compiled for Every Backend
@@ -130,21 +138,13 @@ You can check the containers on DockerHub: https://hub.docker.com/r/kyuz0/amd-st
| ------------------------------ | -------------------------------------- | --------------- | | ------------------------------ | -------------------------------------- | --------------- |
| `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. | | `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. |
| `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. | | `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. |
| `rocm-6.4.4` | ROCm 6.4.4 (HIP) + hipBLASLt* | Latest stable build for ROCm 6.4.4, performs very well with most model architectures/quants. | | `rocm-6.4.4` | ROCm 6.4.4 (HIP) | Latest stable build for ROCm 6.4.4, performs very well with most model architectures/quants. |
| `rocm-6.4.4-rocwmma` | ROCm 6.4.4 + ROCWMMA + hipBLASLt* | 6.4.4 with ROCWMMA enabled for better flash attention on RDNA3+/CDNA. | | `rocm-7.1.1` | ROCm 7.1.1 GA (HIP) | Current GA release for ROCm 7.x; improved scheduler and kernels. |
| `rocm-7.1.1` | ROCm 7.1.1 GA (HIP) + hipBLASLt* | Current GA release for ROCm 7.x; improved scheduler and hipBLASLt kernels. | | `rocm7-nightlies` | ROCm 7 Nightly | Tracks ROCm 7 nightly builds with bleeding-edge patches. |
| `rocm-7.1.1-rocwmma` | ROCm 7.1.1 GA + ROCWMMA + hipBLASLt* | 7.1.1 with ROCWMMA for maximum flash-attention throughput. |
| `rocm-7rc` | ROCm 7.9 (HIP) + hipBLASLt* | Used to be the release candidate for ROCm 7.9.0 (hence the `rc` tag in the name), now released. |
| `rocm-7rc-rocwmma` | ROCm 7.9 + ROCWMMA + hipBLASLt* | 7.9.0 build with ROCWMMA—useful for early flash-attention validation. |
| `rocm-7alpha` | ROCm 7 Nightly (“7rc-alpha”) + hipBLASLt* | Tracks ROCm 7 nightly (alpha) preview with bleeding-edge patches. |
| `rocm-7alpha-rocwmma` | ROCm 7 Nightly + ROCWMMA + hipBLASLt* | Same nightly/alpha stack with ROCWMMA tuned for flash attention. |
| `rocm-7alpha-rocwmma-improved` | ROCm 7 Nightly + ROCWMMA (improved) + hipBLASLt* | Nightly/Alpha stack plus extra ROCWMMA fixes; fastest but most experimental option. |
\* All these toolboxes export `ROCBLAS_USE_HIPBLASLT=1` because it historically delivered better performance and stability, altough this might not be the case any more.
> These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated, ensuring you get the latest bug fixes and new model support. The easiest way to update to the newest versions is by running the `refresh-toolboxes.sh` [script below](#211-toolbox-refresh-script-automatic-updates). > These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated, ensuring you get the latest bug fixes and new model support. The easiest way to update to the newest versions is by running the `refresh-toolboxes.sh` [script below](#211-toolbox-refresh-script-automatic-updates).
> >
> Legacy images `rocm-6.4.2` and `rocm-6.4.3` are still on Docker Hub for reproducibility but are intentionally excluded from the active list above. Prefer `rocm-6.4.4+` or any `rocm-7.x` tag unless you must bisect an old regression. (The `rocm-7beta` images share the same status.) > Legacy images `rocm-6.4.2` and `rocm-6.4.3` are still on Docker Hub for reproducibility but are intentionally excluded from the active list above. Prefer `rocm-6.4.4+` or any `rocm-7.x` tag unless you must bisect an old regression. (The `rocm-7beta` and `rocm-7rc` images share the same status.)
--- ---
@@ -164,16 +164,16 @@ toolbox create llama-vulkan-radv \
*Only `/dev/dri` is required for Vulkan. Make sure your user is in the `video` group.* *Only `/dev/dri` is required for Vulkan. Make sure your user is in the `video` group.*
#### Command — Create ROCm toolbox (swap the tag for 6.4.4, 7.1, 7rc, 7alpha…) #### Command — Create ROCm toolbox (swap the tag for 6.4.4, 7.1, rocm7-nightlies…)
```sh ```sh
toolbox create llama-rocm-7.1-rocwmma \ toolbox create llama-rocm-7.1 \
--image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1-rocwmma \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1 \
-- --device /dev/dri --device /dev/kfd \ -- --device /dev/dri --device /dev/kfd \
--group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
``` ```
*ROCm needs both `/dev/dri` and `/dev/kfd`, plus the `video`, `render`, and sometimes `sudo` groups for full compute access. Swap `rocm-7.1-rocwmma` for any other active ROCm tag (6.4.4, 7rc, 7alpha, etc.).* *ROCm needs both `/dev/dri` and `/dev/kfd`, plus the `video`, `render`, and sometimes `sudo` groups for full compute access. Swap `rocm-7.1` for any other active ROCm tag (6.4.4, rocm7-nightlies, etc.).*
> **Note:** > **Note:**
> >
@@ -188,7 +188,7 @@ Ubuntus `toolbox` package still breaks GPU access, so follow gyhors [issue
```sh ```sh
distrobox create -n llama-rocm-7.1.1 \ distrobox create -n llama-rocm-7.1.1 \
--image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1.1-rocwmma \ --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.1.1 \
--additional-flags "--device /dev/kfd --device /dev/dri --group-add video --group-add render --security-opt seccomp=unconfined" --additional-flags "--device /dev/kfd --device /dev/dri --group-add video --group-add render --security-opt seccomp=unconfined"
distrobox enter llama-rocm-7.1.1 distrobox enter llama-rocm-7.1.1
llama-cli --list-devices llama-cli --list-devices
@@ -213,7 +213,7 @@ This will:
You can also refresh just one or more toolboxes: You can also refresh just one or more toolboxes:
```bash ```bash
./refresh-toolboxes.sh llama-vulkan-radv llama-rocm-7.1.1-rocwmma ./refresh-toolboxes.sh llama-vulkan-radv llama-rocm-7.1.1
``` ```
### 2.2 Running models inside the toolboxes ### 2.2 Running models inside the toolboxes
+1 -1
View File
@@ -18,7 +18,7 @@ from pathlib import Path
from typing import Dict, Iterable, List, Tuple from typing import Dict, Iterable, List, Tuple
DEFAULT_RESULTS = Path("docs") / "results.json" DEFAULT_RESULTS = Path("../docs") / "results.json"
# Matches the tolerance used in docs/assets/index2.js (MIN_TOL = 0.25) # Matches the tolerance used in docs/assets/index2.js (MIN_TOL = 0.25)
DEFAULT_TOLERANCE = 0.25 DEFAULT_TOLERANCE = 0.25
-141
View File
@@ -1,141 +0,0 @@
#!/usr/bin/env python3
import argparse
import glob
import os
import re
RESULTS_DIR_DEFAULT = "results"
# Same detection logic as your extractor
HEADER_RE = re.compile(r"^\|\s*model\s*\|", re.IGNORECASE)
SEP_RE = re.compile(r"^\|\s*-+")
LOAD_ERR = re.compile(r"failed to load model|Device memory allocation.*failed|⚠️\s*Fail", re.IGNORECASE)
HANG_ERR = re.compile(r"GPU Hang|HW Exception", re.IGNORECASE)
GENERIC_ERR = re.compile(r"error:|exit \d+|runtime error|⚠️\s*Runtime Error", re.IGNORECASE)
def parse_table(text):
lines = text.splitlines()
rows = []
header = None
col_idx = {}
for line in lines:
if HEADER_RE.search(line):
header = [c.strip().lower() for c in line.strip().strip("|").split("|")]
for idx, name in enumerate(header):
col_idx[name] = idx
continue
if header and (SEP_RE.search(line) or not line.strip()):
continue
if header and line.startswith("|"):
parts = [c.strip() for c in line.strip().strip("|").split("|")]
if len(parts) < len(header):
continue
row = {}
for name, idx in col_idx.items():
row[name] = parts[idx]
rows.append(row)
if header and line.strip() == "" and rows:
break
return rows
def detect_error(text):
if LOAD_ERR.search(text):
return True
if HANG_ERR.search(text):
return True
if GENERIC_ERR.search(text):
return True
return False
def is_non_transient_vram_issue(text):
# Do NOT delete logs with this kind of Vulkan OOM
return (
"ggml_vulkan: Device memory allocation of size" in text
and "Requested buffer size exceeds device buffer size limit" in text
)
def is_failed_run(text):
table_rows = parse_table(text)
has_pp = any(r.get("test", "").lower() == "pp512" for r in table_rows)
has_tg = any(r.get("test", "").lower() == "tg128" for r in table_rows)
if has_pp or has_tg:
return False
return detect_error(text)
def main():
ap = argparse.ArgumentParser(
description="Delete transient-failure benchmark logs in results/"
)
ap.add_argument(
"--results-dir",
default=RESULTS_DIR_DEFAULT,
help="Directory containing *.log files (default: results)",
)
ap.add_argument(
"--dry-run",
action="store_true",
help="Only print what would be deleted",
)
args = ap.parse_args()
results_dir = args.results_dir
pattern = os.path.join(results_dir, "*.log")
to_delete = []
skipped_non_transient = []
for path in sorted(glob.glob(pattern)):
try:
with open(path, errors="ignore") as f:
text = f.read()
except OSError as e:
print(f"Could not read {path}: {e}")
continue
if not is_failed_run(text):
continue
if is_non_transient_vram_issue(text):
skipped_non_transient.append(path)
continue
to_delete.append(path)
if not to_delete and not skipped_non_transient:
print("No failed logs found.")
return
if skipped_non_transient:
print("Keeping logs with non transient VRAM issues:")
for p in skipped_non_transient:
print(f" KEEP {p}")
if to_delete:
print("Deleting logs with transient failures:")
for p in to_delete:
print(f" DELETE {p}")
if not args.dry_run:
try:
os.remove(p)
except OSError as e:
print(f" Failed to delete {p}: {e}")
else:
print("No logs to delete.")
if __name__ == "__main__":
main()
-571
View File
@@ -1,571 +0,0 @@
#!/usr/bin/env python3
"""
gen_benchmarks_md.py — Generate Markdown for README + detailed benchmarks from results.json
Defaults:
- Input JSON: ../docs/results.json
- Outputs: ./README_benchmarks_section.md and ./benchmarks_generated.md
"""
from __future__ import annotations
import json
import argparse
import statistics as stats
from pathlib import Path
from collections import defaultdict
from typing import Dict, List, Tuple, Optional
# === ENV LABELS ===
ENV_LABEL: Dict[str, str] = {
# ROCm 7 RC
"rocm7_rc-rocwmma": "ROCm 7 RC + ROCWMMA + hipBLASLt",
"rocm7_rc": "ROCm 7 RC (hipBLASLt)",
"rocm7_rc-hblt0": "ROCm 7 RC (hipBLASLt OFF)",
"rocm7_rc-rocwmma-hblt0": "ROCm 7 RC + ROCWMMA (hipBLASLt OFF)",
# ROCm 6.4.4
"rocm6_4_4": "ROCm 6.4.4 (hipBLASLt)",
"rocm6_4_4-hblt0": "ROCm 6.4.4 (hipBLASLt OFF)",
"rocm6_4_4-rocwmma": "ROCm 6.4.4 + ROCWMMA (hipBLASLt)",
"rocm6_4_4-rocwmma-hblt0": "ROCm 6.4.4 + ROCWMMA (hipBLASLt OFF)",
# Vulkan
"vulkan_amdvlk": "Vulkan AMDVLK",
"vulkan_radv": "Vulkan RADV",
}
TESTS = ["pp512", "tg128"]
def md_row(values: List[str]) -> str:
return "| " + " | ".join(values) + " |"
def load_results(path: Path) -> Dict:
data = json.loads(path.read_text())
assert "runs" in data and isinstance(data["runs"], list), "results.json must have a top-level 'runs' list"
return data
def envs_present(runs: List[Dict], only_env: Optional[List[str]], include_all_envs: bool) -> List[str]:
present = {r.get("env") for r in runs if r.get("env")}
if only_env:
present = present.intersection(set(only_env))
if include_all_envs:
# Include even if not present (might appear 0 rows in tables)
envs = [e for e in ENV_LABEL.keys() if (not only_env or e in only_env)]
else:
envs = [e for e in ENV_LABEL.keys() if e in present and (not only_env or e in only_env)]
return envs
def fa_to_filter(fa: str) -> Optional[bool]:
fa = fa.lower().strip()
if fa == "on":
return True
if fa == "off":
return False
if fa == "any":
return None
raise ValueError("--fa must be on/off/any")
def margin_aware_placements(
runs: List[Dict],
envs: List[str],
test_filter: str,
fa_filter: Optional[bool]
) -> Tuple[Dict[str, Dict[str, int]], int]:
"""
Returns (placements, sample_count)
placements[env] -> {"first": n, "second": n, "third": n}
sample_count = number of model+quant comparisons considered
"""
placements = defaultdict(lambda: {"first": 0, "second": 0, "third": 0})
# group by (model, quant)
grouped = defaultdict(list)
for r in runs:
if r.get("error"):
continue
if r.get("test") != test_filter:
continue
if fa_filter is not None and r.get("fa") != fa_filter:
continue
if r.get("env") not in envs:
continue
key = (r.get("model_clean"), r.get("quant"))
grouped[key].append(r)
samples = 0
for key, entries in grouped.items():
# collate by env
env_groups = defaultdict(list)
for e in entries:
env_groups[e["env"]].append(e)
env_list = [e for e in envs if e in env_groups] # keep requested order
if len(env_list) < 2:
continue
# summarize median mean ± median err per env
summary = {}
for env in env_list:
means = [x["tps_mean"] for x in env_groups[env] if x.get("tps_mean") is not None]
errs = [x.get("tps_err", 0.0) or 0.0 for x in env_groups[env]]
if not means:
continue
m = stats.median(means)
e = stats.median(errs) if errs else 0.0
summary[env] = (m - e, m + e, m)
if len(summary) < 2:
continue
samples += 1
# rank with overlap -> ties share rank
remaining = [env for env, _ in sorted(summary.items(), key=lambda kv: kv[1][2], reverse=True)]
assigned = {}
current_rank = 1
while remaining and current_rank <= 3:
env0 = remaining[0]
low0, high0, _ = summary[env0]
tied = [env0]
for env in remaining[1:]:
low, high, _ = summary[env]
if not (low > high0 or high < low0): # overlap -> tie
tied.append(env)
for env in tied:
assigned[env] = current_rank
remaining = [e for e in remaining if e not in tied]
current_rank += 1
for env, rk in assigned.items():
if rk == 1:
placements[env]["first"] += 1
elif rk == 2:
placements[env]["second"] += 1
elif rk == 3:
placements[env]["third"] += 1
return placements, samples
def pairwise_win_counts(runs: List[Dict], envA: str, envB: str, test: str, fa_filter: Optional[bool]) -> Tuple[int, int, int, int]:
A = {}
B = {}
for r in runs:
if r.get("error") or r.get("test") != test:
continue
if fa_filter is not None and r.get("fa") != fa_filter:
continue
key = (r.get("model_clean"), r.get("quant"))
if r.get("env") == envA:
A[key] = r["tps_mean"]
elif r.get("env") == envB:
B[key] = r["tps_mean"]
winsA = winsB = ties = 0
for k in (set(A) & set(B)):
if A[k] > B[k]:
winsA += 1
elif B[k] > A[k]:
winsB += 1
else:
ties += 1
total = winsA + winsB + ties
return winsA, winsB, ties, total
def average_ranks(place_dict: Dict[str, Dict[str, int]]) -> Dict[str, Optional[float]]:
avg = {}
for env, c in place_dict.items():
total = c.get("first", 0) + c.get("second", 0) + c.get("third", 0)
if total == 0:
avg[env] = None
else:
avg[env] = round((1 * c.get("first", 0) + 2 * c.get("second", 0) + 3 * c.get("third", 0)) / total, 2)
return avg
def flash_attention_effect(runs: List[Dict], envs: List[str]) -> Dict[str, Dict[str, Dict[str, float]]]:
"""
Returns: effects[env][test] = {n_pairs, median_pct, min, max}
Based on paired model+quant runs (ON vs OFF).
"""
model_pairs = defaultdict(lambda: defaultdict(dict)) # (env,test)->(model,quant)->{fa: tps}
for r in runs:
if r.get("error") or r.get("tps_mean") is None:
continue
if r.get("test") not in TESTS:
continue
if r.get("env") not in envs:
continue
model_key = (r.get("model_clean"), r.get("quant"))
model_pairs[(r["env"], r["test"])][model_key][r.get("fa")] = r["tps_mean"]
summary = defaultdict(dict)
for (env, test), d in model_pairs.items():
deltas = []
for mk, vals in d.items():
if True in vals and False in vals and vals[False] > 0:
deltas.append((vals[True] - vals[False]) / vals[False] * 100.0)
if deltas:
summary[env][test] = {
"n_pairs": len(deltas),
"median_pct": round(stats.median(deltas), 1),
"min": round(min(deltas), 1),
"max": round(max(deltas), 1),
}
return summary
def rocwmma_effect(runs: List[Dict], pairs_to_compare: List[Tuple[str, str, str]], tests: List[str]) -> List[Tuple[str, str, str, str, int, float]]:
"""
Compare ROCWMMA ON vs OFF with same hipBLASLt state.
Returns rows of (context_label, test, env_on, env_off, n_pairs, median_delta_pct)
where delta_pct = median(ON/OFF - 1)*100 over common model+quant.
"""
rows = []
for env_on, env_off, label in pairs_to_compare:
for test in tests:
data_on = defaultdict(list)
data_off = defaultdict(list)
for r in runs:
if r.get("error") or r.get("test") != test:
continue
if r.get("env") == env_on:
data_on[(r.get("model_clean"), r.get("quant"))].append(r["tps_mean"])
elif r.get("env") == env_off:
data_off[(r.get("model_clean"), r.get("quant"))].append(r["tps_mean"])
common = sorted(set(data_on) & set(data_off))
if not common:
continue
ratios = []
for k in common:
aon = stats.median(data_on[k])
aoff = stats.median(data_off[k])
if aoff > 0:
ratios.append(aon / aoff - 1.0)
if ratios:
rows.append((label, test, env_on, env_off, len(ratios), round(100 * stats.median(ratios), 1)))
return rows
def hipblaslt_effect(runs: List[Dict], pairs_to_compare: List[Tuple[str, str, str]], tests: List[str]) -> List[Tuple[str, str, str, str, int, float]]:
"""
Compare hipBLASLt ON vs OFF with same ROCWMMA state.
Returns rows of (context_label, test, env_on, env_off, n_pairs, median_delta_pct)
where delta_pct = median(ON/OFF - 1)*100 over common model+quant.
"""
rows = []
for env_on, env_off, label in pairs_to_compare:
for test in tests:
data_on = defaultdict(list)
data_off = defaultdict(list)
for r in runs:
if r.get("error") or r.get("test") != test:
continue
if r.get("env") == env_on:
data_on[(r.get("model_clean"), r.get("quant"))].append(r["tps_mean"])
elif r.get("env") == env_off:
data_off[(r.get("model_clean"), r.get("quant"))].append(r["tps_mean"])
common = sorted(set(data_on) & set(data_off))
if not common:
continue
ratios = []
for k in common:
aon = stats.median(data_on[k])
aoff = stats.median(data_off[k])
if aoff > 0:
ratios.append(aon / aoff - 1.0)
if ratios:
rows.append((label, test, env_on, env_off, len(ratios), round(100 * stats.median(ratios), 1)))
return rows
def amdvlk_vs_radv(runs: List[Dict], fa_filter: Optional[bool]) -> List[Tuple[str, int, int, int, int]]:
rows = []
for test in TESTS:
wa, wr, ties, total = pairwise_win_counts(runs, "vulkan_amdvlk", "vulkan_radv", test, fa_filter)
rows.append((test, wa, wr, ties, total))
return rows
def winners(place_dict: Dict[str, Dict[str, int]], slot="first") -> Tuple[List[str], int]:
max_count = max((c.get(slot, 0) for c in place_dict.values()), default=0)
win_list = [env for env, c in place_dict.items() if c.get(slot, 0) == max_count and max_count > 0]
return win_list, max_count
def human_list(envs: List[str]) -> str:
return ", ".join(ENV_LABEL.get(e, e) for e in envs) if envs else ""
def build_readme_section(
envs: List[str],
pp_place: Dict[str, Dict[str, int]],
tg_place: Dict[str, Dict[str, int]],
fa_filter: Optional[bool]
) -> str:
# Winners
pp_wins, _ = winners(pp_place, "first")
tg_wins, _ = winners(tg_place, "first")
lines: List[str] = []
lines.append("## 3. Performance Benchmarks (Key Results)")
lines.append("")
lines.append("🌐 Interactive exploration of the latest benchmark runs: [Interactie Benchmark Viewer](https://kyuz0.github.io/amd-strix-halo-toolboxes/)")
lines.append("")
lines.append("Benchmarks were analysed with **error-aware ties** (mean ± σ). If two backends overlap within margins, they are treated as a tie. All placement counts below use **Flash Attention ON**.")
lines.append("")
# Placement tables
def place_table(title: str, place_dict: Dict[str, Dict[str, int]]):
lines.append(f"**{title}**")
lines.append(md_row(["Backend", "1st", "2nd", "3rd"]))
lines.append(md_row(["---", "---:", "---:", "---:"]))
order = sorted(place_dict.items(), key=lambda kv: (-kv[1].get("first", 0), -kv[1].get("second", 0), kv[0]))
for env, c in order:
lines.append(md_row([ENV_LABEL.get(env, env), str(c.get("first", 0)), str(c.get("second", 0)), str(c.get("third", 0))]))
lines.append("")
place_table("Prompt Processing (pp512)", pp_place)
place_table("Token Generation (tg128)", tg_place)
# Data-driven recommendations
def total_score(c: Dict[str, int]) -> int:
# weight 1st more than 2nd
return c.get("first", 0) * 2 + c.get("second", 0)
best_bal_score = -1
balanced: List[str] = []
for env in envs:
score = total_score(pp_place.get(env, {})) + total_score(tg_place.get(env, {}))
if score > best_bal_score:
best_bal_score = score
balanced = [env]
elif score == best_bal_score:
balanced.append(env)
lines.append("### Summary & Recommendations")
lines.append(f"- **Fastest prompt processing:** {human_list(pp_wins)} (most 1st-place finishes).")
lines.append(f"- **Fastest token generation:** {human_list(tg_wins)} (most 1st-place finishes).")
lines.append(f"- **Balanced choice:** {human_list(balanced)} (consistently near the top across PP/TG).")
lines.append("")
lines.append("> **Note (ROCm 7):** Toolboxes enable **hipBLASLt** by default. The benchmark suite also runs **hipBLASLt OFF** variants to show its impact.")
return "\n".join(lines)
def build_benchmarks_doc(
runs: List[Dict],
envs: List[str],
pp_place: Dict[str, Dict[str, int]],
tg_place: Dict[str, Dict[str, int]],
fa_filter: Optional[bool],
) -> str:
lines: List[str] = []
lines.append("# AMD Strix Halo — llama.cpp Toolboxes (Benchmarks)")
lines.append("")
lines.append("**Interactive results:** https://kyuz0.github.io/amd-strix-halo-toolboxes/")
lines.append("")
lines.append("## Table of Contents")
lines.append("- [Benchmark methodology](#benchmark-methodology)")
lines.append("- [Summary of current dataset (Flash Attention ON)](#summary-of-current-dataset-flash-attention-on)")
lines.append(" - [Placement counts](#placement-counts)")
lines.append(" - [Pairwise head-to-head wins](#pairwise-head-to-head-wins)")
lines.append(" - [Average ranks](#average-ranks)")
lines.append("- [Analyses by feature](#analyses-by-feature)")
lines.append(" - [Impact of Flash Attention](#impact-of-flash-attention)")
lines.append(" - [Impact of ROCWMMA](#impact-of-rocwmma)")
lines.append(" - [Impact of hipBLASLt](#impact-of-hipblaslt)")
lines.append(" - [Vulkan: AMDVLK vs RADV](#vulkan-amdvlk-vs-radv)")
lines.append("- [Recommendations](#recommendations)")
lines.append("- [Winner calculation](#winner-calculation)")
lines.append("")
lines.append("---")
lines.append("")
lines.append("## Benchmark methodology")
lines.append("")
lines.append("- **pp512** — prompt processing throughput (tokens/sec, prefill)")
lines.append("- **tg128** — token generation throughput (tokens/sec, interactive)")
lines.append("- Each backend tested twice per model: `-fa 0` and `-fa 1`")
lines.append("- Winners per model/test are **margin-aware**; multiple winners are possible when mean±σ overlap")
lines.append("- Built from the same llama.cpp commit for consistency")
lines.append("")
lines.append("**Backends in this dataset:** " + ", ".join(ENV_LABEL.get(e, e) for e in envs))
lines.append("")
lines.append("**ROCm 7 hipBLASLt policy:** Toolboxes ship with **hipBLASLt enabled** by default (`ROCBLAS_USE_HIPBLASLT=1`). The benchmark script also runs **hipBLASLt OFF** variants (`-hblt0`) to measure its effect.")
lines.append("")
lines.append("---")
lines.append("")
lines.append("## Summary of current dataset (Flash Attention ON)")
lines.append("")
# Placement counts
lines.append("### Placement counts")
def place_block(title: str, place_dict: Dict[str, Dict[str, int]]):
lines.append(f"**{title}**")
lines.append(md_row(["Backend", "1st", "2nd", "3rd"]))
lines.append(md_row(["---", "---:", "---:", "---:"]))
order = sorted(place_dict.items(), key=lambda kv: (-kv[1].get("first", 0), -kv[1].get("second", 0), kv[0]))
for env, c in order:
lines.append(md_row([ENV_LABEL.get(env, env), str(c.get("first", 0)), str(c.get("second", 0)), str(c.get("third", 0))]))
lines.append("")
place_block("Prompt Processing (pp512)", pp_place)
place_block("Token Generation (tg128)", tg_place)
# Pairwise wins
lines.append("### Pairwise head-to-head wins")
lines.append("For any model+quant where both backends succeeded, this counts who was faster (ties when equal).")
lines.append(md_row(["Comparison", "Test", "A wins", "B wins", "Ties", "Total"]))
lines.append(md_row(["---", "---", "---:", "---:", "---:", "---:"]))
pairs = [
("ROCm 7 RC + ROCWMMA + hipBLASLt", "Vulkan AMDVLK", "rocm7_rc-rocwmma", "vulkan_amdvlk"),
("ROCm 7 RC + ROCWMMA + hipBLASLt", "Vulkan RADV", "rocm7_rc-rocwmma", "vulkan_radv"),
("Vulkan AMDVLK", "Vulkan RADV", "vulkan_amdvlk", "vulkan_radv"),
]
for labelA, labelB, envA, envB in pairs:
for test in TESTS:
a, b, t, total = pairwise_win_counts(runs, envA, envB, test, fa_filter)
lines.append(md_row([f"{labelA} vs {labelB}", test, str(a), str(b), str(t), str(total)]))
lines.append("")
# Average ranks
lines.append("### Average ranks")
avg_pp = average_ranks(pp_place)
avg_tg = average_ranks(tg_place)
lines.append("**Prompt Processing (pp512)**")
lines.append(md_row(["Backend", "Avg Rank (↓ is better)"]))
lines.append(md_row(["---", "---:"]))
for env, val in sorted(avg_pp.items(), key=lambda kv: (kv[1] is None, kv[1] or 99)):
lines.append(md_row([ENV_LABEL.get(env, env), str(val) if val is not None else ""]))
lines.append("")
lines.append("**Token Generation (tg128)**")
lines.append(md_row(["Backend", "Avg Rank (↓ is better)"]))
lines.append(md_row(["---", "---:"]))
for env, val in sorted(avg_tg.items(), key=lambda kv: (kv[1] is None, kv[1] or 99)):
lines.append(md_row([ENV_LABEL.get(env, env), str(val) if val is not None else ""]))
lines.append("")
lines.append("---")
lines.append("")
lines.append("## Analyses by feature")
lines.append("")
# Flash Attention effect
lines.append("### Impact of Flash Attention")
fa_eff = flash_attention_effect(runs, envs)
lines.append("Median % change when **Flash Attention ON vs OFF**, paired by model+quant, per backend:")
lines.append(md_row(["Backend", "pp512 Δ% (median, min..max, n)", "tg128 Δ% (median, min..max, n)"]))
lines.append(md_row(["---", "---", "---"]))
def fmt_eff(row: Optional[Dict[str, float]]) -> str:
return f"{row['median_pct']}% ({row['min']}..{row['max']}), n={row['n_pairs']}" if row else ""
for env in envs:
row_pp = fa_eff.get(env, {}).get("pp512")
row_tg = fa_eff.get(env, {}).get("tg128")
lines.append(md_row([ENV_LABEL.get(env, env), fmt_eff(row_pp), fmt_eff(row_tg)]))
lines.append("")
# ROCWMMA effect — check both ROCm 7 and 6.4.4 families if present
lines.append("### Impact of ROCWMMA")
rocwmma_pairs = []
if "rocm7_rc-rocwmma" in envs and "rocm7_rc" in envs:
rocwmma_pairs.append(("rocm7_rc-rocwmma", "rocm7_rc", "ROCm 7 RC (hipBLASLt)"))
if "rocm7_rc-rocwmma-hblt0" in envs and "rocm7_rc-hblt0" in envs:
rocwmma_pairs.append(("rocm7_rc-rocwmma-hblt0", "rocm7_rc-hblt0", "ROCm 7 RC (hipBLASLt OFF)"))
if "rocm6_4_4-rocwmma" in envs and "rocm6_4_4" in envs:
rocwmma_pairs.append(("rocm6_4_4-rocwmma", "rocm6_4_4", "ROCm 6.4.4 (hipBLASLt)"))
if "rocm6_4_4-rocwmma-hblt0" in envs and "rocm6_4_4-hblt0" in envs:
rocwmma_pairs.append(("rocm6_4_4-rocwmma-hblt0", "rocm6_4_4-hblt0", "ROCm 6.4.4 (hipBLASLt OFF)"))
rocwmma_rows = rocwmma_effect(runs, rocwmma_pairs, TESTS)
lines.append(md_row(["Context", "Test", "Compared Envs", "Pairs", "Median Δ%"]))
lines.append(md_row(["---", "---", "---", "---:", "---:"]))
for label, test, env_on, env_off, n, delta in rocwmma_rows:
lines.append(md_row([label, test, f"{ENV_LABEL.get(env_on, env_on)} vs {ENV_LABEL.get(env_off, env_off)}", str(n), f"{delta}%"]))
lines.append("")
# hipBLASLt effect — for both ROCm 7 and 6.4.4 families
lines.append("### Impact of hipBLASLt")
hip_pairs = []
if "rocm7_rc" in envs and "rocm7_rc-hblt0" in envs:
hip_pairs.append(("rocm7_rc", "rocm7_rc-hblt0", "ROCm 7 RC (no ROCWMMA)"))
if "rocm7_rc-rocwmma" in envs and "rocm7_rc-rocwmma-hblt0" in envs:
hip_pairs.append(("rocm7_rc-rocwmma", "rocm7_rc-rocwmma-hblt0", "ROCm 7 RC + ROCWMMA"))
if "rocm6_4_4" in envs and "rocm6_4_4-hblt0" in envs:
hip_pairs.append(("rocm6_4_4", "rocm6_4_4-hblt0", "ROCm 6.4.4 (no ROCWMMA)"))
if "rocm6_4_4-rocwmma" in envs and "rocm6_4_4-rocwmma-hblt0" in envs:
hip_pairs.append(("rocm6_4_4-rocwmma", "rocm6_4_4-rocwmma-hblt0", "ROCm 6.4.4 + ROCWMMA"))
hip_rows = hipblaslt_effect(runs, hip_pairs, TESTS)
lines.append(md_row(["Context", "Test", "Compared Envs", "Pairs", "Median Δ%"]))
lines.append(md_row(["---", "---", "---", "---:", "---:"]))
for label, test, env_on, env_off, n, delta in hip_rows:
lines.append(md_row([label, test, f"{ENV_LABEL.get(env_on, env_on)} vs {ENV_LABEL.get(env_off, env_off)}", str(n), f"{delta}%"]))
lines.append("")
# AMDVLK vs RADV
lines.append("### Vulkan: AMDVLK vs RADV")
lines.append("Head-to-head wins with selected Flash Attention filter:")
lines.append(md_row(["Test", "AMDVLK wins", "RADV wins", "Ties", "Total"]))
lines.append(md_row(["---", "---:", "---:", "---:", "---:"]))
for test, wa, wr, t, total in amdvlk_vs_radv(runs, fa_filter):
lines.append(md_row([test, str(wa), str(wr), str(t), str(total)]))
lines.append("")
lines.append("---")
lines.append("")
lines.append("## Recommendations")
pp_wins, _ = winners(pp_place, "first")
tg_wins, _ = winners(tg_place, "first")
lines.append(f"- **Fastest prompt processing:** {human_list(pp_wins)} (most 1st-place finishes with selected Flash Attention filter).")
lines.append(f"- **Fastest token generation:** {human_list(tg_wins)} (most 1st-place finishes with selected Flash Attention filter).")
# Balanced: highest (2*first + second) across PP+TG
def score(c: Dict[str, int]) -> int:
return c.get("first", 0) * 2 + c.get("second", 0)
best_bal = -1
balanced: List[str] = []
for env in envs:
s = score(pp_place.get(env, {})) + score(tg_place.get(env, {}))
if s > best_bal:
best_bal = s
balanced = [env]
elif s == best_bal:
balanced.append(env)
lines.append(f"- **Balanced choice:** {human_list(balanced)} (consistently near the top across PP/TG).")
lines.append("")
lines.append("---")
lines.append("")
lines.append("## Winner calculation")
lines.append("A backend is counted as a winner if its mean throughput is within the best backends pooled ± error margin for that model/test type. This treats results within measurement noise as ties instead of false losses.")
return "\n".join(lines)
def main():
ap = argparse.ArgumentParser()
ap.add_argument("--file", type=Path, default=Path("../docs/results.json"),
help="Path to results.json (default: ../docs/results.json)")
ap.add_argument("--out-readme", type=Path, default=Path("./README_benchmarks_section.md"),
help="Path to write README section Markdown (default: ./README_benchmarks_section.md)")
ap.add_argument("--out-bench", type=Path, default=Path("./benchmarks_generated.md"),
help="Path to write detailed benchmarks Markdown (default: ./benchmarks_generated.md)")
ap.add_argument("--fa", choices=["on", "off", "any"], default="on",
help="Flash Attention filter (default: on)")
ap.add_argument("--include-all-envs", action="store_true",
help="Include envs even if not present in results.json")
ap.add_argument("--only-env", action="append",
help="Restrict analysis to specific env keys (repeatable)")
args = ap.parse_args()
data = load_results(args.file)
runs: List[Dict] = data["runs"]
fa_filter = fa_to_filter(args.fa)
envs = envs_present(runs, args.only_env, args.include_all_envs)
pp_place, _ = margin_aware_placements(runs, envs, "pp512", fa_filter)
tg_place, _ = margin_aware_placements(runs, envs, "tg128", fa_filter)
readme_md = build_readme_section(envs, pp_place, tg_place, fa_filter)
args.out_readme.write_text(readme_md)
bench_md = build_benchmarks_doc(runs, envs, pp_place, tg_place, fa_filter)
args.out_bench.write_text(bench_md)
print(f"Wrote:\n - {args.out_readme}\n - {args.out_bench}")
if __name__ == "__main__":
main()
@@ -44,7 +44,8 @@ LONGCTX_RE = re.compile(r"longctx(\d+)", re.IGNORECASE)
ENV_CANON = { ENV_CANON = {
"rocm7_1_1": "rocm7.1.1", "rocm7_1_1": "rocm7.1.1",
"rocm7_alpha": "rocm-7alpha", "rocm7_alpha": "rocm7-nightlies",
"rocm-7alpha": "rocm7-nightlies",
} }
def clean_model_name(raw): def clean_model_name(raw):
-120
View File
@@ -1,120 +0,0 @@
#!/usr/bin/env python3
import re, glob, os
# This script parses llama-bench logs in 'results/' to produce
# Markdown tables for pp512 (prompt processing) and tg128 (text generation).
# Regex patterns to extract tokens/sec rows
PP_RE = re.compile(r"\|[^|]*\|[^|]*\|[^|]*\|[^|]*\|[^|]*\|\s*pp512\s*\|\s*([\d.]+)\s*±\s*([\d.]+)")
TG_RE = re.compile(r"\|[^|]*\|[^|]*\|[^|]*\|[^|]*\|[^|]*\|\s*tg128\s*\|\s*([\d.]+)\s*±\s*([\d.]+)")
# Patterns to classify errors
LOAD_ERR = re.compile(r"failed to load model|Device memory allocation.*failed", re.IGNORECASE)
HANG_ERR = re.compile(r"GPU Hang|HW Exception", re.IGNORECASE)
GENERIC_ERR = re.compile(r"error:|exit \d+", re.IGNORECASE)
# Env ordering
ENV_ORDER = ["vulkan_radv","vulkan_amdvlk","rocm6_4_2","rocm7_beta","rocm7_rc"]
data = {}
# Utility to clean model names
def clean_name(raw):
return re.sub(r"-000\d+-of-000\d+", "", raw)
# Scan logs
glob_pattern = os.path.join("results", "*.log")
for path in sorted(glob.glob(glob_pattern)):
# Fix: use rsplit, not rssplit
base = os.path.basename(path).rsplit('.log',1)[0]
if '__' not in base:
continue
model_raw, env = base.split('__',1)
model = clean_name(model_raw)
text = open(path, errors='ignore').read()
# Determine error type
if LOAD_ERR.search(text):
err_type = 'load'
elif HANG_ERR.search(text):
err_type = 'hang'
elif GENERIC_ERR.search(text) and not (PP_RE.search(text) and TG_RE.search(text)):
err_type = 'runtime'
else:
err_type = None
# Extract performance if no load error
pp_match = PP_RE.search(text) if err_type is None else None
tg_match = TG_RE.search(text) if err_type is None else None
for key, match in [('pp512', pp_match), ('tg128', tg_match)]:
cell = {
'mean': match.group(1) if match else None,
'std': match.group(2) if match else None,
'error': err_type is not None,
'etype': err_type
}
data.setdefault(model, {}).setdefault(key, {})[env] = cell
# Select winner
def pick_winner(env_data):
scores = {e: float(d['mean']) for e,d in env_data.items() if not d['error'] and d['mean']}
if not scores:
return ''
best = max(scores, key=scores.get)
others = [v for k,v in scores.items() if k!=best]
tag = f"🏆 **{best}**"
if others:
gain = (scores[best]/max(others)-1)*100
tag += f" (+{gain:.0f}%)"
return tag
# Render table with distinct error messages
def render_table(test_label, display_name):
print(f"### {display_name} — tokens/second\n")
header = ['Model'] + [e.replace('_',' ').title() for e in ENV_ORDER] + ['Winner']
print("| " + " | ".join(header) + " |")
print("|" + "|".join(['---']*len(header)) + "|")
for model in sorted(data, key=lambda s: s.lower()):
row = [f"**{model}**"]
env_data = data[model].get(test_label, {})
for env in ENV_ORDER:
d = env_data.get(env)
if not d:
cell = ''
elif d['error']:
et = d['etype']
if et=='load':
cell = '⚠️ Load Error'
elif et=='hang':
cell = '⚠️ GPU Hang'
else:
cell = '⚠️ Runtime Error'
else:
cell = f"{float(d['mean']):.2f} ± {float(d['std']):.2f}"
row.append(cell)
row.append(pick_winner(env_data))
print("| " + " | ".join(row) + " |")
print()
# Output tables
render_table('pp512','Prompt Processing (pp512)')
render_table('tg128','Text Generation (tg128)')
# Summary of failures by type
fail_lines = []
for model in sorted(data, key=lambda s: s.lower()):
for test_label, envs in data[model].items():
for env,d in envs.items():
if d['error']:
et = d['etype'] or 'unknown'
desc = {
'load':'failed to load',
'hang':'GPU hang',
'runtime':'runtime error',
}.get(et, 'error')
fail_lines.append(f"- **{model}** [{test_label}] on *{env}*: {desc}")
if fail_lines:
print("## Failed Runs\n")
print("\n".join(fail_lines))
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.74 ± 2.03 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.74 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 330.13 ± 0.85 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.73 ± 0.01 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 333.45 ± 1.70 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.33 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 336.20 ± 2.04 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.77 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.36 ± 0.16 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.68 ± 0.00 |
build: 2aa45ef9e (7423)
@@ -0,0 +1,10 @@
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | pp512 | 323.91 ± 1.10 |
| glm4moe 106B.A12B Q4_K - Medium | 68.01 GiB | 110.47 B | ROCm | 99 | 1 | 0 | tg128 | 21.68 ± 0.00 |
build: 2aa45ef9e (7423)

Some files were not shown because too many files have changed in this diff Show More