From 73be068e85832bcf12db2a07712ffa6e21fbf254 Mon Sep 17 00:00:00 2001 From: Donato Capitella Date: Sun, 26 Apr 2026 16:25:44 +0100 Subject: [PATCH] feat: upgrade ROCm toolboxes to 7.2.2 and update documentation and CI configurations --- .github/workflows/build_and_publish.yml | 2 +- .github/workflows/prune-old-toolboxes.yml | 2 +- AGENTS.md | 2 +- README.md | 22 +++++++------------ benchmark/run_benchmarks.sh | 4 ++-- refresh-toolboxes.sh | 4 ++-- ...rfile.rocm-7.2.1 => Dockerfile.rocm-7.2.2} | 16 +++++++------- ...-pr21344 => Dockerfile.rocm-7.2.2-pr21344} | 18 +++++++-------- 8 files changed, 32 insertions(+), 38 deletions(-) rename toolboxes/{Dockerfile.rocm-7.2.1 => Dockerfile.rocm-7.2.2} (93%) rename toolboxes/{Dockerfile.rocm-7.2.1-pr21344 => Dockerfile.rocm-7.2.2-pr21344} (92%) diff --git a/.github/workflows/build_and_publish.yml b/.github/workflows/build_and_publish.yml index 1ded5e2..eb9647a 100644 --- a/.github/workflows/build_and_publish.yml +++ b/.github/workflows/build_and_publish.yml @@ -28,7 +28,7 @@ jobs: IN='${{ inputs.backends }}' if [[ "$IN" == "all" || -z "$IN" ]]; then - JSON='["rocm-6.4.4","rocm-7.2.1","rocm-7.2.1-pr21344","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]' + JSON='["rocm-6.4.4","rocm-7.2.2","rocm-7.2.2-pr21344","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]' else # Remove spaces and build JSON array from comma list IN_CLEAN=$(echo "$IN" | tr -d '[:space:]') diff --git a/.github/workflows/prune-old-toolboxes.yml b/.github/workflows/prune-old-toolboxes.yml index 22bb3b4..78030d4 100644 --- a/.github/workflows/prune-old-toolboxes.yml +++ b/.github/workflows/prune-old-toolboxes.yml @@ -44,7 +44,7 @@ jobs: run: | IN='${{ github.event.inputs.backends }}' if [[ "$IN" == "all" || -z "$IN" ]]; then - JSON='["rocm-6.4.2","rocm-6.4.3","rocm-6.4.4","rocm-7.1.1","rocm-7.2","rocm-7.2.1","rocm-7.2.1-pr21344","rocm-7beta","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]' + JSON='["rocm-6.4.2","rocm-6.4.3","rocm-6.4.4","rocm-7.1.1","rocm-7.2","rocm-7.2.1","rocm-7.2.1-pr21344","rocm-7.2.2","rocm-7.2.2-pr21344","rocm-7beta","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]' else IN_CLEAN=$(echo "$IN" | tr -d '[:space:]') JSON='["'${IN_CLEAN//,/\",\"}'"]' diff --git a/AGENTS.md b/AGENTS.md index 89045a3..a2b7bda 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ * **Hardware / Drivers**: AMD "Strix Halo" APUs (Gfx1151). Implementations depend on ROCm (v6.4.4, v7.x) and Vulkan (Mesa RADV, AMDVLK). ## Repository Structure Overview -* `/toolboxes/`: Dockerfiles used to build the container images (e.g., `rocm-6.4.4`, `rocm-7.2.1`, `vulkan-radv`). These often use multi-stage builds to compile Llama.cpp and extract standalone binaries. +* `/toolboxes/`: Dockerfiles used to build the container images (e.g., `rocm-6.4.4`, `rocm-7.2.2`, `vulkan-radv`). These often use multi-stage builds to compile Llama.cpp and extract standalone binaries. * `/benchmark/`: Shell scripts and Python utilities (like `generate_results_json.py`) to systematically test Llama.cpp throughput, latency, and RPC performance. * `/docs/`: Markdown documents, along with HTML/CSS/JS (e.g., `index.html`, `assets/`) for the GitHub Pages website (`strix-halo-toolboxes.com`), plus interactive benchmark viewers and documentation on VRAM estimation. * `/scripts/`: Python utilities, including `run_distributed_llama.py` for distributed inference across nodes. diff --git a/README.md b/README.md index 8dcba5f..9f5b630 100644 --- a/README.md +++ b/README.md @@ -34,24 +34,18 @@ This is a hobby project maintained in my spare time. If you find these toolboxes ## Stable Configuration - **OS**: Fedora 42/43 -- **Linux Kernel**: 6.18.6-200 +- **Linux Kernel**: 6.18.9-200.fc43.x86_64 - **Linux Firmware**: 20260110 This is currently the most stable setup. Kernels older than 6.18.4 have a bug that causes stability issues on gfx1151 and should be avoided. Also, **do NOT use `linux-firmware-20251125`.** It breaks ROCm support on Strix Halo (instability/crashes). > ⚠️ **Important**: See [Host Configuration](#host-configuration) for critical kernel parameters. -## ROCm 7 Performance Regression Workaround - -The performance regression previously observed in ROCm 7+ builds (compared to ROCm 6.4.4) has been **resolved in the toolboxes** via a workaround. - -The issue was caused by a compiler regression (llvm/llvm-project#147700) affecting loop unrolling thresholds. We have applied the workaround (`-mllvm --amdgpu-unroll-threshold-local=600`) in the latest toolbox builds, restoring full performance. - -This workaround will be removed once the upstream fix lands. For details, see the issue: [kyuz0/amd-strix-halo-toolboxes#45](https://github.com/kyuz0/amd-strix-halo-toolboxes/issues/45) - - ## Supported Toolboxes +> [!WARNING] +> Current `rocm7-nightlies` builds have a bug that caps memory allocation to 64GB. If you need larger models, prefer stable builds like `rocm-7.2.2` (performance is similar). Track the issue here: https://github.com/ROCm/TheRock/issues/4645 + You can check the containers on DockerHub: [kyuz0/amd-strix-halo-toolboxes](https://hub.docker.com/r/kyuz0/amd-strix-halo-toolboxes/tags). | Container Tag | Backend/Stack | Purpose / Notes | @@ -59,7 +53,7 @@ You can check the containers on DockerHub: [kyuz0/amd-strix-halo-toolboxes](http | `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. | | `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. | | `rocm-6.4.4` | ROCm 6.4.4 (Fedora 43) | Latest stable 6.x build. Uses Fedora 43 packages with backported patch for **kernel 6.18.4+** support. | -| `rocm-7.2.1` | ROCm 7.2.1 | Latest stable 7.x build. Includes patch for **kernel 6.18.4+** support. | +| `rocm-7.2.2` | ROCm 7.2.2 | Latest stable 7.x build. Includes patch for **kernel 6.18.4+** support. | | `rocm7-nightlies` | ROCm 7 Nightly | Tracks nightly builds. Includes patch for **kernel 6.18.4+** support. | > These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated. Legacy images (`rocm-6.4.2`, `rocm-6.4.3`, `rocm-7.1.1`) are excluded from this list. @@ -79,12 +73,12 @@ toolbox enter llama-vulkan-radv **Option B: ROCm (Recommended for Performance)** ```sh -toolbox create llama-rocm-7.2.1 \ - --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1 \ +toolbox create llama-rocm-7.2.2 \ + --image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2 \ -- --device /dev/dri --device /dev/kfd \ --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined -toolbox enter llama-rocm-7.2 +toolbox enter llama-rocm-7.2.2 ``` ### 2. Check GPU Access diff --git a/benchmark/run_benchmarks.sh b/benchmark/run_benchmarks.sh index 42bc29e..24f31d5 100755 --- a/benchmark/run_benchmarks.sh +++ b/benchmark/run_benchmarks.sh @@ -62,8 +62,8 @@ echo declare -A CMDS=( [rocm6_4_4]="toolbox run -c llama-rocm-6.4.4 -- /usr/local/bin/llama-bench" - [rocm-7_2_1]="toolbox run -c llama-rocm-7.2.1 -- /usr/local/bin/llama-bench" - [rocm-7_2_1-pr21344]="toolbox run -c llama-rocm-7.2.1-pr21344 -- /usr/local/bin/llama-bench" + [rocm-7_2_2]="toolbox run -c llama-rocm-7.2.2 -- /usr/local/bin/llama-bench" + [rocm-7_2_2-pr21344]="toolbox run -c llama-rocm-7.2.2-pr21344 -- /usr/local/bin/llama-bench" [rocm7-nightlies]="toolbox run -c llama-rocm7-nightlies -- /usr/local/bin/llama-bench" [vulkan_amdvlk]="toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench" [vulkan_radv]="toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench" diff --git a/refresh-toolboxes.sh b/refresh-toolboxes.sh index 936f141..fe1ced9 100755 --- a/refresh-toolboxes.sh +++ b/refresh-toolboxes.sh @@ -8,8 +8,8 @@ declare -A TOOLBOXES TOOLBOXES["llama-vulkan-amdvlk"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-amdvlk --device /dev/dri --group-add video --security-opt seccomp=unconfined" TOOLBOXES["llama-vulkan-radv"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv --device /dev/dri --group-add video --security-opt seccomp=unconfined" TOOLBOXES["llama-rocm-6.4.4"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" -TOOLBOXES["llama-rocm-7.2.1"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" -TOOLBOXES["llama-rocm-7.2.1-pr21344"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1-pr21344 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" +TOOLBOXES["llama-rocm-7.2.2"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" +TOOLBOXES["llama-rocm-7.2.2-pr21344"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2-pr21344 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" TOOLBOXES["llama-rocm7-nightlies"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm7-nightlies --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined" function usage() { diff --git a/toolboxes/Dockerfile.rocm-7.2.1 b/toolboxes/Dockerfile.rocm-7.2.2 similarity index 93% rename from toolboxes/Dockerfile.rocm-7.2.1 rename to toolboxes/Dockerfile.rocm-7.2.2 index 9f4902c..a5d88ed 100644 --- a/toolboxes/Dockerfile.rocm-7.2.1 +++ b/toolboxes/Dockerfile.rocm-7.2.2 @@ -1,12 +1,12 @@ # build stage FROM registry.fedoraproject.org/fedora:43 AS builder -# rocm 7.2.1 repo +# rocm 7.2.2 repo RUN <<'EOF' tee /etc/yum.repos.d/rocm.repo <