feat: upgrade ROCm toolboxes to 7.2.2 and update documentation and CI configurations
This commit is contained in:
@@ -28,7 +28,7 @@ jobs:
|
|||||||
IN='${{ inputs.backends }}'
|
IN='${{ inputs.backends }}'
|
||||||
|
|
||||||
if [[ "$IN" == "all" || -z "$IN" ]]; then
|
if [[ "$IN" == "all" || -z "$IN" ]]; then
|
||||||
JSON='["rocm-6.4.4","rocm-7.2.1","rocm-7.2.1-pr21344","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
|
JSON='["rocm-6.4.4","rocm-7.2.2","rocm-7.2.2-pr21344","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
|
||||||
else
|
else
|
||||||
# Remove spaces and build JSON array from comma list
|
# Remove spaces and build JSON array from comma list
|
||||||
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
|
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
|
||||||
|
|||||||
@@ -44,7 +44,7 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
IN='${{ github.event.inputs.backends }}'
|
IN='${{ github.event.inputs.backends }}'
|
||||||
if [[ "$IN" == "all" || -z "$IN" ]]; then
|
if [[ "$IN" == "all" || -z "$IN" ]]; then
|
||||||
JSON='["rocm-6.4.2","rocm-6.4.3","rocm-6.4.4","rocm-7.1.1","rocm-7.2","rocm-7.2.1","rocm-7.2.1-pr21344","rocm-7beta","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
|
JSON='["rocm-6.4.2","rocm-6.4.3","rocm-6.4.4","rocm-7.1.1","rocm-7.2","rocm-7.2.1","rocm-7.2.1-pr21344","rocm-7.2.2","rocm-7.2.2-pr21344","rocm-7beta","rocm7-nightlies","vulkan-amdvlk","vulkan-radv"]'
|
||||||
else
|
else
|
||||||
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
|
IN_CLEAN=$(echo "$IN" | tr -d '[:space:]')
|
||||||
JSON='["'${IN_CLEAN//,/\",\"}'"]'
|
JSON='["'${IN_CLEAN//,/\",\"}'"]'
|
||||||
|
|||||||
@@ -8,7 +8,7 @@
|
|||||||
* **Hardware / Drivers**: AMD "Strix Halo" APUs (Gfx1151). Implementations depend on ROCm (v6.4.4, v7.x) and Vulkan (Mesa RADV, AMDVLK).
|
* **Hardware / Drivers**: AMD "Strix Halo" APUs (Gfx1151). Implementations depend on ROCm (v6.4.4, v7.x) and Vulkan (Mesa RADV, AMDVLK).
|
||||||
|
|
||||||
## Repository Structure Overview
|
## Repository Structure Overview
|
||||||
* `/toolboxes/`: Dockerfiles used to build the container images (e.g., `rocm-6.4.4`, `rocm-7.2.1`, `vulkan-radv`). These often use multi-stage builds to compile Llama.cpp and extract standalone binaries.
|
* `/toolboxes/`: Dockerfiles used to build the container images (e.g., `rocm-6.4.4`, `rocm-7.2.2`, `vulkan-radv`). These often use multi-stage builds to compile Llama.cpp and extract standalone binaries.
|
||||||
* `/benchmark/`: Shell scripts and Python utilities (like `generate_results_json.py`) to systematically test Llama.cpp throughput, latency, and RPC performance.
|
* `/benchmark/`: Shell scripts and Python utilities (like `generate_results_json.py`) to systematically test Llama.cpp throughput, latency, and RPC performance.
|
||||||
* `/docs/`: Markdown documents, along with HTML/CSS/JS (e.g., `index.html`, `assets/`) for the GitHub Pages website (`strix-halo-toolboxes.com`), plus interactive benchmark viewers and documentation on VRAM estimation.
|
* `/docs/`: Markdown documents, along with HTML/CSS/JS (e.g., `index.html`, `assets/`) for the GitHub Pages website (`strix-halo-toolboxes.com`), plus interactive benchmark viewers and documentation on VRAM estimation.
|
||||||
* `/scripts/`: Python utilities, including `run_distributed_llama.py` for distributed inference across nodes.
|
* `/scripts/`: Python utilities, including `run_distributed_llama.py` for distributed inference across nodes.
|
||||||
|
|||||||
@@ -34,24 +34,18 @@ This is a hobby project maintained in my spare time. If you find these toolboxes
|
|||||||
## Stable Configuration
|
## Stable Configuration
|
||||||
|
|
||||||
- **OS**: Fedora 42/43
|
- **OS**: Fedora 42/43
|
||||||
- **Linux Kernel**: 6.18.6-200
|
- **Linux Kernel**: 6.18.9-200.fc43.x86_64
|
||||||
- **Linux Firmware**: 20260110
|
- **Linux Firmware**: 20260110
|
||||||
|
|
||||||
This is currently the most stable setup. Kernels older than 6.18.4 have a bug that causes stability issues on gfx1151 and should be avoided. Also, **do NOT use `linux-firmware-20251125`.** It breaks ROCm support on Strix Halo (instability/crashes).
|
This is currently the most stable setup. Kernels older than 6.18.4 have a bug that causes stability issues on gfx1151 and should be avoided. Also, **do NOT use `linux-firmware-20251125`.** It breaks ROCm support on Strix Halo (instability/crashes).
|
||||||
|
|
||||||
> ⚠️ **Important**: See [Host Configuration](#host-configuration) for critical kernel parameters.
|
> ⚠️ **Important**: See [Host Configuration](#host-configuration) for critical kernel parameters.
|
||||||
|
|
||||||
## ROCm 7 Performance Regression Workaround
|
|
||||||
|
|
||||||
The performance regression previously observed in ROCm 7+ builds (compared to ROCm 6.4.4) has been **resolved in the toolboxes** via a workaround.
|
|
||||||
|
|
||||||
The issue was caused by a compiler regression (llvm/llvm-project#147700) affecting loop unrolling thresholds. We have applied the workaround (`-mllvm --amdgpu-unroll-threshold-local=600`) in the latest toolbox builds, restoring full performance.
|
|
||||||
|
|
||||||
This workaround will be removed once the upstream fix lands. For details, see the issue: [kyuz0/amd-strix-halo-toolboxes#45](https://github.com/kyuz0/amd-strix-halo-toolboxes/issues/45)
|
|
||||||
|
|
||||||
|
|
||||||
## Supported Toolboxes
|
## Supported Toolboxes
|
||||||
|
|
||||||
|
> [!WARNING]
|
||||||
|
> Current `rocm7-nightlies` builds have a bug that caps memory allocation to 64GB. If you need larger models, prefer stable builds like `rocm-7.2.2` (performance is similar). Track the issue here: https://github.com/ROCm/TheRock/issues/4645
|
||||||
|
|
||||||
You can check the containers on DockerHub: [kyuz0/amd-strix-halo-toolboxes](https://hub.docker.com/r/kyuz0/amd-strix-halo-toolboxes/tags).
|
You can check the containers on DockerHub: [kyuz0/amd-strix-halo-toolboxes](https://hub.docker.com/r/kyuz0/amd-strix-halo-toolboxes/tags).
|
||||||
|
|
||||||
| Container Tag | Backend/Stack | Purpose / Notes |
|
| Container Tag | Backend/Stack | Purpose / Notes |
|
||||||
@@ -59,7 +53,7 @@ You can check the containers on DockerHub: [kyuz0/amd-strix-halo-toolboxes](http
|
|||||||
| `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. |
|
| `vulkan-amdvlk` | Vulkan (AMDVLK) | Fastest backend—AMD open-source driver. ≤2 GiB single buffer allocation limit, some large models won't load. |
|
||||||
| `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. |
|
| `vulkan-radv` | Vulkan (Mesa RADV) | Most stable and compatible. Recommended for most users and all models. |
|
||||||
| `rocm-6.4.4` | ROCm 6.4.4 (Fedora 43) | Latest stable 6.x build. Uses Fedora 43 packages with backported patch for **kernel 6.18.4+** support. |
|
| `rocm-6.4.4` | ROCm 6.4.4 (Fedora 43) | Latest stable 6.x build. Uses Fedora 43 packages with backported patch for **kernel 6.18.4+** support. |
|
||||||
| `rocm-7.2.1` | ROCm 7.2.1 | Latest stable 7.x build. Includes patch for **kernel 6.18.4+** support. |
|
| `rocm-7.2.2` | ROCm 7.2.2 | Latest stable 7.x build. Includes patch for **kernel 6.18.4+** support. |
|
||||||
| `rocm7-nightlies` | ROCm 7 Nightly | Tracks nightly builds. Includes patch for **kernel 6.18.4+** support. |
|
| `rocm7-nightlies` | ROCm 7 Nightly | Tracks nightly builds. Includes patch for **kernel 6.18.4+** support. |
|
||||||
|
|
||||||
> These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated. Legacy images (`rocm-6.4.2`, `rocm-6.4.3`, `rocm-7.1.1`) are excluded from this list.
|
> These containers are **automatically** rebuilt whenever the Llama.cpp master branch is updated. Legacy images (`rocm-6.4.2`, `rocm-6.4.3`, `rocm-7.1.1`) are excluded from this list.
|
||||||
@@ -79,12 +73,12 @@ toolbox enter llama-vulkan-radv
|
|||||||
|
|
||||||
**Option B: ROCm (Recommended for Performance)**
|
**Option B: ROCm (Recommended for Performance)**
|
||||||
```sh
|
```sh
|
||||||
toolbox create llama-rocm-7.2.1 \
|
toolbox create llama-rocm-7.2.2 \
|
||||||
--image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1 \
|
--image docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2 \
|
||||||
-- --device /dev/dri --device /dev/kfd \
|
-- --device /dev/dri --device /dev/kfd \
|
||||||
--group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
|
--group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined
|
||||||
|
|
||||||
toolbox enter llama-rocm-7.2
|
toolbox enter llama-rocm-7.2.2
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Check GPU Access
|
### 2. Check GPU Access
|
||||||
|
|||||||
@@ -62,8 +62,8 @@ echo
|
|||||||
|
|
||||||
declare -A CMDS=(
|
declare -A CMDS=(
|
||||||
[rocm6_4_4]="toolbox run -c llama-rocm-6.4.4 -- /usr/local/bin/llama-bench"
|
[rocm6_4_4]="toolbox run -c llama-rocm-6.4.4 -- /usr/local/bin/llama-bench"
|
||||||
[rocm-7_2_1]="toolbox run -c llama-rocm-7.2.1 -- /usr/local/bin/llama-bench"
|
[rocm-7_2_2]="toolbox run -c llama-rocm-7.2.2 -- /usr/local/bin/llama-bench"
|
||||||
[rocm-7_2_1-pr21344]="toolbox run -c llama-rocm-7.2.1-pr21344 -- /usr/local/bin/llama-bench"
|
[rocm-7_2_2-pr21344]="toolbox run -c llama-rocm-7.2.2-pr21344 -- /usr/local/bin/llama-bench"
|
||||||
[rocm7-nightlies]="toolbox run -c llama-rocm7-nightlies -- /usr/local/bin/llama-bench"
|
[rocm7-nightlies]="toolbox run -c llama-rocm7-nightlies -- /usr/local/bin/llama-bench"
|
||||||
[vulkan_amdvlk]="toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench"
|
[vulkan_amdvlk]="toolbox run -c llama-vulkan-amdvlk -- /usr/sbin/llama-bench"
|
||||||
[vulkan_radv]="toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench"
|
[vulkan_radv]="toolbox run -c llama-vulkan-radv -- /usr/sbin/llama-bench"
|
||||||
|
|||||||
@@ -8,8 +8,8 @@ declare -A TOOLBOXES
|
|||||||
TOOLBOXES["llama-vulkan-amdvlk"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-amdvlk --device /dev/dri --group-add video --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-vulkan-amdvlk"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-amdvlk --device /dev/dri --group-add video --security-opt seccomp=unconfined"
|
||||||
TOOLBOXES["llama-vulkan-radv"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv --device /dev/dri --group-add video --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-vulkan-radv"]="docker.io/kyuz0/amd-strix-halo-toolboxes:vulkan-radv --device /dev/dri --group-add video --security-opt seccomp=unconfined"
|
||||||
TOOLBOXES["llama-rocm-6.4.4"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-rocm-6.4.4"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-6.4.4 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
||||||
TOOLBOXES["llama-rocm-7.2.1"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-rocm-7.2.2"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
||||||
TOOLBOXES["llama-rocm-7.2.1-pr21344"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.1-pr21344 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-rocm-7.2.2-pr21344"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm-7.2.2-pr21344 --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
||||||
TOOLBOXES["llama-rocm7-nightlies"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm7-nightlies --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
TOOLBOXES["llama-rocm7-nightlies"]="docker.io/kyuz0/amd-strix-halo-toolboxes:rocm7-nightlies --device /dev/dri --device /dev/kfd --group-add video --group-add render --group-add sudo --security-opt seccomp=unconfined"
|
||||||
|
|
||||||
function usage() {
|
function usage() {
|
||||||
|
|||||||
@@ -1,12 +1,12 @@
|
|||||||
# build stage
|
# build stage
|
||||||
FROM registry.fedoraproject.org/fedora:43 AS builder
|
FROM registry.fedoraproject.org/fedora:43 AS builder
|
||||||
|
|
||||||
# rocm 7.2.1 repo
|
# rocm 7.2.2 repo
|
||||||
RUN <<'EOF'
|
RUN <<'EOF'
|
||||||
tee /etc/yum.repos.d/rocm.repo <<REPO
|
tee /etc/yum.repos.d/rocm.repo <<REPO
|
||||||
[ROCm-7.2.1]
|
[ROCm-7.2.2]
|
||||||
name=ROCm7.2.1
|
name=ROCm7.2.2
|
||||||
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.1/main
|
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.2/main
|
||||||
enabled=1
|
enabled=1
|
||||||
priority=50
|
priority=50
|
||||||
gpgcheck=1
|
gpgcheck=1
|
||||||
@@ -69,12 +69,12 @@ RUN chmod +x /usr/local/bin/gguf-vram-estimator.py
|
|||||||
# runtime stage
|
# runtime stage
|
||||||
FROM registry.fedoraproject.org/fedora-minimal:43
|
FROM registry.fedoraproject.org/fedora-minimal:43
|
||||||
|
|
||||||
# rocm 7.2.1 repo
|
# rocm 7.2.2 repo
|
||||||
RUN <<'EOF'
|
RUN <<'EOF'
|
||||||
tee /etc/yum.repos.d/rocm.repo <<REPO
|
tee /etc/yum.repos.d/rocm.repo <<REPO
|
||||||
[ROCm-7.2.1]
|
[ROCm-7.2.2]
|
||||||
name=ROCm7.2.1
|
name=ROCm7.2.2
|
||||||
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.1/main
|
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.2/main
|
||||||
enabled=1
|
enabled=1
|
||||||
priority=50
|
priority=50
|
||||||
gpgcheck=1
|
gpgcheck=1
|
||||||
@@ -1,14 +1,14 @@
|
|||||||
# build stage
|
# build stage
|
||||||
# Based on Dockerfile.rocm-7.2.1, but clones pedapudi/llama.cpp@gfx1151-opt
|
# Based on Dockerfile.rocm-7.2.2, but clones pedapudi/llama.cpp@gfx1151-opt
|
||||||
# (PR #21344: gfx1151 nwarps, tile sizing to curb VGPR pressure)
|
# (PR #21344: gfx1151 nwarps, tile sizing to curb VGPR pressure)
|
||||||
FROM registry.fedoraproject.org/fedora:43 AS builder
|
FROM registry.fedoraproject.org/fedora:43 AS builder
|
||||||
|
|
||||||
# rocm 7.2.1 repo
|
# rocm 7.2.2 repo
|
||||||
RUN <<'EOF'
|
RUN <<'EOF'
|
||||||
tee /etc/yum.repos.d/rocm.repo <<REPO
|
tee /etc/yum.repos.d/rocm.repo <<REPO
|
||||||
[ROCm-7.2.1]
|
[ROCm-7.2.2]
|
||||||
name=ROCm7.2.1
|
name=ROCm7.2.2
|
||||||
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.1/main
|
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.2/main
|
||||||
enabled=1
|
enabled=1
|
||||||
priority=50
|
priority=50
|
||||||
gpgcheck=1
|
gpgcheck=1
|
||||||
@@ -78,12 +78,12 @@ RUN chmod +x /usr/local/bin/gguf-vram-estimator.py
|
|||||||
# runtime stage
|
# runtime stage
|
||||||
FROM registry.fedoraproject.org/fedora-minimal:43
|
FROM registry.fedoraproject.org/fedora-minimal:43
|
||||||
|
|
||||||
# rocm 7.2.1 repo
|
# rocm 7.2.2 repo
|
||||||
RUN <<'EOF'
|
RUN <<'EOF'
|
||||||
tee /etc/yum.repos.d/rocm.repo <<REPO
|
tee /etc/yum.repos.d/rocm.repo <<REPO
|
||||||
[ROCm-7.2.1]
|
[ROCm-7.2.2]
|
||||||
name=ROCm7.2.1
|
name=ROCm7.2.2
|
||||||
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.1/main
|
baseurl=https://repo.radeon.com/rocm/rhel10/7.2.2/main
|
||||||
enabled=1
|
enabled=1
|
||||||
priority=50
|
priority=50
|
||||||
gpgcheck=1
|
gpgcheck=1
|
||||||
Reference in New Issue
Block a user