From 51aab9665d561b8d4063d8d88b493d035ec42f3a Mon Sep 17 00:00:00 2001 From: Donato Capitella Date: Tue, 3 Feb 2026 10:57:08 +0000 Subject: [PATCH] docs: Add a performance regression warning for Llama.cpp with ROCm 7.1+ or nightly builds. --- README.md | 6 ++++++ .../run_distributed_llama.py | 0 2 files changed, 6 insertions(+) rename run_distributed_llama.py => scripts/run_distributed_llama.py (100%) diff --git a/README.md b/README.md index 8f06520..a69b448 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,12 @@ This project provides pre-built containers (“toolboxes”) for running LLMs on This is currently the most stable setup. Switching to newer kernels, such as 6.18.4 breaks all versions of ROCm but the cutting edge nightly builds from TheRock. +## ⚠️ Performance Regression Warning — 2026-02-03 + +There is a considerable performance regression when using Llama.cpp with ROCm 7.1+ or the nightly builds (from TheRock) compared to ROCm 6.4.4. This is a known issue tracked in [ROCm-systems#2865](https://github.com/ROCm/rocm-systems/issues/2865). + +AMD has pinpointed the cause to a compiler patch (llvm/llvm-project#147700) which causes VGPR spills and drops kernel occupancy. Reverting this patch restores performance (e.g., from ~1416 t/s to ~4378 t/s on gfx1100). We are tracking this issue and will update the toolboxes once a fix is available. + ## 🚨 Updates — 2026-01-10 - **Simplified Offering**: Removed `rocwmma` containers as standard kernels in newer `llama.cpp` are now faster and stable. diff --git a/run_distributed_llama.py b/scripts/run_distributed_llama.py similarity index 100% rename from run_distributed_llama.py rename to scripts/run_distributed_llama.py