amd-strix-halo-toolboxes/docs/index.html

<!doctype html>
<html lang="en">

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>AMD Strix Halo — Backend Benchmarks (Grid View)</title>
    <link rel="stylesheet" href="assets/index2.css">
</head>

<body>
    <header>
        <h1>AMD Ryzen AI MAX+ 395 “Strix Halo” — Benchmark Grid</h1>
        <p>Framework Desktop · AMD Ryzen AI MAX 395+ · 128GB unified RAM</p>
        <p>Fedora 42 · Linux 6.18.0-0.rc5.243.vanilla.fc42.x86_64 · llama.cpp build 1c398dc9e (7034)</p>
        <p>Benchmarks captured 14 Nov 2025 · Repo: <a href="https://github.com/kyuz0/amd-strix-halo-toolboxes"
                target="_blank" rel="noreferrer">kyuz0/amd-strix-halo-toolboxes</a></p>
        <div class="legend">
            <label>Legend</label>
            <div class="legend-pills">
                <button id="hipblas-modal-open" type="button" class="chip small legend-pill legend-pill-default">
                    hipBLASLt vs hblt0
                </button>
                <button id="rpc-modal-open" type="button" class="chip small legend-pill legend-pill-rpc">
                    RPC · dual server
                </button>
                <button id="rocwmma-modal-open" type="button" class="chip small legend-pill legend-pill-rocwmma">
                    rocWMMA
                </button>
                <button id="rocwmma-impr-modal-open" type="button" class="chip small legend-pill legend-pill-rocwmma-improved">
                    rocWMMA-improved
                </button>
            </div>
        </div>
    </header>

    <section class="controls">
        <div class="control">
            <label for="filter-search">Search models</label>
            <input id="filter-search" type="text" placeholder="e.g. llama, qwen, 30B…">
        </div>
        <div class="control">
            <label for="filter-quant">Quant</label>
            <select id="filter-quant">
                <option value="">Any</option>
            </select>
        </div>
        <div class="control grow slider-block">
            <label>Context windows</label>
            <div id="context-chips" class="chip-row tight"></div>
        </div>
        <div class="control grow slider-block">
            <label>Model params (B)</label>
            <div class="range-wrap">
                <input type="range" id="sizeLo" step="1">
                <input type="range" id="sizeHi" step="1">
                <div class="range-track" id="sizeTrack"></div>
            </div>
            <div class="range-values">
                <span id="sizeLoVal">0B</span> – <span id="sizeHiVal">0B</span>
            </div>
        </div>
    </section>

    <section class="panel compact">
        <div class="panel-split">
            <div class="backend-header">
                <div class="backend-label">
                    <label>Backends</label>
                    <div class="backend-actions">
                        <button type="button" id="backend-all" class="chip small">All</button>
                        <button type="button" id="backend-none" class="chip small">None</button>
                    </div>
                </div>
                <div id="backend-list" class="backend-list"></div>
            </div>
            <div class="stats-box">
                <div class="stat-line" id="stats-line">Loading…</div>
                <button id="reset-layout" type="button" class="chip small">Reset filters</button>
            </div>
        </div>
    </section>

    <section class="panel compact" id="tables-panel">
        <div id="tables"></div>
    </section>

    <div id="hipblas-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="hipblas-title">
        <div class="modal-content">
            <button id="hipblas-modal-close" class="modal-close" aria-label="Close dialog">×</button>
            <h2 id="hipblas-title">hipBLASLt &amp; hblt0 explained</h2>
            <p>The ROCm toolboxes ship with <code>ROCBLAS_USE_HIPBLASLT=1</code> by default. This forces rocBLAS to prefer
                the hipBLASLt kernel library, which historically delivered the best throughput on gfx1151 (Strix Halo).</p>
            <p>Rows tagged with <code>__hblt0</code> were re-run with <code>ROCBLAS_USE_HIPBLASLT=0</code>, letting rocBLAS
                auto-select between hipBLASLt, Tensile, or other kernel providers. These runs show how performance shifts when
                the tuned hipBLASLt path is disabled.</p>
            <p>hipBLASLt is AMD's LT (low-level tuned) matmul backend, optimized for transformer workloads. Disabling it can
                expose regressions or improvements depending on driver versions, so both configurations are published for
                comparison.</p>
        </div>
    </div>

    <div id="rpc-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="rpc-title">
        <div class="modal-content">
            <button id="rpc-modal-close" class="modal-close" aria-label="Close dialog">×</button>
            <h2 id="rpc-title">RPC · dual server</h2>
            <p>These results were produced with two Strix Halo systems (Framework Desktop + HP G1a workstation, each 128&nbsp;GB)
                connected over 5&nbsp;Gbps Ethernet. One runs <code>rpc-server</code> from llama.cpp; the other runs
                <code>llama-bench --rpc</code>.</p>
            <p>This setup allows distributed inference, splitting large GGUF models across both machines. The metric shows what
                you can expect when latency is limited by the network and the workload is balanced between two RPC participants.</p>
        </div>
    </div>

    <div id="rocwmma-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="rocwmma-title">
        <div class="modal-content">
            <button id="rocwmma-modal-close" class="modal-close" aria-label="Close dialog">×</button>
            <h2 id="rocwmma-title">rocWMMA variants</h2>
            <p>Backends labeled <code>-rocwmma</code> are rebuilt with AMD's rocWMMA library, which unlocks matrix multiply
                pipelines accelerated via wave matrix multiply-accumulate (WMMA) instructions.</p>
            <p>rocWMMA kernels can significantly accelerate BF16/F16 workloads on RDNA3 but may trade stability or memory
                usage; comparing plain toolboxes against <code>-rocwmma</code> ones highlights the benefit or cost.</p>
        </div>
    </div>

    <div id="rocwmma-impr-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="rocwmma-impr-title">
        <div class="modal-content">
            <button id="rocwmma-impr-modal-close" class="modal-close" aria-label="Close dialog">×</button>
            <h2 id="rocwmma-impr-title">rocWMMA-improved builds</h2>
            <p>Toolboxes tagged <code>-rocwmma-improved</code> bake in an experimental llama.cpp patch that retunes rocWMMA
                kernels for long-context throughput on Strix Halo.</p>
            <p>Patch reference: <a href="https://github.com/hjc4869/llama.cpp/commit/12bb5c371bd3c647ef75e8e13de9e311edba604d"
                    target="_blank" rel="noreferrer">12bb5c371bd3</a>. These builds often run faster for 32k+ contexts, but
                the changes are not upstream and may be unstable.</p>
        </div>
    </div>

    <script src="assets/index2.js" type="module"></script>
</body>

</html>