refactor: remove hblt0 benchmark support and associated comparison scripts

This commit is contained in:
Donato Capitella
2026-04-10 11:23:06 +01:00
parent 5acf54cd67
commit c129a04a1c
7 changed files with 47 additions and 327 deletions
+1 -23
View File
@@ -19,9 +19,7 @@
<div class="legend">
<label>Legend</label>
<div class="legend-pills">
<button id="hipblas-modal-open" type="button" class="chip small legend-pill legend-pill-default">
hipBLASLt vs hblt0
</button>
<button id="rpc-modal-open" type="button" class="chip small legend-pill legend-pill-rpc">
RPC · dual server
</button>
@@ -83,26 +81,6 @@
<div id="tables"></div>
</section>
<div id="hipblas-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="hipblas-title">
<div class="modal-content">
<button id="hipblas-modal-close" class="modal-close" aria-label="Close dialog">×</button>
<h2 id="hipblas-title">hipBLASLt &amp; hblt0 explained</h2>
<p>The ROCm toolboxes ship with <code>ROCBLAS_USE_HIPBLASLT=1</code> by default. This forces rocBLAS to
prefer
the hipBLASLt kernel library, which historically delivered the best throughput on gfx1151 (Strix Halo).
</p>
<p>Rows tagged with <code>__hblt0</code> were re-run with <code>ROCBLAS_USE_HIPBLASLT=0</code>, letting
rocBLAS
auto-select between hipBLASLt, Tensile, or other kernel providers. These runs show how performance
shifts when
the tuned hipBLASLt path is disabled.</p>
<p>hipBLASLt is AMD's LT (low-level tuned) matmul backend, optimized for transformer workloads. Disabling it
can
expose regressions or improvements depending on driver versions, so both configurations are published
for
comparison.</p>
</div>
</div>
<div id="rpc-modal" class="modal hidden" role="dialog" aria-modal="true" aria-labelledby="rpc-title">
<div class="modal-content">