kyuz0
8648f93ad3
updated benchs
2026-05-12 12:32:07 +01:00
Donato Capitella
2e3dc657d2
chore: update ROCm version to 7.2.3 and remove deprecated pr21344 toolbox
2026-05-11 19:40:30 +01:00
Donato Capitella
07d2131d8c
added @64k benchmarks
2026-05-03 16:20:42 +01:00
Donato Capitella
1bffd6505f
feat: add longctx65536 support to standard and RPC benchmark scripts
2026-05-01 20:19:02 +01:00
Donato Capitella
d20bb42b04
updated results
2026-04-29 06:45:26 +01:00
Donato Capitella
73be068e85
feat: upgrade ROCm toolboxes to 7.2.2 and update documentation and CI configurations
2026-04-26 16:25:44 +01:00
Donato Capitella
9016c0f8f8
update benchs
2026-04-15 16:54:34 +01:00
Donato Capitella
66a3314c22
refactor: update MODEL_DIR path to use absolute home directory reference
2026-04-15 11:39:35 +01:00
Donato Capitella
9707a15df7
feat: add benchmark results for rocm-7_2_1-pr21344 and update results metadata
2026-04-15 11:39:10 +01:00
Donato Capitella
2c2c36d3da
add rocm-7.2.1-pr21344 toolbox (gfx1151 MMQ/MMVQ tile + nwarp tuning)
...
Adds a new toolbox variant based on PR #21344 (pedapudi/llama.cpp@gfx1151-opt)
which tunes MMQ tile sizes (x_max=48, y=64) and warp counts (nwarps=4) for
RDNA3_5 gfx1151, yielding up to +100% prefill throughput at small batch sizes.
Also adds BMI2/FMA/F16C CPU SIMD flags and GGML_CUDA_FA_ALL_QUANTS=ON to match
the benchmark build used in the PR. Wire up CI (build matrix + prune), the
refresh script, and run_benchmarks.sh so results land alongside rocm-7.2.1.
2026-04-15 09:23:58 +01:00
Donato Capitella
14fae26ad0
add minimax m2.7 benchmarks
2026-04-15 08:09:12 +01:00
Donato Capitella
d74db71362
archvied old multi-node benchmarks
2026-04-11 11:20:30 +01:00
Donato Capitella
7aa6e6dea9
update benchmarks
2026-04-11 11:18:45 +01:00
Donato Capitella
a821bcb91d
chore: update rocm-7.2 benchmark configuration to version 7.2.1
2026-04-10 11:48:27 +01:00
Donato Capitella
c129a04a1c
refactor: remove hblt0 benchmark support and associated comparison scripts
2026-04-10 11:23:06 +01:00
Donato Capitella
a7ace8dba7
updted benchmarks
2026-03-30 08:37:15 +01:00
Donato Capitella
8ff812fbb5
updated benchmarks
2026-02-09 13:30:26 +00:00
Donato Capitella
2d09b9e6db
updated benchmarks
2026-02-05 19:03:13 +00:00
Donato Capitella
06fc789eba
chore: deprecate and remove ROCm 7.1.1 toolbox and all associated references.
2026-02-04 17:56:41 +00:00
Donato Capitella
d97efb0cb9
updated gpt-oss benchmakrs to test rocm7 performance patch
2026-02-04 17:46:43 +00:00
Donato Capitella
d674531182
added rocm-7.2 benchmarks
2026-01-23 15:11:13 +00:00
Donato Capitella
0635552fec
updated benchmark scripts
2026-01-23 08:55:25 +00:00
Donato Capitella
6d70dfc73b
updated with dual-server benchmarks
2026-01-12 13:19:23 +00:00
Donato Capitella
7268e95b0f
updates
2026-01-12 11:05:31 +00:00
Donato Capitella
d6c7456bd0
adding system info to benchmark display
2026-01-11 10:04:05 +00:00
Donato Capitella
783998589e
neclean up of legacy toolboxes, removal of rocwmma and renamed rocm7-alpha to rocm-7nightlies. Added new benchmarks
2026-01-10 10:31:04 +00:00
Donato Capitella
2c8a1e2eef
updated benchmarks
2025-12-21 18:49:08 +00:00
Donato Capitella
9ba6812003
feat: upgrade ROCm to 7.1.1 and update associated tooling and documentation
2025-12-07 09:30:14 +00:00
Donato Capitella
7584a31548
updated Qwen Next Benchmarks
2025-12-05 08:32:58 +00:00
Donato Capitella
7f34f51202
Added Qwen-3-Next benchmarks
2025-11-28 17:50:21 +00:00
Donato Capitella
c7f4ffc346
updated rpc benchmakrs with long context
2025-11-19 07:35:56 +00:00
Donato Capitella
1d88fca07d
added long context benchmakrs for RPC
2025-11-18 10:43:17 +00:00
Donato Capitella
d19875828c
add script to compare performance with/without forcing the hipblaslt path
2025-11-18 08:45:25 +00:00
Donato Capitella
ccf29e6b22
fixed naming convention
2025-11-17 23:09:04 +00:00
Donato Capitella
1d6d48fae1
updated benchmarks
2025-11-17 23:02:56 +00:00
Donato Capitella
ad32126872
Updating retries for run)_benchmark
2025-11-17 17:53:53 +00:00
Donato Capitella
12f057612b
restoring correct llama-bench flags
2025-11-17 16:00:10 +00:00
Donato Capitella
de02a53d96
restored correct benchmark behaviour
2025-11-17 15:55:31 +00:00
Donato Capitella
f62c6e47c5
updated benchmark script to cover HBLASLT for all rocm backends
2025-11-17 15:30:19 +00:00
Donato Capitella
1bb4c1f0cc
improve logic to check if a benchmakr as already been run
2025-11-17 11:01:19 +00:00
Donato Capitella
de49d65b3c
Ensure ROCBLAS_USE_HIPBLASLT is properly set remotely as well
2025-11-17 09:33:14 +00:00
Donato Capitella
17b1ec2825
run llama-bench INSIDe container (vibe coding is tiring)
2025-11-17 09:26:56 +00:00
Donato Capitella
6d8ac6d6f4
remove user from ssh script
2025-11-17 09:10:52 +00:00
Donato Capitella
1eade84757
fixed remote targets
2025-11-17 09:09:15 +00:00
Donato Capitella
1e184979df
remove user from default host
2025-11-17 09:05:13 +00:00
Donato Capitella
ecbe5c14c3
Fixed model path resolution
2025-11-17 08:57:42 +00:00
Donato Capitella
a50adb0c15
add benchmakr script for RPC
2025-11-17 08:27:54 +00:00
Donato Capitella
67fb3a002b
Updated benchmarks
2025-11-15 08:36:25 +00:00
Donato Capitella
1d945f2c21
change llama-bench retries to 3
2025-11-12 14:19:47 +00:00
Donato Capitella
79479ec596
adding rocm7alpha to the benchmarks
2025-11-12 14:06:00 +00:00