Commit Graph

266 Commits

Author SHA1 Message Date
Donato Capitella 2e3dc657d2 chore: update ROCm version to 7.2.3 and remove deprecated pr21344 toolbox 2026-05-11 19:40:30 +01:00
kyuz0 0f9c2c85be Update issue templates 2026-05-06 18:06:18 +01:00
Donato Capitella 07d2131d8c added @64k benchmarks 2026-05-03 16:20:42 +01:00
Donato Capitella 1bffd6505f feat: add longctx65536 support to standard and RPC benchmark scripts 2026-05-01 20:19:02 +01:00
Donato Capitella d20bb42b04 updated results 2026-04-29 06:45:26 +01:00
Donato Capitella 73be068e85 feat: upgrade ROCm toolboxes to 7.2.2 and update documentation and CI configurations 2026-04-26 16:25:44 +01:00
Donato Capitella 1421e87060 feat: add Plausible analytics script to documentation index page 2026-04-21 09:47:16 +01:00
Donato Capitella 9016c0f8f8 update benchs 2026-04-15 16:54:34 +01:00
Donato Capitella 66a3314c22 refactor: update MODEL_DIR path to use absolute home directory reference 2026-04-15 11:39:35 +01:00
Donato Capitella 9707a15df7 feat: add benchmark results for rocm-7_2_1-pr21344 and update results metadata 2026-04-15 11:39:10 +01:00
Donato Capitella c2754a810a fix vulkan builds: add spirv-headers-devel to builder deps 2026-04-15 09:33:13 +01:00
Donato Capitella 2c2c36d3da add rocm-7.2.1-pr21344 toolbox (gfx1151 MMQ/MMVQ tile + nwarp tuning)
Adds a new toolbox variant based on PR #21344 (pedapudi/llama.cpp@gfx1151-opt)
which tunes MMQ tile sizes (x_max=48, y=64) and warp counts (nwarps=4) for
RDNA3_5 gfx1151, yielding up to +100% prefill throughput at small batch sizes.

Also adds BMI2/FMA/F16C CPU SIMD flags and GGML_CUDA_FA_ALL_QUANTS=ON to match
the benchmark build used in the PR. Wire up CI (build matrix + prune), the
refresh script, and run_benchmarks.sh so results land alongside rocm-7.2.1.
2026-04-15 09:23:58 +01:00
Donato Capitella 14fae26ad0 add minimax m2.7 benchmarks 2026-04-15 08:09:12 +01:00
kyuz0 4b3c02a405 Update README.md 2026-04-12 21:33:46 +01:00
Donato Capitella d74db71362 archvied old multi-node benchmarks 2026-04-11 11:20:30 +01:00
Donato Capitella 7aa6e6dea9 update benchmarks 2026-04-11 11:18:45 +01:00
Donato Capitella a821bcb91d chore: update rocm-7.2 benchmark configuration to version 7.2.1 2026-04-10 11:48:27 +01:00
Donato Capitella c129a04a1c refactor: remove hblt0 benchmark support and associated comparison scripts 2026-04-10 11:23:06 +01:00
Savio 5acf54cd67 fix: Update HuggingFace download commands (#61) 2026-04-10 10:56:00 +01:00
Donato Capitella 1dea385f6a fix: remove trailing backtick causing syntax error in prune-old-toolboxes workflow 2026-04-09 19:00:44 +01:00
Donato Capitella 4ac481e7d1 chore: upgrade ROCm version from 7.2 to 7.2.1 across configuration and documentation 2026-04-09 18:33:52 +01:00
Donato Capitella d1e49d4aa0 chore: remove llama.cpp PR 21566 patch from rocm7-nightlies Dockerfile 2026-04-07 18:33:18 +01:00
Donato Capitella a58d133c5e chore: update llama.cpp patch to PR 21566 for gemma-4 inference fix 2026-04-07 17:49:16 +01:00
Donato Capitella d0281bb526 feat: apply upstream llama.cpp patch to fix Gemma-4 inference issues 2026-04-06 10:25:42 +01:00
Donato Capitella bbd8f02014 build: remove -DGGML_CUDA_DISABLE_FUSION=1 from cmake configuration in rocm7-nightlies Dockerfile (this was for a temporary test) 2026-04-03 15:21:58 +01:00
Donato Capitella b376d1558b build: disable GGML CUDA fusion in ROCm build configuration (temporary test) 2026-04-03 15:16:12 +01:00
Donato Capitella a7ace8dba7 updted benchmarks 2026-03-30 08:37:15 +01:00
Donato Capitella 614b00af3e fixed patch (AI slop!!!) 2026-03-25 09:36:50 +00:00
Donato Capitella ca84f4cbf3 patch: increasing MAX_REPETITION_THRESHOLD to allow complex agentic workflows 2026-03-25 09:23:19 +00:00
esc247 eb03432a50 added router mode section and example models.ini file for use with router mode (#67) 2026-03-04 12:09:11 +00:00
Donato Capitella 5f4698c959 build: Remove amdgpu-unroll-threshold-local CMAKE_HIP_FLAG from ROCm 7 nightlies Dockerfile. 2026-03-03 12:54:45 +00:00
zynzynack 8462d300e7 Update README.md to latest huggingface CLI (#64)
huggingface-cli changes to hf 
HF_TOKEN concerpt intoduced for download speed
2026-03-01 13:49:31 +00:00
Donato Capitella a8c3fa89ea docs: Update Quick Start section with a link to Strix Halo Toolboxes configuration details. 2026-02-22 19:47:30 +00:00
Donato Capitella ec245b9b17 feat: Implement OS-aware toolbox command selection (toolbox vs distrobox) in the script and clarify usage in the README. 2026-02-22 19:47:30 +00:00
Trevor Starick 95aaf23a47 fix: remove trailing backslash (#60)
* feat: add REPO/BRANCH build args for llama.cpp

- Introduce ARG REPO and ARG BRANCH to replace the hardcoded git clone with: `git clone -b ${BRANCH} --single-branch --recursive ${REPO}` . This allows overriding the llama.cpp repository and branch at build time via `--build-arg`.

- Update `docs/building.md` to recommend using `--build-arg` instead of updating the file

* fix: remove trailing backslash
2026-02-17 21:41:55 +00:00
Trevor Starick be936d6b59 feat: add REPO/BRANCH build args for llama.cpp (#59)
- Introduce ARG REPO and ARG BRANCH to replace the hardcoded git clone with: `git clone -b ${BRANCH} --single-branch --recursive ${REPO}` . This allows overriding the llama.cpp repository and branch at build time via `--build-arg`.

- Update `docs/building.md` to recommend using `--build-arg` instead of updating the file
2026-02-17 19:29:48 +00:00
Donato Capitella 13a0189b6a Simplify image cleanup by removing explicit logic for older tagged image versions. 2026-02-14 11:27:59 +00:00
Nicholas Burr 9418384c5f Updated HF_HUB_ENABLE_HF_TRANSFER to HF_XET_HIGH_PERFORMANCE. (#56) 2026-02-09 20:16:21 +00:00
Donato Capitella 8ff812fbb5 updated benchmarks 2026-02-09 13:30:26 +00:00
Donato Capitella 632130a2c3 fix: Correct typo in "buy me a coffee" link in README. 2026-02-06 06:53:19 +00:00
Donato Capitella 033585368c fix ToC 2026-02-05 19:46:50 +00:00
Donato Capitella 2d09b9e6db updated benchmarks 2026-02-05 19:03:13 +00:00
Donato Capitella 4a97c47c4f docs: add project context and support sections to README. 2026-02-05 17:53:32 +00:00
dougs f28dee87ef Update README with correction to model download and inference instructions (#54)
Updated instructions for downloading model files to include both parts of the example GGUF
2026-02-05 11:21:47 +00:00
Donato Capitella eb92804284 feat: move and expand host configuration details with updated kernel parameters and explanations, and add a warning note. 2026-02-04 18:39:07 +00:00
Donato Capitella 9c6946e4b5 docs: add Table of Contents to README. 2026-02-04 18:34:21 +00:00
Donato Capitella 616d034bc6 typo 2026-02-04 18:32:24 +00:00
Donato Capitella 4d09c88011 tidy up README 2026-02-04 18:31:39 +00:00
Donato Capitella 3684e49a9d docs: update README to announce the application of a workaround for the ROCm 7 performance regression. 2026-02-04 18:05:10 +00:00
Donato Capitella 06fc789eba chore: deprecate and remove ROCm 7.1.1 toolbox and all associated references. 2026-02-04 17:56:41 +00:00