Strix Halo — llama.cpp Backend Comparator

Compare model throughput across backends (pp512 & tg128). Repo: kyuz0/amd-strix-halo-toolboxes

Loading meta…

Search model

Quant

Model size (B from name)

Flash Attention

Winner = every selected backend within the best’s uncertainty range, combining ± errors from both results.

Prompt Processing (pp512) — tokens/second