Updated performance summary

2025-07-29 18:25:12 +01:00
parent 41a0bc2229
commit 4478690f1c
1 changed files with 18 additions and 4 deletions
@@ -120,16 +120,24 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w

 **🏆 Vulkan Advantages:**
 - Consistently stable across all model sizes
- Best performance on small models (Gemma3 12B) and very large models (235B+)
+- Significantly better prompt processing on smaller quantized models (127% faster on Gemma3 12B)
 - Only option that can handle >64GB models efficiently
+- Moderate advantage on larger quantized models (3-14% better on Llama4 17B)

 **🏆 ROCm 6.4.2 Advantages:**
- Superior performance on medium-sized MoE models (30B Qwen3)
- Better text generation speeds on some models
+- **Dramatically superior performance on BF16 models** (112% faster prompt processing, 222% faster text generation on Qwen3 MoE 30B)
+- Optimized native floating-point operations through HIP compute
+- Better suited for models using native precision formats
+
+**📊 Performance by Model Type:**
+- **BF16/Native Precision Models**: ROCm 6.4.2 is the clear winner with 2-3x better performance
+- **Small Quantized Models**: Vulkan has significant advantages for prompt processing
+- **Large Quantized Models**: Performance is similar between backends (differences within noise)
+- **Large Models (>64GB)**: Vulkan is the only viable option due to ROCm's memory allocation issues

 **❌ ROCm 6.4.2 Limitations:**
 - Extremely slow memory loading for models >64GB (unusable)
- Performance varies significantly by model type
+- Performance advantage limited to BF16/native precision models

 **❌ ROCm 7.0 Beta Issues:**
 - GPU hangs/crashes on larger models (Llama4 17B causes "GPU Hang" and core dump)
@@ -137,6 +145,12 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w
 - Performance similar to ROCm 6.4.2 when it works, but reliability is poor
 - Uses [official AMD RPMs](https://repo.radeon.com/rocm/el9/7.0_beta/main) (beta quality)

+**💡 Recommendation Strategy:**
+- Use **ROCm 6.4.2** for BF16/native precision models under 64GB
+- Use **Vulkan** for quantized models (especially smaller ones) and all models over 64GB
+- For large quantized models under 64GB, either backend performs similarly
+- Avoid ROCm 7.0 beta for production workloads
+
 ## Building Containers Locally (Optional)

 If you prefer to build the containers yourself: