Updated performance summary

This commit is contained in:
Donato Capitella
2025-07-29 18:25:12 +01:00
parent 41a0bc2229
commit 4478690f1c
+18 -4
View File
@@ -120,16 +120,24 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w
**🏆 Vulkan Advantages:**
- Consistently stable across all model sizes
- Best performance on small models (Gemma3 12B) and very large models (235B+)
- Significantly better prompt processing on smaller quantized models (127% faster on Gemma3 12B)
- Only option that can handle >64GB models efficiently
- Moderate advantage on larger quantized models (3-14% better on Llama4 17B)
**🏆 ROCm 6.4.2 Advantages:**
- Superior performance on medium-sized MoE models (30B Qwen3)
- Better text generation speeds on some models
- **Dramatically superior performance on BF16 models** (112% faster prompt processing, 222% faster text generation on Qwen3 MoE 30B)
- Optimized native floating-point operations through HIP compute
- Better suited for models using native precision formats
**📊 Performance by Model Type:**
- **BF16/Native Precision Models**: ROCm 6.4.2 is the clear winner with 2-3x better performance
- **Small Quantized Models**: Vulkan has significant advantages for prompt processing
- **Large Quantized Models**: Performance is similar between backends (differences within noise)
- **Large Models (>64GB)**: Vulkan is the only viable option due to ROCm's memory allocation issues
**❌ ROCm 6.4.2 Limitations:**
- Extremely slow memory loading for models >64GB (unusable)
- Performance varies significantly by model type
- Performance advantage limited to BF16/native precision models
**❌ ROCm 7.0 Beta Issues:**
- GPU hangs/crashes on larger models (Llama4 17B causes "GPU Hang" and core dump)
@@ -137,6 +145,12 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w
- Performance similar to ROCm 6.4.2 when it works, but reliability is poor
- Uses [official AMD RPMs](https://repo.radeon.com/rocm/el9/7.0_beta/main) (beta quality)
**💡 Recommendation Strategy:**
- Use **ROCm 6.4.2** for BF16/native precision models under 64GB
- Use **Vulkan** for quantized models (especially smaller ones) and all models over 64GB
- For large quantized models under 64GB, either backend performs similarly
- Avoid ROCm 7.0 beta for production workloads
## Building Containers Locally (Optional)
If you prefer to build the containers yourself: