Updated performance summary
This commit is contained in:
@@ -120,16 +120,24 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w
|
||||
|
||||
**🏆 Vulkan Advantages:**
|
||||
- Consistently stable across all model sizes
|
||||
- Best performance on small models (Gemma3 12B) and very large models (235B+)
|
||||
- Significantly better prompt processing on smaller quantized models (127% faster on Gemma3 12B)
|
||||
- Only option that can handle >64GB models efficiently
|
||||
- Moderate advantage on larger quantized models (3-14% better on Llama4 17B)
|
||||
|
||||
**🏆 ROCm 6.4.2 Advantages:**
|
||||
- Superior performance on medium-sized MoE models (30B Qwen3)
|
||||
- Better text generation speeds on some models
|
||||
- **Dramatically superior performance on BF16 models** (112% faster prompt processing, 222% faster text generation on Qwen3 MoE 30B)
|
||||
- Optimized native floating-point operations through HIP compute
|
||||
- Better suited for models using native precision formats
|
||||
|
||||
**📊 Performance by Model Type:**
|
||||
- **BF16/Native Precision Models**: ROCm 6.4.2 is the clear winner with 2-3x better performance
|
||||
- **Small Quantized Models**: Vulkan has significant advantages for prompt processing
|
||||
- **Large Quantized Models**: Performance is similar between backends (differences within noise)
|
||||
- **Large Models (>64GB)**: Vulkan is the only viable option due to ROCm's memory allocation issues
|
||||
|
||||
**❌ ROCm 6.4.2 Limitations:**
|
||||
- Extremely slow memory loading for models >64GB (unusable)
|
||||
- Performance varies significantly by model type
|
||||
- Performance advantage limited to BF16/native precision models
|
||||
|
||||
**❌ ROCm 7.0 Beta Issues:**
|
||||
- GPU hangs/crashes on larger models (Llama4 17B causes "GPU Hang" and core dump)
|
||||
@@ -137,6 +145,12 @@ All benchmarks performed on HP Z2 Mini G1a with 128GB RAM, using `llama-bench` w
|
||||
- Performance similar to ROCm 6.4.2 when it works, but reliability is poor
|
||||
- Uses [official AMD RPMs](https://repo.radeon.com/rocm/el9/7.0_beta/main) (beta quality)
|
||||
|
||||
**💡 Recommendation Strategy:**
|
||||
- Use **ROCm 6.4.2** for BF16/native precision models under 64GB
|
||||
- Use **Vulkan** for quantized models (especially smaller ones) and all models over 64GB
|
||||
- For large quantized models under 64GB, either backend performs similarly
|
||||
- Avoid ROCm 7.0 beta for production workloads
|
||||
|
||||
## Building Containers Locally (Optional)
|
||||
|
||||
If you prefer to build the containers yourself:
|
||||
|
||||
Reference in New Issue
Block a user