added script to manage cluster

This commit is contained in:
Donato Capitella
2026-01-14 17:01:00 +00:00
parent 6d70dfc73b
commit 8da5395366
2 changed files with 568 additions and 4 deletions
+31 -4
View File
@@ -50,8 +50,9 @@ AMD has recalled this update, but if you have already installed it, you must dow
6.1 [Test Configuration](#61-test-configuration)
6.2 [Kernel Parameters (tested on Fedora 42)](#62-kernel-parameters-tested-on-fedora-42)
6.3 [Ubuntu 24.04](#63-ubuntu-2404)
7. [More Documentation](#7-more-documentation)
8. [References](#8-references)
7. [Distributed Inference on Strix Halo Clusters](#7-distributed-inference-on-strix-halo-clusters)
8. [More Documentation](#8-more-documentation)
9. [References](#9-references)
## Quick Answers (Read This First)
@@ -380,13 +381,39 @@ Follow this guide by TechnigmaAI for a working configuration on Ubuntu 24.04:
[https://github.com/technigmaai/technigmaai-wiki/wiki/AMD-Ryzen-AI-Max--395:-GTT--Memory-Step%E2%80%90by%E2%80%90Step-Instructions-(Ubuntu-24.04)](https://github.com/technigmaai/technigmaai-wiki/wiki/AMD-Ryzen-AI-Max--395:-GTT--Memory-Step%E2%80%90by%E2%80%90Step-Instructions-%28Ubuntu-24.04%29)
## 7. More Documentation
## 7. Distributed Inference on Strix Halo Clusters
You can use the included `run_distributed_llama.py` script to run models on a cluster of Strix Halo machines.
### Setup
1. **Install Toolboxes**: Install the required toolboxes on each node (main and workers).
2. **Download Models**: Download the model weights to the main node.
3. **SSH Configuration**: Configure SSH passwordless authentication from the main node to all other nodes. The script relies on being able to SSH into workers without user interaction.
```bash
ssh-copy-id user@worker-node-ip
```
### Execution
Run the script on the main node **outside of the toolbox**:
```bash
python3 run_distributed_llama.py
```
This will launch the TUI, where you can:
1. Configure the list of worker nodes (IPs).
2. Select the model and toolbox version.
3. Start the distributed inference.
The script automatically starts the necessary toolbox containers locally and on the remote nodes to handle the inference.
## 8. More Documentation
* [docs/benchmarks.md](docs/benchmarks.md): Full benchmark logs, model list, parsed results
* [docs/vram-estimator.md](docs/vram-estimator.md): Memory planning, practical example runs
* [docs/building.md](docs/building.md): Local build, toolbox customization, advanced use
## 8. References
## 9. References
* The main reference for AMD Ryzen AI MAX home labs, by deseven (there's also a Discord server): [https://strixhalo-homelab.d7.wtf/](https://strixhalo-homelab.d7.wtf/)
* Most comprehesive repostiry of test builds for Strix Halo by lhl -> [https://github.com/lhl/strix-halo-testing/tree/main](https://github.com/lhl/strix-halo-testing/tree/main)