_notes/EXO.md at archivetrim · bmann.ca/bmannconsulting.com

The bmannconsulting.com website
bmannconsulting.com / _notes / EXO.md
at archivetrim 2.4 kB view raw view rendered
 1---
 2github: https://github.com/exo-explore/exo
 3tags:
 4  - opensource
 5  - AI
 6  - LocalAI
 7  - cluster
 8---
 9Run your own AI cluster at home with everyday devices.
10
11Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!
12## Features
13
14From the [README](https://github.com/exo-explore/exo#features)
15
16### Wide Model Support
17
18exo supports different models including LLaMA ([MLX](https://github.com/exo-explore/exo/blob/main/exo/inference/mlx/models/llama.py) and [tinygrad](https://github.com/exo-explore/exo/blob/main/exo/inference/tinygrad/models/llama.py)), Mistral, LlaVA, Qwen, and Deepseek.
19
20### Dynamic Model Partitioning
21
22exo [optimally splits up models](https://github.com/exo-explore/exo/blob/main/exo/topology/ring_memory_weighted_partitioning_strategy.py) based on the current network topology and device resources available. This enables you to run larger models than you would be able to on any single device.
23
24### Automatic Device Discovery
25
26exo will [automatically discover](https://github.com/exo-explore/exo/blob/945f90f676182a751d2ad7bcf20987ab7fe0181e/exo/orchestration/node.py#L154) other devices using the best method available. Zero manual configuration.
27
28### ChatGPT-compatible API
29
30exo provides a [ChatGPT-compatible API](https://github.com/exo-explore/exo/blob/main/exo/api/chatgpt_api.py) for running models. It's a [one-line change](https://github.com/exo-explore/exo/blob/main/examples/chatgpt_api.sh) in your application to run models on your own hardware using exo.
31
32### Device Equality
33
34Unlike other distributed inference frameworks, exo does not use a master-worker architecture. Instead, exo devices [connect p2p](https://github.com/exo-explore/exo/blob/945f90f676182a751d2ad7bcf20987ab7fe0181e/exo/orchestration/node.py#L161). As long as a device is connected somewhere in the network, it can be used to run models.
35
36Exo supports different [partitioning strategies](https://github.com/exo-explore/exo/blob/main/exo/topology/partitioning_strategy.py) to split up a model across devices. The default partitioning strategy is [ring memory weighted partitioning](https://github.com/exo-explore/exo/blob/main/exo/topology/ring_memory_weighted_partitioning_strategy.py). This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.