Final-Twitter-MLX.mp4
MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)
This project is a fully native SwiftUI app that allows you to run local LLMs (e.g. Llama, Mistral) on Apple silicon in real-time using MLX.
- Open the Xcode project.
- Go to Signing & Capabilities.
- Change the Team to your own team.
- Set the destination to My Mac.
- Click Run.
Support for iOS is coming next week.
- Click on Manage Models in the inspector view.
- Download and install a model (we recommend starting with
Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX
). - Go back to the inspector and select the downloaded model from the model picker.
- Wait for the model to load, the status bar will flash "Ready" once it is loaded.
- Click the run button.
- Fix iOS builds
- Implement support for StableLM
- Implement basic support automatically adding model-specific chat templates to the prompt
- Add support for stop sequences
- Add more model suggestions
- ... (many, many more items to be added soon pending sleep)
Model | Status |
---|---|
Mistral | Supported |
Llama | Supported |
Phi | Supported |
Gemma | Supported (May have issues) |
Models are downloaded from Hugging Face. To add a new model, visit the MLX Community on HuggingFace and search for the model you want, then add it via Manage Models → Add Model
Important
Note that this project is still under active development and some models may require additional implementation to run correctly.
No. This is not intended for deploying into production.
- Apple Silicon Mac (M1/M2/M3) with macOS 14.0 or newer.
- Any A-Series chip (iPad, iPhone) with iOS 17.2 or newer.
No. Everything is run locally on device.
-
Temperature: Controls randomness. Lowering results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive.
-
Top K: Sort predicted tokens by probability and discards those below the k-th one. A top-k value of 1 is equivalent to greedy search (select the most probable token).
-
Maximum length: The maximum number of tokens to generate. Requests can use up to 2,048 tokens shared between prompt and completion. The exact limit varies by model. (One token is roughly 4 characters for normal English text)
Special thanks to Awni Hannun and David Koski for early testing and feedback
Much ❤️ to all the folks who made MLX (especially mlx-swift) possible!