MLX Swift Chat: Run LLM models locally with MLX!

Final-Twitter-MLX.mp4

Run LLM models locally with MLX!

MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)

– @awnihannun

This project is a fully native SwiftUI app that allows you to run local LLMs (e.g. Llama, Mistral) on Apple silicon in real-time using MLX.

Installation

Open the Xcode project.
Go to Signing & Capabilities.
Change the Team to your own team.
Set the destination to My Mac.
Click Run.

Support for iOS is coming next week.

Usage

Click on Manage Models in the inspector view.
Download and install a model (we recommend starting with Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX).
Go back to the inspector and select the downloaded model from the model picker.
Wait for the model to load, the status bar will flash "Ready" once it is loaded.
Click the run button.

Roadmap

Fix iOS builds
Implement support for StableLM
Implement basic support automatically adding model-specific chat templates to the prompt
Add support for stop sequences
Add more model suggestions
... (many, many more items to be added soon pending sleep)

Frequently Asked Questions

What models are currently supported?

Model	Status
Mistral	Supported
Llama	Supported
Phi	Supported
Gemma	Supported (May have issues)

How do I add new models?

Models are downloaded from Hugging Face. To add a new model, visit the MLX Community on HuggingFace and search for the model you want, then add it via Manage Models → Add Model

Important

Note that this project is still under active development and some models may require additional implementation to run correctly.

Is this suitable for production?

No. This is not intended for deploying into production.

What are the minimum hardware and software requirements?

Apple Silicon Mac (M1/M2/M3) with macOS 14.0 or newer.
Any A-Series chip (iPad, iPhone) with iOS 17.2 or newer.

Does this collect any data?

No. Everything is run locally on device.

What are the parameters?

Temperature: Controls randomness. Lowering results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive.
Top K: Sort predicted tokens by probability and discards those below the k-th one. A top-k value of 1 is equivalent to greedy search (select the most probable token).
Maximum length: The maximum number of tokens to generate. Requests can use up to 2,048 tokens shared between prompt and completion. The exact limit varies by model. (One token is roughly 4 characters for normal English text)

Acknowledgements

buh/CompactSlider
ml-explore/mlx-swift
huggingface/swift-chat

Special thanks to Awni Hannun and David Koski for early testing and feedback

Much ❤️ to all the folks who made MLX (especially mlx-swift) possible!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLX Swift Chat: Run LLM models locally with MLX!

Run LLM models locally with MLX!

Installation

Usage

Roadmap

Frequently Asked Questions

What models are currently supported?

How do I add new models?

Is this suitable for production?

What are the minimum hardware and software requirements?

Does this collect any data?

What are the parameters?

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLX Swift Chat: Run LLM models locally with MLX!

Run LLM models locally with MLX!

Installation

Usage

Roadmap

Frequently Asked Questions

What models are currently supported?

How do I add new models?

Is this suitable for production?

What are the minimum hardware and software requirements?

Does this collect any data?

What are the parameters?

Acknowledgements