-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparsity - The next performance frontier. #93
Comments
The process for converting a model to a SparseML compatible model doesn't seem all that complicated. Sparsity has a lot of benefits to offer for inference, while you can quantize models to the GGML format, reducing their size and complexity, whereas making a model sparse involves both quantizing and pruning irrelevant parts? |
Here is a good explanation if anyone is interested. https://neuralmagic.com/blog/sparsegpt-remove-100-billion-parameters-for-free/ |
Great work going on with GGML. Bravo to so many contributors. You are champions!
Maybe more performance (on CPU) can be had with bringing sparsity into the workflow. Here is one of the many efforts out there at the moment.
https://github.com/neuralmagic/deepsparse
What are peoples thoughts on this?
The text was updated successfully, but these errors were encountered: