Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How GGML is different from ONNX #3022

Closed
biswaroop1547 opened this issue Sep 5, 2023 · 7 comments
Closed

How GGML is different from ONNX #3022

biswaroop1547 opened this issue Sep 5, 2023 · 7 comments
Labels

Comments

@biswaroop1547
Copy link

I am looking to create an exhaustive pros and cons list for ONNX vs GGML, and would like some help if someone can describe or give pointers on how GGML is different from ONNX.

Currently I am aware that GGML supports 4bit-quantization and follows a no-dependency approach (as mentioned here), and the format in which it creates the computation graph and stores the weights with optimizations (if any) is different.

Apart from this what are the differentiating factors here?

@wangpy1204
Copy link

I also want to ask this question.

@casperdcl
Copy link

found some related issues as well:

@KerfuffleV2
Copy link
Collaborator

I think the question is a little ambiguous. GGML could mean the machine language library itself, the file format (now called GGUF) or maybe even an implementation based on GGML that can do stuff like run inference on models (llama.cpp).

From the GGML as a library side, there isn't really a "format" for the graph, there's an API you can use to construct the graph. Likewise for the weights, they don't have to come from a GGML/GGUF format file at all. Just for example, my little Rust RWKV implementation over here actually only loads models for PyTorch or SafeTensors format files and dynamically quantizes the tensors.

@staviq
Copy link
Collaborator

staviq commented Sep 5, 2023

Glancing through ONNX GitHub readme, from what I understand ONNX is just a "model container" format without any specifics associated inference engine, whereas GGML/GGUF are part of an inference ecosystem together with ggml/llama.cpp.

So the difference would be roughly similar to a 3d model vs unreal engine asset.

@biswaroop1547
Copy link
Author

@staviq sorry for not being clear, but for inference onnx can use onnxruntime which has multiple backends/ optimisations support.

@staviq
Copy link
Collaborator

staviq commented Sep 5, 2023

I see, thank you for clarification.

@github-actions github-actions bot added the stale label Mar 21, 2024
Copy link
Contributor

github-actions bot commented Apr 5, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants