Ported `quantize.cpp` #84

FloppyDisck · 2023-03-27T22:48:41Z

I went ahead and ported the main quantize.cpp file. My changes involve porting the file while keeping the internal C++ function calls intact. I have plans to port those function calls to remove ggml dependencies in a future PR though.

During the porting process, I faced some challenges as I was not familiar with how to use Context. As a result, I added the half library to handle the f16->f32 conversion. I could remove the dependency if needed but ill need some help with working with Context. Something to note on this is that if there are plans to move away from ggml then the half library will be necessary.

Additionally, I included some print statements inside the function to mimic the original behavior of quantize.cpp. I can remove those if needed.

Currently, there is no way to access the function since I did not implement a CLI function for it.

I am open to feedback and suggestions on how to improve this pull request.

Resolves #40

philpax · 2023-03-29T19:59:52Z

Awesome work! I've left some feedback; it's not the most Rust-y code, but that's fine as it's a port and we can fix that up later on 🙂

Really appreciate you doing this, it's great to get one step closer to being completely standalone 🚀

FloppyDisck · 2023-03-29T21:15:28Z

Your comments arent showing up in the PR,

And I fully agree, im going to go ahead and fix all the clippy issues plus see if I can improve some of the logic.

Do you have any recommendations on data reading and writing? I use the same buffer multiple times since its more efficient when working on rust embedded systems so I went ahead and did the same here.

philpax · 2023-03-30T07:34:16Z

Weird, I can definitely see the comments here and in the diff. Not sure what's happening there.

For data read/write, not sure - reusing the same buffer seems reasonable if they're semantically similar, but I'd just use a new buffer if they're not. Do you have any examples of something you'd want advice on?

ggml-raw/ggml/ggml.c

llama-rs/src/ggml.rs

llama-rs/src/quantize.rs

llama-rs/src/ggml.rs

llama-rs/src/quantize.rs

philpax · 2023-04-06T22:54:46Z

I've updated the PR, but the output seems to be incorrect:

thread 'main' panicked at 'Could not load model: TensorWrongSize { tensor_name: "tok_embeddings.weight", path: "../llama-models/v0/ggml-model-q4_0-rust.bin" }', llama-cli/src/cli_args.rs:319:14

Probably an assumption somewhere that I broke. Need to look into it further - any ideas?

I'd also like to support loading unversioned models and GGJT, so this is going to be a bit of a headache in general :(

FloppyDisck · 2023-04-08T17:36:13Z

Looking at your commits, I found a couple of places where it could've broken, Ill check it out and see if it fixes it.

FloppyDisck · 2023-04-09T02:13:21Z

@philpax Regarding supporting other types of models, if you can provide the relevant issues I could research into making those work.

ggml/src/lib.rs

philpax · 2023-04-13T01:50:56Z

Ok, updated to the latest main, haven't tested if it works. Couple of notes:

It would be really nice to support GGML and GGJT, not just GGMF. Standalone loader #125 should make this much easier, but could be a while off.
I think f32 as the element type for dst of the ggml quantize functions is wrong. The resulting values will not be floats in any meaningful sense. u32 is still wrong, but makes it clearer that it's a different data type.
It'd be a good idea to add these asserts to the ggml quantize functions (assuming that they're correct):

assert_eq!(src.len(), n as usize);
assert_eq!(dst.len(), n as usize);
assert!(hist.len() >= 16);

KerfuffleV2 · 2023-04-13T01:54:26Z

u32 is still wrong, but makes it clearer that it's a different data type.

What about u8? When I see that, I think "just a bunch of bytes".

philpax · 2023-04-13T03:03:46Z

What about u8? When I see that, I think "just a bunch of bytes".

The issue is the size - it should be equivalent to the size of the original array in bytes. I guess we could shove 4*size onto the user, it's not that big of a deal

philpax · 2023-04-13T10:07:01Z

Merged in main again, comments/questions from above still apply

FloppyDisck · 2023-04-13T20:23:07Z

I agree, but maybe this should be moved into a different issue I can tackle later on? Maybe one based on making take a more module approach?
In the original implementation both work and data are f32, altho we could have work be a u8 and modify the code accordingly.
I agree, the asserts should work on the current model

philpax · 2023-04-13T23:39:46Z

Yeah, probably. If Standalone loader #125 gets done sooner rather than later we can give it a shot, but if not I'm happy to merge as-is and then we can patc hit up
Looks like it's a llama_buffer in the original code https://github.com/ggerganov/llama.cpp/blob/be87b6ed20a5f7528bf491a83e759a9fc6a24fea/llama.cpp#L1596 but that's fine, we could just use a Vec<u8> here.

Cool then, let's see if #125 happens soon and in the meantime we can fix 2/3.

philpax · 2023-04-19T23:24:12Z

#125 is going to be merged soon if all goes well, but its ggml-format loader doesn't work in its current stage. Given that, I think we're OK to merge this once that's in. Let's try to get support for the other formats as soon as possible, but I won't let that block merging.

philpax · 2023-04-25T20:39:39Z

I implemented write support for the loader (now ggml-format), so now that quantize can accept any single-file GGML/GGMF/GGJT and quantize it to GGJT q4_0 or q4_1. Seems to work from my testing, but I don't have that many f16 models lying around.

FloppyDisck added 2 commits March 27, 2023 00:51

initial quantize implementation

1af69ec

removed unwraps

a43f07a

FloppyDisck changed the title ~~Implemented quantize.cpp~~ Ported quantize.cpp Mar 29, 2023

Guy Garcia added 3 commits March 29, 2023 21:53

removed unused functions

918956e

minor fixes with clippy

d3f22de

removed redundant checks

fea5b22

philpax suggested changes Mar 30, 2023

View reviewed changes

fixed according to comments

69d7ddc

FloppyDisck requested a review from philpax April 1, 2023 05:32

setzer22 mentioned this pull request Apr 2, 2023

Add ggml conversion scripts and documentation for how to convert weights #100

Merged

This was referenced Apr 6, 2023

Add GGJT loader #114

Closed

Directly load pth/PyTorch tensor model files #21

Open

philpax added 2 commits April 7, 2023 00:18

Merge branch 'main' into quantize

4e90696

feat: wire up quantize for CLI

600de36

philpax mentioned this pull request Apr 6, 2023

Consider parsing models with binrw #117

Closed

Fixed merge related bugs

de51830

KerfuffleV2 reviewed Apr 9, 2023

View reviewed changes

ggml/src/lib.rs Outdated Show resolved Hide resolved

philpax added this to the 0.1 milestone Apr 10, 2023

Merge branch 'main' of github.com:rustformers/llama-rs into quantize

02a8999

Merge branch 'main' of github.com:rustformers/llama-rs into quantize

22ac03f

philpax mentioned this pull request Apr 22, 2023

Standalone loader #125

Merged

Merge branch 'main' of github.com:rustformers/llama-rs into quantize

0c8c9a6

philpax mentioned this pull request Apr 22, 2023

A tool to convert multipart models to singlepart models #150

Closed

Merge branch 'main' of github.com:rustformers/llama-rs into quantize

b3a932e

philpax mentioned this pull request Apr 24, 2023

fix #149 - load tensors by type, ignoring filetype #152

Merged

philpax added 8 commits April 25, 2023 04:09

Merge branch 'main' into quantize

fd92ad9

ggml-loader -> ggml-format

5bc1e12

refactor(ggml-format): remove TensorDataTreatment

2243211

refactor(ggml-format): no ControlFlow, revise IF

a8b8cd2

feat: document ggml-format, fix details

ab504b0

chore: remove some clippy ignores

ca99387

feat(ggml-format): implement writer

196d4f3

feat(quantize): rewrite to use ggml-format

6f86e32

feat(ggml): make quantizatin safe

d968bfa

philpax merged commit 6e8aa79 into rustformers:main Apr 25, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ported `quantize.cpp` #84

Ported `quantize.cpp` #84

FloppyDisck commented Mar 27, 2023 •

edited

Loading

philpax commented Mar 29, 2023

FloppyDisck commented Mar 29, 2023

philpax commented Mar 30, 2023

philpax commented Apr 6, 2023

FloppyDisck commented Apr 8, 2023

FloppyDisck commented Apr 9, 2023

philpax commented Apr 13, 2023

KerfuffleV2 commented Apr 13, 2023

philpax commented Apr 13, 2023

philpax commented Apr 13, 2023

FloppyDisck commented Apr 13, 2023

philpax commented Apr 13, 2023

philpax commented Apr 19, 2023

philpax commented Apr 25, 2023

Ported quantize.cpp #84

Ported quantize.cpp #84

Conversation

FloppyDisck commented Mar 27, 2023 • edited Loading

philpax commented Mar 29, 2023

FloppyDisck commented Mar 29, 2023

philpax commented Mar 30, 2023

philpax commented Apr 6, 2023

FloppyDisck commented Apr 8, 2023

FloppyDisck commented Apr 9, 2023

philpax commented Apr 13, 2023

KerfuffleV2 commented Apr 13, 2023

philpax commented Apr 13, 2023

philpax commented Apr 13, 2023

FloppyDisck commented Apr 13, 2023

philpax commented Apr 13, 2023

philpax commented Apr 19, 2023

philpax commented Apr 25, 2023

Ported `quantize.cpp` #84

Ported `quantize.cpp` #84

FloppyDisck commented Mar 27, 2023 •

edited

Loading