Partially convert `pth` to `ggml` #83

karelnagel · 2023-03-27T16:52:23Z

I'm still working on adding the weights to the file, rn it only adds params and tokens to the file (the md5 hash matches the llama.cpp generated file without the weights). And also no quantizing.
Also added generate and convert subcommands to the cli as well.
Let me know if anything needs changing 🙂

Partially resolves #21

karelnagel · 2023-03-27T17:38:16Z

I tried using serde_pickle, but it panics every time I try to load the file

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Eval(Unsupported('P'), 1)', llama-rs/src/convert.rs:152:79

serde-pickle might not work for torch files according to these: guillaume-be/rust-bert#12 , LaurentMazare/tch-rs#171.
Other option would be to use https://github.com/LaurentMazare/tch-rs but this would require installing c++ libtorch on the system 🥲. I will try to research more tomorrow.

philpax · 2023-03-27T22:27:31Z

Nice one! Much appreciated - hope you can figure out what's going on with serde_pickle. Would really like to avoid having a dependency on Torch if possible.

karelnagel · 2023-03-30T08:10:09Z

I had no luck with serde_pickle, running it like this:

And also tried to use safetensors deserialize but panics with HeaderTooLarge, but idk if I used it correctly.
I can't work on this for some time, so if anyone wants to continue from here on, then go ahead 🙂

philpax · 2023-03-30T09:28:43Z

Yeah, looks like we might have to implement our own parser. Will need to explore that at some point 😢

I'm tempted to clean this up and merge it in but with the functionality disabled, so that we can have the base functionality in and we can develop it as we go.

setzer22 · 2023-04-01T08:49:49Z

I'm tempted to clean this up and merge it in but with the functionality disabled, so that we can have the base functionality in and we can develop it as we go.

I'd say this is reasonable 👍 There's some work in this PR and leaving it here for too long means it will end up diverging too much.

philpax · 2023-04-02T09:12:03Z

I'm going to update this to the latest version, hide the CLI version for now, and merge it in - we can then work on our own parser for the tensors when we have some time 🚀

philpax · 2023-04-02T10:54:09Z

Things that I'll fix:

Remove get_n_parts (look at the directory like the load code does)
Split out the REPL as a separate CLI mode
Update the README
Make it clear what the f32 parameter does (enum?)
~~Use a Result type for the conversion process~~ Do this when the actual conversion process is complete
Make convert an optional feature, including its dependencies
Clean up main.rs in general
Change Vocabulary::from to a normal function (obscures the load operation)

karelnagel · 2023-04-02T14:16:55Z

Nice! All good for me

KerfuffleV2 · 2023-04-02T15:51:54Z

Actually, I just remembered something that might help you with your pickle problem: BlinkDL/ChatRWKV#40 (comment)

That's some example code for manually loading a .pth file (they're actually ZIP files with the data uncompressed, so it's contiguous and can be read directly). From that example, you can basically just figure out what files and ranges of data in the .pth are associated with what without needing Torch or even having to load the entire thing.

Using this approach would still need a little helper Python script but it could do something like just scan through the tensors in the .pth file and collected the metadata. Only data.pkl is actually going to be pickled, the actual tensors are just raw data.

edit: Also, just occurred to me... Maybe you're trying to load the zip file with serde_pickle? That definitely wouldn't work.

This also fixes the issue where `.tmp` files would be detected.

KerfuffleV2 · 2023-04-03T14:53:57Z

I've been experimenting with this and it's definitely not something as simple as accidentally loading the ZIP file. serde_pickle doesn't support the BINPERSID opcode, and it also can't handle stuff using OrderedDict. I hacked support for both of those into it and it can load the pickle, but the results aren't correct. I can't tell if it's a problem specifically to do with these changes or just because serde_pickle is buggy.

Also, I'd say whoever invented the pickle format should be taken out back and shot, but that's far to clean an end for someone responsible for such heinous crimes.

…th-to-ggml

setzer22

Overall changes look good :)

There are a few comments about the CLI settings. I also haven't had time to download and test this to make sure all the subcommands work as expected. I want to do this later today but if you've tested everything on your end I'm okay with merging after the comments are addressed

llama-cli/src/cli_args.rs

KerfuffleV2 · 2023-04-06T11:48:05Z

No idea why I put so much time into this, but: https://github.com/KerfuffleV2/repugnant-pickle

You can now (hopefully) parse tensor metadata from PyTorch files in Python.

See this part for an example of what you get: https://github.com/KerfuffleV2/repugnant-pickle#pytorch

philpax · 2023-04-06T12:08:34Z

That is impressive and horrifying. I hope we can make use of it at some point.

KerfuffleV2 · 2023-04-06T16:42:51Z

That is impressive and horrifying. I hope we can make use of it at some point.

In an ideal world, such horrifying things wouldn't be needed. Unfortunately...

Anyway, I dogfooded it and used it to add support to my RWKV project. It was a pretty easy change, so if an example is helpful: https://github.com/KerfuffleV2/smolrsrwkv/pull/3/files

I think this should make it pretty easy to write Rust tools for interfacing with PyTorch models as long as they don't have anything weird going on. I tried it on all the RWKV, Llama and Alpaca files I have and it was able to extract the tensor metadata without a problem.

I don't know if this is something llama-rs would want to depend on. If so though, one thing that could help me make it more reliable is if people could run the dump_torch example on their PyTorch model files and try to find a case where it fails to produce the correct result.

philpax · 2023-04-06T17:07:19Z

You are a madman.

OK - I think we should get this PR in, and then get to work on a repugnant-pickle implementation of this instead. That looks straightforward enough, and it would put us in the pretty enviable position of a no-Python solution for LLaMA.

If so though, one thing that could help me make it more reliable is if people could run the dump_torch example on their PyTorch model files and try to find a case where it fails to produce the correct result.

I figure this will happen naturally as people try it out - no rush on testing it ahead of time if it works on all the usual models we know and love.

…th-to-ggml

setzer22 · 2023-04-06T19:04:53Z

I have no time to review things today, but YES PLEASE 😭

Being able to load the real weights directly would be so much nicer for users.

KerfuffleV2 · 2023-04-06T20:55:23Z

@setzer22 Just to be clear, what I wrote just allows interfacing with the PyTorch files and discovering the tensor metadata (what tensors exist, what types, the dimensions, where they are in the file). That will help facilitate writing something like a conversion utility without needing to involve Python or Torch.

However, if you wanted to do something like load the original non-GGML PyTorch model that would be a much more difficult task since the tensors aren't in the GGML format, may not be quantized, etc.

philpax · 2023-04-06T20:57:40Z

That's fine - the application here would be to convert them to GGML format. In future, we'll figure out a way to load them directly (but I suspect most people can't load the unquantised models anyway)

How hard would it be to load the f16 tensor data?

KerfuffleV2 · 2023-04-06T21:13:43Z

Very easy: https://github.com/KerfuffleV2/smolrsrwkv/blob/182cd3205b7a7c95571a09bcfbb954b0041e4f90/smolrwkv/src/loader.rs#L87

That function expects to get a filename + the mmaped entire .pth file (just as an example, you can do it other ways). Based on the absolute offset + tensor shape and element size, it can calculate the range of data in the file that is associated with that tensor.

pth to ggml, write_tokens working correctly

5ab3ba7

philpax mentioned this pull request Mar 29, 2023

Keep context in repl #76

Closed

fixed clippy errors

86cc1eb

philpax mentioned this pull request Apr 2, 2023

feat: replace $PROMPT in prompt files in non-REPL #99

Merged

Merge branch 'main' into feature/pth-to-ggml

cfb521f

philpax added 3 commits April 2, 2023 12:55

chore: hyperparams -> hyperparameters

a97351d

refactor: Vocabulary::from -> load_vocabulary

e279d3c

feat: make convert optional

952fe93

philpax added 3 commits April 2, 2023 20:15

fix: use shared codepath for locating model files

3ee64b0

This also fixes the issue where `.tmp` files would be detected.

refactor: use Rust enum for ggml::Type

0259dbc

feat(convert): switch to enum

6974104

philpax added 2 commits April 4, 2023 22:14

Merge branch 'main' of github.com:rustformers/llama-rs into feature/p…

8f4491e

…th-to-ggml

feat: split CLI into multiple modes

3ee4764

philpax changed the title ~~WIP: Convert pth to ggml~~ Convert pth to ggml Apr 4, 2023

philpax changed the title ~~Convert pth to ggml~~ Partially convert pth to ggml Apr 4, 2023

philpax requested a review from setzer22 April 4, 2023 22:28

philpax mentioned this pull request Apr 5, 2023

Publish to crates.io #57

Closed

4 tasks

philpax added 2 commits April 5, 2023 10:17

Merge branch 'main' of github.com:rustformers/llama-rs into feature/p…

636813e

…th-to-ggml

feat: chat -> chat-experimental

bed53cc

setzer22 reviewed Apr 5, 2023

View reviewed changes

Address review feedback

cd489a2

philpax added 2 commits April 6, 2023 19:11

Merge branch 'main' of github.com:rustformers/llama-rs into feature/p…

dcc69c4

…th-to-ggml

docs: update README

5ec5048

philpax requested a review from setzer22 April 6, 2023 17:23

setzer22 approved these changes Apr 6, 2023

View reviewed changes

philpax merged commit 44ddfe8 into rustformers:main Apr 6, 2023

This was referenced Apr 6, 2023

WIP: Refactor Cli #74

Closed

Consider parsing models with binrw #117

Closed

wallies mentioned this pull request Jun 12, 2023

rust-bert to directly support torch.load's format? guillaume-be/rust-bert#332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partially convert `pth` to `ggml` #83

Partially convert `pth` to `ggml` #83

karelnagel commented Mar 27, 2023 •

edited by philpax

Loading

karelnagel commented Mar 27, 2023

philpax commented Mar 27, 2023

karelnagel commented Mar 30, 2023

philpax commented Mar 30, 2023

setzer22 commented Apr 1, 2023

philpax commented Apr 2, 2023

philpax commented Apr 2, 2023 •

edited

Loading

karelnagel commented Apr 2, 2023

KerfuffleV2 commented Apr 2, 2023 •

edited

Loading

KerfuffleV2 commented Apr 3, 2023

setzer22 left a comment

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

setzer22 commented Apr 6, 2023 •

edited

Loading

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

KerfuffleV2 commented Apr 6, 2023

Partially convert pth to ggml #83

Partially convert pth to ggml #83

Conversation

karelnagel commented Mar 27, 2023 • edited by philpax Loading

karelnagel commented Mar 27, 2023

philpax commented Mar 27, 2023

karelnagel commented Mar 30, 2023

philpax commented Mar 30, 2023

setzer22 commented Apr 1, 2023

philpax commented Apr 2, 2023

philpax commented Apr 2, 2023 • edited Loading

karelnagel commented Apr 2, 2023

KerfuffleV2 commented Apr 2, 2023 • edited Loading

KerfuffleV2 commented Apr 3, 2023

setzer22 left a comment

Choose a reason for hiding this comment

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

setzer22 commented Apr 6, 2023 • edited Loading

KerfuffleV2 commented Apr 6, 2023

philpax commented Apr 6, 2023

KerfuffleV2 commented Apr 6, 2023

Partially convert `pth` to `ggml` #83

Partially convert `pth` to `ggml` #83

karelnagel commented Mar 27, 2023 •

edited by philpax

Loading

philpax commented Apr 2, 2023 •

edited

Loading

KerfuffleV2 commented Apr 2, 2023 •

edited

Loading

setzer22 commented Apr 6, 2023 •

edited

Loading