Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Partially convert pth to ggml #83

Merged
merged 16 commits into from
Apr 6, 2023

Conversation

karelnagel
Copy link
Contributor

@karelnagel karelnagel commented Mar 27, 2023

I'm still working on adding the weights to the file, rn it only adds params and tokens to the file (the md5 hash matches the llama.cpp generated file without the weights). And also no quantizing.
Also added generate and convert subcommands to the cli as well.
Let me know if anything needs changing 🙂

Partially resolves #21

@karelnagel
Copy link
Contributor Author

I tried using serde_pickle, but it panics every time I try to load the file

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Eval(Unsupported('P'), 1)', llama-rs/src/convert.rs:152:79

serde-pickle might not work for torch files according to these: guillaume-be/rust-bert#12 , LaurentMazare/tch-rs#171.
Other option would be to use https://github.com/LaurentMazare/tch-rs but this would require installing c++ libtorch on the system 🥲. I will try to research more tomorrow.

@philpax
Copy link
Collaborator

philpax commented Mar 27, 2023

Nice one! Much appreciated - hope you can figure out what's going on with serde_pickle. Would really like to avoid having a dependency on Torch if possible.

@philpax philpax mentioned this pull request Mar 29, 2023
@karelnagel
Copy link
Contributor Author

I had no luck with serde_pickle, running it like this:
Screenshot 2023-03-30 at 10 54 09

And also tried to use safetensors deserialize but panics with HeaderTooLarge, but idk if I used it correctly.
I can't work on this for some time, so if anyone wants to continue from here on, then go ahead 🙂

@philpax
Copy link
Collaborator

philpax commented Mar 30, 2023

Yeah, looks like we might have to implement our own parser. Will need to explore that at some point 😢

I'm tempted to clean this up and merge it in but with the functionality disabled, so that we can have the base functionality in and we can develop it as we go.

@setzer22
Copy link
Collaborator

setzer22 commented Apr 1, 2023

I'm tempted to clean this up and merge it in but with the functionality disabled, so that we can have the base functionality in and we can develop it as we go.

I'd say this is reasonable 👍 There's some work in this PR and leaving it here for too long means it will end up diverging too much.

@philpax
Copy link
Collaborator

philpax commented Apr 2, 2023

I'm going to update this to the latest version, hide the CLI version for now, and merge it in - we can then work on our own parser for the tensors when we have some time 🚀

@philpax
Copy link
Collaborator

philpax commented Apr 2, 2023

Things that I'll fix:

  • Remove get_n_parts (look at the directory like the load code does)
  • Split out the REPL as a separate CLI mode
  • Update the README
  • Make it clear what the f32 parameter does (enum?)
  • Use a Result type for the conversion process Do this when the actual conversion process is complete
  • Make convert an optional feature, including its dependencies
  • Clean up main.rs in general
  • Change Vocabulary::from to a normal function (obscures the load operation)

@karelnagel
Copy link
Contributor Author

Nice! All good for me

@KerfuffleV2
Copy link
Contributor

KerfuffleV2 commented Apr 2, 2023

Actually, I just remembered something that might help you with your pickle problem: BlinkDL/ChatRWKV#40 (comment)

That's some example code for manually loading a .pth file (they're actually ZIP files with the data uncompressed, so it's contiguous and can be read directly). From that example, you can basically just figure out what files and ranges of data in the .pth are associated with what without needing Torch or even having to load the entire thing.

Using this approach would still need a little helper Python script but it could do something like just scan through the tensors in the .pth file and collected the metadata. Only data.pkl is actually going to be pickled, the actual tensors are just raw data.

edit: Also, just occurred to me... Maybe you're trying to load the zip file with serde_pickle? That definitely wouldn't work.

@KerfuffleV2
Copy link
Contributor

I've been experimenting with this and it's definitely not something as simple as accidentally loading the ZIP file. serde_pickle doesn't support the BINPERSID opcode, and it also can't handle stuff using OrderedDict. I hacked support for both of those into it and it can load the pickle, but the results aren't correct. I can't tell if it's a problem specifically to do with these changes or just because serde_pickle is buggy.

Also, I'd say whoever invented the pickle format should be taken out back and shot, but that's far to clean an end for someone responsible for such heinous crimes.

@philpax philpax changed the title WIP: Convert pth to ggml Convert pth to ggml Apr 4, 2023
@philpax philpax changed the title Convert pth to ggml Partially convert pth to ggml Apr 4, 2023
@philpax philpax requested a review from setzer22 April 4, 2023 22:28
@philpax philpax mentioned this pull request Apr 5, 2023
4 tasks
Copy link
Collaborator

@setzer22 setzer22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes look good :)

There are a few comments about the CLI settings. I also haven't had time to download and test this to make sure all the subcommands work as expected. I want to do this later today but if you've tested everything on your end I'm okay with merging after the comments are addressed

llama-cli/src/cli_args.rs Outdated Show resolved Hide resolved
llama-cli/src/cli_args.rs Outdated Show resolved Hide resolved
llama-cli/src/cli_args.rs Show resolved Hide resolved
llama-cli/src/cli_args.rs Outdated Show resolved Hide resolved
llama-cli/src/cli_args.rs Outdated Show resolved Hide resolved
@KerfuffleV2
Copy link
Contributor

No idea why I put so much time into this, but: https://github.com/KerfuffleV2/repugnant-pickle

You can now (hopefully) parse tensor metadata from PyTorch files in Python.

See this part for an example of what you get: https://github.com/KerfuffleV2/repugnant-pickle#pytorch

@philpax
Copy link
Collaborator

philpax commented Apr 6, 2023

That is impressive and horrifying. I hope we can make use of it at some point.

@KerfuffleV2
Copy link
Contributor

That is impressive and horrifying. I hope we can make use of it at some point.

In an ideal world, such horrifying things wouldn't be needed. Unfortunately...

Anyway, I dogfooded it and used it to add support to my RWKV project. It was a pretty easy change, so if an example is helpful: https://github.com/KerfuffleV2/smolrsrwkv/pull/3/files

I think this should make it pretty easy to write Rust tools for interfacing with PyTorch models as long as they don't have anything weird going on. I tried it on all the RWKV, Llama and Alpaca files I have and it was able to extract the tensor metadata without a problem.

I don't know if this is something llama-rs would want to depend on. If so though, one thing that could help me make it more reliable is if people could run the dump_torch example on their PyTorch model files and try to find a case where it fails to produce the correct result.

@philpax
Copy link
Collaborator

philpax commented Apr 6, 2023

You are a madman.

OK - I think we should get this PR in, and then get to work on a repugnant-pickle implementation of this instead. That looks straightforward enough, and it would put us in the pretty enviable position of a no-Python solution for LLaMA.

If so though, one thing that could help me make it more reliable is if people could run the dump_torch example on their PyTorch model files and try to find a case where it fails to produce the correct result.

I figure this will happen naturally as people try it out - no rush on testing it ahead of time if it works on all the usual models we know and love.

@philpax philpax requested a review from setzer22 April 6, 2023 17:23
@setzer22
Copy link
Collaborator

setzer22 commented Apr 6, 2023

I have no time to review things today, but YES PLEASE 😭

Being able to load the real weights directly would be so much nicer for users.

@KerfuffleV2
Copy link
Contributor

@setzer22 Just to be clear, what I wrote just allows interfacing with the PyTorch files and discovering the tensor metadata (what tensors exist, what types, the dimensions, where they are in the file). That will help facilitate writing something like a conversion utility without needing to involve Python or Torch.

However, if you wanted to do something like load the original non-GGML PyTorch model that would be a much more difficult task since the tensors aren't in the GGML format, may not be quantized, etc.

@philpax
Copy link
Collaborator

philpax commented Apr 6, 2023

That's fine - the application here would be to convert them to GGML format. In future, we'll figure out a way to load them directly (but I suspect most people can't load the unquantised models anyway)

How hard would it be to load the f16 tensor data?

@KerfuffleV2
Copy link
Contributor

Very easy: https://github.com/KerfuffleV2/smolrsrwkv/blob/182cd3205b7a7c95571a09bcfbb954b0041e4f90/smolrwkv/src/loader.rs#L87

That function expects to get a filename + the mmaped entire .pth file (just as an example, you can do it other ways). Based on the absolute offset + tensor shape and element size, it can calculate the range of data in the file that is associated with that tensor.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Directly load pth/PyTorch tensor model files
4 participants