-
Notifications
You must be signed in to change notification settings - Fork 353
Conversation
I tried using
|
Nice one! Much appreciated - hope you can figure out what's going on with |
Yeah, looks like we might have to implement our own parser. Will need to explore that at some point 😢 I'm tempted to clean this up and merge it in but with the functionality disabled, so that we can have the base functionality in and we can develop it as we go. |
I'd say this is reasonable 👍 There's some work in this PR and leaving it here for too long means it will end up diverging too much. |
I'm going to update this to the latest version, hide the CLI version for now, and merge it in - we can then work on our own parser for the tensors when we have some time 🚀 |
Things that I'll fix:
|
Nice! All good for me |
Actually, I just remembered something that might help you with your pickle problem: BlinkDL/ChatRWKV#40 (comment) That's some example code for manually loading a Using this approach would still need a little helper Python script but it could do something like just scan through the tensors in the edit: Also, just occurred to me... Maybe you're trying to load the zip file with |
This also fixes the issue where `.tmp` files would be detected.
I've been experimenting with this and it's definitely not something as simple as accidentally loading the ZIP file. Also, I'd say whoever invented the pickle format should be taken out back and shot, but that's far to clean an end for someone responsible for such heinous crimes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes look good :)
There are a few comments about the CLI settings. I also haven't had time to download and test this to make sure all the subcommands work as expected. I want to do this later today but if you've tested everything on your end I'm okay with merging after the comments are addressed
No idea why I put so much time into this, but: https://github.com/KerfuffleV2/repugnant-pickle You can now (hopefully) parse tensor metadata from PyTorch files in Python. See this part for an example of what you get: https://github.com/KerfuffleV2/repugnant-pickle#pytorch |
That is impressive and horrifying. I hope we can make use of it at some point. |
In an ideal world, such horrifying things wouldn't be needed. Unfortunately... Anyway, I dogfooded it and used it to add support to my RWKV project. It was a pretty easy change, so if an example is helpful: https://github.com/KerfuffleV2/smolrsrwkv/pull/3/files I think this should make it pretty easy to write Rust tools for interfacing with PyTorch models as long as they don't have anything weird going on. I tried it on all the RWKV, Llama and Alpaca files I have and it was able to extract the tensor metadata without a problem. I don't know if this is something llama-rs would want to depend on. If so though, one thing that could help me make it more reliable is if people could run the |
You are a madman. OK - I think we should get this PR in, and then get to work on a
I figure this will happen naturally as people try it out - no rush on testing it ahead of time if it works on all the usual models we know and love. |
I have no time to review things today, but YES PLEASE 😭 Being able to load the real weights directly would be so much nicer for users. |
@setzer22 Just to be clear, what I wrote just allows interfacing with the PyTorch files and discovering the tensor metadata (what tensors exist, what types, the dimensions, where they are in the file). That will help facilitate writing something like a conversion utility without needing to involve Python or Torch. However, if you wanted to do something like load the original non-GGML PyTorch model that would be a much more difficult task since the tensors aren't in the GGML format, may not be quantized, etc. |
That's fine - the application here would be to convert them to GGML format. In future, we'll figure out a way to load them directly (but I suspect most people can't load the unquantised models anyway) How hard would it be to load the f16 tensor data? |
That function expects to get a filename + the |
I'm still working on adding the weights to the file, rn it only adds params and tokens to the file (the md5 hash matches the llama.cpp generated file without the weights). And also no quantizing.
Also added
generate
andconvert
subcommands to the cli as well.Let me know if anything needs changing 🙂
Partially resolves #21