WIP: Bloom Inference #85

hhamud · 2023-03-29T03:06:33Z

Completes #45

Will refactor and remove most of the duplicate code before merging.

Adds a build flag for apple silicon
Adds ggml_alibi, ggml_compute_alibi_forward_alibi_f32, ggml_compute_forward_alibi_f16, ggml_compute_forward_alibi to ggml.c
Adds in ggml_view_2d, ggml_alibi and ggml_gelu to lib.rs in ggml-raw crate.
Adds a bloom mode flag to the CLI.
Splits up the original library into a commons folder and the models into a models folder
Moved some code into functions in main.rs in llama-cli
Updates ggml.h

hhamud · 2023-04-01T17:40:58Z

Hitting a bit of an issue here when trying to read in the converted bloom models on HF

[2023-04-01T16:59:13Z INFO  llama_cli] Warning: Bad token in vocab at index 0
thread 'main' panicked at 'Could not load model: ReadExactFailed { source: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }, bytes: 4 }', llama-cli/src/main.rs:267:10

philpax · 2023-04-07T14:58:54Z

llama-rs/src/models/bloom.rs

+
+/// The weights for the BLOOM model. All the mutable state is split into a
+/// separate struct `InferenceSession`.
+pub struct BLOOM {


I'd go with Bloom:

In UpperCamelCase, acronyms and contractions of compound words count as one word: use Uuid rather than UUID, Usize rather than USize or Stdin rather than StdIn. In snake_case, acronyms and contractions are lower-cased: is_xid_start.

https://rust-lang.github.io/api-guidelines/naming.html

philpax · 2023-04-07T15:03:18Z

Sorry for taking so long to look at this - it looks good! Once @setzer22 gives his OK (just so that he's happy with the overall structure of things), I'll get it ready for the PR merge chain. Don't update the PR yet - there's some other changes we'll likely need to land first, and it'll be easier for you to do them all at once.

hhamud · 2023-04-07T17:20:36Z

Sorry for taking so long to look at this - it looks good! Once @setzer22 gives his OK (just so that he's happy with the overall structure of things), I'll get it ready for the PR merge chain. Don't update the PR yet - there's some other changes we'll likely need to land first, and it'll be easier for you to do them all at once.

sure, I decided against further restructuring the model load function to cut down on code duplication since there are multiple PRs either making changes or using it.

I will re-open this #74 and refactor it after completing #85 this one.

iacore · 2023-04-08T08:35:47Z

I hope this could be in another repo. Having bloom model in llama-rs is strange.

philpax · 2023-04-08T09:56:35Z

Yes - the reason we're keeping it here for now is because they share a lot of commonalities in architecture, and the base LLaMA library keeps changing. We will probably do one or more of the following over time:

Put BLOOM support under an optional feature
Rename the library to indicate its support for more than one kind of GPT-like
Work out how to share the implementation details, then split them all out into their own crates

danforbes · 2023-04-13T22:49:56Z

What's the status of this? Looks like it's not very up-to-date. Is there anything I can do to help?

hhamud · 2023-04-13T23:25:53Z

What's the status of this? Looks like it's not very up-to-date. Is there anything I can do to help?

It's on pause until most of the major changes have been merged in, see discussion

philpax · 2023-04-13T23:40:44Z

There's also a chance that we investigate #137 before we tackle this again, just to reduce the rework if we go down that road, but I'm not sure yet. (Would appreciate your thoughts on it, btw!)

setzer22 · 2023-04-14T10:39:14Z

There's also a chance that we investigate #137 before we tackle this again, just to reduce the rework if we go down that road, but I'm not sure yet. (Would appreciate your thoughts on it, btw!)

I would not go that far. Making our own computation graphs is a significant undertaking, and support for Bloom has been here for a while now. I would prioritize merging this before making any other big refactors.

In my experience, abstractions always come up better when you don't design them with a single use case to test. If we build the computation graph API and only make sure it works for LLaMA, chances are we're going to have to rework it later on anyway when adding support for other models. It would be better to have multiple models in first, to make sure the abstraction we come up with is more solid.

hhamud · 2023-04-20T00:27:56Z

closing this PR as #141 has integrated changes from this #85 PR

hhamud added 6 commits March 29, 2023 02:28

feat: add bloom file

95d7b27

feat: add ggml_view_2d and ggml_alibi functions

98ae79a

feat: update ggml-raw to add bloom functions

1dccac8

feat: update enum count and add alibi struct

7090240

feat: update lib.rs

5e534ce

TEMP: squash later

9d0293d

hhamud marked this pull request as draft March 29, 2023 03:06

hhamud added 4 commits March 29, 2023 22:35

feat: add op_gelu to llama-ggml

1bcc120

refactor: remove duplicate code

272ed79

refactor: split up lib.rs file into a common folder for re-use

e64ce7b

refactor: duplicate code between models

403e225

philpax mentioned this pull request Mar 30, 2023

Support for RWKV #75

Open

hhamud added 5 commits March 31, 2023 05:49

refactor: fix errors and impl trait

291aaea

fix: lifetime errors

af3088c

fix: converted borrows into box in loadprogress

b3cbf95

fix: import errors

d5c542a

fix: apple aarch ggml-raw compile

ac4d898

hhamud marked this pull request as ready for review April 1, 2023 17:39

hhamud added 9 commits April 2, 2023 23:49

fix: model loading errors

d70b9b1

merge: upstream/main

c302bfd

fix: merge conflicts

8174da4

fix: i32 errors and conflicts

edf0855

refactor: encapsulate in basic functions

0c7d2a3

fix: prompt errors

598aef7

merge: fix conflicts

c051285

fix: missing alibi functions

3ab3926

feat: update to i64

aba7349

philpax mentioned this pull request Apr 6, 2023

Add GGJT loader #114

Closed

hhamud mentioned this pull request Apr 6, 2023

WIP: Refactor Cli #74

Closed

3 tasks

philpax mentioned this pull request Apr 6, 2023

Consider parsing models with binrw #117

Closed

philpax reviewed Apr 7, 2023

View reviewed changes

philpax added this to the 0.1 milestone Apr 10, 2023

philpax mentioned this pull request Apr 13, 2023

Rename to llm #136

Closed

danforbes mentioned this pull request Apr 13, 2023

Standalone loader #125

Merged

danforbes mentioned this pull request Apr 16, 2023

BLOOM Refactor #141

Closed

philpax mentioned this pull request Apr 20, 2023

Support for BLOOM #45

Closed

hhamud closed this Apr 20, 2023

danforbes mentioned this pull request Apr 30, 2023

Structural Overhaul #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Bloom Inference #85

WIP: Bloom Inference #85

hhamud commented Mar 29, 2023 •

edited

Loading

hhamud commented Apr 1, 2023

philpax Apr 7, 2023

philpax commented Apr 7, 2023

hhamud commented Apr 7, 2023

iacore commented Apr 8, 2023 •

edited

Loading

philpax commented Apr 8, 2023

danforbes commented Apr 13, 2023

hhamud commented Apr 13, 2023

philpax commented Apr 13, 2023

setzer22 commented Apr 14, 2023 •

edited

Loading

hhamud commented Apr 20, 2023

WIP: Bloom Inference #85

WIP: Bloom Inference #85

Conversation

hhamud commented Mar 29, 2023 • edited Loading

hhamud commented Apr 1, 2023

philpax Apr 7, 2023

Choose a reason for hiding this comment

philpax commented Apr 7, 2023

hhamud commented Apr 7, 2023

iacore commented Apr 8, 2023 • edited Loading

philpax commented Apr 8, 2023

danforbes commented Apr 13, 2023

hhamud commented Apr 13, 2023

philpax commented Apr 13, 2023

setzer22 commented Apr 14, 2023 • edited Loading

hhamud commented Apr 20, 2023

hhamud commented Mar 29, 2023 •

edited

Loading

iacore commented Apr 8, 2023 •

edited

Loading

setzer22 commented Apr 14, 2023 •

edited

Loading