Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoRa support to the txt2img and img2img pipelines #119

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

stronk-dev
Copy link
Contributor

@stronk-dev stronk-dev commented Jul 10, 2024

Adds support to load in arbitrary embeddings, modules (like LCM), etc.

Fullfills livepeer/bounties#33

Still requires:
- Testing if it works
- Gracefully deal with non-existing requested LoRas
Design decision: do we want to keep LoRas loaded, or always unload already loaded weights like we do now
Design decision: use the current method of requesting LoRas, or explore other options
Design decision: abort inference if one of the loras param is invalid or it fails to load one of the LoRas, or continue on like it does now

LoRas can be loaded by passing a new loras parameter. In the current design this needs to be a string, parseable as JSON. For example: curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"light saber battle in the death star", "loras": "{ \"nerijs/pixel-art-xl\" : 1.2 }"}'

@rickstaa rickstaa force-pushed the main branch 3 times, most recently from cd1feb4 to 0d03040 Compare July 16, 2024 13:10
@stronk-dev
Copy link
Contributor Author

Did some testing:

2024-07-16 13:38:15,602 INFO:     Application startup complete.
2024-07-16 13:38:15,604 INFO:     Uvicorn running on https://0.0.0.0:8000 (Press CTRL+C to quit)
100%|██████████| 50/50 [00:06<00:00,  7.93it/s]
2024-07-16 13:38:23,625 INFO:     172.17.0.1:52384 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:08<00:00,  6.24it/s]
2024-07-16 13:39:10,578 INFO:     172.17.0.1:35758 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:06<00:00,  8.13it/s]
2024-07-16 13:39:22,707 INFO:     172.17.0.1:34084 - "POST /text-to-image HTTP/1.1" 200 OK
100%|██████████| 50/50 [00:08<00:00,  6.23it/s]
2024-07-16 13:39:36,599 INFO:     172.17.0.1:39376 - "POST /text-to-image HTTP/1.1" 200 OK

All images on a 4090, SDXL base model using prompt light saber battle in the death star. Requests 1 and 3 are without LoRas. Requests 2 and 4 requested the nerijs/pixel-art-xl LoRa.

img1
img2
img3
img4

Interesting inference is a bit slower when using LoRas. I could've used a better trigger words in the prompt, but certainly seems like the LoRa was loaded.

One thing which we might want to rethink if the way to pass loras parameter:

curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"light saber battle in the death star", "loras": "{ \"nerijs/pixel-art-xl\" : 1.2 }"}'

@stronk-dev stronk-dev marked this pull request as ready for review July 16, 2024 13:51
@stronk-dev
Copy link
Contributor Author

If the user requests an invalid LoRa repo, it will print the error 2024-07-16 14:14:03,062 - app.pipelines.util - WARNING - Unable to load LoRas for adapter 'nerijs/pixel-ar' (RepositoryNotFoundError)

We can make this more verbose by printing the entire exeption. The runner will continue with inference, but without using the LoRa

@stronk-dev
Copy link
Contributor Author

(as a sidenote: i think it would be useful if exceptions like that are collected and passed back. Make a best effort to complete inference and inform the user of any issues it found during the job. Alternatively we could also abort inference)

@eliteprox
Copy link
Collaborator

If the user requests an invalid LoRa repo, it will print the error 2024-07-16 14:14:03,062 - app.pipelines.util - WARNING - Unable to load LoRas for adapter 'nerijs/pixel-ar' (RepositoryNotFoundError)

We can make this more verbose by printing the entire exeption. The runner will continue with inference, but without using the LoRa

I like your LoRa input validation solution because it handles all incorrect values sufficiently. However, we might want to return these errors to the gateway later to inform the user. I think we should return the error from load_loras and return a bad request in the runner in case of an invalid lora or weight so go-livepeer can return it. I like the error messages you have now, I don't think more detail is needed on them.

We could also hold off on making that change until we develop the go-livepeer side, @rickstaa any thoughts on that approach?

@eliteprox
Copy link
Collaborator

eliteprox commented Aug 7, 2024

Design decision: do we want to keep LoRas loaded, or always unload already loaded weights like we do now

VRAM usage looks great, I tested with multiple concurrent requests of different loras. I think this is working good as it is. If this would enhance inference time, I think we should backlog it as a pipeline improvement.

Design decision: use the current method of requesting LoRas, or explore other options

This implementation appears to be working great, I tried a few LoRas from hugging-face and they are downloaded automatically

Copy link
Collaborator

@eliteprox eliteprox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @stronk-dev! This a nice addition to the image pipelines. I tested both text-to-image and image-to-image using the ByteDance/SDXL-Lightning mdoel with two different loras and multiple images. The pipelines are working great with LoRa support.

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

@eliteprox eliteprox force-pushed the feature/loras branch 3 times, most recently from 6544ef0 to f6c7e94 Compare August 10, 2024 00:57
@rickstaa
Copy link
Collaborator

Intersting 🤔! Not sure what went on during the OpenAPI spec configuration -> dcaa961. Maybe my generation code is no longer sufficient. Will revert dcaa961 and check on monday.

@stronk-dev
Copy link
Contributor Author

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

Just to confirm: we should return a bad request error and abort inference for any of the Exceptions in load_loras function?

Copy link
Collaborator

@eliteprox eliteprox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is ready merge if there's no further optimizations. Please review @stronk-dev and @stronk-dev if you could take a quick look. I've fully tested both text-to-image and image-to-image pipelines with various loras

@eliteprox
Copy link
Collaborator

eliteprox commented Aug 23, 2024

See my comments above on the remaining design decisions. I think responding with an informative bad request response when invalid LoRa values are passed will help inform the user on the gateway side. If you can make that change (or we decide to do it during go-livepeer integration) and resolve conflicts then the PR looks good to me.

Just to confirm: we should return a bad request error and abort inference for any of the Exceptions in load_loras function?

Correct and I'd like to send the specific message back to the gateway. I think this would help developers learn the API faster, but not required for this initial release of LoRa support in my opinion, open to suggestions

@stronk-dev
Copy link
Contributor Author

Thanks for all the tweaks, @eliteprox ! LGTM

@eliteprox
Copy link
Collaborator

Thanks for all the tweaks, @eliteprox ! LGTM

@rickstaa I've resolved conflicts on runner.gen.go and re-generated the openapi schema. This is ready for merge along with livepeer/go-livepeer#3154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants