Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support petals distributed model classes #205

Merged
merged 2 commits into from
Jul 21, 2023
Merged

Support petals distributed model classes #205

merged 2 commits into from
Jul 21, 2023

Conversation

gsarti
Copy link
Member

@gsarti gsarti commented Jul 21, 2023

Description

This PR adds preliminary support for the AutoDistributedModelForCausalLM class from the petals library. Concretely, it supports bypassing the usage of attention_mask in model.generate and model.forward, while support for attention information in petals is still WIP (see #158 for reference).

Usage

The following code snippet showcases an example of contrastive attribution using the Input X Gradient method and the Llama 65B model (tested on a 6GB RTX3060 machine):

import inseq
from petals import AutoDistributedModelForCausalLM

model_name = "enoch/llama-65b-hf"
model = AutoDistributedModelForCausalLM.from_pretrained(model_name).cuda()
inseq_model = inseq.load_model(model=model, tokenizer=model_name, attribution_method="input_x_gradient")

>>> [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
>>> [INFO] Using DHT prefix: llama-65b-hf
>>> [INFO] Loading model with input_x_gradient method...

txt = """Option 1: Take a 50 minute bus, then a half hour train, and finally a 10 minute bike ride.
Option 2: Take a 10 minute bus, then an hour train, and finally a 30 minute bike ride.
Which of the options above is faster to get to work?
The answer is Option """

out = inseq_model.attribute(
    txt,
    txt + "1",
    attributed_fn="contrast_prob_diff",
    contrast_targets=txt + "2",
    step_scores=["contrast_prob_diff", "probability"],
)

>>> Attributing with input_x_gradient...: 100%|██████████| 80/80 [00:37<00:00, 37.55s/it]

out.show()

Notes

Method requiring model internals (e.g. attention) are currently not supported and will raise exceptions if used alongside AutoDistributedModelForCausalLM classes.

@gsarti gsarti added the enhancement New feature or request label Jul 21, 2023
@gsarti gsarti added this to the v0.5 milestone Jul 21, 2023
@gsarti gsarti linked an issue Jul 21, 2023 that may be closed by this pull request
@gsarti gsarti merged commit ea9d982 into main Jul 21, 2023
4 checks passed
@gsarti gsarti deleted the petals-compat branch July 21, 2023 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

petals compatibility issue tracker
1 participant