-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store final layer LM output and record AUROC and acc #165
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested and it seems that the LM outputs appear when I used elicit but not when I use eval -- not sure if I'm missing something.
Edit: nevermind this isn't necessary
"Encoder-decoder model doesn't have expected get_encoder() method" | ||
) | ||
|
||
model = assert_type(PreTrainedModel, model.get_encoder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we removing support for only using the encoder of encoder-decoder models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because that complicates things and is not how you actually do inference on encoder-decoder models normally
# TODO: Do something smarter than "rindex" here. Really we want to | ||
# get the span of the answer directly from Jinja, but that doesn't | ||
# seem possible. This approach may fail for complex templates. | ||
answer_start = text.rindex(choice["answer"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that if answer is the empty string, or the answer appears in the question, this fails. I think this is an important case, as it may be a thing that makes the method work better. Also, if the answer string isn't exactly what's used in the template this would fail (are there other cases?).
I'm wondering if we should store the completion string (the second half of the jinja template) in the prompt dataset, and if it's empty and the user is attempting to use encoder-decoder models, we raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's not necessary to include the answer in the question, because that would slow down inference (since if the answer only appears at the end you can cache the keys and values).
I'm wondering if we should store the completion string (the second half of the jinja template) in the prompt dataset, and if it's empty and the user is attempting to use encoder-decoder models, we raise an exception.
Yes.
In general I'm like, yes, we should do the templates better, but I don't want that to hold up merging this PR
# Only feed question, not the answer, to the encoder for enc-dec models | ||
if model.config.is_encoder_decoder: | ||
# TODO: Maybe make this more generic for complex templates? | ||
text = text[:answer_start].rstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we strip the right here? This kind of assumes the answer string starts with whitespace correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't assume that. It's not the case that the concatenation of the encoder input and the decoder input have to like, make sense or be a full sentence or something. I'm sort of assuming that the answer string does not start with whitespace. And the encoder is not going to expect trailing whitespace in its input (although I doubt it actually matters)
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all the changes since my last review and they look good to me
Also refactors the logging to use Pandas, and removes Python 3.9 support because I was too lazy to refactor my cool span conversion function to work around 3.9