Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use train data for computing normalization #30

Merged
merged 6 commits into from
Feb 9, 2023

Conversation

FabienRoger
Copy link
Collaborator

No description provided.

@FabienRoger FabienRoger marked this pull request as ready for review February 3, 2023 13:20
Copy link
Collaborator

@lauritowal lauritowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FabienRoger @norabelrose
Seems like deberta performs worse after these changes:

for boolq for it was 70% before, now:

model,prefix,method,prompt_level,train,test,accuracy,std,language_model_type,layer,loss
deberta-v2-xxlarge-mnli,normal,ccs,all,boolq,boolq,0.551499992609024,0.051188381401737586,encoder,-1.0,0.37969053238630296
deberta-v2-xxlarge-mnli,normal,lr,all,boolq,boolq,0.6075,0.06677387213573883,encoder,-1.0,0.0"

for imdb for it was 89% before, now:

model,prefix,method,prompt_level,train,test,accuracy,std,language_model_type,layer,loss
deberta-v2-xxlarge-mnli,normal,ccs,all,imdb,imdb,0.8411666552225748,0.15101562370137744,encoder,-1.0,0.8821464618047078
deberta-v2-xxlarge-mnli,normal,lr,all,imdb,imdb,0.9106666666666665,0.07123591478710409,encoder,-1.0,0.0

We should check what is happening there (?)

elk/train.py Outdated Show resolved Hide resolved
elk/utils_evaluation/utils_evaluation.py Outdated Show resolved Hide resolved
@FabienRoger
Copy link
Collaborator Author

FabienRoger commented Feb 6, 2023

for boolq for it was 70% before, now:

I'm not finding that. I got 59% for both.

for imdb for it was 89% before, now:

I'm not finding that. I got 0.88 for both runs.

@lauritowal
Copy link
Collaborator

lauritowal commented Feb 6, 2023

@FabienRoger Thanks for running that too! Are you sure that you are not loading an older model? What model type are you using? I'll try it out again, later.

@FabienRoger
Copy link
Collaborator Author

deberta-v2-xxlarge-mnli,normal,ccs,all,boolq,boolq,0.5864999830722809,0.0585042759828347,encoder,-1.0,0.9317224085330963

@FabienRoger
Copy link
Collaborator Author

FabienRoger commented Feb 7, 2023

I fixed a dumb bug and now the results are
On boolq, drop from 65% to 62%
On imdb, stays at 88%

Copy link
Member

@norabelrose norabelrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants