rescale and specify certain model #46

areejokaili · 2020-05-07T00:33:17Z

Hi
Thank you for making your code available.
I have used your score before the last update (before muti-refs were possible and before scorer). I used to get the hash of the model to make sure I get the same results always.
With the new update, I'm struggling to find how to set a specific model and also rescale.

For example, would like to do like this
out, hash_code= score(preds, golds, model_type="roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)", rescale_with_baseline= True, return_hash=True)

roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0) is the hash I got from my earlier runs couple of months ago.

Appreciate your help
Areej

Tiiiger · 2020-05-07T00:44:20Z

hi @areejokaili , sorry for the confusion.

The code below should meet your use case.

out, hash_code= score(preds, golds, model_type="roberta-large", rescale_with_baseline= True, return_hash=True)

areejokaili · 2020-05-07T01:11:20Z

roberta-large

Hi @Tiiiger, thanks for the quick reply.
Tried your provided code but It required lang='en'.

scorer = BERTScorer(model_type='roberta-large', lang='en', rescale_with_baseline=True)

It works now, but I'm getting different scores than before. I was doing my own multi-refs scoring before, so maybe this is why.
I'll investigate more

Tiiiger · 2020-05-07T01:16:29Z

were you using baseline rescaling before? according to the hash you were not?

areejokaili · 2020-05-07T01:21:49Z

this is what I used before
score([p], [g], lang="en", verbose=False, rescale_with_baseline=True)
and this is the hash actually
roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

Tiiiger · 2020-05-07T12:10:11Z

Cool, that looks correct. Let me know if you have any further question.

areejokaili · 2020-05-08T00:27:41Z

Hi @Tiiiger again,

sorry for asking again but I did a dummy test to compute the similarity between 'server' and 'cloud computing' using two different environments.

First env has bert-score 0.3.0, transformers 2.5.0 and got scores 0.379 0.209 0.289
hash --> roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled

The second env, has bert-score 0.3.2, transformers 2.8.0 and got scores -0.092, -0.167 -0.128
hash --> roberta-large_L17_no-idf_version=0.3.2(hug_trans=2.8.0)-rescaled
In both cases I have used the following
P, R, F= score(preds, golds,lang='en', rescale_with_baseline=True, return_hash=True)
I would like to use bert-score 0.3.2 for the multi-refs feature but would like to maintain the same scores as I got before.
Would appreciate any insight why I'm not getting the same score

Tiiiger · 2020-05-08T17:14:41Z

hi @areejokaili , thank you for letting me know. I suspect that there could be some bugs in the newer version and I would love to fix those.

I am looking into this.

Tiiiger · 2020-05-08T17:36:19Z

hi I quickly tried a couple of environments. Here are the results:

> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([-0.0919]), tensor([-0.1670]), tensor([-0.1279])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.8.0)-rescaled')

> score(['server'], ['cloud computing'],lang='en', rescale_with_baseline=True, return_hash=True)
((tensor([0.3699]), tensor([0.2090]), tensor([0.2893])),
 'roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.5.0)-rescaled')

I believe this is due to an update in the RoBERTa tokenizer.

Running transformers=2.5.0, I got this warning:

RobertaTokenizerFast has an issue when working on mask language modeling where it introduces an extra encoded space before the mask token.See https://github.com/huggingface/transformers/pull/2778 for more information.

I encourage you to checkout issue 2778 to understand this change.

So, as I understand, this is not a change in our software. If you want to keep the same results as before, then you should downgrade transformers==2.5.0. However, I believe the behavior in transformer==2.8.0 is more correct. It's your call and it really depends on your use case.

Again, thank you for giving me the heads-up. I'll add a warning to our README.

areejokaili · 2020-05-11T01:08:35Z

Hi @Tiiiger
Thanks for letting me know. I have updated both libraries and will go with Transformers 2.8.0.
I have one more question and would appreciate clarifying what I'm missing here

cands=['I like lemons.']

refs = [['I am proud of you.','I love lemons.','Go go go.']]

(P, R, F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=True, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

--- output ---

> 0.9023454785346985 0.9023522734642029 0.9025075435638428
manual F score: 0.9023488759866588

Do you know why the F score directly from the method is different than when I do it manually?
Thanks again

felixgwu · 2020-05-11T03:48:17Z

Hi @areejokaili,

The reason is that you are using rescale_with_baseline=True.
The raw F score is computed using the raw P and R, and then rescaled based on the F baseline score. P and R are also rescaled independently based on their own baseline scores as well.

areejokaili · 2020-05-11T14:44:04Z

Thanks @felixgwu
Could you check this please

cands=['I like lemons.', 'cloud computing']
refs = [['I am proud of you.','I love lemons.','Go go go.'],
        ['calculate this.','I love lemons.','Go go go.']]
print("number of cands and ref are", len(cands), len(refs))
(P,R,F), hash_code = score(cands, refs, lang="en", rescale_with_baseline=False, return_hash=True)
P, R, F = P.mean().item(), R.mean().item(), F.mean().item()

print(">", P, R, F)
print("manual F score:", (2 * P * R / (P + R)))

output

> 0.9152767062187195 0.9415446519851685 0.9280155897140503
manual F score: 0.9282248763666026

Appreciate the help,

Tiiiger closed this as completed May 7, 2020

Tiiiger reopened this May 8, 2020

Tiiiger closed this as completed May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rescale and specify certain model #46

rescale and specify certain model #46

areejokaili commented May 7, 2020

Tiiiger commented May 7, 2020

areejokaili commented May 7, 2020 •

edited

Loading

Tiiiger commented May 7, 2020

areejokaili commented May 7, 2020 •

edited

Loading

Tiiiger commented May 7, 2020

areejokaili commented May 8, 2020

Tiiiger commented May 8, 2020

Tiiiger commented May 8, 2020

areejokaili commented May 11, 2020 •

edited

Loading

felixgwu commented May 11, 2020

areejokaili commented May 11, 2020 •

edited

Loading

rescale and specify certain model #46

rescale and specify certain model #46

Comments

areejokaili commented May 7, 2020

Tiiiger commented May 7, 2020

areejokaili commented May 7, 2020 • edited Loading

Tiiiger commented May 7, 2020

areejokaili commented May 7, 2020 • edited Loading

Tiiiger commented May 7, 2020

areejokaili commented May 8, 2020

Tiiiger commented May 8, 2020

Tiiiger commented May 8, 2020

areejokaili commented May 11, 2020 • edited Loading

felixgwu commented May 11, 2020

areejokaili commented May 11, 2020 • edited Loading

areejokaili commented May 7, 2020 •

edited

Loading

areejokaili commented May 7, 2020 •

edited

Loading

areejokaili commented May 11, 2020 •

edited

Loading

areejokaili commented May 11, 2020 •

edited

Loading