Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Add NLI models #3865

Merged
7 commits merged into from
Feb 27, 2020
Merged

Add NLI models #3865

7 commits merged into from
Feb 27, 2020

Conversation

ZhaofengWu
Copy link
Contributor

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, though I would probably make the difference in fields explicit with an option.

fields["premise"] = TextField(premise_tokens, self._token_indexers)
fields["hypothesis"] = TextField(hypothesis_tokens, self._token_indexers)

if isinstance(self._tokenizer, PretrainedTransformerTokenizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference for having this be a configuration option instead of relying on this type check.

@@ -29,14 +29,29 @@ class SnliReader(DatasetReader):
We use this `Tokenizer` for both the premise and the hypothesis. See :class:`Tokenizer`.
token_indexers : `Dict[str, TokenIndexer]`, optional (default=`{"tokens": SingleIdTokenIndexer()}`)
We similarly use this for both the premise and the hypothesis. See :class:`TokenIndexer`.
tokenize_separately : `bool`, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having the default be a best guess like this, though the name is a bit confusing to me. Maybe combine_fields or use_single_field, or something, with the default being False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is this isn't necessarily about tokenization, it's more about how the data is represented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the default being None? Because if it's False we can no longer take a guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry, I guess I just meant negating the logic in the guess.

@@ -29,14 +29,27 @@ class SnliReader(DatasetReader):
We use this `Tokenizer` for both the premise and the hypothesis. See :class:`Tokenizer`.
token_indexers : `Dict[str, TokenIndexer]`, optional (default=`{"tokens": SingleIdTokenIndexer()}`)
We similarly use this for both the premise and the hypothesis. See :class:`TokenIndexer`.
combine_input_fields : `bool`, optional
(default=`not isinstance(tokenizer, PretrainedTransformerTokenizer)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is now wrong.

Suggested change
(default=`not isinstance(tokenizer, PretrainedTransformerTokenizer)`)
(default=`isinstance(tokenizer, PretrainedTransformerTokenizer)`)

After this, looks great!

@ghost ghost merged commit 8b9ce9c into allenai:master Feb 27, 2020
@ZhaofengWu ZhaofengWu deleted the classification_models branch February 27, 2020 23:38
},
"dropout": 0.1
},
"iterator": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice this earlier - this needs to be updated to match #3700. We can probably also just remove the sorting_keys key here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the catch. Will update this (and a new coref config as well).

"embedding_dim": transformer_dim,
"cls_is_last_token": cls_is_last_token
},
"feedforward": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just checking Anthony's results versus the numbers you're reporting. You're about 1.5 points below Anthony's MNLI result, and one point below the published number. I looked at Anthony's code, and it looks like the only difference is that he doesn't have this feedforward layer in there. What happens if you remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I tried w/ vs. w/o this on SST -- didn't make a difference.

"cut_frac": 0.06
},
"optimizer": {
"type": "huggingface_adamw",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another difference with Anthony's code, though I'm not sure how much difference it makes: https://github.com/anthonywchen/generative-nli/blob/b81d4e3e7f635fcc1e5ca09305707e5df983ec88/configs/roberta_large_config.json#L37-L43

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight decay is different but I took the value from the RoBERTa paper (https://arxiv.org/pdf/1907.11692.pdf, last page last table last column). I doubt 1e-5 vs. 2e-5 LR is making 1.5 difference in F1? Anthony also didn't regularize bias which is probably the right thing to do.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in experiments I tried, the differences in different learning rates are basically negligible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other difference is in the beta parameters in the adamw optimizer. I set mine to be [0.9, 0.98], which is recommended on their examples.

https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants