Add NLI models #3865

ZhaofengWu · 2020-02-27T21:52:49Z

SNLI: dev 92.47, test 91.65, URL https://storage.googleapis.com/allennlp-public-models/snli-roberta-large-2020.02.27.tar.gz

MNLI: dev-matched 89.10, dev-mismatched 88.71, URL https://storage.googleapis.com/allennlp-public-models/mnli-roberta-large-2020.02.27.tar.gz

matt-gardner

Looks ok to me, though I would probably make the difference in fields explicit with an option.

matt-gardner · 2020-02-27T21:56:02Z

allennlp/data/dataset_readers/snli.py

-        fields["premise"] = TextField(premise_tokens, self._token_indexers)
-        fields["hypothesis"] = TextField(hypothesis_tokens, self._token_indexers)
+
+        if isinstance(self._tokenizer, PretrainedTransformerTokenizer):


Slight preference for having this be a configuration option instead of relying on this type check.

matt-gardner · 2020-02-27T22:23:30Z

allennlp/data/dataset_readers/snli.py

@@ -29,14 +29,29 @@ class SnliReader(DatasetReader):
        We use this `Tokenizer` for both the premise and the hypothesis.  See :class:`Tokenizer`.
    token_indexers : `Dict[str, TokenIndexer]`, optional (default=`{"tokens": SingleIdTokenIndexer()}`)
        We similarly use this for both the premise and the hypothesis.  See :class:`TokenIndexer`.
+    tokenize_separately : `bool`, optional


I like having the default be a best guess like this, though the name is a bit confusing to me. Maybe combine_fields or use_single_field, or something, with the default being False?

The issue is this isn't necessarily about tokenization, it's more about how the data is represented.

Do you mean the default being None? Because if it's False we can no longer take a guess.

Right, sorry, I guess I just meant negating the logic in the guess.

matt-gardner · 2020-02-27T22:42:35Z

allennlp/data/dataset_readers/snli.py

@@ -29,14 +29,27 @@ class SnliReader(DatasetReader):
        We use this `Tokenizer` for both the premise and the hypothesis.  See :class:`Tokenizer`.
    token_indexers : `Dict[str, TokenIndexer]`, optional (default=`{"tokens": SingleIdTokenIndexer()}`)
        We similarly use this for both the premise and the hypothesis.  See :class:`TokenIndexer`.
+    combine_input_fields : `bool`, optional
+            (default=`not isinstance(tokenizer, PretrainedTransformerTokenizer)`)


This line is now wrong.

Suggested change

(default=`not isinstance(tokenizer, PretrainedTransformerTokenizer)`)

(default=`isinstance(tokenizer, PretrainedTransformerTokenizer)`)

After this, looks great!

matt-gardner · 2020-02-27T23:48:03Z

training_config/mnli_roberta.jsonnet

+    },
+    "dropout": 0.1
+  },
+  "iterator": {


Sorry I didn't notice this earlier - this needs to be updated to match #3700. We can probably also just remove the sorting_keys key here.

Ah, thanks for the catch. Will update this (and a new coref config as well).

matt-gardner · 2020-02-27T23:49:23Z

training_config/mnli_roberta.jsonnet

+       "embedding_dim": transformer_dim,
+       "cls_is_last_token": cls_is_last_token
+    },
+    "feedforward": {


I was just checking Anthony's results versus the numbers you're reporting. You're about 1.5 points below Anthony's MNLI result, and one point below the published number. I looked at Anthony's code, and it looks like the only difference is that he doesn't have this feedforward layer in there. What happens if you remove this?

RoBERTa actually has this feedforward layer (using the same model as BERT) https://github.com/huggingface/transformers/blob/908fa43b543cf52a3238129624f502240725a6a6/src/transformers/modeling_bert.py#L426-L438

And I tried w/ vs. w/o this on SST -- didn't make a difference.

matt-gardner · 2020-02-27T23:49:54Z

training_config/mnli_roberta.jsonnet

+      "cut_frac": 0.06
+    },
+    "optimizer": {
+      "type": "huggingface_adamw",


This is another difference with Anthony's code, though I'm not sure how much difference it makes: https://github.com/anthonywchen/generative-nli/blob/b81d4e3e7f635fcc1e5ca09305707e5df983ec88/configs/roberta_large_config.json#L37-L43

Weight decay is different but I took the value from the RoBERTa paper (https://arxiv.org/pdf/1907.11692.pdf, last page last table last column). I doubt 1e-5 vs. 2e-5 LR is making 1.5 difference in F1? Anthony also didn't regularize bias which is probably the right thing to do.

Yeah in experiments I tried, the differences in different learning rates are basically negligible.

One other difference is in the beta parameters in the adamw optimizer. I set mine to be [0.9, 0.98], which is recommended on their examples.

https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md

matt-gardner approved these changes Feb 27, 2020

View reviewed changes

ZhaofengWu added 3 commits February 27, 2020 14:05

Support PretrainedTranformerTokenizer in SNLI reader

7de8a82

Add SNLI and MNLI configs

9a0d085

Make tokenize_separately an explicit argument

e75f48e

ZhaofengWu force-pushed the classification_models branch from 9af93b7 to e75f48e Compare February 27, 2020 22:05

ZhaofengWu added the merge when ready label Feb 27, 2020

mypy

fe59957

matt-gardner reviewed Feb 27, 2020

View reviewed changes

ZhaofengWu removed the merge when ready label Feb 27, 2020

tokenize_separately -> combine_input_fields

37425df

matt-gardner reviewed Feb 27, 2020

View reviewed changes

ZhaofengWu added 2 commits February 27, 2020 15:20

fix doc

f77f6a6

add test

fb1ed0e

ZhaofengWu added the merge when ready label Feb 27, 2020

ghost merged commit 8b9ce9c into allenai:master Feb 27, 2020

ZhaofengWu deleted the classification_models branch February 27, 2020 23:38

matt-gardner reviewed Feb 27, 2020

View reviewed changes

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NLI models #3865

Add NLI models #3865

ZhaofengWu commented Feb 27, 2020

matt-gardner left a comment

matt-gardner Feb 27, 2020

matt-gardner Feb 27, 2020

matt-gardner Feb 27, 2020

ZhaofengWu Feb 27, 2020

matt-gardner Feb 27, 2020

matt-gardner Feb 27, 2020

matt-gardner Feb 27, 2020

ZhaofengWu Feb 27, 2020

matt-gardner Feb 27, 2020

ZhaofengWu Feb 27, 2020

ZhaofengWu Feb 27, 2020

matt-gardner Feb 27, 2020

ZhaofengWu Feb 27, 2020

anthonywchen Apr 1, 2020

anthonywchen Apr 1, 2020

	(default=`not isinstance(tokenizer, PretrainedTransformerTokenizer)`)
	(default=`isinstance(tokenizer, PretrainedTransformerTokenizer)`)

Add NLI models #3865

Add NLI models #3865

Conversation

ZhaofengWu commented Feb 27, 2020

matt-gardner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment