Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Run checklist suites in AllenNLP #5065

Merged
merged 38 commits into from
Apr 24, 2021
Merged

Run checklist suites in AllenNLP #5065

merged 38 commits into from
Apr 24, 2021

Conversation

AkshitaB
Copy link
Contributor

@AkshitaB AkshitaB commented Mar 22, 2021

  • TaskSuite base class and tests
  • Implement classes for 3 tasks
  • Command line functionality

Suites for 3 tasks (pkl files and scripts):

  • Sentiment Analysis
  • Squad QA
  • Textual Entailment

Run suites for models

  • Sentiment Analysis
  • Squad QA
  • Textual Entailment

Note:

At first, it may seem like a lot of this functionality can simply be a part of the Predictor class. But predictors can be more generic. Eg. text-classification-predictor can be used for sentiment analysis as well as textual-entailment. The tests in the test suites are supposed to be more task-oriented, however, and cannot be used interchangeably.

@AkshitaB AkshitaB marked this pull request as draft March 22, 2021 14:13
@AkshitaB AkshitaB marked this pull request as ready for review April 2, 2021 15:49
@AkshitaB AkshitaB requested review from epwalsh and dirkgr and removed request for epwalsh April 2, 2021 15:49
@AkshitaB
Copy link
Contributor Author

AkshitaB commented Apr 2, 2021

@dirkgr @epwalsh This is ready for a look. I'll be adding more default tests for the QA suite, as well as unit tests (they require models from allennlp-models, so perhaps that's where the suite pkl files should be tested?).

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked at the tests yet, but here is a first review. It seems like there is a lot of copy and paste code that's not following the guidelines (as opposed to enforced rules) of AllenNLP code quality, like missing type annotations etc.?

allennlp/commands/checklist.py Outdated Show resolved Hide resolved
Comment on lines +157 to +158
if capabilities:
self._print_summary_args["capabilities"] = capabilities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this to print when capabilities == []?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_print_summary_args are just passed to the summary function. When capabilities == [], the summary for tests of all capabilities are printed.

allennlp/sanity_checks/task_checklists/utils.py Outdated Show resolved Hide resolved
allennlp/sanity_checks/task_checklists/utils.py Outdated Show resolved Hide resolved
allennlp/sanity_checks/task_checklists/utils.py Outdated Show resolved Hide resolved
Comment on lines +46 to +51
ret = []
if s != data:
ret.append(s)
if s + "." != data:
ret.append(s + ".")
return ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return [s] or return [s + "."]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what it's doing:

data = "This was great!"
Returns ["This was great", "This was great."]

data = "The movie was good"
Returns ["The movie was good."]

@AkshitaB AkshitaB requested a review from dirkgr April 12, 2021 08:57
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble getting this to work with a QA model.

I tried running

 allennlp checklist \
    https://storage.googleapis.com/allennlp-public-models/transformer-qa-2020-10-03.tar.gz \
    question-answering

But got no output (no errors, either). I also tried the above with the --checklist-suite parameter pointed to a download and extracted version of https://github.com/marcotcr/checklist/blob/master/release_suites/squad_suite.tar.gz, but I got errors from dill:

image

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
allennlp/commands/checklist.py Show resolved Hide resolved
allennlp/commands/checklist.py Outdated Show resolved Hide resolved

subparser.add_argument("task", type=str, help="the name of the task suite")

subparser.add_argument("--checklist-suite", type=str, help="the checklist suite path")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add our default suites to cloud. And people can also write their own suites.

allennlp/sanity_checks/task_checklists/task_suite.py Outdated Show resolved Hide resolved
@AkshitaB
Copy link
Contributor Author

@epwalsh That was because the tests were not being added by default. I've now switched the flag, so you should be able to run the command now. (It will only run 2 tests right now; I didn't want this PR to have more lines of code).

@AkshitaB AkshitaB requested a review from epwalsh April 19, 2021 06:00
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few "cosmetic" comments. I'm working my way through some examples next so I'll followup in a bit with another review

allennlp/common/testing/checklist_test.py Show resolved Hide resolved
allennlp/sanity_checks/task_checklists/task_suite.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool!

The only other comment I have is about the format of the summary. It's just a little difficult to tell apart "capabilities" from specific test names. I guess if you know what you're looking at, it may be obvious. I just had a little trouble the first time I looked at it.

If it's difficult to override the default summary, I wouldn't worry about.

@AkshitaB
Copy link
Contributor Author

AkshitaB commented Apr 24, 2021

@epwalsh Thanks for reviewing a mountain of code!

I've added custom formatting functions now. Here's a preview of what that looks like (task-specific):

Screen Shot 2021-04-23 at 7 36 11 PM

@AkshitaB AkshitaB requested a review from epwalsh April 24, 2021 02:39
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that looks really great! Much easier to read!

I'm just curious, is there a built-in way to get a structured version of the output? Like in JSON format, for example? That would be a cool feature to have, especially for people doing continuous deployment of models, because they could programmatically run the checklists suites and a get a pass/fail result.

@AkshitaB AkshitaB merged commit 10400e0 into main Apr 24, 2021
@AkshitaB AkshitaB deleted the checklist branch April 24, 2021 23:37
dirkgr pushed a commit that referenced this pull request May 10, 2021
* run checklist suites from command line

* specify output file

* separate task from checklist suite

* qa task

* adding describe, misc updates

* fix docs, TE suite

* update changelog

* bug fix

* adding default tests

* qa defaults

* typing, docs, minor updates

* more updates

* set add_default_tests to True

* remove commented lines

* capitalizing help strings

* does this work

* adding start_method to test

* skipping test

* oops, actually fix

* temp fix to check memory issues

* Skip more memory hungry tests

* fix

* fixing professions

* Update setup.py

Co-authored-by: Pete <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Pete <[email protected]>

* Update allennlp/sanity_checks/task_checklists/task_suite.py

Co-authored-by: Pete <[email protected]>

* formatting functions

Co-authored-by: Evan Pete Walsh <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants