Skip to content

Commit

Permalink
Update doc.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Sohl-Dickstein authored Jan 29, 2021
1 parent 535c925 commit c5289ec
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ For tasks submitted to the benchmark, the following criteria will be used to det
4. **Specificity:** Tasks should aim to cleanly capture some specific capability of language models. Strong task submissions should clearly justify their task construction choices in the `README.md` or `README.pdf`.
5. **Thoroughness:** While tasks are encouraged to target a single capability or flaw of large language models, they should attempt to quantify their target in a robust way. Task should attempt to control for, or average over, potential confounds and nuissance variables.
6. **Novelty:** It is OK to adapt your own past work to be part of BIG-bench. Task submissions cannot be replicas of tasks from existing language modeling benchmarks (such as [GLUE](https://gluebenchmark.com/) or [SuperGLUE](https://super.gluebenchmark.com/)) created by others. Previously proposed tasks from others that are not part of existing benchmarks will be considered on a case-by-case basis. Dual submission is also permitted -- tasks that are submitted to BIG-bench may be submitted to other venues.
7. **Justification:** The `README.md` or `README.pdf` file should clearly explain what the task is attempting to measure, and give background motivating why this is an important capability of large language models to quantify. For more details on what should go in the `README`, see the template [`README.md`](../bigbench/benchmark_tasks/simple_arithmetic/README.md) file for the `simple_arithmetic` task.
7. **Justification:** The `README.md` or `README.pdf` file should clearly explain what the task is attempting to measure, and give background motivating why this is an important capability of large language models to quantify. For more details on what should go in the `README`, see the [template `README.md`](../bigbench/benchmark_tasks/simple_arithmetic/README.md) file for the `simple_arithmetic` task.
8. **Compute resources:** Reviewers will consider whether the computational resources used by a task are large enough to make running the task practically challenging. Tasks may be rejected for this reason.

Exceptions to these criteria may be made on a case-by-case basis for unusual tasks.
Expand Down

0 comments on commit c5289ec

Please sign in to comment.