Update README to reflect version 1.0 finalization

norabelrose · Mar 11, 2022 · 8a172b0 · 8a172b0
1 parent 8c54c64
commit 8a172b0
Showing 1 changed file with 0 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,5 @@
 # Deduplicating Training Data Makes Language Models Better
 
-WARNING: This is a development branch. I am rewriting the code to be cleaner. Continue at your own risk.
-
 This repository contains code to deduplicate language model datasets as descrbed in the paper ["Deduplicating Training Data Makes Language Models Better"](https://arxiv.org/abs/2107.06499) by Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch and Nicholas Carlini.
 We release the ExactSubstr deduplication implementation (written in Rust) along with the scripts we used in the paper to perform ExactSubstr deduplication and inspect the results (written in Python).
 We also release the document clusters resulting from running NearDup deduplication on C4, RealNews, LM1B, and Wiki-4B-en.