GPT3 -> Dataset -> NLI Model

This repository contains a dataset collected for NLI classification using GPT-3. Thanks to Greg Brockman at OpenAI for giving me access to the API.

Why was this dataset created? The goals of this project can be found in my blog post on Notion (https://www.notion.so/GPT3-Dataset-Task-Model-b97a267d6f5f44e688ba4f7ec85c00cc).

The dataset (data/dataset.jsonl) contains 30000 examples in total. All of these examples are 'fake' and were generated by GPT-3. I used these to fine-tune a BERT model with moderate success as you'll see if you read my post.

I also included the output of each stage of my dataset creation process in data/.

Disclaimer: this dataset has not been filtered in any way. If you notice any text that is offensive, I'll be happy to remove it from the data. This repository is not owned by or associated with OpenAI.

If you use this dataset, please include a link to my blogpost and this Github repository. Feel free to contact me at kgoel [at] cs [dot] stanford [dot] edu.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT3 -> Dataset -> NLI Model

About

Releases

Packages

License

krandiash/gpt3-nli

Folders and files

Latest commit

History

Repository files navigation

GPT3 -> Dataset -> NLI Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages