Skip to content

Training a model without a dataset for natural language inference (NLI)

License

Notifications You must be signed in to change notification settings

krandiash/gpt3-nli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

GPT3 -> Dataset -> NLI Model

This repository contains a dataset collected for NLI classification using GPT-3. Thanks to Greg Brockman at OpenAI for giving me access to the API.

Why was this dataset created? The goals of this project can be found in my blog post on Notion (https://www.notion.so/GPT3-Dataset-Task-Model-b97a267d6f5f44e688ba4f7ec85c00cc).

The dataset (data/dataset.jsonl) contains 30000 examples in total. All of these examples are 'fake' and were generated by GPT-3. I used these to fine-tune a BERT model with moderate success as you'll see if you read my post.

I also included the output of each stage of my dataset creation process in data/.

Disclaimer: this dataset has not been filtered in any way. If you notice any text that is offensive, I'll be happy to remove it from the data. This repository is not owned by or associated with OpenAI.

If you use this dataset, please include a link to my blogpost and this Github repository. Feel free to contact me at kgoel [at] cs [dot] stanford [dot] edu.

About

Training a model without a dataset for natural language inference (NLI)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published