DALLE-tools is a github repository with useful tools to categorize, annotate or check the sanity of your datasets.
Just clone this repository to your folder and use one of the following commands in the section underneath.
python annotator.py
Press to switch to the next page, to change the annotation category or click on the image to add it to the current cateogry and save it in annotations.json. Please upload your annotations.json by creating a push request into community_annotations folder into the folder of the dataset you used (e.g. YFCC100m, or LAION400m etc.), so everyone can use the data for better dataset annotations! If you want to continue to annotate a dataset where someone else already started, just copy the annotations.json from the community_annotations folder and the used dataset into the root directory and run the annotator!
python aligner.py
This tool helps to align the shuffled keys, so the WebDataset module can read your datasets correctly. You just need to specify the keys you want to look for and keep in your new dataset.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.