Authors: Karsten Roth, Jae Myung Kim, Almut Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata
Table of Contents
This repository contains code to replicate key experiments from our paper Waffling around for Performance: Visual Classification with Random Words and Broad Concepts. It should also provide a good starting point for any subsequent research looking to study improved (zero-shot) transfer performance of pretrained Vision Language Models (VLM), and extends the great repository associated with the original Visual Classification via Description from Large Language Models paper.
If you find this repository useful or use it as part of your research, please consider citing it.
Set up environment: To get started, simply set up the correct environment using environment.yaml
by running
conda env create -f environment.yaml
and activate the environment via conda activate waffle
.
Ensure clip
is up-to-date: The above command should install all relevant libraries. If you are not able to utilize the ViT-L/14
backbone, it is like because your version of clip
is not up-to-date. In this case, consider re-installing it:
pip install git+https://github.com/openai/CLIP.git
Downloading datasets