Large Scale Benchmarks for Multimodal Representation Learning
Correspondence to:
- Paul Pu Liang ([email protected])
- Yiwei Lyu ([email protected])
- Affective computing: CMU-MOSI, CMU-MOSEI, POM, UR-FUNNY, Deception, MUStARD
- Healthcare: MIMIC
- Multimedia: AV-MNIST, MMIMDB, Kinetics (size issue?)
- Finance: Stocks-food, Stocks-tech, Stocks-healthcare
TODO: add HCI and Robotics
To add a new dataset:
- see datasets/
- add a new folder if appropriate
- write a dataloader python file following the existing examples
- see examples/ and write an example training python file following the existing examples
- check that calling the dataloader and running a simple training script works
- unimodals: LSTM, Transformer, FCN, Random Forest
- fusions: early/late concatenation, attention, tensors
- objective_functions: VAE, contrastive learning, max MI, CCA
- training_structures: balancing generalization, architecture search
To add a new algorithm:
- Figure out which subfolder to add it into:
- unimodals/ : unimodal architectures
- fusions/ : multimodal fusion architectures
- objective_functions/ : objective functions in addition to supervised training loss (e.g., VAE loss, contrastive loss)
- training_structures/ : training algorithms excluding objective functions (e.g., balancing generalization, architecture search outer RL loop)
- see examples/ and write an example training python file following the existing examples
- check that calling the added functions and running a simple training script works