Skip to content

FZKChange/CA4LTR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

CA4LTR

We developed a new network architecture that leverages image and text modalities, enhancing feature learning for long-tailed datasets. Our research shows that combining pre-trained image and text models via cross-modal attention compensates for the individual limitations of each model, significantly boosting long-tail recognition accuracy. Further experiments explored how text quality affects the model’s performance and identified key factors influencing multimodal model effectiveness. After the paper is accepted, we will release the code.

Text content quality on multimodal models

Table shows that image labels distill image content concisely and offer the most value in the multimodal fusion process. Although the descriptive text includes redundancies, its performance was still notable. The inclusion of nonsensical text somewhat impacted the multimodal model’s performance.

image

CIFAR-10-LT and CIFAR-100-LT

image

It can be observed from the table that our method achieves strong results across different types of methods. Taking CIFAR-100-LT (IF100) as an example from the table, our method reached an accuracy of 62.32%, superior to the multimodal training approach of CLIP2FL, which achieved 37.56%. Our method also performs better than generative methods, outperforming feature-based LDMLR(51.92%), label-based ProCo (52.80%), and sample-based DiffuLT (50.70%).

Tiny-ImageNet-LT

image

The Pure model indicates that we trained an image model, ResNet-32, from scratch. The blue bar chart represents our method. Since the class labels in this dataset are purely numerical, our textual content is also descriptive text generated by the BLIP-2 model. It can be observed from Figure 2 that our method enhances the classification performance for the tail categories while maintaining stability in the head categories.

About

Cross-Attention in Long-Tail Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published