Stars
Update the latest text-related papers from top conferences
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model. IEEE Transactions on Image Processing (2018)
The official repo for [ECCV'22] "VSA: Learning Varied-Size Window Attention in Vision Transformers"
The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
[CVPR 2023] Egocentric Audio-Visual Object Localization
[TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
A curated list of awesome temporal action segmentation resources.
[ECCV 2022] Official Pytorch Implementation of the paper : " Zero-Shot Temporal Action Detection via Vision-Language Prompting "
Code for our IJCAI 2024 paper "DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition"
Unified Audio-Visual Perception for Multi-Task Video Localization
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Official Pytorch implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
A curated list of audio-visual learning methods and datasets.
This repository contains the PyTorch implementation of the CVPR'2024 paper (Highlight), IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection.