-
University of Oxford
- Oxford
-
03:47
(UTC -12:00) - bpiyush.github.io
- @bagad_piyush
Highlights
- Pro
Block or Report
Block or report bpiyush
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Code repository for the paper: 'Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks'
PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"
Timo: Towards Better Temporal Reasoning for Language Models (COLM 2024)
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Official implementation of "ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"
🍽️ Annotations for the public release of the EPIC-KITCHENS-100 dataset
LAVIS - A One-stop Library for Language-Vision Intelligence
Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
The official Python library for the Google Gemini API
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Multimodal language model benchmark, featuring challenging examples
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
Schedule-Free Optimization in PyTorch
High-Resolution Image Synthesis with Latent Diffusion Models
Code for the paper "Jukebox: A Generative Model for Music"
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Fast Differentiable Sorting and Ranking
Rank-aware Attention Network from 'The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos'