Skip to content
View bpiyush's full-sized avatar

Highlights

  • Pro
Block or Report

Block or report bpiyush

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code repository for the paper: 'Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks'

Python 143 14 Updated Aug 25, 2023
Jupyter Notebook 45 1 Updated Nov 12, 2022

PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"

Python 224 9 Updated Jan 20, 2023

Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

Jupyter Notebook 59 6 Updated Mar 3, 2022

Timo: Towards Better Temporal Reasoning for Language Models (COLM 2024)

Python 8 1 Updated Jul 3, 2024

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Python 1,481 205 Updated Apr 9, 2024

Official implementation of "ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"

Python 15 Updated Jul 7, 2024

🍽️ Annotations for the public release of the EPIC-KITCHENS-100 dataset

Python 123 28 Updated Aug 1, 2022

Supercharged BLIP-2 that can handle videos

Python 107 6 Updated Dec 1, 2023

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,257 917 Updated Jul 17, 2024

Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Python 56 3 Updated Jul 8, 2024

The official Python library for the Google Gemini API

Python 1,221 233 Updated Jul 20, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,045 39 Updated Jul 14, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 546 33 Updated Jul 19, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 629 48 Updated Mar 25, 2024

Multimodal language model benchmark, featuring challenging examples

Python 139 6 Updated May 14, 2024

[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

Python 64 3 Updated Jun 11, 2024

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 247 21 Updated Jun 6, 2024

Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".

Python 76 5 Updated May 16, 2024

Schedule-Free Optimization in PyTorch

Python 1,708 55 Updated Jul 12, 2024

Learning to Count without Annotations

Python 18 Updated May 24, 2024

Composed Video Retrieval

Python 34 Updated May 2, 2024

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 11,159 1,457 Updated Feb 29, 2024

The official implementation of RAR

Python 56 Updated Mar 27, 2024

Code for the paper "Jukebox: A Generative Model for Music"

Python 7,704 1,393 Updated Jun 19, 2024

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Python 43 4 Updated Apr 17, 2024

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Python 100 4 Updated Jul 6, 2024

Fast Differentiable Sorting and Ranking

Python 562 45 Updated Feb 15, 2024

Rank-aware Attention Network from 'The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos'

Python 26 5 Updated Apr 16, 2021

DDSP: Differentiable Digital Signal Processing

Python 2,834 332 Updated Jun 17, 2024
Next