Skip to content
View TXH-mercury's full-sized avatar
🌴
On vacation
🌴
On vacation
Block or Report

Block or report TXH-mercury

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A progressive, highly extensible and developer-friendly framework for building deep learning projects based on PyTorch.

Python 9 Updated Apr 9, 2024

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Jupyter Notebook 215 12 Updated Mar 14, 2024

Official repository of the paper "Are current long-term video understanding datasets long-term?", published in CVEU 2023.

HTML 1 Updated May 24, 2024

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Python 25 7 Updated Dec 8, 2023

An optimized re-implementation for 2D-TAN: Learning 2D Temporal Localization Networks for Moment Localization with Natural Language (AAAI'2020).

Python 123 15 Updated Apr 1, 2023

Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"

Python 13 1 Updated Aug 9, 2023

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Python 42 1 Updated Sep 4, 2023

基于LaTeX编译生成的中英文个人简历

TeX 196 66 Updated Jan 11, 2024
Python 153 12 Updated Feb 24, 2024

Tracking and collecting papers/projects/others related to Segment Anything.

1,438 126 Updated Mar 28, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,337 695 Updated Jul 1, 2024

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

262 16 Updated Jun 24, 2024

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Python 37 1 Updated Aug 1, 2023

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Python 246 14 Updated May 28, 2024
Python 32 2 Updated Jun 6, 2023

Moved to https://github.com/nodejs/node

34,481 7,319 Updated Mar 28, 2023

Oscar and VinVL

Python 1,034 247 Updated Aug 28, 2023
Python 864 107 Updated May 24, 2024

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Python 6,379 1,183 Updated Jun 6, 2024
Python 119 18 Updated Nov 10, 2023

[ECCV 2020] PyTorch Implementation of some RGBD Semantic Segmentation models.

Python 285 42 Updated Aug 17, 2020

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Python 264 24 Updated Oct 18, 2019

Dual Attention Network for Scene Segmentation (CVPR2019)

Python 2 Updated Sep 3, 2020