-
BiliBili University
- Shenzhen, China
- https://yulong.buzz
Highlights
- Pro
Block or Report
Block or report zyl9737
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (12)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
All the resources you need to get to Senior Engineer and beyond
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
llama3 implementation one matrix multiplication at a time
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Research Code for Multimodal-Cognition Team in Ant Group
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
Code and pre-trained models for our paper "CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection".
Official implementation of "Harnessing Large Language Models for Training-free Video Anomaly Detection", CVPR 2024
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
[AAAI2023] TopicFM: Robust, Efficient, and Interpretable Topic-Assisted Feature Matching
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network (CVPR'24)
[CVPR2023] Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions
Code and trained models for our paper: K. Triaridis, V. Mezaris, "Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization", Proc. 30th Int. Conf. on MultiMedia Modeling (MMM …
The code for "A Single Simple Patch is All You Need for AI-generated Image Detection"
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
In this repository I demonstrate how you can perform multimodal(image+text) search to find similar images+texts given a test image+text from a multimodal (texts+images) database . I use the Kaggle …