Stars
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Book_1_《编程不难》 | 鸢尾花书:从加减乘除到机器学习;请多多批评指正!
Book_2_《可视之美》 | 鸢尾花书:从加减乘除到机器学习,欢迎批评指正
Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!
Book_4_《矩阵力量》 | 鸢尾花书:从加减乘除到机器学习;上架!
Book_5_《统计至简》 | 鸢尾花书:从加减乘除到机器学习;上架!
Book_7_《机器学习》 | 鸢尾花书:从加减乘除到机器学习;欢迎批评指正
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
前沿论文持续更新--视频时刻定位 or 时域语言定位 or 视频片段检索。
Official code for "Learning Prompt-Enhanced Context features for Weakly-Supervised Video Anomlay Detection" (IEEE-TIP)
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Papers for Video Anomaly Detection, released codes collection, Performance Comparision.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Fast and memory-efficient exact attention
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
EVA Series: Visual Representation Fantasies from BAAI
This repository contains all the papers accepted in top conference of computer vision, with convenience to search related papers.
Multi Task Vision and Language
这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境,促进“无障碍视频”的发展。