Starred repositories
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
An open source implementation of CLIP.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Code and data of our AAAI2021 paper "A Case Study of the Shortcut Effects in Visual Commonsense Reasoning"
LAVIS - A One-stop Library for Language-Vision Intelligence
【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Awesome Knowledge Distillation
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Source code and data for Things not Written in Text: Exploring Spatial Commonsense from Visual Signals (ACL2022 main conference paper).
后台admin前端模板,基于 layui 编写的最简洁、易用的后台框架模板。只需提供一个接口就直接初始化整个框架,无需复杂操作。
一套遵循原生态开发模式的 Web UI 组件库,采用自身轻量级模块化规范,易上手,可以更简单快速地构建网页界面。
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
MERLOT: Multimodal Neural Script Knowledge Models
awesome grounding: A curated list of research papers in visual grounding
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image