-
Shanghai Artificial Intelligence Laboratory
- Shanghai
Stars
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Agentic components of the Llama Stack APIs
[ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
An open source implementation of CLIP.
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
AirLLM 70B inference with single 4GB GPU
LLM based autonomous agent that does online comprehensive research on any given topic
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023
Cayman is a Jekyll theme for GitHub Pages
The Open-Source Data Annotation Platform
Making large AI models cheaper, faster and more accessible
🤖 GPT Code Review for Gitlab (针对于 Gitlab 的 LLM 辅助 Code Review 工具)项目详细文档 👇🏻
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
Measuring Massive Multitask Language Understanding | ICLR 2021
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
🚀 基于大语言模型和 RAG 的知识库问答系统。开箱即用、模型中立、灵活编排,支持快速嵌入到第三方业务系统。
[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?