Stars
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
[CoRL 2024] Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
[RSS 2024]: Expressive Whole-Body Control for Humanoid Robots
A fast and flexible implementation of Rigid Body Dynamics algorithms and their analytical derivatives
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
High-Resolution Image Synthesis with Latent Diffusion Models
A latent text-to-image diffusion model
text to image to generation: CogView3-Plus and CogView3(ECCV 2024)
official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
A high-performance runtime framework for modern robotics.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Official release of InternLM2.5 base and chat models. 1M context support
BlueLM(蓝心大模型): Open large language models developed by vivo AI Lab
LLaVA-HR: High-Resolution Large Language-Vision Assistant
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Code for "Temporal Difference Learning for Model Predictive Control"
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
A version 1.1 of the Alexander Koch low cost robot arm with some small changes.