Stars
Code for studying the super weight in LLM
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
A suite of image and video neural tokenizers
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Robust recipes to align language models with human and AI preferences
✨✨Latest Papers and Datasets on Mobile and PC Agent
Collect some World Models for Autonomous Driving papers.
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"
O1 Replication Journey: A Strategic Progress Report – Part I
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
A simple screen parsing tool towards pure vision based GUI agent
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Building blocks for foundation models.
A paper list of some recent works about Token Compress for Vit and VLM
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture