Highlights
- Pro
Stars
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Python bindings for real-time control of Franka Emika robots.
The official repo for the paper "In-Context Imitation Learning via Next-Token Prediction"
This code corresponds to simulation environments used as part of the MimicGen project.
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
[CVPR'24 Highlight & Best Demo Award] Gaussian Splatting SLAM
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
[CoRL 2023] This repository contains data generation and training code for Scaling Up & Distilling Down
MuJoCo Models for Google's Scanned Objects Dataset
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
PyTorch code and models for the DINOv2 self-supervised learning method.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
🇫🇷 Oh my tmux! My self-contained, pretty & versatile tmux configuration made with ❤️
This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and proces…
Automated, hardware-independent Hand-Eye Calibration
High-Resolution Image Synthesis with Latent Diffusion Models
A latent text-to-image diffusion model