Stars
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
Official implementation of the NeurIPS 2023 paper "Self-supervised Object-Centric Learning for Videos"
evelinehong / 3D-CLR-Official
Forked from zsh2000/3D-CLR[CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
[ICCV 2023] ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking
[NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challen…
⛄ Possibly the smallest compiler ever
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Inference Vision Transformer (ViT) in plain C/C++ with ggml
A collection of learning resources for curious software engineers
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Master programming by recreating your favorite technologies from scratch.
NVIDIA's Deep Imagination Team's PyTorch Library
[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
Code for Diffusion Action Segmentation (ICCV 2023)
[ICCV 2023] Official PyTorch implementation of the paper "DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion"
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
Inference code for Persimmon-8B