-
National University of Singapore
- Singapore
Highlights
- Pro
Starred repositories
The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
Utilities intended for use with Llama models.
Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
A latent text-to-image diffusion model
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Surgical Visual Question Answering. A transformer-based surgical VQA model. Offical Implementation of "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformers", MICCAI 2022.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Grounded Tracking for Streaming Videos
Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.
Papers of ComputerVision x Surgery
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
This project presents a Single Input Multiple Output (SIMO) deep convolutional neural network, a so-called ART-Net (Augmented Reality Tool Network) consisting of an encoder-decoder architecture to …
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
[Nature Biomedical Engineering 2023] Decoding surgical activity from videos with a vision transformer
The repository provides code for the evaluation of SAR-RARP50 challenge cathegories, thus action recognition and segmentation, as well as the combined performances.