Surgical Visual Question Answering. A transformer-based surgical VQA model. Offical Implementation of "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformers", MICCAI 2022.

Python 46 10 Updated Mar 27, 2023

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,427 661 Updated Aug 12, 2024

patrick-tssn / Streaming-Grounded-SAM-2

Grounded Tracking for Streaming Videos

Jupyter Notebook 32 3 Updated Aug 15, 2024

egeozsoy / ORacle

Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.

Python 9 Updated May 27, 2024

XuMengyaAmy / CIDACaptioning

Python 14 3 Updated Jul 5, 2021

XuMengyaAmy / SwinMLP_TranCAP

Python 13 3 Updated Jun 26, 2022

Finspire13 / Awesome-Surgical-Video-Analysis

Papers of ComputerVision x Surgery

89 21 Updated Jan 7, 2024

FreedomIntelligence / LongLLaVA

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 153 9 Updated Sep 29, 2024

rasbt / LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 28,085 3,188 Updated Sep 30, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,777 108 Updated Jul 29, 2024

LargeWorldModel / LWM

Python 7,097 549 Updated Aug 12, 2024

kamruleee51 / ART-Net

This project presents a Single Input Multiple Output (SIMO) deep convolutional neural network, a so-called ART-Net (Augmented Reality Tool Network) consisting of an encoder-decoder architecture to …

Jupyter Notebook 18 4 Updated Dec 8, 2021

labdeeman7 / cholec_instance_seg

Jupyter Notebook 3 Updated Sep 25, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,433 134 Updated Sep 24, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,527 188 Updated Sep 26, 2024

SkyworkAI / Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 290 19 Updated Jul 19, 2024

lalithjets / SurgicalGPT

Python 23 7 Updated Feb 7, 2024

danikiyasseh / SAIS

[Nature Biomedical Engineering 2023] Decoding surgical activity from videos with a vision transformer

Python 15 2 Updated Jun 6, 2024

Negin-Ghamsarian / Cataract-1K

Python 9 Updated Sep 16, 2024

franciszchen / SCA-Net

Python 8 Updated Oct 7, 2023

surgical-vision / SAR_RARP50-evaluation

The repository provides code for the evaluation of SAR-RARP50 challenge cathegories, thus action recognition and segmentation, as well as the combined performances.

Python 9 1 Updated Sep 30, 2022

Starred topics

3blue1brown

MvvmCross

Minecraft

Maven

MATLAB

Linux

$latex logo$

LaTeX

JSON

jQuery

JavaScript

See all starred topics