Skip to content
View FXLYZ's full-sized avatar
  • National University of Singapore
  • Singapore

Highlights

  • Pro

Block or report FXLYZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Next-Token Prediction is All You Need

Python 753 17 Updated Sep 30, 2024

The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"

Python 585 52 Updated Jul 8, 2024

Utilities intended for use with Llama models.

Python 4,280 763 Updated Oct 2, 2024

Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python 38 Updated Sep 19, 2024

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

3,223 193 Updated Sep 21, 2024

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,831 727 Updated Oct 1, 2024

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

Python 4,416 421 Updated Sep 21, 2024

A latent text-to-image diffusion model

Jupyter Notebook 67,745 10,109 Updated Jun 18, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 603 22 Updated Oct 1, 2024

Surgical Visual Question Answering. A transformer-based surgical VQA model. Offical Implementation of "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformers", MICCAI 2022.

Python 46 10 Updated Mar 27, 2023

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,427 661 Updated Aug 12, 2024

Grounded Tracking for Streaming Videos

Jupyter Notebook 32 3 Updated Aug 15, 2024

Official code of the paper ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling accepted at MICCAI 2024.

Python 9 Updated May 27, 2024
Python 14 3 Updated Jul 5, 2021
Python 13 3 Updated Jun 26, 2022

Papers of ComputerVision x Surgery

89 21 Updated Jan 7, 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 153 9 Updated Sep 29, 2024

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 28,085 3,188 Updated Sep 30, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,777 108 Updated Jul 29, 2024
Python 7,097 549 Updated Aug 12, 2024

This project presents a Single Input Multiple Output (SIMO) deep convolutional neural network, a so-called ART-Net (Augmented Reality Tool Network) consisting of an encoder-decoder architecture to …

Jupyter Notebook 18 4 Updated Dec 8, 2021
Jupyter Notebook 3 Updated Sep 25, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,433 134 Updated Sep 24, 2024
Python 2,527 188 Updated Sep 26, 2024

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 290 19 Updated Jul 19, 2024
Python 23 7 Updated Feb 7, 2024

[Nature Biomedical Engineering 2023] Decoding surgical activity from videos with a vision transformer

Python 15 2 Updated Jun 6, 2024
Python 9 Updated Sep 16, 2024
Python 8 Updated Oct 7, 2023

The repository provides code for the evaluation of SAR-RARP50 challenge cathegories, thus action recognition and segmentation, as well as the combined performances.

Python 9 1 Updated Sep 30, 2022
Next