Skip to content
View lizhaoliu-Lec's full-sized avatar
🎯
Focusing
🎯
Focusing
  • South China University of Technology
  • Guangzhou/China

Block or report lizhaoliu-Lec

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 3,141 245 Updated Aug 14, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,238 776 Updated Aug 21, 2024

Enable macOS HiDPI and have a native setting.

Shell 8,598 985 Updated Jul 3, 2024

official implementation for ECCV 2024 paper "Prioritized Semantic Learning for Zero-shot Instance Navigation"

Python 5 1 Updated Jul 15, 2024
5 Updated Jul 16, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 1,946 31 Updated Jun 6, 2024

This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR2024).

Python 38 3 Updated Aug 12, 2024

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

943 62 Updated Jul 4, 2024

Grok open release

Python 49,394 8,332 Updated Aug 30, 2024
Jupyter Notebook 351 45 Updated Dec 5, 2023

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,550 441 Updated May 3, 2024

SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Python 91 7 Updated Jan 12, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 678 43 Updated Jul 29, 2024

Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"

Python 176 9 Updated Sep 7, 2023

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 735 37 Updated Jun 2, 2024

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Python 10,133 1,528 Updated Aug 29, 2024
1 Updated Jan 19, 2024

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2

Python 239 48 Updated Aug 20, 2024

Dual Regression Compression for SR Models

Python 5 Updated Jan 8, 2024
Python 14 Updated Dec 13, 2023

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Python 229 15 Updated Jun 7, 2023
Jupyter Notebook 750 71 Updated Aug 7, 2024

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Python 491 25 Updated Jun 11, 2024
Python 332 12 Updated Jul 29, 2024
Python 8,285 483 Updated Jan 27, 2024

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

Python 524 103 Updated Oct 29, 2023

This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.

Jupyter Notebook 673 61 Updated Oct 17, 2023

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Python 470 29 Updated May 8, 2024

Open-Set Grounded Text-to-Image Generation

Python 1,960 147 Updated Mar 6, 2024

Grounded Language-Image Pre-training

Python 2,135 190 Updated Jan 24, 2024
Next