Skip to content
View Charleshhy's full-sized avatar
Block or Report

Block or report Charleshhy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 34 1 Updated Jul 8, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,473 89 Updated Jul 6, 2024

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Python 14 1 Updated Jul 2, 2024

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

257 7 Updated Jul 3, 2024
24 1 Updated Apr 3, 2024

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Python 62 7 Updated Apr 23, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 23 1 Updated Jul 18, 2023

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Python 345 18 Updated May 17, 2024

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Python 1,622 140 Updated May 25, 2024

Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

126 4 Updated Dec 6, 2023

Awesome-LLM-RAG: a curated list of advanced retrieval augmented generation (RAG) in Large Language Models

704 44 Updated May 28, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,035 1,955 Updated Jul 3, 2024

[NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning

Python 27 Updated Jul 5, 2024

【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval

Python 54 3 Updated Apr 16, 2024

[CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"

Python 22 Updated Feb 2, 2024

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 242 21 Updated Jun 6, 2024

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Python 630 40 Updated Jan 10, 2024

torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.

C++ 14 Updated Mar 29, 2024

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

369 23 Updated Jun 12, 2024

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…

TypeScript 15,352 4,011 Updated Jul 8, 2024

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 255 22 Updated Nov 3, 2023

Cross-modal few-shot adaptation with CLIP

Python 292 32 Updated Mar 13, 2024

SAM: Sharpness-Aware Minimization (PyTorch)

Python 1,700 192 Updated Feb 21, 2024

ImageBind One Embedding Space to Bind Them All

Python 8,071 734 Updated Jul 5, 2024

Code for visualizing the loss landscape of neural nets

Python 2,721 388 Updated Apr 5, 2022

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 457 37 Updated Jun 16, 2024

[CVPR 2024] A framework to fine-tune LLaMAs on instruction-following task and get many Stitched LLaMAs with customized number of parameters, e.g., Stitched LLaMA 8B, 9B, and 10B...

7 Updated Dec 1, 2023

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 5,014 317 Updated Jun 27, 2024

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 885 54 Updated Jun 27, 2024
Python 24 1 Updated Feb 29, 2024
Next