Skip to content
View songkq's full-sized avatar

Block or report songkq

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,816 725 Updated Oct 1, 2024

MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs.

Python 58 3 Updated Sep 29, 2024
Python 2,526 187 Updated Sep 26, 2024

A curated list of awesome Multimodal studies.

HTML 73 5 Updated Sep 23, 2024

Official implementation of AnimateDiff.

Python 10,351 847 Updated Jul 31, 2024

AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI

Python 3,059 253 Updated Sep 22, 2024

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 1,207 91 Updated Aug 22, 2024

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥

23 2 Updated Sep 29, 2024

Next-Token Prediction is All You Need

Python 696 16 Updated Sep 30, 2024

Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"

Python 128 6 Updated Sep 26, 2024

Handwritten Text Recognition and Character Detection

Python 65 3 Updated Sep 26, 2024

A reading list of video generation

374 29 Updated Sep 30, 2024
Python 172 3 Updated Jul 15, 2024

The related works and background techniques about Openai o1

82 4 Updated Sep 24, 2024

The successful integration of Qwen2-VL-Instruct into the ComfyUI platform has enabled a smooth operation, supporting (but not limited to) text-based queries, video queries, single-image queries, an…

Python 55 6 Updated Sep 26, 2024

Recaption large (Web)Datasets with vllm and save the artifacts.

Python 29 2 Updated Sep 24, 2024

This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.

Python 38 8 Updated Sep 18, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

3,950 214 Updated Oct 1, 2024

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 347 18 Updated Sep 19, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 311 16 Updated Sep 25, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,599 260 Updated Oct 1, 2024

An ML Systems Onboarding list

516 20 Updated Jul 23, 2024

GPU programming related news and material links

1,140 69 Updated Sep 23, 2024

Efficient Triton Kernels for LLM Training

Python 3,103 159 Updated Sep 30, 2024

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Python 19 5 Updated Sep 13, 2024

code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"

Python 1 Updated Mar 14, 2024

[ICML'2024] Can AI Assistants Know What They Don't Know?

Python 2 Updated Feb 5, 2024

An open-source RAG-based tool for chatting with your documents.

Python 13,094 978 Updated Oct 1, 2024

NLP Zero to Hero in just 10 Kernels

Jupyter Notebook 476 61 Updated Sep 22, 2024
Next