-
The University of Hong Kong
- Hong Kong
- ttengwang.com
Highlights
- Pro
Block or Report
Block or report ttengwang
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
SEED-Story: Multimodal Long Story Generation with Large Language Model
This is a repo to track the latest autoregressive visual generation papers.
mllm-npu: training multimodal large language models on Ascend NPUs
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
A generative speech model for daily dialogue.
Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to improve performance of numerous vision language performances for diverse capa…
Create LLM agents with long-term memory and custom tools 📚🦙
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Convert PDF to markdown quickly with high accuracy
Unified Audio-Visual Perception for Multi-Task Video Localization
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Sample LaTex file for HKU PhD thesis.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
[ACL 2024] Progressive LLaMA with Block Expansion.
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.