Skip to content
View ChuniHiro's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report ChuniHiro

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let your Claude able to think

JavaScript 4,101 480 Updated Nov 16, 2024

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

Jupyter Notebook 247 12 Updated Aug 12, 2024

A comprehensive collection of IQA papers

TeX 1,004 67 Updated Nov 5, 2024

B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.

Python 26 4 Updated Jun 3, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,221 81 Updated Aug 13, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,960 739 Updated Nov 15, 2024

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,482 107 Updated Jul 5, 2024

Contrastive Language-Audio Pretraining

Python 1,415 137 Updated Jul 9, 2024

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Python 192 14 Updated Oct 2, 2024
Python 10 1 Updated Sep 25, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,095 188 Updated Oct 4, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,596 122 Updated Sep 6, 2024
Python 2,872 239 Updated Oct 16, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,238 2,236 Updated Aug 12, 2024

A Python project which can detect gender and age using OpenCV of the person (face) in a picture or through webcam.

Python 489 204 Updated May 16, 2024

Real-time estimation of gender and age

Python 156 39 Updated May 10, 2019

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,014 465 Updated Nov 15, 2024

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 161 19 Updated May 29, 2024
Python 19 1 Updated Apr 26, 2024

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Python 775 113 Updated Mar 26, 2024

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 6,279 619 Updated Nov 15, 2024

坚持分享 GitHub 上高质量、有趣实用的开源技术教程、开发者工具、编程网站、技术资讯。A list cool, interesting projects of GitHub.

32,496 3,562 Updated May 29, 2024

Generative models for conditional audio generation

Python 2,715 258 Updated Nov 5, 2024

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Python 107 11 Updated Apr 23, 2024

Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation

Python 104 9 Updated Jan 18, 2023

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 16,441 1,620 Updated Nov 12, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,595 886 Updated Oct 22, 2024

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Python 13,364 1,218 Updated Oct 30, 2024

Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)

Jupyter Notebook 724 64 Updated Jul 18, 2023
Next