- Shanghai, China
- https://ymzhang0319.github.io/
Highlights
- Pro
Block or Report
Block or report ymzhang0319
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
[ICML 2024 Poster] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"
[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型,能够根据多种控制生成自然和谐的结果!
Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
code for AAAI accepted paper Similarity Distribution based Membership Inference Attack on Person Re-Identification.
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
哔哩下载姬downkyi,哔哩哔哩网站视频下载工具,支持批量下载,支持8K、HDR、杜比视界,提供工具箱(音视频提取、去水印等)。
The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
a research paper for generative cartoon interpolation
ChordNova is a powerful open-source chord progression analysis plus generation software with unprecedentedly detailed control over chord trait parameters, that is way above mainstream softwares. Ru…
Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation
Guide diffusion on ImageBind embedding similarity
Lumina-T2X is a unified framework for Text to Any Modality Generation
Official Repo for CVPR 2024 Paper "FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Fully-Supervised Action Segmentation"
Robust Speech Recognition via Large-Scale Weak Supervision