Multimodal Content Generation

Review

Artifcial intelligence in the creative industries: a review.
N Anantrasirichai, D Bull.
Artifcial Intelligence Review, 2021.

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?.
C Zhang, C Zhang, S Zheng, Y Qiao, C Li, et al.
arXiv, 2023.

State of the art on diffusion models for visual computing.
R Po, W Yifan, V Golyanik, K Aberman, JT Barron, AH Bermano, ER Chan, T Dekel, et al.
arXiv:2310.07204, 2023 [Paper]

Image

Generation

Layout

Image Generation from Layout.
B Zhao, L Meng, W Yin, L Sigal.
CVPR, 2019. [Paper] [Github]

Layout2image Image Generation from Layout.
B Zhao, W Yin, L Meng, L Sigal.
IJCV, 2020.

Editing

In-domain gan inversion for real image editing.
J Zhu, Y Shen, D Zhao, B Zhou.
ECCV, 2020.

Anycost gans for interactive image synthesis and editing.
J Lin, R Zhang, F Ganz, S Han, et al.
CVPR, 2021.

EditGAN: High-Precision Semantic Image Editing.
H Ling, K Kreis, D Li, SW Kim, et al.
NIPS, 2021.

Controllable

Condition-Aware Neural Network for Controlled Image Generation.
H Cai, M Li, Q Zhang, MY Liu, S Han.
CVPR, 2024.

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation.
M Huang, Y Long, X Deng, R Chu, J Xiong, X Liang, H Cheng, Q Lu, W Liu.
arXiv:2403.08857, 2024. [Paper] [Github]

Diffusion

Layoutdiffusion: Controllable diffusion model for layout-to-image generation.
G Zheng, X Zhou, X Li, Z Qi, et al.
CVPR, 2023. [PDF]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
JT Hoe, X Jiang, CS Chan, et al.
CVPR, 2024. [PDF] [Github]

Composition

Making images real again: A comprehensive survey on deep image composition.
L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang, et al.
arXiv, 2021. [Paper]

Current advances and future perspectives of image fusion: A comprehensive review.
S Karim, G Tong, J Li, A Qadir, U Farooq, Y Yu .
Information Fusion, 2023. [Paper]

Applications

Intelligent design of multimedia content in Alibaba.
K. Liu, and et al.
Front Inform Technol Electron Eng, 2019, 20(12):1657-1664. [Paper] [Github]

Content-aware generative modeling of graphic design layouts.
X Zheng, X Qiao, Y Cao, RWH Lau.
TOG, 2019.

Advertisment

Automatic synthesis of advertising images according to a specified style.
W. You, and et al.
Front Inform Technol Electron Eng, 2020. [PDF] [Github]

Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce.
S Vempati, KT Malayil, V Sruthi, R Sandeep.
FRS, 2020.

N" uwa: Visual synthesis pre-training for neural visual world creation.
C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, et al.
ArXiv, 2021.

Vinci: An Intelligent Graphic Design System for Generating Advertising Posters.
S Guo, Z Jin, F Sun, J Li, Z Li, Y Shi, N Cao.
CHI, 2021.

Preparing for an era of deepfakes and AI-generated ads: A framework for understanding responses to manipulated advertising.
C Campbell, K Plangger, S Sands, et al.
Journal of Advertisment, 2021.

Manipulation

Learning Rich Features for Image Manipulation Detection.
P Zhou, X Han, VI Morariu, et al.
CVPR, 2018. [PDF]

Faceforensics++: Learning to detect manipulated facial images.
A Rossler, D Cozzolino, L Verdoliva, et al.
CVPR, 2019.

Constrained R-CNN A general image manipulation detection model.
C Yang, H Li, F Lin, B Jiang, et al.
ICME, 2020. [PDF]

Media Forensics and DeepFakes.
L Verdoliva.
IEEE Journal of Selected Topics in Signal Processing, 2020. [PDF]

The creation and detection of deepfakes: A survey.
Y Mirsky, W Lee.
ACM Computing Surveys (CSUR), 2021.

Multi-Modality Image Manipulation Detection.
C Yang, Z Wang, H Shen, H Li, et al.
ICME, 2021. [PDF]

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples.
S Hussain, P Neekhara, M Jere, et al.
WACV, 2021. [PDF]

Exploiting deep generative prior for versatile image restoration and manipulation.
X Pan, X Zhan, B Dai, D Lin, CC Loy, et al.
TPAMI, 2021.

Video

Generation

Video to Video Synthesis.
TC Wang, MY Liu, JY Zhu, G Liu, A Tao, J Kautz, et al.
NIPS, 2018.

Mocogan: Decomposing motion and content for video generation.
S Tulyakov, MY Liu, X Yang, et al.
CVPR, 2018.

Playable Video Generation.
W Menapace, S Lathuilière, et al.
CVPR, 2021. [PDF]

A good image generator is what you need for high-resolution video synthesis.
Y Tian, J Ren, M Chai, K Olszewski, X Peng, et al.
ICLR, 2021. [PDF]

From Sora What We Can See: A Survey of Text-to-Video Generation.
R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li, H Duan, B Wei, R Ranjan.
arXiv:2405.10674, 2024. [PDF]

Sora as an agi world model? a complete survey on text-to-video generation.
J Cho, FD Puspitasari, S Zheng, J Zheng, LH Lee, TH Kim, CS Hong, C Zhang.
arXiv:2403.05131, 2024. [PDF]

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.
Y Zhang, Y Kang, Z Zhang, X Ding, S Zhao, X Yue.
arXiv:2402.03040, 2024. [PDF] [Github]

Manipulation

Deepfake Video Detection Using Recurrent Neural Networks.
D Güera, EJ Delp.
AVSS, 2018. [PDF]

Faceforensics: A large-scale video dataset for forgery detection in human faces.
A Rössler, D Cozzolino, L Verdoliva, C Riess, et al.
ArXiv, 2018. [PDF]

Mesonet: a compact facial video forgery detection network.
D Afchar, V Nozick, J Yamagishi, et al.
WIFS, 2018. [PDF]

Face Forensics in the Wild.
T Zhou, W Wang, Z Liang, et al.
CVPR, 2021.

Text

Generation

awesome-text-generation.
[Github]

NLP Text Generation.
[Github]

Manipulation

Online handwritten signature verification using feature weighting algorithm relief.
L Yang, Y Cheng, X Wang, Q Liu.
Soft Computing, 2018. [PDF]

Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification.
LG Hafemann, R Sabourin, et al.
IEEE Transactions on Information Forensics and Security, 2020. [PDF]

TextStyleBrush: Transfer of Text Aesthetics from a Single Example.
P Krishnan, R Kovvuri, G Pang, B Vassilev, et al.
ArXiv, 2021. [PDF]

Audio

Generation

Wavenet: A generative model for raw audio.
A Oord, S Dieleman, H Zen, K Simonyan, et al.
ArXiv, 2016.

Applications of Deep Learning to Audio Generation.
Y Zhao, X Xia, R Togneri.
ICSM, 2018.

Gansynth: Adversarial neural audio synthesis.
J Engel, KK Agrawal, S Chen, I Gulrajani, et al.
ICLR, 2019.

magenta
Magenta is a research project exploring the role of machine learning in the process of creating art and music.
[Github]

Manipulation

All your voices are belong to us: Stealing voices to fool humans and machines.
D Mukhopadhyay, M Shirvanian, N Saxena.
ESORICS, 2015. [PDF]

Deepsonar: Towards effective and robust detection of ai-synthesized fake voices.
R Wang, F Juefei-Xu, Y Huang, Q Guo, X Xie, et al.
MM, 2018. [PDF]

ASVspoof 2019: Future horizons in spoofed and fake audio detection.
M Todisco, X Wang, V Vestman, M Sahidullah, et al.
ArXiv, 2019. [PDF]

Deep4SNet: deep learning for fake speech classification.
DM Ballesteros, Y Rodriguez-Ortega, D Renza, et al.
ESWA, 2021. [PDF]

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Content Generation

Review

Image

Generation

Layout

Editing

Controllable

Diffusion

Composition

Applications

Advertisment

Manipulation

Video

Generation

Manipulation

Text

Generation

Manipulation

Audio

Generation

Manipulation

About

Releases

Packages

MMDSPF/Multimodal-Content-Generation-Resources

Folders and files

Latest commit

History

Repository files navigation

Multimodal Content Generation

Review

Image

Generation

Layout

Editing

Controllable

Diffusion

Composition

Applications

Advertisment

Manipulation

Video

Generation

Manipulation

Text

Generation

Manipulation

Audio

Generation

Manipulation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages