Skip to content

Resources for Multimodal Content Generation & Manipulation Detection

Notifications You must be signed in to change notification settings

MMDSPF/Multimodal-Content-Generation-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

Multimodal Content Generation

Review

Artifcial intelligence in the creative industries: a review.
N Anantrasirichai, D Bull.
Artifcial Intelligence Review, 2021.

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?.
C Zhang, C Zhang, S Zheng, Y Qiao, C Li, et al.
arXiv, 2023.

State of the art on diffusion models for visual computing.
R Po, W Yifan, V Golyanik, K Aberman, JT Barron, AH Bermano, ER Chan, T Dekel, et al.
arXiv:2310.07204, 2023 [Paper]

Image

Generation

Layout

Image Generation from Layout.
B Zhao, L Meng, W Yin, L Sigal.
CVPR, 2019. [Paper] [Github]

Layout2image Image Generation from Layout.
B Zhao, W Yin, L Meng, L Sigal.
IJCV, 2020.

Editing

In-domain gan inversion for real image editing.
J Zhu, Y Shen, D Zhao, B Zhou.
ECCV, 2020.

Anycost gans for interactive image synthesis and editing.
J Lin, R Zhang, F Ganz, S Han, et al.
CVPR, 2021.

EditGAN: High-Precision Semantic Image Editing.
H Ling, K Kreis, D Li, SW Kim, et al.
NIPS, 2021.

Controllable

Condition-Aware Neural Network for Controlled Image Generation.
H Cai, M Li, Q Zhang, MY Liu, S Han.
CVPR, 2024.

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation.
M Huang, Y Long, X Deng, R Chu, J Xiong, X Liang, H Cheng, Q Lu, W Liu.
arXiv:2403.08857, 2024. [Paper] [Github]

Diffusion

Layoutdiffusion: Controllable diffusion model for layout-to-image generation.
G Zheng, X Zhou, X Li, Z Qi, et al.
CVPR, 2023. [PDF]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
JT Hoe, X Jiang, CS Chan, et al.
CVPR, 2024. [PDF] [Github]

Composition

Making images real again: A comprehensive survey on deep image composition.
L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang, et al.
arXiv, 2021. [Paper]

Current advances and future perspectives of image fusion: A comprehensive review.
S Karim, G Tong, J Li, A Qadir, U Farooq, Y Yu .
Information Fusion, 2023. [Paper]

Applications

Intelligent design of multimedia content in Alibaba.
K. Liu, and et al.
Front Inform Technol Electron Eng, 2019, 20(12):1657-1664. [Paper] [Github]

Content-aware generative modeling of graphic design layouts.
X Zheng, X Qiao, Y Cao, RWH Lau.
TOG, 2019.

Advertisment

Automatic synthesis of advertising images according to a specified style.
W. You, and et al.
Front Inform Technol Electron Eng, 2020. [PDF] [Github]

Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce.
S Vempati, KT Malayil, V Sruthi, R Sandeep.
FRS, 2020.

N" uwa: Visual synthesis pre-training for neural visual world creation.
C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, et al.
ArXiv, 2021.

Vinci: An Intelligent Graphic Design System for Generating Advertising Posters.
S Guo, Z Jin, F Sun, J Li, Z Li, Y Shi, N Cao.
CHI, 2021.

Preparing for an era of deepfakes and AI-generated ads: A framework for understanding responses to manipulated advertising.
C Campbell, K Plangger, S Sands, et al.
Journal of Advertisment, 2021.

Manipulation

Learning Rich Features for Image Manipulation Detection.
P Zhou, X Han, VI Morariu, et al.
CVPR, 2018. [PDF]

Faceforensics++: Learning to detect manipulated facial images.
A Rossler, D Cozzolino, L Verdoliva, et al.
CVPR, 2019.

Constrained R-CNN A general image manipulation detection model.
C Yang, H Li, F Lin, B Jiang, et al.
ICME, 2020. [PDF]

Media Forensics and DeepFakes.
L Verdoliva.
IEEE Journal of Selected Topics in Signal Processing, 2020. [PDF]

The creation and detection of deepfakes: A survey.
Y Mirsky, W Lee.
ACM Computing Surveys (CSUR), 2021.

Multi-Modality Image Manipulation Detection.
C Yang, Z Wang, H Shen, H Li, et al.
ICME, 2021. [PDF]

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples.
S Hussain, P Neekhara, M Jere, et al.
WACV, 2021. [PDF]

Exploiting deep generative prior for versatile image restoration and manipulation.
X Pan, X Zhan, B Dai, D Lin, CC Loy, et al.
TPAMI, 2021.

Video

Generation

Video to Video Synthesis.
TC Wang, MY Liu, JY Zhu, G Liu, A Tao, J Kautz, et al.
NIPS, 2018.

Mocogan: Decomposing motion and content for video generation.
S Tulyakov, MY Liu, X Yang, et al.
CVPR, 2018.

Playable Video Generation.
W Menapace, S Lathuilière, et al.
CVPR, 2021. [PDF]

A good image generator is what you need for high-resolution video synthesis.
Y Tian, J Ren, M Chai, K Olszewski, X Peng, et al.
ICLR, 2021. [PDF]

From Sora What We Can See: A Survey of Text-to-Video Generation.
R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li, H Duan, B Wei, R Ranjan.
arXiv:2405.10674, 2024. [PDF]

Sora as an agi world model? a complete survey on text-to-video generation.
J Cho, FD Puspitasari, S Zheng, J Zheng, LH Lee, TH Kim, CS Hong, C Zhang.
arXiv:2403.05131, 2024. [PDF]

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.
Y Zhang, Y Kang, Z Zhang, X Ding, S Zhao, X Yue.
arXiv:2402.03040, 2024. [PDF] [Github]

Manipulation

Deepfake Video Detection Using Recurrent Neural Networks.
D Güera, EJ Delp.
AVSS, 2018. [PDF]

Faceforensics: A large-scale video dataset for forgery detection in human faces.
A Rössler, D Cozzolino, L Verdoliva, C Riess, et al.
ArXiv, 2018. [PDF]

Mesonet: a compact facial video forgery detection network.
D Afchar, V Nozick, J Yamagishi, et al.
WIFS, 2018. [PDF]

Face Forensics in the Wild.
T Zhou, W Wang, Z Liang, et al.
CVPR, 2021.

Text

Generation

awesome-text-generation.
[Github]

NLP Text Generation.
[Github]

Manipulation

Online handwritten signature verification using feature weighting algorithm relief.
L Yang, Y Cheng, X Wang, Q Liu.
Soft Computing, 2018. [PDF]

Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification.
LG Hafemann, R Sabourin, et al.
IEEE Transactions on Information Forensics and Security, 2020. [PDF]

TextStyleBrush: Transfer of Text Aesthetics from a Single Example.
P Krishnan, R Kovvuri, G Pang, B Vassilev, et al.
ArXiv, 2021. [PDF]

Audio

Generation

Wavenet: A generative model for raw audio.
A Oord, S Dieleman, H Zen, K Simonyan, et al.
ArXiv, 2016.

Applications of Deep Learning to Audio Generation.
Y Zhao, X Xia, R Togneri.
ICSM, 2018.

Gansynth: Adversarial neural audio synthesis.
J Engel, KK Agrawal, S Chen, I Gulrajani, et al.
ICLR, 2019.

magenta
Magenta is a research project exploring the role of machine learning in the process of creating art and music.
[Github]

Manipulation

All your voices are belong to us: Stealing voices to fool humans and machines.
D Mukhopadhyay, M Shirvanian, N Saxena.
ESORICS, 2015. [PDF]

Deepsonar: Towards effective and robust detection of ai-synthesized fake voices.
R Wang, F Juefei-Xu, Y Huang, Q Guo, X Xie, et al.
MM, 2018. [PDF]

ASVspoof 2019: Future horizons in spoofed and fake audio detection.
M Todisco, X Wang, V Vestman, M Sahidullah, et al.
ArXiv, 2019. [PDF]

Deep4SNet: deep learning for fake speech classification.
DM Ballesteros, Y Rodriguez-Ortega, D Renza, et al.
ESWA, 2021. [PDF]

About

Resources for Multimodal Content Generation & Manipulation Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published