Skip to content

šŸ˜Ž up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.

Notifications You must be signed in to change notification settings

liudaizong/Awesome-LVLM-Attack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 

History

29 Commits
Ā 
Ā 
Ā 
Ā 

Repository files navigation

Awesome-LVLM-Attack Awesome

A continual collection of papers related to Attacks on Large-Vision-Language-Models (LVLMs).

Large vision-language models (LVLMs) have achieved significant success and demonstrated promising capabilities in various multimodal downstream tasks. Despite their remarkable capabilities, the increased complexity and deployment of LVLMs have also exposed them to various security threats and vulnerabilities, making the study of attacks on these models a critical area of research.

Here, we've summarized existing LVLM Attack methods in our survey paperšŸ‘.

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

If you find some important work missed, it would be super helpful to let me know ([email protected]). Thanks!

If you find our survey useful for your research, please consider citing:

@article{liu2024attack,
  title={A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends},
  author={Liu, Daizong and Yang, Mingyu and Qu, Xiaoye and Zhou, Pan and Hu, Wei and Cheng, Yu},
  journal={arXiv preprint arXiv:2407.07403},
  year={2024}
}

Table of Contents


Adversarial-Attack

  • On the Adversarial Robustness of Multi-Modal Foundation Models |
  • On Evaluating Adversarial Robustness of Large Vision-Language Models | Github
    • Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, Min Lin
    • Singapore University of Technology and Design, Sea AI Lab, Tsinghua University, Renmin University of China
    • [NeurIPs2023] https://arxiv.org/abs/2305.16934
  • VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models | Github
    • Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma
    • The Pennsylvania State University, Zhejiang University, Xiā€™an Jiaotong University, Dalian University of Technology, Stony Brook University
    • [NeurIPs2023] https://arxiv.org/abs/2312.03777
  • Adversarial Illusions in Multi-Modal Embeddings | Github
  • Image Hijacks: Adversarial Images can Control Generative Models at Runtime | Github
  • How Robust is Google's Bard to Adversarial Image Attacks? | Github
    • Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu
    • Tsinghua University, RealAI
    • [Arxiv2023] https://arxiv.org/abs/2309.11751
  • Misusing Tools in Large Language Models With Visual Adversarial Examples |
    • Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes
    • University of California San Diego, University of Washington
    • [Arxiv2023] https://arxiv.org/abs/2310.03185
  • How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs | Github
    • Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie
    • UC Santa Cruz, UNC-Chapel Hill, University of Edinburgh, University of Oxford, AIWaves Inc.
    • [Arxiv2023] https://arxiv.org/abs/2311.16101
  • InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models |
    • Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang
    • The Hong Kong University of Science and Technology
    • [Arxiv2023] https://arxiv.org/abs/2312.01886
  • OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization |
    • Dongchen Han, Xiaojun Jia, Yang Bai, Jindong Gu, Yang Liu, Xiaochun Cao
    • Sun Yat-sen University, Nanyang Technological University, Tsinghua University, University of Oxford
    • [Arxiv2023] https://arxiv.org/abs/2312.04403
  • An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models | Github
  • Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images | Github
    • Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr, Zhifeng Li, Wei Liu
    • Tsinghua University, Tencent Technology (Beijing), University of Oxford, Tencent Data Platform, Peng Cheng Laboratory
    • [ICLR2024] https://arxiv.org/abs/2401.11170
  • AdversarialĀ Robustness for Visual Grounding of Multimodal Large Language Models |
    • Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, Shu-Tao Xia
    • Tsinghua University, Tencent Security Platform, Peng Cheng Laboratory
    • [ICLRworkshop2024] https://arxiv.org/abs/2405.09981
  • Transferable Multimodal Attack on Vision-Language Pre-training Models |
  • On the Safety Concerns of Deploying LLMs/VLMsĀ in Robotics: Highlighting the Risks and Vulnerabilities |
    • Xiyang Wu, Ruiqi Xian, Tianrui Guan, Jing Liang, Souradip Chakraborty, Fuxiao Liu, Brian Sadler, Dinesh Manocha, Amrit Singh Bedi
    • University of Maryland, Army Research Laboratory, University of Central Florida
    • [Arxiv2024] https://arxiv.org/abs/2402.10340
  • The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative | Github
    • Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, Huan Liu
    • Arizona State University, Michigan State University, Harvard University
    • [Arxiv2024] https://arxiv.org/abs/2402.14859
  • Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images |
    • Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu
    • Technical University of Munich, Ludwig Maximilian University of Munich, Huawei Munich Research Center, University of Oxford
    • [Arxiv2024] https://arxiv.org/abs/2402.14899
  • AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions |
    • Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang
    • Xiā€™an Jiaotong University, Shanghai Artificial Intelligence Laboratory, Osaka University
    • [Arxiv2024] https://arxiv.org/abs/2403.09346
  • Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models |
    • Qi Guo, Shanmin Pang, Xiaojun Jia, Qing Guo
    • Xiā€™an Jiaotong University, Nanyang Technological University, Center for Frontier AI Research
    • [Arxiv2024] https://arxiv.org/abs/2404.10335
  • Adversarial Attacks on Multimodal Agents |
  • Refusing Safe Prompts for Multi-modal Large Language Models |
  • Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models | #
    • Hao Cheng, Erjia Xiao, Chengyuan Yu, Zhao Yao, Jiahang Cao, Qiang Zhang, Jiaxu Wang, Mengshu Sun, Kaidi Xu, Jindong Gu, Renjing Xu
    • The Hong Kong University of Science and Technology, University of Oxford, Hohai University, Hunan University, Drexel University, Beijing University of Technology
    • [Arxiv2024] https://arxiv.org/abs/2409.13174
  • On the Robustness of Large Multimodal Models Against Image Adversarial Attacks |
  • Exploring the Transferability of Visual Prompting for Multimodal Large Language Models | Github #

Jailbreak-Attack

  • Are aligned neural networks adversarially aligned? |
    • Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt
    • Google DeepMind, Stanford, University of Washington, ETH Zurich
    • [NeurIPs2023] https://arxiv.org/abs/2306.15447
  • FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts | Github
    • Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang
    • Tsinghua University, Shandong University, Carnegie Mellon University
    • [Arxiv2023] https://arxiv.org/abs/2311.05608
  • Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts |
    • Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun
    • Huazhong University of Science and Technology, Lehigh University
    • [Arxiv2023] https://arxiv.org/abs/2311.09127
  • MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | Github
    • Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao
    • Shanghai AI Laboratory, East China Normal University, Midea Group, University of Oxford
    • [Arxiv2023] https://arxiv.org/abs/2311.17600
  • Visual Adversarial Examples Jailbreak Aligned Large Language Models | Github
    • Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
    • Princeton University, Stanford University
    • [AAAI2024] https://arxiv.org/abs/2306.13213
  • Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models |
  • Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Github
    • Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin
    • Sea AI Lab, National University of Singapore, Singapore Management University
    • [ICML2024] https://arxiv.org/abs/2402.08567
  • Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone ToĀ JailbreakĀ Attacks | Github
  • Jailbreaking Attack against Multimodal Large Language Model |
  • ImgTrojan: Jailbreaking Vision-Language Models with ONE Image | Github
  • Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models | Github
    • Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
    • Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods
    • [Arxiv2024] https://arxiv.org/abs/2403.09792
  • Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? |
    • Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu
    • LMU Munich, University of Oxford, Siemens AG, Munich Center for Machine Learning, Wuhan University
    • [Arxiv2024] https://arxiv.org/abs/2404.03411
  • JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks | Github
    • Weidi Luo, Siyuan Ma, Xiaogeng Liu, Xiaoyu Guo, Chaowei Xiao
    • The Ohio State University, University of Wisconsin-Madison
    • [Arxiv2024] https://arxiv.org/abs/2404.03027
  • Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security |
    • Yihe Fan, Yuxin Cao, Ziyu Zhao, Ziyao Liu, Shaofeng Li
    • TongJi University, Tsinghua University, Beijing University of Technology, Nanyang Technological University, Peng Cheng Laboratory
    • [Arxiv2024] https://arxiv.org/abs/2404.05264
  • White-box Multimodal Jailbreaks Against Large Vision-Language Models |
    • Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang
    • Fudan University, Shanghai Jiao Tong University, DataGrand Tech
    • [Arxiv2024] https://arxiv.org/abs/2405.17894
  • From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking |
    • Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei
    • University of Southern California, Fudan University, Alibaba Inc.
    • [Arxiv2024] https://arxiv.org/abs/2406.14859
  • Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks |
  • Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character |
    • Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu
    • University of Wisconsinā€“Madison, The Ohio State University, Peking University
    • [Arxiv2024] https://arxiv.org/abs/2405.20773
  • Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts |
    • Yi Liu, Chengjun Cai, Xiaoli Zhang, Xingliang Yuan, Cong Wang
    • Stanford, Harvard, Anthropic, Constellation, MIT, UC Berkeley
    • [Arxiv2024] https://arxiv.org/abs/2407.15050
  • When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? |
    • Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, CristĆ³bal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
    • City University of Hong Kong, University of Science and Technology, The University of Melbourne
    • [Arxiv2024] https://arxiv.org/abs/2407.15211
  • MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models | #

Prompt-Injection

  • Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs |
  • Can Language Models be Instructed to Protect Personal Information? |
    • Yang Chen, Ethan Mendes, Sauvik Das, Wei Xu, Alan Ritter
    • Georgia Institute of Technology, Carnegie Mellon University
    • [Arxiv2023] https://arxiv.org/abs/2310.02224
  • FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts | Github
    • Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang
    • Tsinghua University, Shandong University, Carnegie Mellon University
    • [Arxiv2023] https://arxiv.org/abs/2311.05608
  • MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models | Github
    • Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao
    • Shanghai AI Laboratory, East China Normal University, Midea Group, University of Oxford
    • [Arxiv2023] https://arxiv.org/abs/2311.17600
  • MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance | Github
    • Renjie Pi, Tianyang Han, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang
    • The Hong Kong University of Science and Technology, University of Illinois at Urbana-Champaign, The Hong Kong Polytechnic University
    • [Arxiv2024] https://arxiv.org/abs/2401.02906
  • Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks | Github
  • Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors |
    • Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao
    • University of Michigan Ann arbor, University of Wisconsin Madison, University of Science and Technology of China
    • [Arxiv2024] https://arxiv.org/abs/2405.10529
  • Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection | #
  • Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models | #
    • Hao Cheng, Erjia Xiao, Chengyuan Yu, Zhao Yao, Jiahang Cao, Qiang Zhang, Jiaxu Wang, Mengshu Sun, Kaidi Xu, Jindong Gu, Renjing Xu
    • The Hong Kong University of Science and Technology, University of Oxford, Hohai University, Hunan University, Drexel University, Beijing University of Technology
    • [Arxiv2024] https://arxiv.org/abs/2409.13174
  • Exploring the Transferability of Visual Prompting for Multimodal Large Language Models | Github #

Data-Poisoning

  • Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models | Github
    • Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang
    • University of Maryland, College Park, JPMorgan AI Research, University of Waterloo, Salesforce Research
    • [Arxiv2024] https://arxiv.org/abs/2402.06659
  • PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models | Github
    • Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia
    • Pennsylvania State University, Wuhan University, Illinois Institute of Technology
    • [Arxiv2024] https://arxiv.org/abs/2402.07867
  • Test-Time Backdoor Attacks on Multimodal Large Language Models | Github
    • Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin
    • Southern University of Science and Technology, Sea AI Lab, University of California
    • [Arxiv2024] https://arxiv.org/abs/2402.08577
  • VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models |
    • Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao
    • Sun Yat-sen University
    • [Arxiv2024] https://arxiv.org/abs/2402.13851
  • Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models |
    • Zhenyang Ni, Rui Ye, Yuxi Wei, Zhen Xiang, Yanfeng Wang, Siheng Chen
    • Shanghai Jiao Tong University, University of Illinois Urbana-Champaign, Shanghai AI Laboratory, Multi-Agent Governance & Intelligence Crew
    • [Arxiv2024] https://arxiv.org/abs/2404.12916
  • Revisiting Backdoor Attacks against Large Vision-Language Models |
    • Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Ee-Chien Chang, Xiaochun Cao
    • National University of Singapore, Sun Yat-sen University, Beihang University
    • [Arxiv2024] https://arxiv.org/abs/2406.18844

About

šŸ˜Ž up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published