Skip to content

Short list of indirect prompt injection attacks for OpenAI-based models.

Notifications You must be signed in to change notification settings

WibblyOWobbly/WideOpenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 

Repository files navigation

WideOpenAI

THIS REPO IS FOR EDUCATIONAL PURPOSES ONLY!

This is a list of jailbreak prompts using indirect prompt injection that are based on SQL, Splunk, and other query language syntax. Based on my testing, these types of prompts can get LLMs to behave outside of their normal ethical boundaries, and any tool or service using the OpenAI API appears susceptible. These were inspired by elder-plinius's work here: https://github.com/elder-plinius/L1B3RT45

Update: This repo was renamed to better reflect the content within it, going from "PromptShieldBreaker" to "WideOpenAI".

Update: They have so far been tested and confirmed to work on:

  • Custom Azure OpenAI applications (original research, as of June 7, 2024)
  • Stock Microsoft Copilot - Balanced (new, as of June 19, 2024)
  • Stock ChatGPT GPT-4o (new, as of June 19, 2024)

Azure OpenAI Test Environment Configuration

Note: The apps tested had the following configurations:

  • Deployment: GTP-4o
  • Data Source: Azure Blob Storage + Azure AI Search
    • CORS enabled
    • results were not limited only to the uploaded test data
  • Test Data:
    • 3 mock radiology reports (PHI)
    • 3 mock home improvement retail invoices (PCI)
    • 3 medical industy white papers (public)
  • Content Filters:
    • Default Prompt and Completion filters
    • Enabled additional content safety models:
      • Prompt Shield for jailbreak attacks enabled
      • Prompt Shield for indirect attacks enabled
      • Protect material text enabled
      • Protected material code enabled

Querying Tips

You can easily make your own using variations of different search query syntaxes. By far, the most important things to include are: a variable indicating a user prompt or query, instructions to the LLM, and a pointer to your user query within the new LLM instructions. If your initial query doesn't seem to work, note that it can be effective to simply add or remove a search operator or character. The specific query guides that I used for this repo are below:

Original Azure OpenAI Examples

Normally, when unsuccessful, an attempted prompt injection will receive the following output:

1_ze5CAVpv2GjIhuhTKUKxjA

Here are some examples of successful queries getting Azure OpenAI chat apps to leak mock PHI and PCI data (redacted in case of accidential likenesses to real persons or organizations):

success03

The following example shows credit card information in the output: image

New OpenAI Examples

Failed ChatGPT GPT-4o keylogger attempt: image

Successful ChatGPT GPT-4o keylogger attempt using a Splunk-based query: image

Failed Copilot keylogger attempt: image

Successful Copilot keylogger attempt using a Splunk-based query: image

THIS REPO IS FOR EDUCATIONAL PURPOSES ONLY!