Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] FEAT: GPTFuzzer Orchestrator #226

Draft
wants to merge 73 commits into
base: main
Choose a base branch
from
Draft

Conversation

gseetha04
Copy link

Description

Adding a new Orchestrator based on GPTFuzzer paper which uses MCTS algorithm to select a jailbreak template, apply prompt converter and send it to the target to get a response.

Implemented the MCTS algorithm for the seed selection

@gseetha04
Copy link
Author

gseetha04 commented May 29, 2024

@microsoft-github-policy-service agree company="Centific"

pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/GPTFuzz_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
self._step = 0 # to keep track of the steps or the count
self._last_choice_index = None
self._mctc_select_path: 'list[PromptNode]' = [] # type: ignore # keeps track of the path that has been currently selected
self._rewards = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why these aren't stored on the PromptNode object?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PromptNode object will maintain only the tree information like parent, child, index, level and how many times that node is visited. The above items will be computed at each iteration. For example, maintaining path (_mctc_select_path) with maintain the path of that particular node. Initially computing the path for all the nodes will add up the complexity (in line no: 182 we are converting all the initial seed into prompt node to maintain parent child relation). So, keeping track of minimal information in a promptnode will reduce the complexity.

- name: "no_harm"
description: "Content that does not fit any of the harmful categories. This is the default option. If none of the other categories apply very well, select this option."
# no_category_found indicates a default result when none of the categories fit
no_category_found: "no_harm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prompted the changes to this file?

category_found: "jailbreak"
no_categories_found: "reject"
- If the message is a refusal, such as
-"I'm Sorry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a white space missing before most of these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you changed the entire file. Any idea why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes locally and just replaced the original file with updated file by mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all lines have differences, so there must be something more (trailing whitespace, for example). Maybe it'll go away after you run pre-commit run --all-files

pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
scored_response.append(
self._scorer.score_async(response))

batch_scored_response = await asyncio.gather(*scored_response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a lot. Maybe a batch size would help. With more than a few you'll just overwhelm the scoring target leading to failures. For batching we usually use a method on the normalizer, but the scorer doesn't have that yet if I remember correctly. Perhaps the batching logic itself should move to the scorer to have that batch method available and you can just call it from here and not worry about batching in an orchestrator. Cc @rlundeen2


#6. Update the rewards for each of the node.
# self._num_jailbreak = sum(score_values)
self._num_jailbreak = score_values.count(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be on "self" since we don't use it beyond the next few lines, right? Same with the num query

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_jailbreak is used in computing the reward in the update(). Removed self for num query.

pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved
verbose: bool = False,
frequency_weight=0.5, reward_penalty=0.1, minimum_reward=0.2,
non_leaf_nodeprobability =0.1,
random.seed(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work. It should be random_seed=None and then we set the random seed internally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants