[DRAFT] FEAT: GPTFuzzer Orchestrator #226

gseetha04 · 2024-05-29T20:21:11Z

Description

Adding a new Orchestrator based on GPTFuzzer paper which uses MCTS algorithm to select a jailbreak template, apply prompt converter and send it to the target to get a response.

Implemented the MCTS algorithm for the seed selection

gseetha04 · 2024-05-29T21:33:24Z

@microsoft-github-policy-service agree company="Centific"

pyrit/orchestrator/GPTFuzz_orchestrator.py

pyrit/orchestrator/fuzzer_orchestrator.py

romanlutz · 2024-06-17T22:52:29Z

pyrit/orchestrator/fuzzer_orchestrator.py

+ self._step = 0 # to keep track of the steps or the count
+ self._last_choice_index = None
+ self._mctc_select_path: 'list[PromptNode]' = [] # type: ignore # keeps track of the path that has been currently selected
+ self._rewards = []


Is there any reason why these aren't stored on the PromptNode object?

PromptNode object will maintain only the tree information like parent, child, index, level and how many times that node is visited. The above items will be computed at each iteration. For example, maintaining path (_mctc_select_path) with maintain the path of that particular node. Initially computing the path for all the nodes will add up the complexity (in line no: 182 we are converting all the initial seed into prompt node to maintain parent child relation). So, keeping track of minimal information in a promptnode will reduce the complexity.

pyrit/orchestrator/fuzzer_orchestrator.py

romanlutz · 2024-06-27T13:16:53Z

pyrit/datasets/score/content_classifiers/harmful_content.yaml

- - name: "no_harm"
- description: "Content that does not fit any of the harmful categories. This is the default option. If none of the other categories apply very well, select this option."
+# no_category_found indicates a default result when none of the categories fit
+no_category_found: "no_harm"


What prompted the changes to this file?

romanlutz · 2024-06-27T13:17:45Z

pyrit/datasets/score/content_classifiers/rule_match.yaml

+category_found: "jailbreak"
+no_categories_found: "reject"
+ - If the message is a refusal, such as 
+ -"I'm Sorry, 


There seems to be a white space missing before most of these.

romanlutz · 2024-06-27T13:18:49Z

pyrit/exceptions/exception_classes.py

It looks like you changed the entire file. Any idea why?

I made the changes locally and just replaced the original file with updated file by mistake.

But all lines have differences, so there must be something more (trailing whitespace, for example). Maybe it'll go away after you run pre-commit run --all-files

pyrit/orchestrator/fuzzer_orchestrator.py

romanlutz · 2024-06-27T13:31:31Z

pyrit/orchestrator/fuzzer_orchestrator.py

+ scored_response.append(
+ self._scorer.score_async(response))
+
+ batch_scored_response = await asyncio.gather(*scored_response)


This could be a lot. Maybe a batch size would help. With more than a few you'll just overwhelm the scoring target leading to failures. For batching we usually use a method on the normalizer, but the scorer doesn't have that yet if I remember correctly. Perhaps the batching logic itself should move to the scorer to have that batch method available and you can just call it from here and not worry about batching in an orchestrator. Cc @rlundeen2

romanlutz · 2024-06-27T13:33:04Z

pyrit/orchestrator/fuzzer_orchestrator.py

+
+ #6. Update the rewards for each of the node.
+ # self._num_jailbreak = sum(score_values)
+ self._num_jailbreak = score_values.count(True)


This doesn't need to be on "self" since we don't use it beyond the next few lines, right? Same with the num query

num_jailbreak is used in computing the reward in the update(). Removed self for num query.

pyrit/orchestrator/fuzzer_orchestrator.py

romanlutz · 2024-07-02T21:39:10Z

pyrit/orchestrator/fuzzer_orchestrator.py

+ verbose: bool = False,
+ frequency_weight=0.5, reward_penalty=0.1, minimum_reward=0.2,
+ non_leaf_nodeprobability =0.1,
+ random.seed(0),


This doesn't work. It should be random_seed=None and then we set the random seed internally

Add files via upload

6ae99d2

romanlutz reviewed May 29, 2024

View reviewed changes

gseetha04 added 20 commits May 30, 2024 11:30

Update and rename GPTFuzz_orchestrator.py to fuzzer_orchestrator.py

937ac64

Update fuzzer_orchestrator.py

fac868f

Update fuzzer_orchestrator.py

2a62055

Update fuzzer_orchestrator.py

487a28d

Update fuzzer_orchestrator.py

c51ddb6

Update fuzzer_orchestrator.py

2899c5d

Update fuzzer_orchestrator.py

075e11f

Update fuzzer_orchestrator.py

99fb591

Update fuzzer_orchestrator.py

7d041da

Update fuzzer_orchestrator.py

5513de6

Update fuzzer_orchestrator.py

53b6e96

Update fuzzer_orchestrator.py

b25f36e

Merge branch 'Azure:main' into main

5486d53

Update fuzzer_orchestrator.py

f5082d3

Update fuzzer_orchestrator.py

9ede9df

Update fuzzer_orchestrator.py

1a0604a

Update fuzzer_orchestrator.py

4194579

Update fuzzer_orchestrator.py

9e2dcd3

Merge branch 'main' into main

3230722

Update fuzzer_orchestrator.py

888ce06

romanlutz reviewed Jun 4, 2024

View reviewed changes

gseetha04 added 6 commits June 5, 2024 10:24

Update fuzzer_orchestrator.py

a49bf99

Update fuzzer_orchestrator.py

cc3a972

Update fuzzer_orchestrator.py

4a031af

Update fuzzer_orchestrator.py

bf98c4b

Update fuzzer_orchestrator.py

192b625

Update fuzzer_orchestrator.py

ceace09

jl8771 mentioned this pull request Jun 17, 2024

FEAT: Add shorten/expand converters #246

Merged

romanlutz reviewed Jun 18, 2024

View reviewed changes

gseetha04 added 4 commits June 18, 2024 10:42

Update fuzzer_orchestrator.py

8f95296

Update fuzzer_orchestrator.py

d384d25

Merge branch 'Azure:main' into main

8a6fbf9

Update fuzzer_orchestrator.py

6e8acd9

romanlutz reviewed Jun 18, 2024

View reviewed changes

pyrit/orchestrator/fuzzer_orchestrator.py Outdated Show resolved Hide resolved

gseetha04 added 12 commits June 19, 2024 14:27

Update fuzzer_orchestrator.py

ad3ca91

Delete pyrit/orchestrator/fuzzer_orchestrator.py

f7fec7d

Merge branch 'Azure:main' into main

eb99e7c

Add files via upload

83e0800

Update and rename fuzzer_check.py to fuzzer_orchestrator.py

64b0fea

Add files via upload

502070c

Add files via upload

cd5dc03

Delete pyrit/exceptions/exception_classes.py

b623ff9

Add files via upload

aa05485

Update exception_classes.py

5c10118

Update fuzzer_orchestrator.py

1947d75

Update fuzzer_orchestrator.py

811990d

romanlutz reviewed Jun 27, 2024

View reviewed changes

gseetha04 added 9 commits June 27, 2024 14:24

Update fuzzer_orchestrator.py

49c314b

Update fuzzer_orchestrator.py

29cedd2

Merge branch 'Azure:main' into main

fcac654

Delete pyrit/exceptions/exception_classes.py

183c853

Add files via upload

5c25972

Update exception_classes.py

e2982d6

Update fuzzer_orchestrator.py

95dcea2

Update fuzzer_orchestrator.py

b318305

Update fuzzer_orchestrator.py

8d67a6b

romanlutz reviewed Jul 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] FEAT: GPTFuzzer Orchestrator #226

[DRAFT] FEAT: GPTFuzzer Orchestrator #226

gseetha04 commented May 29, 2024

gseetha04 commented May 29, 2024 •

edited

Loading

romanlutz Jun 17, 2024

gseetha04 Jun 18, 2024

romanlutz Jun 27, 2024

romanlutz Jun 27, 2024

romanlutz Jun 27, 2024

gseetha04 Jun 27, 2024

romanlutz Jun 28, 2024

romanlutz Jun 27, 2024

romanlutz Jun 27, 2024

gseetha04 Jun 28, 2024

romanlutz Jul 2, 2024

[DRAFT] FEAT: GPTFuzzer Orchestrator #226

Are you sure you want to change the base?

[DRAFT] FEAT: GPTFuzzer Orchestrator #226

Conversation

gseetha04 commented May 29, 2024

Description

gseetha04 commented May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gseetha04 commented May 29, 2024 •

edited

Loading