Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIPRO optimizer updates for paper release #1169

Merged
merged 9 commits into from
Jun 21, 2024
Merged

MIPRO optimizer updates for paper release #1169

merged 9 commits into from
Jun 21, 2024

Conversation

XenonMolecule
Copy link
Collaborator

Updates the MIPRO optimizer to add in program aware proposal, minibatching, and refactor to dspy.propose pattern in line with the paper release: https://arxiv.org/abs/2406.11695

Also updates the MIPRO HotPotQA notebook and creates a ScoNe notebook. Note that neither notebook has a functioning cache yet.

Old MIPRO is preserved temporarily in mipro_optimizer_deprecated.py

@XenonMolecule
Copy link
Collaborator Author

Note that this PR makes breaking changes to the MIPRO optimizer and is not backwards compatible.

Also switch all the old test cases back
@XenonMolecule
Copy link
Collaborator Author

Instead of outright breaking old MIPRO, I instead deprecated it. TODO: Make test cases compatible with new MIPRO.

New MIPRO has been renamed to MIPROv2.

@rawwerks
Copy link

can't wait! this is going to be so awesome.

@mikeedjones
Copy link

mikeedjones commented Jun 18, 2024

This MR includes a reversion of #920

I made some other suggestions in #918 on how the "extend gen" logic might affect programs and how that could be better documented

Comment on lines +103 to +108
new_kwargs = {
**kwargs,
"max_tokens": max_tokens,
"n": 1,
"temperature": 0.0,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting n=1 here causes a regression on issues #914 and #749

@XenonMolecule
Copy link
Collaborator Author

Hey Mike, thanks for looking at this! I reverted predict.py back because it was causing a few test programs I ran to crash, not to mention the MIPRO optimizer itself. This was specifically due to the max_depth assertion. We’ve been developing off an old branch of the repo so when I brought everything up to date it seemed like an issue in the predictor logic. I’m happy to remove my changes to predict.py, but do you have suggestions for handling the case where the LM doesn’t generate all fields? Is it a matter of changing the kwargs to predict to increase the max_depth?

@mikeedjones
Copy link

mikeedjones commented Jun 19, 2024

My guess is the crash would be caused by the LM skipping a field in one of the candidate generations and that skip not being picked up by the extend generation logic. Then, because the LM has moved on to another filed in the signature it would be very unlikely to go back and fill the earlier field, independently of how many levels of recursion.

Will to do some investigation as to why - the extend generation logic should roll back to the first skipped field so if the above is the behaviour, then it's a bug in #920 's implementation.

As to why that is never seen in the old logic - I'm stumped. I tried to replicate as closely as possible the logic in there. Something else to figure out!

EDIT: Added some tests in #1172 to try and define how the extend generation logic ought to work - neither main or this branch's implementation passes - but I think those tests cover how I expected the logic to behave.

Looking into it a little more in #1172 (comment)

Comment on lines 173 to 183
# if self.program_aware: # Use full program demo
# for pred_key in demo_candidates.keys(): # Go through each predictor
# for example in demo_candidates[pred_key][demo_set_i]:
# if "augmented" in example.keys():
# fields_to_use = get_signature(program.predictors()[pred_key]).fields
# example_string = create_example_string(fields_to_use, example)
# task_demos += f"{example_string}\n"
# curr_demos_num+=1
# if curr_demos_num>=max_demos:
# break
# else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be left in?

Comment on lines 48 to 67
# def parse_list_of_instructions(instruction_string):
# # Convert the string representation of a list to an actual list
# try:
# instructions = json.loads(instruction_string)
# except json.JSONDecodeError as e:
# print(f"Error decoding JSON: {e}")
# return []

# return instructions
def parse_list_of_instructions(instruction_string):
# Try to convert the string representation of a list to an actual list using JSON
try:
instructions = json.loads(instruction_string)
return instructions
except json.JSONDecodeError:
pass

# If JSON decoding fails, extract strings within quotes
instructions = re.findall(r'"([^"]*)"', instruction_string)
return instructions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which of these is correct?

Comment on lines 69 to 82
# def get_program_instruction_set_string(program):
# instruction_set_dict = {}
# for i, pred in enumerate(program.predictors()):
# pred_instructions = get_signature(pred).instructions
# instruction_set_dict[f"Module {i} Instruction"] = pred_instructions
# return '\n'.join(f"{key}: {value}" for key, value in instruction_set_dict.items())

def get_program_instruction_set_string(program):
instruction_list = []
for _, pred in enumerate(program.predictors()):
pred_instructions = get_signature(pred).instructions
instruction_list.append(f"\"{pred_instructions}\"")
# Joining the list into a single string that looks like a list
return f"[{', '.join(instruction_list)}]"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can one be removed?

break

# hard_examples.append(example)
#breakpoint()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debugging statements?

print("Bootstrapping complete!")
break

# hard_examples.append(example)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard examples is never appended to?


def _bootstrap_one_example(self, example, round_idx=0):
name2traces = self.name2traces
teacher = self.teacher #.deepcopy()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be deepcopied or not?

if total_successes > (total_attempts - min_failed_attempts):
break
# if this example failed the right number of times and is therefore 'hard', let's add it as one of our demo examples
# print(f"Total Successes: {total_successes} | Total Failures: {total_attempts-total_successes}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought dspy had moved from print statements to logging?

self.trainset = trainset
self.valset = valset

raise Exception("NOTE THAT THIS HAS BEEN EXPENSIVE TO RUN. CHECK IN WITH KRISTAOO BEFORE DOING A FULL RUN.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KRISTA00 is gonna be busy!

# Joining all the field strings
return '\n'.join(output)

# def create_predictor_level_history_string(base_program, predictor_i, trial_logs, top_n):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?


return instruction_set_history_string

# def parse_list_of_instructions(instruction_string):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?

import sys

import numpy as np
from IPython.core.magics.code import extract_symbols
Copy link

@mikeedjones mikeedjones Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPython is a dependency? does it need to be added to requirements.txt?

Elsewhere IPython is used it's wrapped in:

try:
    from IPython.display import display as ipython_display
except ImportError:
    ipython_display = print
from dsp.utils import EM

@XenonMolecule
Copy link
Collaborator Author

XenonMolecule commented Jun 19, 2024

Thanks Mike I addressed your comments and cleaned up the code from the dev setting based on your comments! In particular I also removed hard bootstrapping from this PR as we aren't using it for this particular Optimizer.

If you are looking for a specific test setting where the propose.py was failing, you can check out this notebook: 'examples/qa/hotpot/hotpotqa_with_MIPRO.ipynb'. I'll take a closer look at #1172.

@mikeedjones
Copy link

Thanks! Based on what I saw with the tests I think it should be possible to inspect the cache of that notebook to see if the implemented extend gen logic is exhibiting the issue I saw in #1172.

That said, the extend generation logic from #920 doesn't pass the tests in #1172 either - and seems to be causing issues like #1108. Would be good to get the view from the author of the extend gen logic and it's expected behaviour. The logic was in the initial commit so probably @okhat?

Comment on lines 50 to 52
* prompt_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)).
* task_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)).
* teacher_settings: The settings used for the teacher model. When unspecified, defaults to the settings set in settings (i.e., dspy.settings.configure(lm=task_model)).
Copy link

@mikeedjones mikeedjones Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be clearer which of prompt_model, task_model, and teacher_settings (which can include a model and looks to be for only generating fewshot demo sets?) does which part of the optimisation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited! Thanks for the suggestion!

@XenonMolecule XenonMolecule merged commit 015c649 into main Jun 21, 2024
4 checks passed
@XenonMolecule XenonMolecule deleted the mipro_v2 branch June 21, 2024 05:16
@mikeedjones
Copy link

@XenonMolecule - did you manage to track down whether or not the reverted extend generation logic was leading to the prediction from the language model ending up in the wrong OutputField? Would be great to understand what I missed in #1172 Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants