-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIPRO optimizer updates for paper release #1169
Conversation
Note that this PR makes breaking changes to the MIPRO optimizer and is not backwards compatible. |
Also switch all the old test cases back
Instead of outright breaking old MIPRO, I instead deprecated it. TODO: Make test cases compatible with new MIPRO. New MIPRO has been renamed to MIPROv2. |
can't wait! this is going to be so awesome. |
new_kwargs = { | ||
**kwargs, | ||
"max_tokens": max_tokens, | ||
"n": 1, | ||
"temperature": 0.0, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mike, thanks for looking at this! I reverted predict.py back because it was causing a few test programs I ran to crash, not to mention the MIPRO optimizer itself. This was specifically due to the max_depth assertion. We’ve been developing off an old branch of the repo so when I brought everything up to date it seemed like an issue in the predictor logic. I’m happy to remove my changes to predict.py, but do you have suggestions for handling the case where the LM doesn’t generate all fields? Is it a matter of changing the kwargs to predict to increase the max_depth? |
My guess is the crash would be caused by the LM skipping a field in one of the candidate generations and that skip not being picked up by the extend generation logic. Then, because the LM has moved on to another filed in the signature it would be very unlikely to go back and fill the earlier field, independently of how many levels of recursion. Will to do some investigation as to why - the extend generation logic should roll back to the first skipped field so if the above is the behaviour, then it's a bug in #920 's implementation. As to why that is never seen in the old logic - I'm stumped. I tried to replicate as closely as possible the logic in there. Something else to figure out! EDIT: Added some tests in #1172 to try and define how the extend generation logic ought to work - neither main or this branch's implementation passes - but I think those tests cover how I expected the logic to behave. Looking into it a little more in #1172 (comment) |
dspy/propose/grounded_proposer.py
Outdated
# if self.program_aware: # Use full program demo | ||
# for pred_key in demo_candidates.keys(): # Go through each predictor | ||
# for example in demo_candidates[pred_key][demo_set_i]: | ||
# if "augmented" in example.keys(): | ||
# fields_to_use = get_signature(program.predictors()[pred_key]).fields | ||
# example_string = create_example_string(fields_to_use, example) | ||
# task_demos += f"{example_string}\n" | ||
# curr_demos_num+=1 | ||
# if curr_demos_num>=max_demos: | ||
# break | ||
# else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be left in?
dspy/propose/utils.py
Outdated
# def parse_list_of_instructions(instruction_string): | ||
# # Convert the string representation of a list to an actual list | ||
# try: | ||
# instructions = json.loads(instruction_string) | ||
# except json.JSONDecodeError as e: | ||
# print(f"Error decoding JSON: {e}") | ||
# return [] | ||
|
||
# return instructions | ||
def parse_list_of_instructions(instruction_string): | ||
# Try to convert the string representation of a list to an actual list using JSON | ||
try: | ||
instructions = json.loads(instruction_string) | ||
return instructions | ||
except json.JSONDecodeError: | ||
pass | ||
|
||
# If JSON decoding fails, extract strings within quotes | ||
instructions = re.findall(r'"([^"]*)"', instruction_string) | ||
return instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which of these is correct?
dspy/propose/utils.py
Outdated
# def get_program_instruction_set_string(program): | ||
# instruction_set_dict = {} | ||
# for i, pred in enumerate(program.predictors()): | ||
# pred_instructions = get_signature(pred).instructions | ||
# instruction_set_dict[f"Module {i} Instruction"] = pred_instructions | ||
# return '\n'.join(f"{key}: {value}" for key, value in instruction_set_dict.items()) | ||
|
||
def get_program_instruction_set_string(program): | ||
instruction_list = [] | ||
for _, pred in enumerate(program.predictors()): | ||
pred_instructions = get_signature(pred).instructions | ||
instruction_list.append(f"\"{pred_instructions}\"") | ||
# Joining the list into a single string that looks like a list | ||
return f"[{', '.join(instruction_list)}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can one be removed?
dspy/teleprompt/hard_bootstrap.py
Outdated
break | ||
|
||
# hard_examples.append(example) | ||
#breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debugging statements?
dspy/teleprompt/hard_bootstrap.py
Outdated
print("Bootstrapping complete!") | ||
break | ||
|
||
# hard_examples.append(example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard examples is never appended to?
dspy/teleprompt/hard_bootstrap.py
Outdated
|
||
def _bootstrap_one_example(self, example, round_idx=0): | ||
name2traces = self.name2traces | ||
teacher = self.teacher #.deepcopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be deepcopied or not?
dspy/teleprompt/hard_bootstrap.py
Outdated
if total_successes > (total_attempts - min_failed_attempts): | ||
break | ||
# if this example failed the right number of times and is therefore 'hard', let's add it as one of our demo examples | ||
# print(f"Total Successes: {total_successes} | Total Failures: {total_attempts-total_successes}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought dspy had moved from print statements to logging?
dspy/teleprompt/hard_bootstrap.py
Outdated
self.trainset = trainset | ||
self.valset = valset | ||
|
||
raise Exception("NOTE THAT THIS HAS BEEN EXPENSIVE TO RUN. CHECK IN WITH KRISTAOO BEFORE DOING A FULL RUN.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KRISTA00 is gonna be busy!
dspy/propose/utils.py
Outdated
# Joining all the field strings | ||
return '\n'.join(output) | ||
|
||
# def create_predictor_level_history_string(base_program, predictor_i, trial_logs, top_n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be removed?
dspy/propose/utils.py
Outdated
|
||
return instruction_set_history_string | ||
|
||
# def parse_list_of_instructions(instruction_string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be removed?
dspy/teleprompt/utils.py
Outdated
import sys | ||
|
||
import numpy as np | ||
from IPython.core.magics.code import extract_symbols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPython is a dependency? does it need to be added to requirements.txt?
Elsewhere IPython is used it's wrapped in:
try:
from IPython.display import display as ipython_display
except ImportError:
ipython_display = print
from dsp.utils import EM
Thanks Mike I addressed your comments and cleaned up the code from the dev setting based on your comments! In particular I also removed hard bootstrapping from this PR as we aren't using it for this particular Optimizer. If you are looking for a specific test setting where the |
Thanks! Based on what I saw with the tests I think it should be possible to inspect the cache of that notebook to see if the implemented extend gen logic is exhibiting the issue I saw in #1172. That said, the extend generation logic from #920 doesn't pass the tests in #1172 either - and seems to be causing issues like #1108. Would be good to get the view from the author of the extend gen logic and it's expected behaviour. The logic was in the initial commit so probably @okhat? |
* prompt_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)). | ||
* task_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)). | ||
* teacher_settings: The settings used for the teacher model. When unspecified, defaults to the settings set in settings (i.e., dspy.settings.configure(lm=task_model)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be clearer which of prompt_model, task_model, and teacher_settings (which can include a model and looks to be for only generating fewshot demo sets?) does which part of the optimisation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited! Thanks for the suggestion!
@XenonMolecule - did you manage to track down whether or not the reverted extend generation logic was leading to the prediction from the language model ending up in the wrong OutputField? Would be great to understand what I missed in #1172 Thank you! |
Updates the MIPRO optimizer to add in program aware proposal, minibatching, and refactor to dspy.propose pattern in line with the paper release: https://arxiv.org/abs/2406.11695
Also updates the MIPRO HotPotQA notebook and creates a ScoNe notebook. Note that neither notebook has a functioning cache yet.
Old MIPRO is preserved temporarily in
mipro_optimizer_deprecated.py