extend generation logic tests #1172

mikeedjones · 2024-06-19T07:24:46Z

Added tests to better define the behaviour of the extend generation logic in dsp/primitives/predict.py.

They currently don't pass with either version of the extend generation logic! I'm not sure what the intended behaviour should be - can @XenonMolecule, @okhat @arnavsinghvi11 please explain what these tests should look like?

Cheers!

mikeedjones · 2024-06-19T07:53:29Z

tests/predict/test_predict.py

+def test_extend_generation(SandwichIdea):
+ lm = DummyLM(
+ [
+ " whole wheat\n\nProtein: turkey\n\nFat: avocado",
+ " tomato\n\nSauce: mustard",
+ ]
+ )
+ dspy.settings.configure(lm=lm)
+
+ prediction = Predict(SandwichIdea)(meal="lunch", dietary_requiements="N/A")
+ assert prediction.bread == "whole wheat"
+ assert prediction.protein == "turkey"
+ assert prediction.fat == "avocado"
+ assert prediction.garnish == "tomato"
+ assert prediction.sauce == "mustard"


@okhat - I'm not sure about the other tests I've added, but I was definitely expecting this test to pass. However:

SandwichIdea = SandwichIdea(meal, dietary_requiements -> bread, protein, fat, garnish, sauce instructions='Based on the meal and ...notation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Sauce:', 'desc': '${sauce}'}) ) def test_extend_generation(SandwichIdea): lm = DummyLM( [ " whole wheat\n\nProtein: turkey\n\nFat: avocado", " tomato\n\nSauce: mustard", ] ) dspy.settings.configure(lm=lm) prediction = Predict(SandwichIdea)(meal="lunch", dietary_requiements="N/A") assert prediction.bread == "whole wheat" assert prediction.protein == "turkey" assert prediction.fat == "avocado" > assert prediction.garnish == "tomato" E AssertionError: assert '' == 'tomato' E - tomato

I think when "" is added as the result of the field

dspy/dsp/primitives/predict.py

Line 98 in 34725d0

completion[field_names[last_field_idx]] = ""

it is interperated as the final value by the extract method of template as the "has this field been completed?" check depends on comparing the value of the field to None

dspy/dsp/templates/template_v2.py

Line 156 in 34725d0

if self.fields[idx].input_variable not in example or example[self.fields[idx].input_variable] is None:

So the field is being filled with an empty string - and the model generated value is not being included in the "correct place" in the output, but the program continues without raising the recursion error.

The upshot is the prediction ends up offset by one field, and the following fields are not parsed correctly. ie:

prediction.garnish # "" prediction.sauce # " tomato\n\nSauce: mustard"

The most atomised fix would be to delete

dspy/dsp/primitives/predict.py

Line 98 in 34725d0

completion[field_names[last_field_idx]] = ""

But I think that might cause quite a few issues with existing examples which use the extend generation logic - the "quiet fail" which currently occurs is replaced by a recursion depth exception as the model continues to not generate the field.

Depending on the program, it might be the case that further down the line the deserialisation of a prompt + completion will unpick the offset caused by

dspy/dsp/primitives/predict.py

Line 98 in 34725d0

completion[field_names[last_field_idx]] = ""

okhat · 2024-06-19T21:49:47Z

Hey @mikeedjones, thanks so much for the deep dive! The changes do need to be reverted for now, because they break more fundamental things than the regressions you mentioned, although these are very important too.

In the longer run, I like the direction of this PR overall. Let's think of what the right long-term behavior is for parsing.

The original DSPy behavior, which strikes a very good compromise IMO but needs to be better documented, is: when you ask for n=1 completion, you'll always get it back. If you request n>1 completions, you get at least one. No guarantees. If you need guaranteed n > 1 behavior, create multiple modules with temperature=0.7 + i*0.001.

mikeedjones · 2024-06-19T21:53:33Z

Hi @okhat - I think what I show above is that the reverted logic isn't currently working as I expected?

The original logic fills the missed field with an empty string and the completion continues from there - so the model would be prompted to start from Sauce as opposed to Garnish and the user has to accept the unfilled field? Is that the indended behavior?

Given that #920 was merged - think it makes sense to revert and add some tests so someone else can't come and inadvertently break something further down the line!

mikeedjones · 2024-06-19T22:02:23Z

tests/predict/test_predict.py

+ lm = DummyLM(
+ [
+ " whole wheat\n\nProtein: turkey\n\nFat: avocado",
+ " tomato\n\nSauce: mustard",


@okhat So this generation would actually look like

" mustard"

Because the last None field in the signature would be sauce?

mikeedjones · 2024-06-20T07:26:50Z

Updated the tests and added a comment to each including the logged lm calls. This branch has been pointing at https://github.com/stanfordnlp/dspy/tree/mipro_v2 and the tests are passing for the reverted logic. I think the dummy LM generations are representative - but I'm not sure the behaviour demonstrated by the tests is desired?

JONEMI19 added 4 commits June 19, 2024 06:33

feat(dspy): in extend_generation, compare key values to None, not to

6cc6a37

feat(dspy): add tests for extend generation

2b55ca3

feat(dspy): revert changes to extend generation logic

d12e0f2

feat(dspy): rm redundant comments

ea29442

mikeedjones changed the title ~~Mipro v2 extend generation~~ extend generation logic tests Jun 19, 2024

feat(dspy): revert changes to test file writing

15aef81

mikeedjones mentioned this pull request Jun 19, 2024

MIPRO optimizer updates for paper release #1169

Merged

mikeedjones commented Jun 19, 2024

View reviewed changes

arnavsinghvi11 mentioned this pull request Jun 19, 2024

MultiChainComparison input from ChainOfThought #1162

Closed

feat(dspy): changed assertions so tests pass and added conversation logs

4771535

feat(dsp): clarify what is generated in each generation

bcab220

XenonMolecule deleted the branch stanfordnlp:mipro_v2 June 21, 2024 05:16

XenonMolecule closed this Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend generation logic tests #1172

extend generation logic tests #1172

mikeedjones commented Jun 19, 2024

mikeedjones Jun 19, 2024 •

edited

Loading

okhat commented Jun 19, 2024

mikeedjones commented Jun 19, 2024 •

edited

Loading

mikeedjones Jun 19, 2024 •

edited

Loading

mikeedjones commented Jun 20, 2024

extend generation logic tests #1172

extend generation logic tests #1172

Conversation

mikeedjones commented Jun 19, 2024

mikeedjones Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

okhat commented Jun 19, 2024

mikeedjones commented Jun 19, 2024 • edited Loading

mikeedjones Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

mikeedjones commented Jun 20, 2024

mikeedjones Jun 19, 2024 •

edited

Loading

mikeedjones commented Jun 19, 2024 •

edited

Loading

mikeedjones Jun 19, 2024 •

edited

Loading