Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Backing off" rate limiting error message reports incorrect model kwargs #1079

Closed
clayms opened this issue May 30, 2024 · 5 comments
Closed

Comments

@clayms
Copy link

clayms commented May 30, 2024

When running an optimizer and hitting a rate limit, the "Backing off" message reports incorrect model kwargs : n and temperature .

Setting up the lm :

turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=500, temperature=0.0, )
dspy.settings.configure(lm=turbo)

and the optimizer:

config_bsfsrs = dict(
    metric=measure_count,       # measure_count defined elsewhere, not needed to understand this problem
    teacher_settings=dict(lm=turbo) ,   
    max_bootstrapped_demos=4,   
    max_labeled_demos=16,      
    max_rounds=1,     
    num_candidate_programs=4,  
    num_threads=2      
)

optimizer = BootstrapFewShotWithRandomSearch(**config_bsfsrs)

show the lm kwargs

optimizer_count.teacher_settings.get('lm').kwargs
{'temperature': 0.0,
 'max_tokens': 500,
 'top_p': 1,
 'frequency_penalty': 0,
 'presence_penalty': 0,
 'n': 1,
 'model': 'gpt-3.5-turbo-instruct'}

Run the optimizer:

optimized_prog = optimizer.compile(program, trainset=trainset)

When I hit a rate limit, I get:

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at ##########> 
    with kwargs {'n': 5, 'temperature': 0.7}

Why is it reporting a different n and temperature ?
Is the optimizer using these instead?
How do I change them if it is?

@clayms
Copy link
Author

clayms commented May 30, 2024

Changing the Program from:

class get_count(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(PredictCount, n=1, temperature=0.0)

    def forward(self, title, text):
        return self.prog(title=title, text=text)
    
program_event_count = get_event_count()

To:

class get_count(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(PredictCount)

    def forward(self, title, text):
        return self.prog(title=title, text=text)
    
program_count = get_count()

Seems to have changed the issue. Now when I hit the rate limit, I get:

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at ############> 
   with kwargs {}

Still misreports the kwargs.

Is it using the correct kwargs that I specified?

@clayms
Copy link
Author

clayms commented May 30, 2024

Even separately adding the n or temperature to the program self.prog leads to the initial issue reported.

...
        self.prog = dspy.ChainOfThought(PredictArticleEventCount, n=3)
...
## OR
        self.prog = dspy.ChainOfThought(PredictArticleEventCount, temperature=0.0)

When it hits a rate limit:

Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at ############> 
    with kwargs {'n': 3, 'temperature': 0.7}

@tom-doerr
Copy link
Contributor

The temperature value is expected, it gets set when bootstrapping:

lm = lm.copy(temperature=0.7 + 0.001 * round_idx) if round_idx > 0 else lm

It's not obvious to me that forcing the temperature to be zero would make sense when using BootstrapFewShotWithRandomSearch since the point of using it is to randomly generate new bootstrapping demos.

@clayms
Copy link
Author

clayms commented Jun 2, 2024

I took "bootstrap" to mean that my provided labeled training examples would be randomly sampled with replacement similar to how bootstrapping is used in Random Forests, or other "bagging" (bootstrap aggregating) models.

I did not understand that "bootstrap" in this library was redefined to mean allowing the LLM more imaginative creativity when creating entirely new demos to train on.

@tom-doerr
Copy link
Contributor

The way it works is that it samples from your training samples and generates output/labels for some of those samples that are then used together with samples where the output/label is the one you provided. So it's a mix of both. It is not generating completely new demos, although my previous comment sounded like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants