Best way to check for breaches of rate limit using the Assistants API? #66

Jaffacakes82 · 2024-06-17T13:10:56Z

Hi,

What is the recommended approach to leveraging the Assistants API via this SDK and appropriately handling breaches of the TPM rate limits?

I'm using RAG with both GPT-3.5-turbo and GPT-4o models, and given context tokens count towards the rate limit, I'm hitting these semi-frequently. How should I handle this?

Thanks!

KrzysztofCwalina · 2024-06-17T17:57:53Z

This is a good issue. Thanks for bringing it to our attention. We will think about possibly offering a built-in solution for this, but as a workaround I wonder if you could not handle this using a custom policy:

Create a subclass of PipelinePolicy that handles the error you get when you emceed the rate limit, in which case the policy would throttle (i.e. Task.Delay)
create an instance of OpenAIClientOptions and call AddPolicy(yourCustomPolicy, PipelinePosition.PerTry) on it.
Inject the policy into the AssitantClient by passing the instance of [OpenAIClientOptions to the client's constructor. The

trrwilson · 2024-06-18T01:59:02Z

To add to this: if you'd like to examine the values of the documented rate limit response headers, you can also do that without a custom policy by retrieving the raw response from the formal response wrapper and then checking its header values:

ClientResult<ThreadRun> runResult = client.CreateRun("assistantId", "threadId");
if (runResult.GetRawResponse().Headers.TryGetValue("x-ratelimit-limit-tokens", out string remainingTokenText))
{
    // remainingTokenText has a value like: "150000"
}
ResultCollection<StreamingUpdate> streamingUpdates = client.CreateRunStreaming("threadId", "assistantId");
if (streamingUpdates.GetRawResponse().Headers.TryGetValue("x-ratelimit-reset-tokens", out string resetTimeText))
{
    // resetTime has a value like: "6m0s"
}

As @KrzysztofCwalina mentioned, we'll look into providing a more direct and typed mechanism to retrieve this; ideally, when the keys are well-known, you shouldn't need to provide them explicitly like this.

Jaffacakes82 · 2024-06-18T18:10:29Z

Thanks @KrzysztofCwalina @trrwilson. I've implemented a basic solution using exponential backoff for now.

It might make sense to make this available in the Usage property of the ThreadRun.

KrzysztofCwalina · 2024-06-18T20:49:47Z

@Jaffacakes82, be aware that the clients already implement a retry logic (with exponential backoff). The retries happen on any error, and apparently they delay is not enough for your scenarios. But, because of the retry logic , it's very important that when you add your custom policy to the client, you add it at "PerTry" position. Otherwise, the built in retry logic will kick in first and the client will still be retrying too early.

trrwilson · 2024-06-19T23:59:42Z

@KrzysztofCwalina, I believe this is the related System.ClientModel issue:
Azure/azure-sdk-for-net#44222

Without the built-in DelayStrategy, I think retries are -- without the custom policy -- happening immediately, irrespective of hints like retry-after headers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to check for breaches of rate limit using the Assistants API? #66

Best way to check for breaches of rate limit using the Assistants API? #66

Jaffacakes82 commented Jun 17, 2024

KrzysztofCwalina commented Jun 17, 2024

trrwilson commented Jun 18, 2024

Jaffacakes82 commented Jun 18, 2024

KrzysztofCwalina commented Jun 18, 2024

trrwilson commented Jun 19, 2024

Best way to check for breaches of rate limit using the Assistants API? #66

Best way to check for breaches of rate limit using the Assistants API? #66

Comments

Jaffacakes82 commented Jun 17, 2024

KrzysztofCwalina commented Jun 17, 2024

trrwilson commented Jun 18, 2024

Jaffacakes82 commented Jun 18, 2024

KrzysztofCwalina commented Jun 18, 2024

trrwilson commented Jun 19, 2024