-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to check for breaches of rate limit using the Assistants API? #66
Comments
This is a good issue. Thanks for bringing it to our attention. We will think about possibly offering a built-in solution for this, but as a workaround I wonder if you could not handle this using a custom policy:
|
To add to this: if you'd like to examine the values of the documented rate limit response headers, you can also do that without a custom policy by retrieving the raw response from the formal response wrapper and then checking its header values: ClientResult<ThreadRun> runResult = client.CreateRun("assistantId", "threadId");
if (runResult.GetRawResponse().Headers.TryGetValue("x-ratelimit-limit-tokens", out string remainingTokenText))
{
// remainingTokenText has a value like: "150000"
}
ResultCollection<StreamingUpdate> streamingUpdates = client.CreateRunStreaming("threadId", "assistantId");
if (streamingUpdates.GetRawResponse().Headers.TryGetValue("x-ratelimit-reset-tokens", out string resetTimeText))
{
// resetTime has a value like: "6m0s"
} As @KrzysztofCwalina mentioned, we'll look into providing a more direct and typed mechanism to retrieve this; ideally, when the keys are well-known, you shouldn't need to provide them explicitly like this. |
Thanks @KrzysztofCwalina @trrwilson. I've implemented a basic solution using exponential backoff for now. It might make sense to make this available in the |
@Jaffacakes82, be aware that the clients already implement a retry logic (with exponential backoff). The retries happen on any error, and apparently they delay is not enough for your scenarios. But, because of the retry logic , it's very important that when you add your custom policy to the client, you add it at "PerTry" position. Otherwise, the built in retry logic will kick in first and the client will still be retrying too early. |
@KrzysztofCwalina, I believe this is the related System.ClientModel issue: Without the built-in |
Hi,
What is the recommended approach to leveraging the Assistants API via this SDK and appropriately handling breaches of the TPM rate limits?
I'm using RAG with both GPT-3.5-turbo and GPT-4o models, and given context tokens count towards the rate limit, I'm hitting these semi-frequently. How should I handle this?
Thanks!
The text was updated successfully, but these errors were encountered: