-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Inference API Anthropic integration #109893
[ML] Inference API Anthropic integration #109893
Conversation
Hi @jonathan-buttner, I've created a changelog YAML for you. |
…ttner/elasticsearch into ml-anthropic-chat-completion
…opic-chat-completion
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great addition here, awesome work Jonathan!
static final String REQUESTS_LIMIT = "anthropic-ratelimit-requests-limit"; | ||
// The number of requests remaining within the current rate limit window. | ||
static final String REMAINING_REQUESTS = "anthropic-ratelimit-requests-remaining"; | ||
// The time when the request rate limit window will reset, provided in RFC 3339 format. | ||
static final String REQUEST_RESET = "anthropic-ratelimit-requests-reset"; | ||
// The maximum number of tokens allowed within the rate limit window. | ||
static final String TOKENS_LIMIT = "anthropic-ratelimit-tokens-limit"; | ||
// The number of tokens remaining, rounded to the nearest thousand, within the current rate limit window. | ||
static final String REMAINING_TOKENS = "anthropic-ratelimit-tokens-remaining"; | ||
// The time when the token rate limit window will reset, provided in RFC 3339 format. | ||
static final String TOKENS_RESET = "anthropic-ratelimit-tokens-reset"; | ||
// The number of seconds until the rate limit window resets. | ||
static final String RETRY_AFTER = "retry-after"; | ||
|
||
static final String SERVER_BUSY = "Received an Anthropic server is temporarily overloaded status code"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting that anthropic provides these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems pretty similar to OpenAI with a few additions.
var response = result.response(); | ||
var tokenLimit = getFirstHeaderOrUnknown(response, TOKENS_LIMIT); | ||
var remainingTokens = getFirstHeaderOrUnknown(response, REMAINING_TOKENS); | ||
var requestLimit = getFirstHeaderOrUnknown(response, REQUESTS_LIMIT); | ||
var remainingRequests = getFirstHeaderOrUnknown(response, REMAINING_REQUESTS); | ||
var requestReset = getFirstHeaderOrUnknown(response, REQUEST_RESET); | ||
var tokensReset = getFirstHeaderOrUnknown(response, TOKENS_RESET); | ||
var retryAfter = getFirstHeaderOrUnknown(response, RETRY_AFTER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so clean!
builder.startObject(); | ||
|
||
{ | ||
builder.field(ROLE_FIELD, USER_FIELD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? the field value is a field string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm hardcoding the role to user
here. In theory this could be specified but for all our other chat completion implementations we've hard coded it so I figured probably best to keep it consistent for now until we decided to provide that capability and then change it for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Definitely a nit, but I would prefer if we used a different variable name for user
then, since this isn't actually a field name in this context. Even just creating a constant String USER_VALUE = USER_FIELD
I feel like would be more clear? Up to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, yeah you're right the name doesn't make sense, I'll update 👍
ValidationException validationException = new ValidationException(); | ||
|
||
Integer maxTokens = extractOptionalPositiveInteger(map, MAX_TOKENS, ModelConfigurations.SERVICE_SETTINGS, validationException); | ||
// At the time of writing the allowed values are -1, and range 0-1. I'm intentionally not validating the values here, we'll let |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which setting is this referring to? temperature? do you mind adding a linebreak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll make it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds support for Anthropic as a 3rd party service to the inference API for chat completion. https://docs.anthropic.com/en/api/messages
Pass through settings
Since Anthropic allows a sophisticatedtool
field in their requests I thought it'd be helpful if we allows passing through an unvalidated portion of input. Thetask_settings
allows specifying aoptional_settings
(happy to change the name) field which can contain anything and is not validated. The contents will be persisted in inference endpoint configuration and sent to Anthropic when inference requests are made.The implementation still validates the required fields (model
andmax_tokens
).We decided to wait to implement the pass through settings for all services at a later time.
Examples
Create the inference endpoint
Request
Response
Perform a completion request
Request
Response