Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Inference API Anthropic integration #109893

Merged

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Jun 18, 2024

This PR adds support for Anthropic as a 3rd party service to the inference API for chat completion. https://docs.anthropic.com/en/api/messages

Pass through settings

Since Anthropic allows a sophisticated tool field in their requests I thought it'd be helpful if we allows passing through an unvalidated portion of input. The task_settings allows specifying a optional_settings (happy to change the name) field which can contain anything and is not validated. The contents will be persisted in inference endpoint configuration and sent to Anthropic when inference requests are made.

The implementation still validates the required fields (model and max_tokens).

We decided to wait to implement the pass through settings for all services at a later time.

Examples

Create the inference endpoint

Request

PUT http:https://localhost:9200/_inference/completion/test
{
    "service": "anthropic",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "claude-3-opus-20240229"
    },
    "task_settings": {
        "max_tokens": 1024
    }
}

Response

{
    "model_id": "test",
    "task_type": "completion",
    "service": "anthropic",
    "service_settings": {
        "model_id": "claude-3-opus-20240229",
        "rate_limit": {
            "requests_per_minute": 50
        }
    },
    "task_settings": {
        "max_tokens": 1024
    }
}
Perform a completion request

Request

POST http:https://localhost:9200/_inference/completion/test
{
    "input": "What is the weather like in San Francisco?",
    "task_settings": {
        "optional_settings": {
            "metadata": {
                "user_id": "hello"
            },
            "tools": [
                {
                    "name": "get_weather",
                    "description": "Get the current weather in a given location",
                    "input_schema": {
                    "type": "object",
                    "properties": {
                        "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                    }
                }
            ]
        }
    }
}

Response

{
    "completion": [
        {
            "result": "<thinking>\nThe get_weather tool appears to be relevant for answering this query, as it can provide the current weather for a given location. \n\nThe required parameters for get_weather are:\n- location (string): The city and state, e.g. \"San Francisco, CA\"\n\nThe user has directly provided the location in their query - they are asking about the weather in San Francisco. So we have the necessary information to make the API call.\n\n</thinking>"
        }
    ]
}

@jonathan-buttner jonathan-buttner added >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.15.0 labels Jun 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review June 24, 2024 12:47
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great addition here, awesome work Jonathan!

Comment on lines +28 to +42
static final String REQUESTS_LIMIT = "anthropic-ratelimit-requests-limit";
// The number of requests remaining within the current rate limit window.
static final String REMAINING_REQUESTS = "anthropic-ratelimit-requests-remaining";
// The time when the request rate limit window will reset, provided in RFC 3339 format.
static final String REQUEST_RESET = "anthropic-ratelimit-requests-reset";
// The maximum number of tokens allowed within the rate limit window.
static final String TOKENS_LIMIT = "anthropic-ratelimit-tokens-limit";
// The number of tokens remaining, rounded to the nearest thousand, within the current rate limit window.
static final String REMAINING_TOKENS = "anthropic-ratelimit-tokens-remaining";
// The time when the token rate limit window will reset, provided in RFC 3339 format.
static final String TOKENS_RESET = "anthropic-ratelimit-tokens-reset";
// The number of seconds until the rate limit window resets.
static final String RETRY_AFTER = "retry-after";

static final String SERVER_BUSY = "Received an Anthropic server is temporarily overloaded status code";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting that anthropic provides these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems pretty similar to OpenAI with a few additions.

Comment on lines +90 to +97
var response = result.response();
var tokenLimit = getFirstHeaderOrUnknown(response, TOKENS_LIMIT);
var remainingTokens = getFirstHeaderOrUnknown(response, REMAINING_TOKENS);
var requestLimit = getFirstHeaderOrUnknown(response, REQUESTS_LIMIT);
var remainingRequests = getFirstHeaderOrUnknown(response, REMAINING_REQUESTS);
var requestReset = getFirstHeaderOrUnknown(response, REQUEST_RESET);
var tokensReset = getFirstHeaderOrUnknown(response, TOKENS_RESET);
var retryAfter = getFirstHeaderOrUnknown(response, RETRY_AFTER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so clean!

builder.startObject();

{
builder.field(ROLE_FIELD, USER_FIELD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right? the field value is a field string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm hardcoding the role to user here. In theory this could be specified but for all our other chat completion implementations we've hard coded it so I figured probably best to keep it consistent for now until we decided to provide that capability and then change it for all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Definitely a nit, but I would prefer if we used a different variable name for user then, since this isn't actually a field name in this context. Even just creating a constant String USER_VALUE = USER_FIELD I feel like would be more clear? Up to you though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, yeah you're right the name doesn't make sense, I'll update 👍

ValidationException validationException = new ValidationException();

Integer maxTokens = extractOptionalPositiveInteger(map, MAX_TOKENS, ModelConfigurations.SERVICE_SETTINGS, validationException);
// At the time of writing the allowed values are -1, and range 0-1. I'm intentionally not validating the values here, we'll let
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which setting is this referring to? temperature? do you mind adding a linebreak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll make it more explicit.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonathan-buttner jonathan-buttner merged commit e6150de into elastic:main Jun 25, 2024
15 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-anthropic-chat-completion branch June 25, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants