[ML] Inference API Anthropic integration #109893

jonathan-buttner · 2024-06-18T21:49:43Z

This PR adds support for Anthropic as a 3rd party service to the inference API for chat completion. https://docs.anthropic.com/en/api/messages

Pass through settings

Since Anthropic allows a sophisticated tool field in their requests I thought it'd be helpful if we allows passing through an unvalidated portion of input. The task_settings allows specifying a optional_settings (happy to change the name) field which can contain anything and is not validated. The contents will be persisted in inference endpoint configuration and sent to Anthropic when inference requests are made.

~~The implementation still validates the required fields (model and max_tokens).~~

We decided to wait to implement the pass through settings for all services at a later time.

Examples

Create the inference endpoint

Request

PUT http:https://localhost:9200/_inference/completion/test
{
    "service": "anthropic",
    "service_settings": {
        "api_key": "<api key>",
        "model_id": "claude-3-opus-20240229"
    },
    "task_settings": {
        "max_tokens": 1024
    }
}

Response

{
    "model_id": "test",
    "task_type": "completion",
    "service": "anthropic",
    "service_settings": {
        "model_id": "claude-3-opus-20240229",
        "rate_limit": {
            "requests_per_minute": 50
        }
    },
    "task_settings": {
        "max_tokens": 1024
    }
}

Perform a completion request

Request

POST http:https://localhost:9200/_inference/completion/test
{
    "input": "What is the weather like in San Francisco?",
    "task_settings": {
        "optional_settings": {
            "metadata": {
                "user_id": "hello"
            },
            "tools": [
                {
                    "name": "get_weather",
                    "description": "Get the current weather in a given location",
                    "input_schema": {
                    "type": "object",
                    "properties": {
                        "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                    }
                }
            ]
        }
    }
}

Response

{
    "completion": [
        {
            "result": "<thinking>\nThe get_weather tool appears to be relevant for answering this query, as it can provide the current weather for a given location. \n\nThe required parameters for get_weather are:\n- location (string): The city and state, e.g. \"San Francisco, CA\"\n\nThe user has directly provided the location in their query - they are asking about the weather in San Francisco. So we have the necessary information to make the API call.\n\n</thinking>"
        }
    ]
}

elasticsearchmachine · 2024-06-18T21:50:07Z

Hi @jonathan-buttner, I've created a changelog YAML for you.

…ttner/elasticsearch into ml-anthropic-chat-completion

…opic-chat-completion

elasticsearchmachine · 2024-06-24T12:47:54Z

Pinging @elastic/ml-core (Team:ML)

maxhniebergall

LGTM! Great addition here, awesome work Jonathan!

maxhniebergall · 2024-06-24T13:59:45Z

...main/java/org/elasticsearch/xpack/inference/external/anthropic/AnthropicResponseHandler.java

+ static final String REQUESTS_LIMIT = "anthropic-ratelimit-requests-limit";
+ // The number of requests remaining within the current rate limit window.
+ static final String REMAINING_REQUESTS = "anthropic-ratelimit-requests-remaining";
+ // The time when the request rate limit window will reset, provided in RFC 3339 format.
+ static final String REQUEST_RESET = "anthropic-ratelimit-requests-reset";
+ // The maximum number of tokens allowed within the rate limit window.
+ static final String TOKENS_LIMIT = "anthropic-ratelimit-tokens-limit";
+ // The number of tokens remaining, rounded to the nearest thousand, within the current rate limit window.
+ static final String REMAINING_TOKENS = "anthropic-ratelimit-tokens-remaining";
+ // The time when the token rate limit window will reset, provided in RFC 3339 format.
+ static final String TOKENS_RESET = "anthropic-ratelimit-tokens-reset";
+ // The number of seconds until the rate limit window resets.
+ static final String RETRY_AFTER = "retry-after";
+
+ static final String SERVER_BUSY = "Received an Anthropic server is temporarily overloaded status code";


Very interesting that anthropic provides these

Yeah, seems pretty similar to OpenAI with a few additions.

maxhniebergall · 2024-06-24T14:03:08Z

...main/java/org/elasticsearch/xpack/inference/external/anthropic/AnthropicResponseHandler.java

+ var response = result.response();
+ var tokenLimit = getFirstHeaderOrUnknown(response, TOKENS_LIMIT);
+ var remainingTokens = getFirstHeaderOrUnknown(response, REMAINING_TOKENS);
+ var requestLimit = getFirstHeaderOrUnknown(response, REQUESTS_LIMIT);
+ var remainingRequests = getFirstHeaderOrUnknown(response, REMAINING_REQUESTS);
+ var requestReset = getFirstHeaderOrUnknown(response, REQUEST_RESET);
+ var tokensReset = getFirstHeaderOrUnknown(response, TOKENS_RESET);
+ var retryAfter = getFirstHeaderOrUnknown(response, RETRY_AFTER);


maxhniebergall · 2024-06-24T14:12:44Z

...csearch/xpack/inference/external/request/anthropic/AnthropicChatCompletionRequestEntity.java

+ builder.startObject();
+
+ {
+ builder.field(ROLE_FIELD, USER_FIELD);


is this right? the field value is a field string?

Yeah I'm hardcoding the role to user here. In theory this could be specified but for all our other chat completion implementations we've hard coded it so I figured probably best to keep it consistent for now until we decided to provide that capability and then change it for all of them.

That makes sense. Definitely a nit, but I would prefer if we used a different variable name for user then, since this isn't actually a field name in this context. Even just creating a constant String USER_VALUE = USER_FIELD I feel like would be more clear? Up to you though.

Oh I see, yeah you're right the name doesn't make sense, I'll update 👍

maxhniebergall · 2024-06-24T14:21:37Z

...pack/inference/services/anthropic/completion/AnthropicChatCompletionRequestTaskSettings.java

+ ValidationException validationException = new ValidationException();
+
+ Integer maxTokens = extractOptionalPositiveInteger(map, MAX_TOKENS, ModelConfigurations.SERVICE_SETTINGS, validationException);
+ // At the time of writing the allowed values are -1, and range 0-1. I'm intentionally not validating the values here, we'll let


which setting is this referring to? temperature? do you mind adding a linebreak?

Good idea, I'll make it more explicit.

davidkyle

LGTM

Working anthropic with pass through settings

b5144c1

jonathan-buttner added >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.15.0 labels Jun 18, 2024

jonathan-buttner and others added 8 commits June 18, 2024 17:50

Update docs/changelog/109893.yaml

8222b62

Starting tests, removed pass through settings

02d2869

Merge branch 'ml-anthropic-chat-completion' of github.com:jonathan-bu…

d4f4ca2

…ttner/elasticsearch into ml-anthropic-chat-completion

Adding tests

b02d9f0

Updating changelog

46fab77

Merge branch 'main' of github.com:elastic/elasticsearch into ml-anthr…

9877310

…opic-chat-completion

Removing openai references

9276c49

Adding service tests

dada3df

jonathan-buttner marked this pull request as ready for review June 24, 2024 12:47

jonathan-buttner requested review from davidkyle and maxhniebergall June 24, 2024 12:47

maxhniebergall approved these changes Jun 24, 2024

View reviewed changes

jonathan-buttner added 2 commits June 24, 2024 10:49

Adding more comments

80bdbc2

Updating user constant name

d1646f2

davidkyle approved these changes Jun 25, 2024

View reviewed changes

jonathan-buttner merged commit e6150de into elastic:main Jun 25, 2024
15 checks passed

jonathan-buttner deleted the ml-anthropic-chat-completion branch June 25, 2024 14:10

jonathan-buttner mentioned this pull request Jul 12, 2024

[ML] Inference API Anthropic docs #110850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Inference API Anthropic integration #109893

[ML] Inference API Anthropic integration #109893

jonathan-buttner commented Jun 18, 2024 •

edited

Loading

elasticsearchmachine commented Jun 18, 2024

elasticsearchmachine commented Jun 24, 2024

maxhniebergall left a comment

maxhniebergall Jun 24, 2024

jonathan-buttner Jun 24, 2024

maxhniebergall Jun 24, 2024

maxhniebergall Jun 24, 2024

jonathan-buttner Jun 24, 2024

maxhniebergall Jun 24, 2024

jonathan-buttner Jun 24, 2024

maxhniebergall Jun 24, 2024

jonathan-buttner Jun 24, 2024

davidkyle left a comment

[ML] Inference API Anthropic integration #109893

[ML] Inference API Anthropic integration #109893

Conversation

jonathan-buttner commented Jun 18, 2024 • edited Loading

Pass through settings

Examples

elasticsearchmachine commented Jun 18, 2024

elasticsearchmachine commented Jun 24, 2024

maxhniebergall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

jonathan-buttner commented Jun 18, 2024 •

edited

Loading