-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Add Google Vertex AI as provider for text_embedding task type #110090
[Inference API] Add Google Vertex AI as provider for text_embedding task type #110090
Conversation
Pinging @elastic/ml-core (Team:ML) |
x-pack/plugin/inference/build.gradle
Outdated
@@ -38,6 +38,169 @@ dependencies { | |||
clusterPlugins project(':x-pack:plugin:inference:qa:test-service-plugin') | |||
|
|||
api "com.ibm.icu:icu4j:${versions.icu4j}" | |||
|
|||
runtimeOnly 'com.google.guava:guava:32.0.1-jre' | |||
api 'com.google.code.gson:gson:2.10' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do those dependencies all need to be api
? It seems implementation
would be more appropriate? I wrote about this here https://groups.google.com/a/elastic.co/g/es/c/Do0hGs7ooJI/m/6nHB9r5NAwAJ a while ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@breskeby Thank you for sharing! Adjusted it and seems to work fine with Use implementation instead of api command for 3rd party libraries :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, don't forget to add the new service, task, and secret settings to the registry.
List<String> AUTH_SCOPE = Collections.singletonList("https://www.googleapis.com/auth/cloud-platform"); | ||
|
||
static void decorateWithBearerToken(HttpPost httpPost, GoogleVertexAiSecretSettings secretSettings) { | ||
AccessController.doPrivileged((PrivilegedAction<Void>) () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could use the SocketAccess
wrapper here? https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/HttpClient.java#L109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point! Adjusted with Use SocketAccess instead of AccessController directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi: I've adjusted the change to use AccessController
again and add an explicit SpecialPermission.check()
with Use AccessController and add special permission check.
For some reason I get a permission denied error again using SocketAccess
, which is confusing as now the code is basically the same and works? Looking through the codebase I see a lot of replications of the SocketAccess
class, maybe it has something to do with some module magic I'm not aware of? Appreciating any hints!
I suggest we keep it like this for now, if we don't find a solution soonish. I'll create an issue and revisit it afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be the case, because the x-pack core
plugin misses permission java.lang.RuntimePermission "accessDeclaredMembers"
, which GoogleCredentials
needs and the security manager uses the intersection of the two permission sets of the core
and inference
plugin?
Update: I think that's the actually the case. So we have two options:
- Keep it like this for now
- Create our own
SocketAccess
class, which will implicitly also grant access forGoogleCredentials
through our security policy (which feels weird as it has nothing to do with SocketAccess, but reflection?)
I would keep it like this for now, otherwise we need a duplicated class, which also kinda hides the reflection detail. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, leaving it like you have is fine 👍
* or null if the response does not contain the `error.message` field | ||
*/ | ||
@SuppressWarnings("unchecked") | ||
public static GoogleVertexAiErrorResponseEntity fromResponse(HttpResult response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're only using the message
field, how do you feel about using the implementation here: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/response/AzureMistralOpenAiErrorResponseEntity.java#L54
I'm planning on renaming it in my PR: https://github.com/elastic/elasticsearch/pull/109893/files#diff-669128ea579af4136c4ff6995ee69accdc1c7963c1540f1b3efa8185bbf36d1a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about this (specifically code duplication vs resiliency trade-off). It could be a little bit brittle IMHO to rely on the APIs from very different providers never changing and with separate classes I guess we're in a better spot to adapt more easily. WDYT? No super strong opinion here, just a gut feeling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah leaving it in is fine 👍
.../org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiSecretSettings.java
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiServiceFields.java
Show resolved
Hide resolved
|
||
public static final String NAME = "google_vertex_ai_embeddings_service_settings"; | ||
|
||
public static final String DIMENSIONS_SET_BY_USER = "dimensions_set_by_user"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this field? This would be needed if the user can set the dimensions in the request to google. I could have missed it but I didn't see that in the google rest docs here: https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer models, which are currently pre-GA will allow it, which you actually don't find in the english docs, but in the german docs 🤦♂️ . I've an additional issue to add the outputDimensionality
to the request, but wanted to keep it separate first to check, whether we already want to support this for pre-GA models.
It's also announced here https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-announces-new-text-embedding-models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle what do you think? I suppose we could just not document it as a supported option until we need to pass it through the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that GoogleVertexAiService::updateModelWithEmbeddingDetails
will error if the user set dimensions does not match the embedding size I think we should allow this knowing that new models which support the option are coming.
validationException | ||
); | ||
SimilarityMeasure similarityMeasure = extractSimilarity(map, ModelConfigurations.SERVICE_SETTINGS, validationException); | ||
Integer dims = extractOptionalPositiveInteger(map, DIMENSIONS, ModelConfigurations.SERVICE_SETTINGS, validationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not consistent about how we do this but moving forward we should only allow parsing the dimensions
field from the persistent parse context. The reason is that we shouldn't allow users to set the value (unless they can actually pass it through to the request like OpenAI). Instead we'll determine the dimensions
by doing a request when the inference endpoint is first created.
So if this is the request context let's just default the dimensions
to null.
Here's an example of how we did it recently for eland: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elasticsearch/CustomElandInternalTextEmbeddingServiceSettings.java#L51-L74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering https://github.com/elastic/elasticsearch/pull/110090/files/ef907ae635f3ba0d47ee82f2f6605a58dc49b23d#r1651260492 we can keep it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's see what Dave thinks 👍
ModelConfigurations.SERVICE_SETTINGS, | ||
validationException | ||
); | ||
SimilarityMeasure similarityMeasure = extractSimilarity(map, ModelConfigurations.SERVICE_SETTINGS, validationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're going to need to default this, it probably makes sense to do it in this file (I know previous we've done that in service file after we do an embeddings request to determine the dimensions). It's probably more straightforward to do it here because we'll always have a similarity and element type defined and we can avoid the readOptional*
writeOptional*
calls.
Here's an example from eland: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elasticsearch/CustomElandInternalTextEmbeddingServiceSettings.java#L100-L101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it from GoogleVertexAiService
and added it here: Set default similarity measure in service settings
...in/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiService.java
Outdated
Show resolved
Hide resolved
webServer.close(); | ||
} | ||
|
||
// Successful case tested via end-to-end notebook tests in AppEx repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Thanks! Added via Add Google Vertex AI named writeables to InferenceNamedWriteablesProv… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
return null; | ||
}); | ||
} catch (Exception e) { | ||
ValidationException validationException = new ValidationException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm should this be validation error or an ElasticsearchException? Or maybe an status exception 🤔 with a permissions denied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks!
@elasticmachine update branch |
This PR adds support for Google Vertex AI embeddings (single embedding and batch embeddings + chunked inference support) through the Google Vertex AI Get Text Embeddings API.
Creating a
text_embedding
inference endpoint usinggooglevertexai
as service provider:Creating a single embedding using a single string as input:
Creating a single embedding using a list with a single string as input:
Creating multiple embeddings with a list of multiple strings as input: