Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to allow changes to the encoding_format of the embedded options? #51

Closed
JadynWong opened this issue Jun 12, 2024 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@JadynWong
Copy link

JadynWong commented Jun 12, 2024

Currently EmbeddingClient fixes the encoding_format value to base64 for better performance.

// CUSTOM: Made internal. We always request the embedding as a base64-encoded string for better performance.
/// <summary>
/// The format to return the embeddings in. Can be either `float` or
/// [`base64`](https://pypi.org/project/pybase64/).
/// </summary>
internal InternalEmbeddingGenerationOptionsEncodingFormat? EncodingFormat { get; set; }

internal Embedding(int index, BinaryData embeddingProperty, InternalEmbeddingObject @object, IDictionary<string, BinaryData> serializedAdditionalRawData)
{
Index = (int)index;
EmbeddingProperty = embeddingProperty;
Object = @object;
_serializedAdditionalRawData = serializedAdditionalRawData;
// Handle additional custom properties.
Vector = ConvertToVectorOfFloats(embeddingProperty);
}

It can't be changed, even if I want to use float format. I want to use this client for text-embeddings-inference, which currently does not support the encoding_format parameter.
This results in the following error

The input is not a valid Base64 string of encoded floats.

I know that encoding_format compatibility would be a better approach in other projects, but a lot of compatible openai api's don't update as fast as they should.

Is it possible to allow users to change the encoding_format value?
Of course, as the official SDK of OpenAI, I would respect it if it was only compatible with OpenAI.

For now I can serialize it myself using protocol methods.

@joseharriaga
Copy link
Collaborator

Thank you for reaching out, @JadynWong ! Presently, it is not in our plans to expose the encoding_format property publicly, but we will definitely take your feedback into consideration as we continue to evolve this API.

As you have correctly pointed out, you can use the EmbeddingClient's GenerateEmbeddings protocol method to have full control of the request that is sent to the OpenAI service. By the sound of it, you have it all figured out already, but I am including a small example below in case others might find it useful:

EmbeddingClient client = new("text-embedding-3-small", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

string description = "Best hotel in town if you like luxury hotels. They have an amazing infinity pool, a spa,"
    + " and a really helpful concierge. The location is perfect -- right downtown, close to all the tourist"
    + " attractions. We highly recommend this hotel.";

BinaryData input = BinaryData.FromObjectAsJson(new
{
    model = "text-embedding-3-small",
    input = description,
    encoding_format = "float"
});

using BinaryContent content = BinaryContent.Create(input);
ClientResult result = await client.GenerateEmbeddingsAsync(content);
BinaryData output = result.GetRawResponse().Content;

using JsonDocument outputAsJson = JsonDocument.Parse(output.ToString());
JsonElement vector = outputAsJson.RootElement
    .GetProperty("data"u8)[0]
    .GetProperty("embedding"u8);

Console.WriteLine($"Dimension: {vector.GetArrayLength()}");
Console.WriteLine($"Floats: ");
int i = 0;
foreach (JsonElement element in vector.EnumerateArray())
{
    Console.WriteLine($"  [{i++,4}] = {element.GetDouble()}");
}

Note that, when using protocol methods, the input and output (represented as BinarayData) map directly to the REST API as described here: 🔗 https://platform.openai.com/docs/api-reference/embeddings/create

I have also created the following PR to add this example to our suite: 🔗 #122

@joseharriaga joseharriaga self-assigned this Jul 13, 2024
@joseharriaga joseharriaga added the documentation Improvements or additions to documentation label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants