Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for euclidean_distance, dot_product, cosine_distance functions #22397

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jun 15, 2024

Description

Adds 3 functions calculating distance:

  • euclidean_distance
  • dot_product
  • cosine_distance

Release notes

# General
* Add support for `euclidean_distance`, `dot_product` and `cosine_distance` functions. ({issue}`22397`)

void testVectorWrite()
{
try (TestTable table = new TestTable(onRemoteDatabase(), "test_vector_writes", "(v vector(1))")) {
assertUpdate("INSERT INTO " + table.getName() + " VALUES ARRAY[REAL '1.0'], NULL", 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow writing vector types?
I'm thinking about unforeseen complexities/potential bugs which come with this ability

import static org.assertj.core.api.Assertions.assertThatThrownBy;

final class TestPostgreSqlVectorType
extends AbstractTestQueryFramework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we pushdown pgvector operators (<->, <#>, <=>) to PostgreSQL databases that don't have such an extension?

Pls add a corresponding test case for this use-case in a separate class.

return 1.0 - cosineSimilarity;
}

private static double dotProduct(double[] first, double[] second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the callers all have floats, so this could use floats for the argument and then have a double for the computation. This would keep temp memory down.

Also, as @raunaqmorarka mentioned we could likely just work directly with the blocks, or pull out the underlying arrays (see IntArrayBlock getRawValues and getRawValuesOffset). For the moment these are int[], but I expect we will move to either float[] or MemorySegment soon.

@wendigo
Copy link
Contributor

wendigo commented Jun 19, 2024

I think that eventually we should add support for vector type to the engine as a first class citizen (along with float16 type). Cc @martint

@dain
Copy link
Member

dain commented Jun 21, 2024

I think that eventually we should add support for vector type to the engine as a first class citizen (along with float16 type). Cc @martint

I think we should look at it. Vector is basically a fixed length array, and knowing the fixed length could help speed up some operations. We talked about float16 and float8 last year. I believe they are coming to the JVM, but I believe there were competing standards with these, and I'm not sure what happened.

@ebyhr ebyhr marked this pull request as ready for review June 26, 2024 23:10
@ebyhr ebyhr requested a review from martint June 26, 2024 23:10
@ebyhr ebyhr force-pushed the ebi/postgresql-vector branch 2 times, most recently from 7dbb05a to e4b33c9 Compare July 4, 2024 04:25
@ebyhr ebyhr requested a review from martint July 4, 2024 04:35
@wendigo
Copy link
Contributor

wendigo commented Jul 4, 2024

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit looks good.

@Execution(CONCURRENT)
final class TestArrayVectorFunctions
{
private QueryAssertions assertions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this final and initialize it at the declaration site.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@ebyhr ebyhr changed the title Add support for vector type and distance functions in PostgreSQL Add support for euclidean_distance, dot_product, cosine_distance functions Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants