Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-33408 Introduce session variables to manage HNSW index parameters #3226

Open
wants to merge 28 commits into
base: bb-11.4-vec-vicentiu-hugo
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7b87e01
MDEV-32885 VEC_DISTANCE() function
vuvova Nov 25, 2023
ed8b6da
make INFORMATION_SCHEMA.STATISTICS.COMMENT not nullable
vuvova Jan 19, 2024
4bea056
fix main.plugin_vars test to cleanup after itself
vuvova Feb 5, 2024
62e6031
cleanup: spaces, casts, comments
vuvova Jan 8, 2024
e409868
cleanup: pass TABLE_SHARE to store_key_options()
vuvova Jan 26, 2024
bc8a7a3
reject invalid spatial key declarations in the parser
vuvova Jan 8, 2024
16846b5
cleanup: remove unconditional #ifdef's
vuvova Jan 10, 2024
2cd2320
cleanup: lex_string_set3()
vuvova Jan 27, 2024
8ddb99a
cleanup: Queue and Bounded_queue
vuvova Feb 6, 2024
ef14f42
cleanup: key algorithm vs key flags
vuvova Jan 14, 2024
022bc34
cleanup: make_long_hash_field_name() and add_hash_field()
vuvova Jan 18, 2024
38e84a8
cleanup: generalize ER_SPATIAL_CANT_HAVE_NULL
vuvova Jan 17, 2024
f1d0352
cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE
vuvova Jan 25, 2024
0cbd050
cleanup: extract ha_create_table_from_share()
vuvova Jan 25, 2024
cdcf739
open frm for DROP TABLE
vuvova Jan 26, 2024
3fa8be1
cleanup: unused function argument
vuvova Jan 26, 2024
cc82b35
Revert "MDEV-15458 Segfault in heap_scan() upon UPDATE after ADD SYST…
vuvova Feb 9, 2024
a857b68
initial support for vector indexes
vuvova Jan 17, 2024
9ca6554
Initial fixup
cvicentiu Feb 17, 2024
1568677
Wip
cvicentiu Feb 21, 2024
e944876
Graph insert possibly working ok
cvicentiu Feb 22, 2024
b34a896
Search is now working, but layer unaware
cvicentiu Feb 22, 2024
8aa7c1e
wip
cvicentiu Feb 22, 2024
3d0e4ea
Vec insert and search working on a multi-layer
cvicentiu Feb 23, 2024
437e214
MDEV-33408 Alter HNSW graph storage and fix memory leak
HugoWenTD Apr 12, 2024
ee2cc47
Support files for ann-workspace
cvicentiu Apr 15, 2024
7829259
Bug fixes - on top of Hugo's patch
cvicentiu Apr 18, 2024
5899540
Introduce session variables to manage HNSW index parameters
HugoWenTD Apr 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Prev Previous commit
Next Next commit
initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
  • Loading branch information
vuvova authored and cvicentiu committed Apr 4, 2024
commit a857b68c53632a7ccf47a180eac2d306f839b400
3 changes: 2 additions & 1 deletion include/my_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ enum ha_key_alg {
HA_KEY_ALG_HASH= 3, /* HASH keys (HEAP tables) */
HA_KEY_ALG_FULLTEXT= 4, /* FULLTEXT */
HA_KEY_ALG_LONG_HASH= 5, /* long BLOB keys */
HA_KEY_ALG_UNIQUE_HASH= 6 /* Internal UNIQUE hash (Aria) */
HA_KEY_ALG_UNIQUE_HASH= 6, /* Internal UNIQUE hash (Aria) */
HA_KEY_ALG_MHNSW= 7 /* HNSW for k-ANN vector search */
};

/* Storage media types */
Expand Down
186 changes: 186 additions & 0 deletions mysql-test/main/vector.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
create temporary table t1 (id int auto_increment primary key, v blob not null, vector index (v));
ERROR HY000: Cannot create VECTOR index on temporary MyISAM table
create table t1 (id int auto_increment primary key,
u blob not null, vector index (u),
v blob not null, vector index (v));
ERROR 42000: This version of MariaDB doesn't yet support 'multiple VECTOR indexes'
create table t1 (id int auto_increment primary key, v blob not null, vector index (v));
show create table t1;
Table Create Table
t1 CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`v` blob NOT NULL,
PRIMARY KEY (`id`),
VECTOR KEY `v` (`v`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
show keys from t1;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment Ignored
t1 0 PRIMARY 1 id A 0 NULL NULL BTREE NO
t1 1 v 1 v A NULL 1 NULL VECTOR NO
select * from information_schema.statistics where table_name='t1';
TABLE_CATALOG def
TABLE_SCHEMA test
TABLE_NAME t1
NON_UNIQUE 0
INDEX_SCHEMA test
INDEX_NAME PRIMARY
SEQ_IN_INDEX 1
COLUMN_NAME id
COLLATION A
CARDINALITY 0
SUB_PART NULL
PACKED NULL
NULLABLE
INDEX_TYPE BTREE
COMMENT
INDEX_COMMENT
IGNORED NO
TABLE_CATALOG def
TABLE_SCHEMA test
TABLE_NAME t1
NON_UNIQUE 1
INDEX_SCHEMA test
INDEX_NAME v
SEQ_IN_INDEX 1
COLUMN_NAME v
COLLATION A
CARDINALITY NULL
SUB_PART 1
PACKED NULL
NULLABLE
INDEX_TYPE VECTOR
COMMENT
INDEX_COMMENT
IGNORED NO
insert t1 (v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
select id, hex(v) from t1;
id hex(v)
1 E360D63EBE554F3FCDBC523F4522193F5236083D
2 F511303F72224A3FDD05FE3EB22A133FFAE86A3F
3 F09BAA3EA172763F123DEF3E0C7FE53E288BF33E
4 B97A523F2A193E3EB4F62E3F2D23583E9DD60D3F
5 F7C5DF3E984B2B3E65E59D3D7376DB3EAC63773E
6 DE01453FFA486D3F10AA4D3FDD66813C71CB163F
7 76EDFC3E4B57243F10F8423FB158713F020BDA3E
8 56926C3FDF098D3E2C8C5E3D1AD4953DAA9D0B3E
9 7B713F3E5258323F80D1113D673B2B3F66E3583F
10 6CA1D43E9DF91B3FE580DA3E1C247D3F147CF33E
flush tables;
select id,vec_distance(v, x'b047263c9f87233fcfd27e3eae493e3f0329f43e') d from t1 order by d limit 3;
id d
9 0.22278176178224385
10 0.256948729687565
3 0.344061212052452
select t1.id as id1, t2.id as id2, vec_distance(t1.v, t2.v) from t1, t1 as t2 order by 3,1,2;
id1 id2 vec_distance(t1.v, t2.v)
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
7 7 0
8 8 0
9 9 0
10 10 0
7 10 0.12396744079887867
10 7 0.12396744079887867
1 7 0.31054688012227416
7 1 0.31054688012227416
2 3 0.367857878212817
3 2 0.367857878212817
1 3 0.37555301235988736
3 1 0.37555301235988736
5 8 0.3868834706954658
8 5 0.3868834706954658
3 10 0.4255195118806938
10 3 0.4255195118806938
9 10 0.45328998332843184
10 9 0.45328998332843184
3 7 0.4623853687662631
7 3 0.4623853687662631
3 9 0.4652266185730696
9 3 0.4652266185730696
2 10 0.4783527944236994
10 2 0.4783527944236994
2 9 0.48534219339489937
9 2 0.48534219339489937
3 6 0.5045010282192379
6 3 0.5045010282192379
2 7 0.5069749839603901
7 2 0.5069749839603901
2 6 0.5404628878459334
6 2 0.5404628878459334
1 10 0.54565767017084
10 1 0.54565767017084
4 6 0.6059622673783451
6 4 0.6059622673783451
4 8 0.6077508088201284
8 4 0.6077508088201284
4 5 0.6612954348674975
5 4 0.6612954348674975
2 4 0.6824288554489613
4 2 0.6824288554489613
5 10 0.6866589883284178
10 5 0.6866589883284178
5 9 0.7690152280265465
9 5 0.7690152280265465
1 6 0.7852460269641597
6 1 0.7852460269641597
3 5 0.850858983467333
5 3 0.850858983467333
4 7 0.8738353815861046
7 4 0.8738353815861046
7 9 0.8768924188334495
9 7 0.8768924188334495
3 4 0.9520111442543566
4 3 0.9520111442543566
1 2 0.9624144533590879
2 1 0.9624144533590879
1 4 0.9931070283055305
4 1 0.9931070283055305
5 7 0.9953781084623188
7 5 0.9953781084623188
4 10 1.0219887541607022
10 4 1.0219887541607022
1 5 1.0421060165972449
5 1 1.0421060165972449
6 7 1.0447564153000712
7 6 1.0447564153000712
2 5 1.1041161566972733
5 2 1.1041161566972733
6 8 1.2175365379080176
8 6 1.2175365379080176
3 8 1.2477562054991722
8 3 1.2477562054991722
6 10 1.327899457886815
10 6 1.327899457886815
1 9 1.3543723821640015
9 1 1.3543723821640015
2 8 1.3774709925055504
8 2 1.3774709925055504
4 9 1.3798951730132103
9 4 1.3798951730132103
1 8 1.418471465818584
8 1 1.418471465818584
8 10 1.4625506848096848
10 8 1.4625506848096848
6 9 1.475082814693451
9 6 1.475082814693451
5 6 1.5062125325202942
6 5 1.5062125325202942
8 9 1.5813712995150127
9 8 1.5813712995150127
7 8 1.6595615148544312
8 7 1.6595615148544312
drop table t1;
db.opt
31 changes: 31 additions & 0 deletions mysql-test/main/vector.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
error ER_NO_INDEX_ON_TEMPORARY;
create temporary table t1 (id int auto_increment primary key, v blob not null, vector index (v));

error ER_NOT_SUPPORTED_YET;
create table t1 (id int auto_increment primary key,
u blob not null, vector index (u),
v blob not null, vector index (v));

create table t1 (id int auto_increment primary key, v blob not null, vector index (v));
show create table t1;
show keys from t1;
query_vertical select * from information_schema.statistics where table_name='t1';
# print unpack(H40,pack(f5,map{rand}1..5))
insert t1 (v) values (x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');

select id, hex(v) from t1;
flush tables;
select id,vec_distance(v, x'b047263c9f87233fcfd27e3eae493e3f0329f43e') d from t1 order by d limit 3;
select t1.id as id1, t2.id as id2, vec_distance(t1.v, t2.v) from t1, t1 as t2 order by 3,1,2;
drop table t1;
let $datadir=`select @@datadir`;
list_files $datadir/test;
2 changes: 1 addition & 1 deletion sql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ SET (SQL_SOURCE
mf_iocache.cc my_decimal.cc
mysqld.cc net_serv.cc keycaches.cc
../sql-common/client_plugin.c
opt_range.cc
opt_range.cc vector_mhnsw.cc
opt_rewrite_date_cmp.cc
opt_rewrite_remove_casefold.cc
opt_sum.cc
Expand Down
7 changes: 3 additions & 4 deletions sql/filesort_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -413,10 +413,10 @@ void Filesort_buffer::sort_buffer(const Sort_param *param, uint count)


static
size_t get_sort_length(THD *thd, Item_field *item)
size_t get_sort_length(THD *thd, Item *item)
{
SORT_FIELD_ATTR sort_attr;
sort_attr.type= ((item->field)->is_packable() ?
sort_attr.type= (item->type_handler()->is_packable() ?
SORT_FIELD_ATTR::VARIABLE_SIZE :
SORT_FIELD_ATTR::FIXED_SIZE);
item->type_handler()->sort_length(thd, item, &sort_attr);
Expand Down Expand Up @@ -452,8 +452,7 @@ double cost_of_filesort(TABLE *table, ORDER *order_by, ha_rows rows_to_read,

for (ORDER *ptr= order_by; ptr ; ptr= ptr->next)
{
Item_field *field= (Item_field*) (*ptr->item)->real_item();
size_t length= get_sort_length(thd, field);
size_t length= get_sort_length(thd, *ptr->item);
set_if_smaller(length, thd->variables.max_sort_length);
sort_len+= (uint) length;
}
Expand Down
45 changes: 41 additions & 4 deletions sql/handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
#include "rowid_filter.h"
#include "mysys_err.h"
#include "optimizer_defaults.h"
#include "vector_mhnsw.h"

#ifdef WITH_PARTITION_STORAGE_ENGINE
#include "ha_partition.h"
Expand Down Expand Up @@ -3495,7 +3496,7 @@ PSI_table_share *handler::ha_table_share_psi() const
const char *handler::index_type(uint key_number)
{
static const char* alg2str[]= { "???", "BTREE", "SPATIAL", "HASH",
"FULLTEXT", "HASH", "HASH" };
"FULLTEXT", "HASH", "HASH", "VECTOR" };
enum ha_key_alg alg= table_share->key_info[key_number].algorithm;
if (!alg)
{
Expand Down Expand Up @@ -6205,7 +6206,37 @@ int ha_create_table(THD *thd, const char *path, const char *db,
goto err;
}

error= ha_create_table_from_share(thd, &share, create_info);
if ((error= ha_create_table_from_share(thd, &share, create_info)))
goto err;

/* create secondary tables for high level indexes */
if (share.total_keys > share.keys)
{
/* as of now: only one vector index can be here */
DBUG_ASSERT(share.total_keys == share.keys + 1);
DBUG_ASSERT(share.key_info[share.keys].algorithm == HA_KEY_ALG_MHNSW);
TABLE_SHARE index_share;
char file_name[FN_REFLEN+1];
HA_CREATE_INFO index_cinfo;
char *path_end= strmov(file_name, path);

if ((error= share.path.length > sizeof(file_name) - HLINDEX_BUF_LEN))
goto err;

for (uint i= share.keys; i < share.total_keys; i++)
{
my_snprintf(path_end, HLINDEX_BUF_LEN, HLINDEX_TEMPLATE, i);
init_tmp_table_share(thd, &index_share, db, 0, table_name, file_name);
index_share.db_plugin= share.db_plugin;
if ((error= index_share.init_from_sql_statement_string(thd, false,
mhnsw_hlindex_table.str, mhnsw_hlindex_table.length)))
break;

if ((error= ha_create_table_from_share(thd, &index_share, &index_cinfo)))
break;
}
free_table_share(&index_share);
}

err:
free_table_share(&share);
Expand Down Expand Up @@ -7453,6 +7484,8 @@ int handler::ha_reset()
delete lookup_handler;
lookup_handler= this;
}
if (table->reset_hlindexes())
return 1;
DBUG_RETURN(reset());
}

Expand Down Expand Up @@ -7846,8 +7879,12 @@ bool handler::prepare_for_row_logging()

int handler::prepare_for_insert(bool do_create)
{
if (table->open_hlindexes_for_write())
return 1;

/* Preparation for unique of blob's */
if (table->s->long_unique_table || table->s->period.unique_keys)
if (table->s->long_unique_table || table->s->period.unique_keys ||
table->hlindex)
{
if (do_create && create_lookup_handler())
return 1;
Expand Down Expand Up @@ -7888,7 +7925,7 @@ int handler::ha_write_row(const uchar *buf)
{ error= write_row(buf); })

MYSQL_INSERT_ROW_DONE(error);
if (likely(!error))
if (!error && !((error= table->update_hlindexes())))
{
rows_changed++;
Log_func *log_func= Write_rows_log_event::binlog_row_logging_function;
Expand Down
6 changes: 3 additions & 3 deletions sql/handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -3206,7 +3206,6 @@ class handler :public Sql_alloc
Table_flags cached_table_flags; /* Set on init() and open() */

ha_rows estimation_rows_to_insert;
handler *lookup_handler;
/* Statistics for the query. Updated if handler_stats.in_use is set */
ha_handler_stats active_handler_stats;
void set_handler_stats();
Expand All @@ -3215,6 +3214,7 @@ class handler :public Sql_alloc
OPTIMIZER_COSTS *costs; /* Points to table->share->costs */
uchar *ref; /* Pointer to current row */
uchar *dup_ref; /* Pointer to duplicate row */
handler *lookup_handler;
uchar *lookup_buffer;

/* General statistics for the table like number of row, file sizes etc */
Expand Down Expand Up @@ -3421,8 +3421,8 @@ class handler :public Sql_alloc
handler(handlerton *ht_arg, TABLE_SHARE *share_arg)
:table_share(share_arg), table(0),
estimation_rows_to_insert(0),
lookup_handler(this),
ht(ht_arg), costs(0), ref(0), lookup_buffer(NULL), handler_stats(NULL),
ht(ht_arg), costs(0), ref(0), lookup_handler(this),
lookup_buffer(NULL), handler_stats(NULL),
end_range(NULL), implicit_emptied(0),
mark_trx_read_write_done(0),
check_table_binlog_row_based_done(0),
Expand Down