-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for aliases in elasticsearch document store #2448
Conversation
# bad embedding field | ||
if mappings["properties"][self.embedding_field]["type"] != "knn_vector": | ||
raise Exception( | ||
f"The '{index}' index in OpenSearch already has a field called '{self.embedding_field}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZeJ0hn index
is not defined here as I found out from the failing test. I think it makes sense to use index_id
and maybe also index_name
here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed. Wow. 🤩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first pull request in your project so I need to be reactive! 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look very good to me. 👍 What I would like to see added as a nice bonus before we merge this pull request is a small test in test_document_store.py
. You can have a look at test_elasticsearch_search_field_mapping
in there as a simple example. The test could create an alias of several indices and then initialize an ElasticsearchDocumentStore
with that alias as index name. As create_index
is True by default, it will internally call self._create_document_index(index)
and then we can check whether all indices in the alias fulfill the requirements or raise an Exception. @ZeJ0hn What do you think?
Sounds good. I'm going to do that this afternoon |
Unit test OK:
|
From the test log I see that the Exception wasn't raised as expected:
|
Hi @tstadel, In my case, I don't want all using *. We have many indices with different versions of them (different mappings for an incremental tests). And we use an alias to always point to the last one. That's why we need alias support. |
@ZeJ0hn Yep, I almost guessed that. I bet that no one uses the grouping function of aliases.But speaking about complexity, because it is technically possible, it's better to support it. If we did not, we at least would need to raise an error or show a warning which again increases complexity. So let's keep it the way it is. # Technically it is possible to group multiple indexes to a single alias. As all of them will be searched we have to ensure that all of them meet our requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍 It's ready to be merged. I just added the comment about the for loop and all the tests look good to me. The two failing tests have nothing to do with this pull request. So I will go ahead and merge. @ZeJ0hn Thank you so much for this pull request and congratulations to your first contribution to Haystack! 🥇
@ZeJ0hn There is one final thing you could do, which is enable all GitHub actions on this pull request. We describe how to do that in detail here: https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md#forks |
Thank you! |
Proposed changes:
Status (please check what you already did):