Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for aliases in elasticsearch document store #2448

Merged
merged 10 commits into from
Apr 28, 2022
Merged

Add support for aliases in elasticsearch document store #2448

merged 10 commits into from
Apr 28, 2022

Conversation

ZeJ0hn
Copy link
Contributor

@ZeJ0hn ZeJ0hn commented Apr 22, 2022

Proposed changes:

  • Add the support for alias in elasticsearch. indices.get() method from elasticsearch python library doesn't return alias name as key, but return indexes grouped by the alias, so we have to loop over all keys to check that every indices are correct

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

@ZeJ0hn ZeJ0hn marked this pull request as draft April 22, 2022 08:40
@CLAassistant
Copy link

CLAassistant commented Apr 23, 2022

CLA assistant check
All committers have signed the CLA.

# bad embedding field
if mappings["properties"][self.embedding_field]["type"] != "knn_vector":
raise Exception(
f"The '{index}' index in OpenSearch already has a field called '{self.embedding_field}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZeJ0hn index is not defined here as I found out from the failing test. I think it makes sense to use index_id and maybe also index_name here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed. Wow. 🤩

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first pull request in your project so I need to be reactive! 🤓

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look very good to me. 👍 What I would like to see added as a nice bonus before we merge this pull request is a small test in test_document_store.py. You can have a look at test_elasticsearch_search_field_mapping in there as a simple example. The test could create an alias of several indices and then initialize an ElasticsearchDocumentStore with that alias as index name. As create_index is True by default, it will internally call self._create_document_index(index) and then we can check whether all indices in the alias fulfill the requirements or raise an Exception. @ZeJ0hn What do you think?

@ZeJ0hn
Copy link
Contributor Author

ZeJ0hn commented Apr 27, 2022

Sounds good. I'm going to do that this afternoon

@ZeJ0hn
Copy link
Contributor Author

ZeJ0hn commented Apr 27, 2022

Unit test OK:

  • one creates a store using 2 valid indices and 1 alias
  • second creates a store using 1 valid index, 1 invalid index, and 1 alias: Exception must be raised

@julian-risch
Copy link
Member

julian-risch commented Apr 27, 2022

From the test log I see that the Exception wasn't raised as expected:

        with pytest.raises(Exception):
            # missing field "content" in index "haystack_existing_alias_1"
            ElasticsearchDocumentStore(
>               index="haystack_existing_alias", search_fields=["content"], content_field="title"
            )
E           Failed: DID NOT RAISE <class 'Exception'>

@ZeJ0hn
Copy link
Contributor Author

ZeJ0hn commented Apr 27, 2022

Hi @tstadel,

In my case, I don't want all using *. We have many indices with different versions of them (different mappings for an incremental tests). And we use an alias to always point to the last one. That's why we need alias support.

@tstadel
Copy link
Member

tstadel commented Apr 27, 2022

Hi @tstadel,

In my case, I don't want all using *. We have many indices with different versions of them (different mappings for an incremental tests). And we use an alias to always point to the last one. That's why we need alias support.

@ZeJ0hn Yep, I almost guessed that. I bet that no one uses the grouping function of aliases.But speaking about complexity, because it is technically possible, it's better to support it. If we did not, we at least would need to raise an error or show a warning which again increases complexity. So let's keep it the way it is.
You can however insert a comment in front of the loop like

# Technically it is possible to group multiple indexes to a single alias. As all of them will be searched we have to ensure that all of them meet our requirements.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍 It's ready to be merged. I just added the comment about the for loop and all the tests look good to me. The two failing tests have nothing to do with this pull request. So I will go ahead and merge. @ZeJ0hn Thank you so much for this pull request and congratulations to your first contribution to Haystack! 🥇

@julian-risch
Copy link
Member

@ZeJ0hn There is one final thing you could do, which is enable all GitHub actions on this pull request. We describe how to do that in detail here: https://github.com/deepset-ai/haystack/blob/master/CONTRIBUTING.md#forks
That would allow automatic code style and documentation updates on your fork.

@ZeJ0hn
Copy link
Contributor Author

ZeJ0hn commented Apr 28, 2022

Thank you!

@julian-risch julian-risch merged commit 25b87e8 into deepset-ai:master Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants