Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests #5113

Merged
merged 6 commits into from
Jul 7, 2023

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented Jun 9, 2023

Related Issues

Proposed Changes:

  • explicitly set the doc attributes into "_source" field of the indexing message

How did you test it?

  • existing unit tests
  • added integration tests

Notes for the reviewer

  • That _source works for both OpenSearch and ElasticSearch can be inferred by inspecting the code here for OS and here for ES

Checklist

@tstadel tstadel requested a review from a team as a code owner June 9, 2023 10:57
@tstadel tstadel requested review from vblagoje and removed request for a team June 9, 2023 10:57
@tstadel tstadel changed the title fix: use _source on opensearch bulk requests fix: avoid conflicts with opensearch / elasticsearch magic attributes during bulk requests Jun 9, 2023
@coveralls
Copy link
Collaborator

coveralls commented Jun 9, 2023

Pull Request Test Coverage Report for Build 5486440683

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 192 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.06%) to 44.789%

Files with Coverage Reduction New Missed Lines %
preview/pipeline.py 3 91.18%
nodes/retriever/_openai_encoder.py 8 78.02%
nodes/prompt/invocation_layer/open_ai.py 10 80.85%
nodes/answer_generator/openai.py 14 85.0%
nodes/retriever/dense.py 53 26.45%
document_stores/search_engine.py 104 62.05%
Totals Coverage Status
Change from base Build 5484430606: 0.06%
Covered Lines: 10328
Relevant Lines: 23059

💛 - Coveralls

@masci masci assigned vblagoje and unassigned vblagoje Jun 19, 2023
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes. Let's note that methods in this class need to be broken down, simplified and better unit tested. We can do that in a new PR dedicated to simplification and better unit tests for this class.

if _doc[self.embedding_field] is not None:
if type(_doc[self.embedding_field]) == np.ndarray:
_doc[self.embedding_field] = _doc[self.embedding_field].tolist()
if _source[self.embedding_field] is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:

if isinstance(_source.get(self.embedding_field), np.ndarray):
    _source[self.embedding_field] = _source[self.embedding_field].tolist()

Would that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change that.

_doc[k] = v
_doc.pop("meta")
documents_to_index.append(_doc)
if "meta" in _source.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

_source.update(_source.pop("meta", {}))

Would that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to take care of _source["meta] == None. Besides that it would work.

@github-actions github-actions bot added the type:documentation Improvements on the docs label Jul 7, 2023
@tstadel tstadel requested a review from vblagoje July 7, 2023 12:36
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tstadel tstadel merged commit 9acb275 into main Jul 7, 2023
59 checks passed
@tstadel tstadel deleted the avoid_opensearch_magic_attrs branch July 7, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot add metadata "version" to OpenSearchDocumentStore
3 participants