Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify tests & allow running on individual doc stores #1487

Merged
merged 15 commits into from
Sep 27, 2021
Merged

Conversation

tholor
Copy link
Member

@tholor tholor commented Sep 22, 2021

Proposed changes:
Running the tests locally is currently quite cumbersome as you need to start various external docker containers for elasticsearch, milvus etc.

Let's simplify this and extend the option that was introduced in #1202, to run tests only on a subset or single documentstore.
This means, when calling pytest . --doc_store_type="memory" we will run all tests that can be run just with the InMemoryDocumentStore, i.e.:

  • all the tests that we typically run on the whole "document store grid" will only be run for InMemoryDocumentStore
  • any test that is specific to other document stores (e.g. elasticsearch) and is not supported by the chosen document store will be skipped (and marked in the logs accordingly)

New conventions for creating new tests:

Test should run on all document stores / those supplied in the CLI arg --doc_store_type:

Use one of the fixtures document_store or document_store_with_docs or document_store_type.
Do not parameterize it yourself.

Example:

def test_write_with_duplicate_doc_ids(document_store):
        ....

Test is only compatible with certain document stores:

Some tests you don't want to run on all possible document stores. Either because the test is specific to one/few doc store(s) or the test is not really document store related and it's enough to test it on one document store and speed up the execution time.

Example:

# Currently update_document_meta() is not implemented for InMemoryDocStore so it's not listed here as an option

@pytest.mark.parametrize("document_store", ["elasticsearch", "faiss"], indirect=True)
def test_update_meta(document_store):
    ....

Test is not using a document_store/ fixture, but still has a hard requirement for a certain document store:

Example:

@pytest.mark.elasticsearch
def test_elasticsearch_custom_fields(elasticsearch_fixture):
    client = Elasticsearch()
    client.indices.delete(index='haystack_test_custom', ignore=[404])
    document_store = ElasticsearchDocumentStore(index="haystack_test_custom", text_field="custom_text_field",
                                                embedding_field="custom_embedding_field")

Limitations / future work:

  • As of now, you will still need to launch the external document stores yourself, in a separate PR we could also automate this. We can probably get rid of the document stores fixtures in conftest.py and move the launch logic there into get_document_store()
  • We should make weaviate fully compliant so that we can get rid of the "special tests" in test_weaviate.py and rather add it as another parametrization value in test_document_store.py (see comment)
  • Revisit which tests really need the option to run on the whole document store grid, and where we can maybe just reduce it to InMemoryDocumentStore to speed up the CI
  • We currently have quite many integration tests, more unit tests would be helpful and could speed things up
  • There was recently a MockRetriever introduced in test_faiss_cosine_similarity. We can check if we can use this one in more places and in general mock more of the expensive components.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants