Pylint (import related warnings) and REST API improvements (deepset-a…

…i#2326) * remove duplicate imports * fix ungrouped-imports * Fix wrong-import-position * Fix unused-import * pyproject.toml * Working on wrong-import-order * Solve wrong-import-order * fix Pool import * Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py * remove Converter from modeling * Fix mypy issues on adaptive_model.py * create es_converter.py * remove converter import * change import path in tests * Restructure REST API to not rely on global vars from search.apy and improve tests * Fix openapi generator * Move variable initialization * Change type of FilterRequest.filters Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
jamescalam · Apr 12, 2022 · 96a538b · 96a538b
1 parent 75dcfd3
commit 96a538b
Show file tree

Hide file tree

Showing 98 changed files with 1,291 additions and 1,227 deletions.
diff --git a/.github/utils/generate_openapi_specs.py b/.github/utils/generate_openapi_specs.py
@@ -4,6 +4,10 @@
 import sys
 import shutil
 
+sys.path.append(".")
+from rest_api.utils import get_openapi_specs, get_app, get_pipelines # pylint: disable=wrong-import-position
+from haystack import __version__ # pylint: disable=wrong-import-position
+
 REST_PATH = Path("./rest_api").absolute()
 PIPELINE_PATH = str(REST_PATH / "pipeline" / "pipeline_empty.haystack-pipeline.yml")
 APP_PATH = str(REST_PATH / "application.py")
@@ -13,8 +17,9 @@
 
 print(f"Loading OpenAPI specs from {APP_PATH} with pipeline at {PIPELINE_PATH}")
 
-sys.path.append(".")
-from rest_api.application import get_openapi_specs, haystack_version
+# To initialize the app and the pipelines
+get_app()
+get_pipelines()
 
 # Generate the openapi specs
 specs = get_openapi_specs()
@@ -29,4 +34,4 @@
  os.remove(specs_file)
 
 # Add versioned copy
-shutil.copy(DOCS_PATH / "openapi.json", DOCS_PATH / f"openapi-{haystack_version}.json")
+shutil.copy(DOCS_PATH / "openapi.json", DOCS_PATH / f"openapi-{__version__}.json")
diff --git a/Dockerfile b/Dockerfile
@@ -46,4 +46,4 @@ EXPOSE 8000
 ENV HAYSTACK_DOCKER_CONTAINER="HAYSTACK_CPU_CONTAINER"
 
 # cmd for running the API
-CMD ["gunicorn", "rest_api.application:app",  "-b", "0.0.0.0", "-k", "uvicorn.workers.UvicornWorker", "--workers", "1", "--timeout", "180"]
+CMD ["gunicorn", "rest_api.application:app", "-b", "0.0.0.0", "-k", "uvicorn.workers.UvicornWorker", "--workers", "1", "--timeout", "180"]
diff --git a/docker-compose-gpu.yml b/docker-compose-gpu.yml
@@ -21,7 +21,7 @@ services:
  - 8000:8000
  restart: on-failure
  environment:
- # See rest_api/pipelines.yaml for configurations of Search & Indexing Pipeline.
+ # See rest_api/pipeline/pipelines.haystack-pipeline.yml for configurations of Search & Indexing Pipeline.
  - DOCUMENTSTORE_PARAMS_HOST=elasticsearch
  - PIPELINE_YAML_PATH=/home/user/rest_api/pipeline/pipelines_dpr.haystack-pipeline.yml
  - CONCURRENT_REQUEST_PER_WORKER

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -12,7 +12,7 @@ services:
  - 8000:8000
  restart: on-failure
  environment:
- # See rest_api/pipelines.yaml for configurations of Search & Indexing Pipeline.
+ # See rest_api/pipeline/pipelines.haystack-pipeline.yml for configurations of Search & Indexing Pipeline.
  - DOCUMENTSTORE_PARAMS_HOST=elasticsearch
  - PIPELINE_YAML_PATH=/home/user/rest_api/pipeline/pipelines.haystack-pipeline.yml
  - CONCURRENT_REQUEST_PER_WORKER

diff --git a/docs/_src/api/api/document_store.md b/docs/_src/api/api/document_store.md
@@ -4752,115 +4752,3 @@ and UTC as default time zone.
 This method cannot be part of WeaviateDocumentStore, as this would result in a circular import between weaviate.py
 and filter_utils.py.
 
-<a id="utils.open_search_index_to_document_store"></a>
-
-#### open\_search\_index\_to\_document\_store
-
-```python
-def open_search_index_to_document_store(document_store: "BaseDocumentStore", original_index_name: str, original_content_field: str, original_name_field: Optional[str] = None, included_metadata_fields: Optional[List[str]] = None, excluded_metadata_fields: Optional[List[str]] = None, store_original_ids: bool = True, index: Optional[str] = None, preprocessor: Optional[PreProcessor] = None, id_hash_keys: Optional[List[str]] = None, batch_size: int = 10_000, host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "admin", password: str = "admin", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, scheme: str = "https", ca_certs: Optional[str] = None, verify_certs: bool = False, timeout: int = 30, use_system_proxy: bool = False) -> "BaseDocumentStore"
-```
-
-This function provides brownfield support of existing OpenSearch indexes by converting each of the records in
-
-the provided index to haystack `Document` objects and writing them to the specified `DocumentStore`. It can be used
-on a regular basis in order to add new records of the OpenSearch index to the `DocumentStore`.
-
-**Arguments**:
-
-- `document_store`: The haystack `DocumentStore` to write the converted `Document` objects to.
-- `original_index_name`: OpenSearch index containing the records to be converted.
-- `original_content_field`: OpenSearch field containing the text to be put in the `content` field of the
-resulting haystack `Document` objects.
-- `original_name_field`: Optional OpenSearch field containing the title of the Document.
-- `included_metadata_fields`: List of OpenSearch fields that shall be stored in the `meta` field of the
-resulting haystack `Document` objects. If `included_metadata_fields` and `excluded_metadata_fields` are `None`,
-all the fields found in the OpenSearch records will be kept as metadata. You can specify only one of the
-`included_metadata_fields` and `excluded_metadata_fields` parameters.
-- `excluded_metadata_fields`: List of OpenSearch fields that shall be excluded from the `meta` field of the
-resulting haystack `Document` objects. If `included_metadata_fields` and `excluded_metadata_fields` are `None`,
-all the fields found in the OpenSearch records will be kept as metadata. You can specify only one of the
-`included_metadata_fields` and `excluded_metadata_fields` parameters.
-- `store_original_ids`: Whether to store the ID a record had in the original OpenSearch index at the
-`"_original_es_id"` metadata field of the resulting haystack `Document` objects. This should be set to `True`
-if you want to continuously update the `DocumentStore` with new records inside your OpenSearch index. If this
-parameter was set to `False` on the first call of `open_search_index_to_document_store`,
-all the indexed Documents in the `DocumentStore` will be overwritten in the second call.
-- `index`: Name of index in `document_store` to use to store the resulting haystack `Document` objects.
-- `preprocessor`: Optional PreProcessor that will be applied on the content field of the original OpenSearch
-record.
-- `id_hash_keys`: Generate the document id from a custom list of strings that refer to the document's
-attributes. If you want to ensure you don't have duplicate documents in your DocumentStore but texts are
-not unique, you can modify the metadata and pass e.g. `"meta"` to this field (e.g. [`"content"`, `"meta"`]).
-In this case the id will be generated by using the content and the defined metadata.
-- `batch_size`: Number of records to process at once.
-- `host`: URL(s) of OpenSearch nodes.
-- `port`: Ports(s) of OpenSearch nodes.
-- `username`: Username (standard authentication via http_auth).
-- `password`: Password (standard authentication via http_auth).
-- `api_key_id`: ID of the API key (altenative authentication mode to the above http_auth).
-- `api_key`: Secret value of the API key (altenative authentication mode to the above http_auth).
-- `aws4auth`: Authentication for usage with AWS OpenSearch
-(can be generated with the requests-aws4auth package).
-- `scheme`: `"https"` or `"http"`, protocol used to connect to your OpenSearch instance.
-- `ca_certs`: Root certificates for SSL: it is a path to certificate authority (CA) certs on disk.
-You can use certifi package with `certifi.where()` to find where the CA certs file is located in your machine.
-- `verify_certs`: Whether to be strict about ca certificates.
-- `timeout`: Number of seconds after which an OpenSearch request times out.
-- `use_system_proxy`: Whether to use system proxy.
-
-<a id="utils.elasticsearch_index_to_document_store"></a>
-
-#### elasticsearch\_index\_to\_document\_store
-
-```python
-def elasticsearch_index_to_document_store(document_store: "BaseDocumentStore", original_index_name: str, original_content_field: str, original_name_field: Optional[str] = None, included_metadata_fields: Optional[List[str]] = None, excluded_metadata_fields: Optional[List[str]] = None, store_original_ids: bool = True, index: Optional[str] = None, preprocessor: Optional[PreProcessor] = None, id_hash_keys: Optional[List[str]] = None, batch_size: int = 10_000, host: Union[str, List[str]] = "localhost", port: Union[int, List[int]] = 9200, username: str = "", password: str = "", api_key_id: Optional[str] = None, api_key: Optional[str] = None, aws4auth=None, scheme: str = "http", ca_certs: Optional[str] = None, verify_certs: bool = True, timeout: int = 30, use_system_proxy: bool = False) -> "BaseDocumentStore"
-```
-
-This function provides brownfield support of existing Elasticsearch indexes by converting each of the records in
-
-the provided index to haystack `Document` objects and writing them to the specified `DocumentStore`. It can be used
-on a regular basis in order to add new records of the Elasticsearch index to the `DocumentStore`.
-
-**Arguments**:
-
-- `document_store`: The haystack `DocumentStore` to write the converted `Document` objects to.
-- `original_index_name`: Elasticsearch index containing the records to be converted.
-- `original_content_field`: Elasticsearch field containing the text to be put in the `content` field of the
-resulting haystack `Document` objects.
-- `original_name_field`: Optional Elasticsearch field containing the title of the Document.
-- `included_metadata_fields`: List of Elasticsearch fields that shall be stored in the `meta` field of the
-resulting haystack `Document` objects. If `included_metadata_fields` and `excluded_metadata_fields` are `None`,
-all the fields found in the Elasticsearch records will be kept as metadata. You can specify only one of the
-`included_metadata_fields` and `excluded_metadata_fields` parameters.
-- `excluded_metadata_fields`: List of Elasticsearch fields that shall be excluded from the `meta` field of the
-resulting haystack `Document` objects. If `included_metadata_fields` and `excluded_metadata_fields` are `None`,
-all the fields found in the Elasticsearch records will be kept as metadata. You can specify only one of the
-`included_metadata_fields` and `excluded_metadata_fields` parameters.
-- `store_original_ids`: Whether to store the ID a record had in the original Elasticsearch index at the
-`"_original_es_id"` metadata field of the resulting haystack `Document` objects. This should be set to `True`
-if you want to continuously update the `DocumentStore` with new records inside your Elasticsearch index. If this
-parameter was set to `False` on the first call of `elasticsearch_index_to_document_store`,
-all the indexed Documents in the `DocumentStore` will be overwritten in the second call.
-- `index`: Name of index in `document_store` to use to store the resulting haystack `Document` objects.
-- `preprocessor`: Optional PreProcessor that will be applied on the content field of the original Elasticsearch
-record.
-- `id_hash_keys`: Generate the document id from a custom list of strings that refer to the document's
-attributes. If you want to ensure you don't have duplicate documents in your DocumentStore but texts are
-not unique, you can modify the metadata and pass e.g. `"meta"` to this field (e.g. [`"content"`, `"meta"`]).
-In this case the id will be generated by using the content and the defined metadata.
-- `batch_size`: Number of records to process at once.
-- `host`: URL(s) of Elasticsearch nodes.
-- `port`: Ports(s) of Elasticsearch nodes.
-- `username`: Username (standard authentication via http_auth).
-- `password`: Password (standard authentication via http_auth).
-- `api_key_id`: ID of the API key (altenative authentication mode to the above http_auth).
-- `api_key`: Secret value of the API key (altenative authentication mode to the above http_auth).
-- `aws4auth`: Authentication for usage with AWS Elasticsearch
-(can be generated with the requests-aws4auth package).
-- `scheme`: `"https"` or `"http"`, protocol used to connect to your Elasticsearch instance.
-- `ca_certs`: Root certificates for SSL: it is a path to certificate authority (CA) certs on disk.
-You can use certifi package with `certifi.where()` to find where the CA certs file is located in your machine.
-- `verify_certs`: Whether to be strict about ca certificates.
-- `timeout`: Number of seconds after which an Elasticsearch request times out.
-- `use_system_proxy`: Whether to use system proxy.
-