deploy current version

ju-gu · Jun 11, 2021 · afd670d · afd670d
1 parent 791e637
commit afd670d
Show file tree

Hide file tree

Showing 9 changed files with 283 additions and 13 deletions.
diff --git a/src/pages/docs/versions/master/latest/site/en/usage/usage/document_store.md b/src/pages/docs/versions/master/latest/site/en/usage/usage/document_store.md
@@ -116,6 +116,27 @@ from haystack.document_store import SQLDocumentStore
 document_store = SQLDocumentStore()
 ```
 
+</div>
+</div>
+
+<div class="tab">
+<input type="radio" id="tab-1-6" name="tab-group-1">
+<label class="labelouter" for="tab-1-6">Weaviate</label>
+<div class="tabcontent">
+
+The `WeaviateDocumentStore` requires a running Weaviate Server. 
+You can start a basic instance like this (see Weaviate docs for details): 
+```
+ docker run -d -p 8080:8080 --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.4.0
+```
+
+Afterwards, you can use it in Haystack:
+```python
+from haystack.document_store import WeaviateDocumentStore
+
+document_store = WeaviateDocumentStore()
+```
+
 </div>
 </div>
 
@@ -264,6 +285,24 @@ The Document Stores have different characteristics. You should choose one depend
 </div>
 </div>
 
+
+<div class="tab">
+<input type="radio" id="tab-2-6" name="tab-group-2">
+<label class="labelouter" for="tab-2-6">Weaviate</label>
+<div class="tabcontent">
+
+**Pros:**
+- Simple vector search
+- Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
+- Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset 
+
+**Cons:**
+- Less options for ANN algorithms than FAISS or Milvus
+- No BM25 / Tf-idf retrieval
+
+</div>
+</div>
+
 </div>
 
 <div class="recommendation">
@@ -276,4 +315,4 @@ The Document Stores have different characteristics. You should choose one depend
 
 **Vector Specialist:** Use the `MilvusDocumentStore`, if you want to focus on dense retrieval and possibly deal with larger datasets
 
-</div>
+</div>
diff --git a/src/pages/docs/versions/master/latest/site/en/usage/usage/faq.md b/src/pages/docs/versions/master/latest/site/en/usage/usage/faq.md
@@ -0,0 +1,90 @@
+---
+title: "Frequently Asked Questions"
+metaTitle: "Frequently Asked Questions"
+metaDescription: ""
+slug: "/docs/faq"
+date: "2020-09-03"
+id: "faqmd"
+---
+
+#Frequently Asked Questions
+
+##Why am I seeing duplicate answers being returned?
+
+The ElasticsearchDocumentStore and MilvusDocumentStore rely on Elasticsearch and Milvus backend services which 
+persist after your Python script has finished running.
+If you rerun your script without deleting documents, you could end up with duplicate 
+copies of your documents in your database.
+The easiest way to avoid this is to call `DocumentStore.delete_documents()` after initialization
+to ensure that you are working with an empty DocumentStore.
+
+DocumentStores also have a `duplicate_documents` argument in their `__init__()` and `write_documents` methods
+where you can define whether you'd like skip writing duplicates, overwrite existing duplicates or raise an error when there are duplicates.
+
+##How can I make sure that my GPU is being engaged when I use Haystack?
+
+You will want to ensure that a CUDA enabled GPU is being engaged when Haystack is running (you can check by running `nvidia-smi -l` on your command line).
+Components which can be sped up by GPU have a `use_gpu` argument in their constructor which you will want to set to `True`.
+
+##How do I speed up my predictions?
+
+There are many different ways to speed up the performance of your Haystack system.
+
+The Reader is usually the most computationally expensive component in a pipeline 
+and you can often speed up your system by using a smaller model, like `deepset/minilm-uncased-squad2` (see [benchmarks](https://huggingface.co/deepset/minilm-uncased-squad2)). This usually comes with a small trade-off in accuracy.
+
+You can reduce the work load on the Reader by instructing the Retriever to pass on less documents. 
+This is done by setting the `top_k_retriever` parameter to a lower value.
+
+Making sure that your documents are shorter can also increase the speed of your system. You can split
+your documents into smaller chunks by using the `PreProcessor` (see [tutorial](https://haystack.deepset.ai/docs/latest/tutorial11md)).
+
+For more optimization suggestions, have a look at our [optimization page](https://haystack.deepset.ai/docs/latest/optimizationmd)
+and also our [blogs](https://medium.com/deepset-ai)
+
+##How do I use Haystack for my language?
+
+The components in Haystack, such as the `Retriever` or the `Reader`, are designed in a language agnostic way. However you may
+have to set certain parameters or load models pretrained for your language in order to get good performance out of Haystack.
+See our [languages page](https://haystack.deepset.ai/docs/latest/languagesmd) for more details.
+
+##How can I add metadata to my documents so that I can apply filters?
+
+When providing your documents in the input format (see [here](https://haystack.deepset.ai/docs/latest/documentstoremd#Input-Format))
+you can provide metadata information as a dictionary under the `meta` key. At query time, you can provide a `filters` argument
+(most likely through `Pipelines.run()`) that specifies the accepted values for a certain metadata field
+(for an example of what a `filters` dictionary might look like, please refer to [this example](https://haystack.deepset.ai/docs/latest/apiretrievermd#__init__))
+
+##How can I see predictions during evaluation?
+
+To see predictions during evaluation, you want to initialize the `EvalDocuments` or `EvalAnswers` with `debug=True`. 
+This causes their `EvalDocuments.log` or `EvalAnswers.log` to be populated with a record of each prediction made.
+
+##How can I serve my Haystack model?
+
+Haystack models can be wrapped in a REST API. For basic details on how to set this up, please refer to this section 
+on our [Github page](https://github.com/deepset-ai/haystack/blob/master/README.md#7-rest-api). 
+More comprehensive documentation coming soon!
+
+##How can I interpret the confidence scores being returned by the Reader?
+
+The confidence scores are in the range of 0 and 1 and reflect how confident the model is in each prediction that it makes.
+Having a confidence score is particularly useful in cases where you need Haystack to work with a certain accuracy threshold.
+Many of our users have built systems where predictions below a certain confidence value are routed on to a fallback system.
+
+For more information on model confidence and how to tune it, please refer to [this section](https://haystack.deepset.ai/docs/latest/readermd#Confidence-Scores).
+
+##My documents aren't showing up in my DocumentStore even though I've called `DocumentStore.write_documents()`
+
+When indexing, retrieving or querying for documents from a DocumentStore, you can specify an `index` on which to perform this action. 
+This can be specified in almost all methods of `DocumentStore` as well as `Retriever.retrieve()`.
+Ensure that you are performing these operations on the one index! 
+Note that this also applies at evaluation where labels are written into their own separate DocumentStore index.
+
+##What is the difference between the FARMReader and the TransformersReader?
+
+In short, the FARMReader using a QA pipeline implementation that comes from our own 
+[FARM framework](https://github.com/deepset-ai/FARM) that we can more easily update and also optimize for performance. 
+By contrast, the TransformersReader uses a QA pipeline implementation that comes from HuggingFace's [Transformers](https://github.com/huggingface/transformers).
+See [this section](https://haystack.deepset.ai/docs/latest/readermd#Deeper-Dive-FARM-vs-Transformers) 
+for a more details about their differences!
diff --git a/src/pages/docs/versions/master/latest/site/en/usage/usage/ranker.md b/src/pages/docs/versions/master/latest/site/en/usage/usage/ranker.md
@@ -30,6 +30,7 @@ Alternatively, [this example](https://github.com/deepset-ai/FARM/blob/master/exa
 ### Description
 
 The FARMRanker consists of a Transformer-based model for document re-ranking using the TextPairClassifier of [FARM](https://github.com/deepset-ai/FARM).
+Given a text pair of query and passage, the TextPairClassifier either predicts label "1" if the pair is similar or label "0" if they are dissimilar (accompanied with a probability).
 While the underlying model can vary (BERT, Roberta, DistilBERT, ...), the interface remains the same.
 With a FARMRanker, you can:
 * Directly get predictions (re-ranked version of the supplied list of Document) via predict() if supplying a pre-trained model

diff --git a/src/pages/docs/versions/master/latest/site/en/usage/usage/reader.md b/src/pages/docs/versions/master/latest/site/en/usage/usage/reader.md
@@ -247,7 +247,7 @@ When printing the full results of a Reader,
 you will see that each prediction is accompanied 
 by a value in the range of 0 to 1 reflecting the model's confidence in that prediction
 
-In the output of `print_answers()`, you will find the model confidence in dictionary key called `probability`.
+In the output of `print_answers()`, you will find the model confidence in dictionary key called `confidence`.
 
 ```python
 from haystack.utils import print_answers
@@ -263,17 +263,22 @@ print_answers(prediction, details="all")
  'She travels with her father, Eddard, to '
  "King's Landing when he is made Hand of the "
  'King. Before she leaves,',
- 'probability': 0.9899835586547852,
+ 'confidence': 0.9899835586547852,
  ...
  },
  ]
 }
 ```
 
 In order to align this probability score with the model's accuracy, finetuning needs to be performed
-on a specific dataset. Have a look at this [FARM tutorial](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering_confidence.py)
-to see how this is done. 
-Note that a finetuned confidence score is specific to the domain that its finetuned on. 
+on a specific dataset. 
+To this end, the reader has a method `calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin)`.
+The parameters of this method are the same as for the `eval()` method because the calibration of confidence scores is performed on a dataset that comes with gold labels.
+The calibration calls the `eval()` method internally and therefore needs a DocumentStore containing labeled questions and evaluation documents.
+
+Have a look at this [FARM tutorial](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering_confidence.py)
+to see how to compare calibrated confidence scores with uncalibrated confidence scores within FARM. 
+Note that a finetuned confidence score is specific to the domain that it is finetuned on. 
 There is no guarantee that this performance can transfer to a new domain.
 
 Having a confidence score is particularly useful in cases where you need Haystack to work with a certain accuracy threshold.

diff --git a/src/pages/docs/versions/master/site/en/usage/usage/document_store.md b/src/pages/docs/versions/master/site/en/usage/usage/document_store.md
@@ -116,6 +116,27 @@ from haystack.document_store import SQLDocumentStore
 document_store = SQLDocumentStore()
 ```
 
+</div>
+</div>
+
+<div class="tab">
+<input type="radio" id="tab-1-6" name="tab-group-1">
+<label class="labelouter" for="tab-1-6">Weaviate</label>
+<div class="tabcontent">
+
+The `WeaviateDocumentStore` requires a running Weaviate Server. 
+You can start a basic instance like this (see Weaviate docs for details): 
+```
+ docker run -d -p 8080:8080 --env AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED='true' --env PERSISTENCE_DATA_PATH='/var/lib/weaviate' semitechnologies/weaviate:1.4.0
+```
+
+Afterwards, you can use it in Haystack:
+```python
+from haystack.document_store import WeaviateDocumentStore
+
+document_store = WeaviateDocumentStore()
+```
+
 </div>
 </div>
 
@@ -264,6 +285,24 @@ The Document Stores have different characteristics. You should choose one depend
 </div>
 </div>
 
+
+<div class="tab">
+<input type="radio" id="tab-2-6" name="tab-group-2">
+<label class="labelouter" for="tab-2-6">Weaviate</label>
+<div class="tabcontent">
+
+**Pros:**
+- Simple vector search
+- Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
+- Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset 
+
+**Cons:**
+- Less options for ANN algorithms than FAISS or Milvus
+- No BM25 / Tf-idf retrieval
+
+</div>
+</div>
+
 </div>
 
 <div class="recommendation">
@@ -276,4 +315,4 @@ The Document Stores have different characteristics. You should choose one depend
 
 **Vector Specialist:** Use the `MilvusDocumentStore`, if you want to focus on dense retrieval and possibly deal with larger datasets
 
-</div>
+</div>
diff --git a/src/pages/docs/versions/master/site/en/usage/usage/faq.md b/src/pages/docs/versions/master/site/en/usage/usage/faq.md
@@ -0,0 +1,90 @@
+---
+title: "Frequently Asked Questions"
+metaTitle: "Frequently Asked Questions"
+metaDescription: ""
+slug: "/docs/faq"
+date: "2020-09-03"
+id: "faqmd"
+---
+
+#Frequently Asked Questions
+
+##Why am I seeing duplicate answers being returned?
+
+The ElasticsearchDocumentStore and MilvusDocumentStore rely on Elasticsearch and Milvus backend services which 
+persist after your Python script has finished running.
+If you rerun your script without deleting documents, you could end up with duplicate 
+copies of your documents in your database.
+The easiest way to avoid this is to call `DocumentStore.delete_documents()` after initialization
+to ensure that you are working with an empty DocumentStore.
+
+DocumentStores also have a `duplicate_documents` argument in their `__init__()` and `write_documents` methods
+where you can define whether you'd like skip writing duplicates, overwrite existing duplicates or raise an error when there are duplicates.
+
+##How can I make sure that my GPU is being engaged when I use Haystack?
+
+You will want to ensure that a CUDA enabled GPU is being engaged when Haystack is running (you can check by running `nvidia-smi -l` on your command line).
+Components which can be sped up by GPU have a `use_gpu` argument in their constructor which you will want to set to `True`.
+
+##How do I speed up my predictions?
+
+There are many different ways to speed up the performance of your Haystack system.
+
+The Reader is usually the most computationally expensive component in a pipeline 
+and you can often speed up your system by using a smaller model, like `deepset/minilm-uncased-squad2` (see [benchmarks](https://huggingface.co/deepset/minilm-uncased-squad2)). This usually comes with a small trade-off in accuracy.
+
+You can reduce the work load on the Reader by instructing the Retriever to pass on less documents. 
+This is done by setting the `top_k_retriever` parameter to a lower value.
+
+Making sure that your documents are shorter can also increase the speed of your system. You can split
+your documents into smaller chunks by using the `PreProcessor` (see [tutorial](https://haystack.deepset.ai/docs/latest/tutorial11md)).
+
+For more optimization suggestions, have a look at our [optimization page](https://haystack.deepset.ai/docs/latest/optimizationmd)
+and also our [blogs](https://medium.com/deepset-ai)
+
+##How do I use Haystack for my language?
+
+The components in Haystack, such as the `Retriever` or the `Reader`, are designed in a language agnostic way. However you may
+have to set certain parameters or load models pretrained for your language in order to get good performance out of Haystack.
+See our [languages page](https://haystack.deepset.ai/docs/latest/languagesmd) for more details.
+
+##How can I add metadata to my documents so that I can apply filters?
+
+When providing your documents in the input format (see [here](https://haystack.deepset.ai/docs/latest/documentstoremd#Input-Format))
+you can provide metadata information as a dictionary under the `meta` key. At query time, you can provide a `filters` argument
+(most likely through `Pipelines.run()`) that specifies the accepted values for a certain metadata field
+(for an example of what a `filters` dictionary might look like, please refer to [this example](https://haystack.deepset.ai/docs/latest/apiretrievermd#__init__))
+
+##How can I see predictions during evaluation?
+
+To see predictions during evaluation, you want to initialize the `EvalDocuments` or `EvalAnswers` with `debug=True`. 
+This causes their `EvalDocuments.log` or `EvalAnswers.log` to be populated with a record of each prediction made.
+
+##How can I serve my Haystack model?
+
+Haystack models can be wrapped in a REST API. For basic details on how to set this up, please refer to this section 
+on our [Github page](https://github.com/deepset-ai/haystack/blob/master/README.md#7-rest-api). 
+More comprehensive documentation coming soon!
+
+##How can I interpret the confidence scores being returned by the Reader?
+
+The confidence scores are in the range of 0 and 1 and reflect how confident the model is in each prediction that it makes.
+Having a confidence score is particularly useful in cases where you need Haystack to work with a certain accuracy threshold.
+Many of our users have built systems where predictions below a certain confidence value are routed on to a fallback system.
+
+For more information on model confidence and how to tune it, please refer to [this section](https://haystack.deepset.ai/docs/latest/readermd#Confidence-Scores).
+
+##My documents aren't showing up in my DocumentStore even though I've called `DocumentStore.write_documents()`
+
+When indexing, retrieving or querying for documents from a DocumentStore, you can specify an `index` on which to perform this action. 
+This can be specified in almost all methods of `DocumentStore` as well as `Retriever.retrieve()`.
+Ensure that you are performing these operations on the one index! 
+Note that this also applies at evaluation where labels are written into their own separate DocumentStore index.
+
+##What is the difference between the FARMReader and the TransformersReader?
+
+In short, the FARMReader using a QA pipeline implementation that comes from our own 
+[FARM framework](https://github.com/deepset-ai/FARM) that we can more easily update and also optimize for performance. 
+By contrast, the TransformersReader uses a QA pipeline implementation that comes from HuggingFace's [Transformers](https://github.com/huggingface/transformers).
+See [this section](https://haystack.deepset.ai/docs/latest/readermd#Deeper-Dive-FARM-vs-Transformers) 
+for a more details about their differences!
diff --git a/src/pages/docs/versions/master/site/en/usage/usage/ranker.md b/src/pages/docs/versions/master/site/en/usage/usage/ranker.md
@@ -30,6 +30,7 @@ Alternatively, [this example](https://github.com/deepset-ai/FARM/blob/master/exa
 ### Description
 
 The FARMRanker consists of a Transformer-based model for document re-ranking using the TextPairClassifier of [FARM](https://github.com/deepset-ai/FARM).
+Given a text pair of query and passage, the TextPairClassifier either predicts label "1" if the pair is similar or label "0" if they are dissimilar (accompanied with a probability).
 While the underlying model can vary (BERT, Roberta, DistilBERT, ...), the interface remains the same.
 With a FARMRanker, you can:
 * Directly get predictions (re-ranked version of the supplied list of Document) via predict() if supplying a pre-trained model