Skip to content

Commit

Permalink
refactor tutorial 06 to make it testable (deepset-ai#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
masci committed Sep 16, 2022
1 parent 868c7de commit a708730
Show file tree
Hide file tree
Showing 3 changed files with 82 additions and 61 deletions.
1 change: 1 addition & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ jobs:
- 03_Basic_QA_Pipeline_without_Elasticsearch
- 04_FAQ_style_QA
- 05_Evaluation
- 06_Better_Retrieval_via_Embedding_Retrieval

env:
ELASTICSEARCH_HOST: "elasticsearch"
Expand Down
37 changes: 21 additions & 16 deletions markdowns/6.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,20 +59,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">

You can double check whether the GPU runtime is enabled with the following command:

```python
# Make sure you have a GPU running
!nvidia-smi

```bash
%%bash

nvidia-smi
```

To start, install the latest release of Haystack with `pip`:

```python
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]
```bash
%%bash

pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]
```

## Logging
Expand All @@ -90,12 +93,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
logging.getLogger("haystack").setLevel(logging.INFO)
```


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader
```

### Document Store

#### Option 1: FAISS
Expand Down Expand Up @@ -143,6 +140,9 @@ Similarly to the previous tutorials, we download, convert and index some Game of


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http


# Let's first get some files that we want to use
doc_dir = "data/tutorial6"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt6.zip"
Expand Down Expand Up @@ -196,9 +196,11 @@ Here we use a FARMReader with the *deepset/roberta-base-squad2* model (see: http


```python
from haystack.nodes import FARMReader


# Load a local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
```

Expand Down Expand Up @@ -229,6 +231,9 @@ prediction = pipe.run(


```python
from haystack.utils import print_answers


print_answers(prediction, details="minimum")
```

Expand Down
105 changes: 60 additions & 45 deletions tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -62,82 +62,84 @@
"Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n",
"**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg\">"
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg\">\n",
"\n",
"You can double check whether the GPU runtime is enabled with the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JlZgP8q1A6NW"
"id": "JlZgP8q1A6NW",
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"# Make sure you have a GPU running\n",
"!nvidia-smi"
"%%bash\n",
"\n",
"nvidia-smi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To start, install the latest release of Haystack with `pip`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NM36kbRFA6Nc"
"id": "NM36kbRFA6Nc",
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"# Install the latest release of Haystack in your own environment\n",
"#! pip install farm-haystack\n",
"%%bash\n",
"\n",
"# Install the latest main of Haystack\n",
"!pip install --upgrade pip\n",
"!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]"
"pip install --upgrade pip\n",
"pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"id": "GbM2ml-ozqLX",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Logging\n",
"\n",
"We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n",
"Example log message:\n",
"INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n",
"Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
},
"id": "GbM2ml-ozqLX"
}
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"import logging\n",
"\n",
"logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n",
"logging.getLogger(\"haystack\").setLevel(logging.INFO)"
],
"metadata": {
"id": "kQWEUUMnzqLX",
"pycharm": {
"name": "#%%\n"
},
"id": "kQWEUUMnzqLX"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xmRuhTQ7A6Nh"
}
},
"outputs": [],
"source": [
"from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers\n",
"from haystack.nodes import FARMReader, TransformersReader"
"import logging\n",
"\n",
"logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n",
"logging.getLogger(\"haystack\").setLevel(logging.INFO)"
]
},
{
Expand Down Expand Up @@ -179,10 +181,10 @@
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"id": "s4HK5l0qzqLZ",
"pycharm": {
"name": "#%% md\n"
},
"id": "s4HK5l0qzqLZ"
}
},
"source": [
"#### Option 2: Milvus\n",
Expand All @@ -197,10 +199,10 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2Ur4h-E3zqLZ",
"pycharm": {
"name": "#%%\n"
},
"id": "2Ur4h-E3zqLZ"
}
},
"outputs": [],
"source": [
Expand Down Expand Up @@ -242,6 +244,9 @@
},
"outputs": [],
"source": [
"from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http\n",
"\n",
"\n",
"# Let's first get some files that we want to use\n",
"doc_dir = \"data/tutorial6\"\n",
"s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt6.zip\"\n",
Expand Down Expand Up @@ -324,9 +329,11 @@
},
"outputs": [],
"source": [
"from haystack.nodes import FARMReader\n",
"\n",
"\n",
"# Load a local model or any of the QA models on\n",
"# Hugging Face's model hub (https://huggingface.co/models)\n",
"\n",
"reader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\", use_gpu=True)"
]
},
Expand Down Expand Up @@ -389,6 +396,9 @@
},
"outputs": [],
"source": [
"from haystack.utils import print_answers\n",
"\n",
"\n",
"print_answers(prediction, details=\"minimum\")"
]
},
Expand Down Expand Up @@ -426,8 +436,9 @@
"name": "Tutorial6_Better_Retrieval_via_Embedding_Retrieval.ipynb",
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.10.6 64-bit",
"language": "python",
"name": "python3"
},
Expand All @@ -441,10 +452,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
"version": "3.10.6"
},
"gpuClass": "standard"
"vscode": {
"interpreter": {
"hash": "bda33b16be7e844498c7c2d368d72665b4f1d165582b9547ed22a0249a29ca2e"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
}

0 comments on commit a708730

Please sign in to comment.