Skip to content

Commit

Permalink
refactor tutorial 07 to make it testable (deepset-ai#13)
Browse files Browse the repository at this point in the history
* refactor tutorial 07 to make it testable

* remove type checking
  • Loading branch information
masci committed Sep 16, 2022
1 parent a708730 commit 823cd74
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 65 deletions.
1 change: 1 addition & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ jobs:
- 04_FAQ_style_QA
- 05_Evaluation
- 06_Better_Retrieval_via_Embedding_Retrieval
- 07_RAG_Generator

env:
ELASTICSEARCH_HOST: "elasticsearch"
Expand Down
44 changes: 23 additions & 21 deletions markdowns/7.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,22 +25,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">

You can double check whether the GPU runtime is enabled with the following command:

```python
# Make sure you have a GPU running
!nvidia-smi

```bash
%%bash

nvidia-smi
```

Here are the packages and imports that we'll need:
To start, install the latest release of Haystack with `pip`:


```python
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack
```bash
%%bash

# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]
pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]
```

## Logging
Expand All @@ -58,22 +59,16 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
logging.getLogger("haystack").setLevel(logging.INFO)
```

Let's download a csv containing some sample text and preprocess the data.



```python
from typing import List
import requests
import pandas as pd
from haystack import Document
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import RAGenerator, DensePassageRetriever
from haystack.utils import fetch_archive_from_http
```

Let's download a csv containing some sample text and preprocess the data.

from haystack.utils import fetch_archive_from_http


```python
# Download sample
doc_dir = "data/tutorial7/"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/small_generator_dataset.csv.zip"
Expand All @@ -92,10 +87,13 @@ Alternatively, we can also just use dictionaries with "text" and "meta" fields


```python
from haystack import Document


# Use data to initialize Document objects
titles = list(df["title"].values)
texts = list(df["text"].values)
documents: List[Document] = []
documents = []
for title, text in zip(titles, texts):
documents.append(Document(content=text, meta={"name": title or ""}))
```
Expand All @@ -105,6 +103,10 @@ FAISS is chosen here since it is optimized vector storage.


```python
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import RAGenerator, DensePassageRetriever


# Initialize FAISS document store.
# Set `return_embedding` to `True`, so generator doesn't have to perform re-embedding
document_store = FAISSDocumentStore(faiss_index_factory_str="Flat", return_embedding=True)
Expand Down
92 changes: 48 additions & 44 deletions tutorials/07_RAG_Generator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
"Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n",
"**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg\">"
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg\">\n",
"\n",
"You can double check whether the GPU runtime is enabled with the following command:"
]
},
{
Expand All @@ -39,12 +41,16 @@
"collapsed": false,
"pycharm": {
"name": "#%%\n"
},
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"# Make sure you have a GPU running\n",
"!nvidia-smi"
"%%bash\n",
"\n",
"nvidia-smi"
]
},
{
Expand All @@ -53,7 +59,7 @@
"collapsed": false
},
"source": [
"Here are the packages and imports that we'll need:"
"To start, install the latest release of Haystack with `pip`:"
]
},
{
Expand All @@ -63,51 +69,35 @@
"collapsed": false,
"pycharm": {
"name": "#%%\n"
},
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"# Install the latest release of Haystack in your own environment\n",
"#! pip install farm-haystack\n",
"%%bash\n",
"\n",
"# Install the latest main of Haystack\n",
"!pip install --upgrade pip\n",
"!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]"
"pip install --upgrade pip\n",
"pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,faiss]"
]
},
{
"cell_type": "markdown",
"source": [
"## Logging\n",
"\n",
"We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n",
"Example log message:\n",
"INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n",
"Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
},
"source": [
"import logging\n",
"## Logging\n",
"\n",
"logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n",
"logging.getLogger(\"haystack\").setLevel(logging.INFO)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
"We configure how logging messages should be displayed and which log level should be used before importing Haystack.\n",
"Example log message:\n",
"INFO - haystack.utils.preprocessing - Converting data/tutorial1/218_Olenna_Tyrell.txt\n",
"Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:"
]
},
{
"cell_type": "code",
Expand All @@ -120,13 +110,10 @@
},
"outputs": [],
"source": [
"from typing import List\n",
"import requests\n",
"import pandas as pd\n",
"from haystack import Document\n",
"from haystack.document_stores import FAISSDocumentStore\n",
"from haystack.nodes import RAGenerator, DensePassageRetriever\n",
"from haystack.utils import fetch_archive_from_http"
"import logging\n",
"\n",
"logging.basicConfig(format=\"%(levelname)s - %(name)s - %(message)s\", level=logging.WARNING)\n",
"logging.getLogger(\"haystack\").setLevel(logging.INFO)"
]
},
{
Expand All @@ -149,6 +136,11 @@
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"from haystack.utils import fetch_archive_from_http\n",
"\n",
"\n",
"# Download sample\n",
"doc_dir = \"data/tutorial7/\"\n",
"s3_url = \"https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/small_generator_dataset.csv.zip\"\n",
Expand Down Expand Up @@ -183,10 +175,13 @@
},
"outputs": [],
"source": [
"from haystack import Document\n",
"\n",
"\n",
"# Use data to initialize Document objects\n",
"titles = list(df[\"title\"].values)\n",
"texts = list(df[\"text\"].values)\n",
"documents: List[Document] = []\n",
"documents = []\n",
"for title, text in zip(titles, texts):\n",
" documents.append(Document(content=text, meta={\"name\": title or \"\"}))"
]
Expand All @@ -212,6 +207,10 @@
},
"outputs": [],
"source": [
"from haystack.document_stores import FAISSDocumentStore\n",
"from haystack.nodes import RAGenerator, DensePassageRetriever\n",
"\n",
"\n",
"# Initialize FAISS document store.\n",
"# Set `return_embedding` to `True`, so generator doesn't have to perform re-embedding\n",
"document_store = FAISSDocumentStore(faiss_index_factory_str=\"Flat\", return_embedding=True)\n",
Expand Down Expand Up @@ -367,7 +366,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.10.6 64-bit",
"language": "python",
"name": "python3"
},
Expand All @@ -381,9 +380,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "bda33b16be7e844498c7c2d368d72665b4f1d165582b9547ed22a0249a29ca2e"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
}

0 comments on commit 823cd74

Please sign in to comment.