Skip to content

Commit

Permalink
Refactor Tutorial 03 to make it testable (deepset-ai#7)
Browse files Browse the repository at this point in the history
* refactor Tutorial 03 to make it testable

* pyzmq 24 is broken, temp workaround

* update markdown version
  • Loading branch information
masci committed Sep 16, 2022
1 parent fd6fc0d commit f84091f
Show file tree
Hide file tree
Showing 4 changed files with 107 additions and 159 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ jobs:
notebook:
# Note: use only the name of the file without the extension
- 01_Basic_QA_Pipeline
- 03_Basic_QA_Pipeline_without_Elasticsearch

env:
ELASTICSEARCH_HOST: "elasticsearch"
Expand All @@ -32,7 +33,9 @@ jobs:
uses: actions/checkout@v3

- name: Install jupyter
# remove pip install pyzmq when this is resolved https://github.com/zeromq/pyzmq/issues/1764
run: |
pip install pyzmq==23.2.1
pip install nbconvert
- name: Convert notebook to Python
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/run_tutorials.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ jobs:
uses: actions/checkout@v3

- name: Install jupyter
# remove pip install pyzmq when this is resolved https://github.com/zeromq/pyzmq/issues/1764
run: |
pip install pyzmq==23.2.1
pip install nbconvert
- name: Files changed
Expand All @@ -37,6 +39,7 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}

- name: Convert notebooks to Python
shell: bash
run: |
for changed_file in ${{ steps.files.outputs.all }}; do
if [[ $changed_file == *".ipynb" ]]; then
Expand Down
50 changes: 30 additions & 20 deletions markdowns/3.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">

You can double check whether the GPU runtime is enabled with the following command:

```python
# Make sure you have a GPU running
!nvidia-smi

```bash
%%bash

nvidia-smi
```

To start, install the latest release of Haystack with `pip`:

```python
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```bash
%%bash

pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```

## Logging
Expand All @@ -56,12 +59,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
logging.getLogger("haystack").setLevel(logging.INFO)
```


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader
```

## Document Store


Expand All @@ -75,7 +72,8 @@ document_store = InMemoryDocumentStore()


```python
# SQLite Document Store
# Alternatively, uncomment the following to use the SQLite Document Store:

# from haystack.document_stores import SQLDocumentStore
# document_store = SQLDocumentStore(url="sqlite:https:///qa.db")
```
Expand All @@ -92,6 +90,9 @@ In this tutorial, we download Wikipedia articles on Game of Thrones, apply a bas


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http


# Let's first get some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "data/tutorial3"
Expand All @@ -109,6 +110,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split

# Let's have a look at the first 3 entries:
print(docs[:3])

# Now, let's write the docs to our DB.
document_store.write_documents(docs)
```
Expand All @@ -124,7 +126,6 @@ With InMemoryDocumentStore or SQLDocumentStore, you can use the TfidfRetriever.

```python
# An in-memory TfidfRetriever based on Pandas dataframes

from haystack.nodes import TfidfRetriever

retriever = TfidfRetriever(document_store=document_store)
Expand All @@ -150,17 +151,21 @@ With both you can either load a local model or one from Hugging Face's model hub


```python
from haystack.nodes import FARMReader


# Load a local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
```

#### TransformersReader

Alternatively, we can use a Transformers reader:


```python
# Alternative:
# from haystack.nodes import FARMReader, TransformersReader
# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)
```

Expand Down Expand Up @@ -191,6 +196,8 @@ prediction = pipe.run(


```python
# You can try asking more questions:

# prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}})
# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}})
```
Expand Down Expand Up @@ -223,7 +230,10 @@ pprint(prediction)

```python
# ...or use a util to simplify the output
# Change `minimum` to `medium` or `all` to raise the level of detail
from haystack.utils import print_answers


# Change `minimum` to `medium` or `all` to control the level of detail
print_answers(prediction, details="minimum")
```

Expand Down
Loading

0 comments on commit f84091f

Please sign in to comment.