Refactor Tutorial 03 to make it testable (deepset-ai#7)

* refactor Tutorial 03 to make it testable * pyzmq 24 is broken, temp workaround * update markdown version
blancadesal · Sep 16, 2022 · f84091f · f84091f
1 parent fd6fc0d
commit f84091f
Show file tree

Hide file tree

Showing 4 changed files with 107 additions and 159 deletions.
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
@@ -23,6 +23,7 @@ jobs:
  notebook:
  # Note: use only the name of the file without the extension
  - 01_Basic_QA_Pipeline
+ - 03_Basic_QA_Pipeline_without_Elasticsearch
 
  env:
  ELASTICSEARCH_HOST: "elasticsearch"
@@ -32,7 +33,9 @@ jobs:
  uses: actions/checkout@v3
 
  - name: Install jupyter
+ # remove pip install pyzmq when this is resolved https://github.com/zeromq/pyzmq/issues/1764
  run: |
+ pip install pyzmq==23.2.1
  pip install nbconvert
 
  - name: Convert notebook to Python

diff --git a/.github/workflows/run_tutorials.yml b/.github/workflows/run_tutorials.yml
@@ -26,7 +26,9 @@ jobs:
  uses: actions/checkout@v3
 
  - name: Install jupyter
+ # remove pip install pyzmq when this is resolved https://github.com/zeromq/pyzmq/issues/1764
  run: |
+ pip install pyzmq==23.2.1
  pip install nbconvert
 
  - name: Files changed
@@ -37,6 +39,7 @@ jobs:
  token: ${{ secrets.GITHUB_TOKEN }}
 
  - name: Convert notebooks to Python
+ shell: bash
  run: |
  for changed_file in ${{ steps.files.outputs.all }}; do
  if [[ $changed_file == *".ipynb" ]]; then

diff --git a/markdowns/3.md b/markdowns/3.md
@@ -25,20 +25,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial
 
 <img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">
 
+You can double check whether the GPU runtime is enabled with the following command:
 
-```python
-# Make sure you have a GPU running
-!nvidia-smi
+
+```bash
+%%bash
+
+nvidia-smi
 ```
 
+To start, install the latest release of Haystack with `pip`:
 
-```python
-# Install the latest release of Haystack in your own environment
-#! pip install farm-haystack
 
-# Install the latest main of Haystack
-!pip install --upgrade pip
-!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
+```bash
+%%bash
+
+pip install --upgrade pip
+pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
 ```
 
 ## Logging
@@ -56,12 +59,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
 logging.getLogger("haystack").setLevel(logging.INFO)
 ```
 
-
-```python
-from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
-from haystack.nodes import FARMReader, TransformersReader
-```
-
 ## Document Store
 
 
@@ -75,7 +72,8 @@ document_store = InMemoryDocumentStore()
 
 
 ```python
-# SQLite Document Store
+# Alternatively, uncomment the following to use the SQLite Document Store:
+
 # from haystack.document_stores import SQLDocumentStore
 # document_store = SQLDocumentStore(url="sqlite:https:///qa.db")
 ```
@@ -92,6 +90,9 @@ In this tutorial, we download Wikipedia articles on Game of Thrones, apply a bas
 
 
 ```python
+from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http
+
+
 # Let's first get some documents that we want to query
 # Here: 517 Wikipedia articles for Game of Thrones
 doc_dir = "data/tutorial3"
@@ -109,6 +110,7 @@ docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split
 
 # Let's have a look at the first 3 entries:
 print(docs[:3])
+
 # Now, let's write the docs to our DB.
 document_store.write_documents(docs)
 ```
@@ -124,7 +126,6 @@ With InMemoryDocumentStore or SQLDocumentStore, you can use the TfidfRetriever.
 
 ```python
 # An in-memory TfidfRetriever based on Pandas dataframes
-
 from haystack.nodes import TfidfRetriever
 
 retriever = TfidfRetriever(document_store=document_store)
@@ -150,17 +151,21 @@ With both you can either load a local model or one from Hugging Face's model hub
 
 
 ```python
+from haystack.nodes import FARMReader
+
+
 # Load a local model or any of the QA models on
 # Hugging Face's model hub (https://huggingface.co/models)
-
 reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
 ```
 
 #### TransformersReader
 
+Alternatively, we can use a Transformers reader:
+
 
 ```python
-# Alternative:
+# from haystack.nodes import FARMReader, TransformersReader
 # reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)
 ```
 
@@ -191,6 +196,8 @@ prediction = pipe.run(
 
 
 ```python
+# You can try asking more questions:
+
 # prediction = pipe.run(query="Who created the Dothraki vocabulary?", params={"Reader": {"top_k": 5}})
 # prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}})
 ```
@@ -223,7 +230,10 @@ pprint(prediction)
 
 ```python
 # ...or use a util to simplify the output
-# Change `minimum` to `medium` or `all` to raise the level of detail
+from haystack.utils import print_answers
+
+
+# Change `minimum` to `medium` or `all` to control the level of detail
 print_answers(prediction, details="minimum")
 ```