Introduce testing (deepset-ai#3)

* first commit * first try * fix cell * remove test clause
blancadesal · Sep 13, 2022 · fd6fc0d · fd6fc0d
1 parent b54703a
commit fd6fc0d
Show file tree

Hide file tree

Showing 5 changed files with 281 additions and 90 deletions.
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
@@ -0,0 +1,44 @@
+name: Run Tutorials Nightly
+
+on:
+ workflow_dispatch: # Activate this workflow manually
+ schedule:
+ - cron: '0 0 * * *'
+
+jobs:
+ run-tutorials:
+ runs-on: ubuntu-latest
+ container: deepset/haystack:base-massi-docker
+
+ services:
+ elasticsearch:
+ image: elasticsearch:7.9.2
+ env:
+ discovery.type: "single-node"
+ ES_JAVA_OPTS: "-Xms128m -Xmx256m"
+
+ strategy:
+ max-parallel: 2
+ matrix:
+ notebook:
+ # Note: use only the name of the file without the extension
+ - 01_Basic_QA_Pipeline
+
+ env:
+ ELASTICSEARCH_HOST: "elasticsearch"
+
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v3
+
+ - name: Install jupyter
+ run: |
+ pip install nbconvert
+
+ - name: Convert notebook to Python
+ run: |
+ jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ./tutorials/${{ matrix.notebook }}.ipynb
+
+ - name: Run the converted notebook
+ run: |
+ python ./tutorials/${{ matrix.notebook }}.py
diff --git a/.github/workflows/run_tutorials.yml b/.github/workflows/run_tutorials.yml
@@ -0,0 +1,50 @@
+name: Run Tutorials
+
+on:
+ workflow_dispatch: # Activate this workflow manually
+ pull_request:
+ paths:
+ - 'tutorials/*.ipynb'
+
+jobs:
+ run-tutorials:
+ runs-on: ubuntu-latest
+ container: deepset/haystack:base-massi-docker
+
+ services:
+ elasticsearch:
+ image: elasticsearch:7.9.2
+ env:
+ discovery.type: "single-node"
+ ES_JAVA_OPTS: "-Xms128m -Xmx256m"
+
+ env:
+ ELASTICSEARCH_HOST: "elasticsearch"
+
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v3
+
+ - name: Install jupyter
+ run: |
+ pip install nbconvert
+
+ - name: Files changed
+ uses: jitterbit/get-changed-files@v1
+ id: files
+ with:
+ format: space-delimited
+ token: ${{ secrets.GITHUB_TOKEN }}
+
+ - name: Convert notebooks to Python
+ run: |
+ for changed_file in ${{ steps.files.outputs.all }}; do
+ if [[ $changed_file == *".ipynb" ]]; then
+ echo $changed_file
+ jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ${changed_file}
+ fi
+ done
+
+ - name: Run the converted notebooks
+ run: |
+ find ./tutorials -name "*.py" -execdir python {} \;
diff --git a/markdowns/1.md b/markdowns/1.md
@@ -30,20 +30,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial
 
 <img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">
 
+You can double check whether the GPU runtime is enabled with the following command:
 
-```python
-# Make sure you have a GPU running
-!nvidia-smi
+
+```bash
+%%bash
+
+nvidia-smi
 ```
 
+To start, install the latest release of Haystack with `pip`:
 
-```python
-# Install the latest release of Haystack in your own environment
-#! pip install farm-haystack
 
-# Install the latest main of Haystack
-!pip install --upgrade pip
-!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
+```bash
+%%bash
+
+pip install --upgrade pip
+pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
 ```
 
 ## Logging
@@ -61,12 +64,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
 logging.getLogger("haystack").setLevel(logging.INFO)
 ```
 
-
-```python
-from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
-from haystack.nodes import FARMReader, TransformersReader
-```
-
 ## Document Store
 
 Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`.
@@ -77,7 +74,7 @@ Haystack finds answers to queries within the documents stored in a `DocumentStor
 
 **Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.
 
-### Start an Elasticsearch server
+### Start an Elasticsearch server locally
 You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.
 
 
@@ -88,30 +85,47 @@ from haystack.utils import launch_es
 launch_es()
 ```
 
+### Start an Elasticsearch server in Colab
 
-```python
-# In Colab / No Docker environments: Start Elasticsearch from source
-! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
-! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
-! chown -R daemon:daemon elasticsearch-7.9.2
+If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.
 
-import os
-from subprocess import Popen, PIPE, STDOUT
 
-es_server = Popen(
- ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1) # as daemon
-)
-# wait until ES has started
-! sleep 30
+```bash
+%%bash
+
+wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
+tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
+chown -R daemon:daemon elasticsearch-7.9.2
+sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -d
 ```
 
 
+```bash
+%%bash --bg
+
+sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch
+```
+
+### Create the Document Store
+
+The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing:
+
+
 ```python
-# Connect to Elasticsearch
+import time
+time.sleep(30)
+```
 
+Finally, we create the Document Store instance:
+
+
+```python
+import os
 from haystack.document_stores import ElasticsearchDocumentStore
 
-document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
+# Get the host where Elasticsearch is running, default to localhost
+host = os.environ.get("ELASTICSEARCH_HOST", "localhost")
+document_store = ElasticsearchDocumentStore(host=host, username="", password="", index="document")
 ```
 
 ## Preprocessing of documents
@@ -126,6 +140,9 @@ In this tutorial, we download Wikipedia articles about Game of Thrones, apply a
 
 
 ```python
+from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http
+
+
 # Let's first fetch some documents that we want to query
 # Here: 517 Wikipedia articles for Game of Thrones
 doc_dir = "data/tutorial1"
@@ -205,6 +222,8 @@ With both you can either load a local model or one from Hugging Face's model hub
 
 
 ```python
+from haystack.nodes import FARMReader
+
 # Load a local model or any of the QA models on
 # Hugging Face's model hub (https://huggingface.co/models)
 
@@ -213,9 +232,11 @@ reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=Tr
 
 #### TransformersReader
 
+Alternative:
+
 
 ```python
-# Alternative:
+from haystack.nodes import TransformersReader
 # reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)
 ```
 
@@ -250,9 +271,10 @@ prediction = pipe.run(
 # prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}})
 ```
 
+Now you can either print the object directly:
+
 
 ```python
-# Now you can either print the object directly...
 from pprint import pprint
 
 pprint(prediction)
@@ -275,9 +297,12 @@ pprint(prediction)
 # }
 ```
 
+Or use a util to simplify the output:
+
 
 ```python
-# ...or use a util to simplify the output
+from haystack.utils import print_answers
+
 # Change `minimum` to `medium` or `all` to raise the level of detail
 print_answers(prediction, details="minimum")
 ```

diff --git a/tutorials/.gitignore b/tutorials/.gitignore
@@ -0,0 +1,3 @@
+# Avoid checking in Python files by mistake
+# in the tutorials folder.
+*.py