Skip to content

Commit

Permalink
Introduce testing (deepset-ai#3)
Browse files Browse the repository at this point in the history
* first commit

* first try

* fix cell

* remove test clause
  • Loading branch information
masci committed Sep 13, 2022
1 parent b54703a commit fd6fc0d
Show file tree
Hide file tree
Showing 5 changed files with 281 additions and 90 deletions.
44 changes: 44 additions & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Run Tutorials Nightly

on:
workflow_dispatch: # Activate this workflow manually
schedule:
- cron: '0 0 * * *'

jobs:
run-tutorials:
runs-on: ubuntu-latest
container: deepset/haystack:base-massi-docker

services:
elasticsearch:
image: elasticsearch:7.9.2
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"

strategy:
max-parallel: 2
matrix:
notebook:
# Note: use only the name of the file without the extension
- 01_Basic_QA_Pipeline

env:
ELASTICSEARCH_HOST: "elasticsearch"

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install jupyter
run: |
pip install nbconvert
- name: Convert notebook to Python
run: |
jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ./tutorials/${{ matrix.notebook }}.ipynb
- name: Run the converted notebook
run: |
python ./tutorials/${{ matrix.notebook }}.py
50 changes: 50 additions & 0 deletions .github/workflows/run_tutorials.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Run Tutorials

on:
workflow_dispatch: # Activate this workflow manually
pull_request:
paths:
- 'tutorials/*.ipynb'

jobs:
run-tutorials:
runs-on: ubuntu-latest
container: deepset/haystack:base-massi-docker

services:
elasticsearch:
image: elasticsearch:7.9.2
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"

env:
ELASTICSEARCH_HOST: "elasticsearch"

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install jupyter
run: |
pip install nbconvert
- name: Files changed
uses: jitterbit/get-changed-files@v1
id: files
with:
format: space-delimited
token: ${{ secrets.GITHUB_TOKEN }}

- name: Convert notebooks to Python
run: |
for changed_file in ${{ steps.files.outputs.all }}; do
if [[ $changed_file == *".ipynb" ]]; then
echo $changed_file
jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ${changed_file}
fi
done
- name: Run the converted notebooks
run: |
find ./tutorials -name "*.py" -execdir python {} \;
91 changes: 58 additions & 33 deletions markdowns/1.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">

You can double check whether the GPU runtime is enabled with the following command:

```python
# Make sure you have a GPU running
!nvidia-smi

```bash
%%bash

nvidia-smi
```

To start, install the latest release of Haystack with `pip`:

```python
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```bash
%%bash

pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```

## Logging
Expand All @@ -61,12 +64,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
logging.getLogger("haystack").setLevel(logging.INFO)
```


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader
```

## Document Store

Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`.
Expand All @@ -77,7 +74,7 @@ Haystack finds answers to queries within the documents stored in a `DocumentStor

**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.

### Start an Elasticsearch server
### Start an Elasticsearch server locally
You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.


Expand All @@ -88,30 +85,47 @@ from haystack.utils import launch_es
launch_es()
```

### Start an Elasticsearch server in Colab

```python
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2
If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1) # as daemon
)
# wait until ES has started
! sleep 30
```bash
%%bash

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
chown -R daemon:daemon elasticsearch-7.9.2
sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -d
```


```bash
%%bash --bg

sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch
```

### Create the Document Store

The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing:


```python
# Connect to Elasticsearch
import time
time.sleep(30)
```

Finally, we create the Document Store instance:


```python
import os
from haystack.document_stores import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
# Get the host where Elasticsearch is running, default to localhost
host = os.environ.get("ELASTICSEARCH_HOST", "localhost")
document_store = ElasticsearchDocumentStore(host=host, username="", password="", index="document")
```

## Preprocessing of documents
Expand All @@ -126,6 +140,9 @@ In this tutorial, we download Wikipedia articles about Game of Thrones, apply a


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http


# Let's first fetch some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "data/tutorial1"
Expand Down Expand Up @@ -205,6 +222,8 @@ With both you can either load a local model or one from Hugging Face's model hub


```python
from haystack.nodes import FARMReader

# Load a local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

Expand All @@ -213,9 +232,11 @@ reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=Tr

#### TransformersReader

Alternative:


```python
# Alternative:
from haystack.nodes import TransformersReader
# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)
```

Expand Down Expand Up @@ -250,9 +271,10 @@ prediction = pipe.run(
# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}})
```

Now you can either print the object directly:


```python
# Now you can either print the object directly...
from pprint import pprint

pprint(prediction)
Expand All @@ -275,9 +297,12 @@ pprint(prediction)
# }
```

Or use a util to simplify the output:


```python
# ...or use a util to simplify the output
from haystack.utils import print_answers

# Change `minimum` to `medium` or `all` to raise the level of detail
print_answers(prediction, details="minimum")
```
Expand Down
3 changes: 3 additions & 0 deletions tutorials/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Avoid checking in Python files by mistake
# in the tutorials folder.
*.py
Loading

0 comments on commit fd6fc0d

Please sign in to comment.