Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce testing #3

Merged
merged 4 commits into from
Sep 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Run Tutorials Nightly

on:
workflow_dispatch: # Activate this workflow manually
schedule:
- cron: '0 0 * * *'

jobs:
run-tutorials:
runs-on: ubuntu-latest
container: deepset/haystack:base-massi-docker

services:
elasticsearch:
image: elasticsearch:7.9.2
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"

strategy:
max-parallel: 2
matrix:
notebook:
# Note: use only the name of the file without the extension
- 01_Basic_QA_Pipeline

env:
ELASTICSEARCH_HOST: "elasticsearch"

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install jupyter
run: |
pip install nbconvert
- name: Convert notebook to Python
run: |
jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ./tutorials/${{ matrix.notebook }}.ipynb
- name: Run the converted notebook
run: |
python ./tutorials/${{ matrix.notebook }}.py
50 changes: 50 additions & 0 deletions .github/workflows/run_tutorials.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Run Tutorials

on:
workflow_dispatch: # Activate this workflow manually
pull_request:
paths:
- 'tutorials/*.ipynb'

jobs:
run-tutorials:
runs-on: ubuntu-latest
container: deepset/haystack:base-massi-docker

services:
elasticsearch:
image: elasticsearch:7.9.2
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"

env:
ELASTICSEARCH_HOST: "elasticsearch"

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Install jupyter
run: |
pip install nbconvert
- name: Files changed
uses: jitterbit/get-changed-files@v1
id: files
with:
format: space-delimited
token: ${{ secrets.GITHUB_TOKEN }}

- name: Convert notebooks to Python
run: |
for changed_file in ${{ steps.files.outputs.all }}; do
if [[ $changed_file == *".ipynb" ]]; then
echo $changed_file
jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ${changed_file}
fi
done
- name: Run the converted notebooks
run: |
find ./tutorials -name "*.py" -execdir python {} \;
91 changes: 58 additions & 33 deletions markdowns/1.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,23 @@ Make sure you enable the GPU runtime to experience decent speed in this tutorial

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/colab_gpu_runtime.jpg">

You can double check whether the GPU runtime is enabled with the following command:

```python
# Make sure you have a GPU running
!nvidia-smi

```bash
%%bash

nvidia-smi
```

To start, install the latest release of Haystack with `pip`:

```python
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```bash
%%bash

pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]
```

## Logging
Expand All @@ -61,12 +64,6 @@ logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logg
logging.getLogger("haystack").setLevel(logging.INFO)
```


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.nodes import FARMReader, TransformersReader
```

## Document Store

Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`, `SQLDocumentStore`, and `InMemoryDocumentStore`.
Expand All @@ -77,7 +74,7 @@ Haystack finds answers to queries within the documents stored in a `DocumentStor

**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.

### Start an Elasticsearch server
### Start an Elasticsearch server locally
You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.


Expand All @@ -88,30 +85,47 @@ from haystack.utils import launch_es
launch_es()
```

### Start an Elasticsearch server in Colab

```python
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2
If Docker is not readily available in your environment (e.g. in Colab notebooks), then you can manually download and execute Elasticsearch from source.

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1) # as daemon
)
# wait until ES has started
! sleep 30
```bash
%%bash

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
chown -R daemon:daemon elasticsearch-7.9.2
sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch -d
```


```bash
%%bash --bg

sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch
```

### Create the Document Store

The `ElasticsearchDocumentStore` class will try to open a connection in the constructor, here we wait 30 seconds only to be sure Elasticsearch is ready before continuing:


```python
# Connect to Elasticsearch
import time
time.sleep(30)
```

Finally, we create the Document Store instance:


```python
import os
from haystack.document_stores import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
# Get the host where Elasticsearch is running, default to localhost
host = os.environ.get("ELASTICSEARCH_HOST", "localhost")
document_store = ElasticsearchDocumentStore(host=host, username="", password="", index="document")
```

## Preprocessing of documents
Expand All @@ -126,6 +140,9 @@ In this tutorial, we download Wikipedia articles about Game of Thrones, apply a


```python
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http


# Let's first fetch some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "data/tutorial1"
Expand Down Expand Up @@ -205,6 +222,8 @@ With both you can either load a local model or one from Hugging Face's model hub


```python
from haystack.nodes import FARMReader

# Load a local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

Expand All @@ -213,9 +232,11 @@ reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=Tr

#### TransformersReader

Alternative:


```python
# Alternative:
from haystack.nodes import TransformersReader
# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)
```

Expand Down Expand Up @@ -250,9 +271,10 @@ prediction = pipe.run(
# prediction = pipe.run(query="Who is the sister of Sansa?", params={"Reader": {"top_k": 5}})
```

Now you can either print the object directly:


```python
# Now you can either print the object directly...
from pprint import pprint

pprint(prediction)
Expand All @@ -275,9 +297,12 @@ pprint(prediction)
# }
```

Or use a util to simplify the output:


```python
# ...or use a util to simplify the output
from haystack.utils import print_answers

# Change `minimum` to `medium` or `all` to raise the level of detail
print_answers(prediction, details="minimum")
```
Expand Down
3 changes: 3 additions & 0 deletions tutorials/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Avoid checking in Python files by mistake
# in the tutorials folder.
*.py
Loading