Skip to content

Commit

Permalink
New front matter (deepset-ai#40)
Browse files Browse the repository at this point in the history
* first version of frontmatter creation

* Generating new markdowns with new generate_markdowns script

* up to date levels and desciptions

* updated generate_markdowns to take a --notebooks argument instead

* finalized frontmatter and aliases

* removing 'open in colab' buttin as it's already done on hugo

* generated new markdowns with no colab button

* some minor updates and adding tomli to requirements

* attempting to fix markdowns workflow

* generate markdowns selectively

* switch to better action

* facepalm

* try with all

* fix id name

* updated dates and for loop

Co-authored-by: Massimiliano Pippi <[email protected]>
  • Loading branch information
TuanaCelik and masci authored Oct 20, 2022
1 parent 4adbb7c commit 9d3f5e5
Show file tree
Hide file tree
Showing 40 changed files with 453 additions and 420 deletions.
17 changes: 12 additions & 5 deletions .github/workflows/markdowns.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,17 @@ jobs:
python-version: '3.9'
cache: 'pip' # caching pip dependencies

- name: Get changed notebooks
id: changed-files
uses: tj-actions/changed-files@v32
with:
files: |
tutorials/*.ipynb
- name: Install Dependencies and Generate Markdown
run: |
pip install -r requirements.txt
python scripts/generate_markdowns.py
python scripts/generate_markdowns.py --index index.toml --output markdowns --notebooks ${{ steps.changed-files.outputs.all_changed_files }}
- name: Status
run: |
Expand All @@ -33,13 +40,13 @@ jobs:
echo "#"
echo "# CHECK FAILED! You need to update the static version of the tutorials."
echo "#"
echo "# Please run the tutorials documentation update script:"
echo "# Please run the tutorials markdown update script:"
echo "#"
echo "# python .github/utils/convert_notebooks_into_webpages.py"
echo "# python scripts/generate_markdowns.py --index index.toml --output markdowns --notebooks ..."
echo "#"
echo "# or see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md for help."
echo "# or see https://github.com/deepset-ai/haystack-tutorials/blob/main/CONTRIBUTING.md for help."
echo "#"
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack-tutorials/issues"
echo "#"
echo "##################################################################################################"
exit 1
Expand Down
148 changes: 148 additions & 0 deletions index.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
[config]
layout = "tutorial"
toc = true
colab = "https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/"

[[tutorial]]
title = "Build Your First QA System"
description = "Get Started by creating a Retriever Reader pipeline."
level = "beginner"
weight = 10
notebook = "01_Basic_QA_Pipeline.ipynb"
aliases = ["first-qa-system"]

[[tutorial]]
title = "Fine-Tuning a Model on Your Own Data"
description = "Improve the performance of your Reader by performing fine-tuning."
level = "intermediate"
weight = 50
notebook = "02_Finetune_a_model_on_your_data.ipynb"
aliases = ["fine-tuning-a-model"]

[[tutorial]]
title = "Build a QA System Without Elasticsearch"
description = "Create a Retriever Reader pipeline that requires no external database dependencies."
level = "beginner"
weight = 15
notebook = "03_Basic_QA_Pipeline_without_Elasticsearch.ipynb"
aliases = ["without-elasticsearch"]

[[tutorial]]
title = "Utilizing Existing FAQs for Question Answering"
description = "Create a smarter way to answer new questions using your existing FAQ documents."
level = "beginner"
weight = 20
notebook = "04_FAQ_style_QA.ipynb"
aliases = ["existing-faqs"]

[[tutorial]]
title = "Evaluation of a QA System"
description = "Learn how to evaluate the performance of individual nodes as well as entire pipelines."
level = "advanced"
weight = 100
notebook = "05_Evaluation.ipynb"
aliases = ["evaluation"]

[[tutorial]]
title = "Better Retrieval with Embedding Retrieval"
description = "Use Transformer based dense Retrievers to improve your system’s performance."
level = "intermediate"
weight = 55
notebook = "06_Better_Retrieval_via_Embedding_Retrieval.ipynb"
aliases = ["embedding-retrieval"]

[[tutorial]]
title = "Generative QA with Retrieval-Augmented Generation"
description = "Try out a generative model in place of the extractive Reader."
level = "intermediate"
weight = 60
notebook = "07_RAG_Generator.ipynb"
aliases = ["retrieval-augmented-generation"]

[[tutorial]]
title = "Preprocessing Your Documents"
description = "Start converting, cleaning, and splitting Documents using Haystack’s preprocessing capabilities."
level = "beginner"
weight = 25
notebook = "08_Preprocessing.ipynb"
aliases = ["preprocessing"]

[[tutorial]]
title = "Training Your Own Dense Passage Retrieval Model"
description = "Learn about training a Dense Passage Retrieval model and the data needed to do so."
level = "advanced"
weight = 110
notebook = "09_DPR_training.ipynb"
aliases = ["train-dpr"]

[[tutorial]]
title = "Question Answering on a Knowledge Graph"
description = "Experiment with a question answering system that draws upon knowledge graph.h"
level = "advanced"
weight = 120
notebook = "10_Knowledge_Graph.ipynb"
aliases = ["knowledge-graph"]

[[tutorial]]
title = "How to Use Pipelines"
description = "Learn about the many ways which you can route queries through the nodes in a pipeline."
level = "intermediate"
weight = 65
notebook = "11_Pipelines.ipynb"
aliases = ["pipelines"]

[[tutorial]]
title = "Generatice QA with LFQA"
description = "Try out a generative model in place of the extractive Reader."
level = "intermediate"
weight = 70
notebook = "12_LFQA.ipynb"
aliases = ["lfqa"]

[[tutorial]]
title = "Question Generation"
description = "Generate a set of questions that can be answered by a given Document."
level = "intermediate"
weight = 75
notebook = "13_Question_generation.ipynb"
aliases = ["question-generation"]

[[tutorial]]
title = "Query Classifier"
description = "Classify incoming queries so that they can be routed to the nodes that are best at handling them."
level = "intermediate"
weight = 80
notebook = "14_Query_Classifier.ipynb"
aliases = ["query-classifier"]

[[tutorial]]
title = "Open-Domain QA on Tables"
description = "Perform question answering on tabular data."
level = "advanced"
weight = 130
notebook = "15_TableQA.ipynb"
aliases = ["table-qa"]

[[tutorial]]
title = "Document Classification at Index Time"
description = "Generate and attach classification labels to your Documents when indexing."
level = "intermediate"
weight = 85
notebook = "16_Document_Classifier_at_Index_Time.ipynb"
aliases = ["doc-class-index"]

[[tutorial]]
title = "Make Your QA Pipelines Talk!"
description = "Convert text Answers into speech."
level = "intermediate"
weight = 90
notebook = "17_Audio.ipynb"
aliases = ["audio"]

[[tutorial]]
title = "Generative Pseudo Labeling for Domain Adaptation"
description = "Use a Retriever and a query generator to perform unsupervised domain adaptation."
level = "advanced"
weight = 140
notebook = "18_GPL.ipynb"
aliases = ["gpl"]
23 changes: 13 additions & 10 deletions markdowns/1.md → markdowns/01_Basic_QA_Pipeline.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
<!---
title: "Tutorial 1"
metaTitle: "Build Your First QA System"
metaDescription: ""
slug: "/docs/tutorial1"
date: "2020-09-03"
id: "tutorial1md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb
toc: True
title: "Build Your First QA System"
last_updated: 2022-10-12
level: "beginner"
weight: 10
description: Get Started by creating a Retriever Reader pipeline.
category: "QA"
aliases: ['/tutorials/first-qa-system']
---


# Build Your First QA System

<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/en/d/d8/Game_of_Thrones_title_card.jpg">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb)

Question Answering can be used in a variety of use cases. A very common one: Using it to navigate through complex knowledge bases or long documents ("search setting").

A "knowledge base" could for example be your website, an internal wiki or a collection of financial reports.
Expand Down
23 changes: 13 additions & 10 deletions markdowns/2.md → ...downs/02_Finetune_a_model_on_your_data.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
<!---
title: "Tutorial 2"
metaTitle: "Fine-tuning a model on your own data"
metaDescription: ""
slug: "/docs/tutorial2"
date: "2020-09-03"
id: "tutorial2md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb
toc: True
title: "Fine-Tuning a Model on Your Own Data"
last_updated: 2022-10-12
level: "intermediate"
weight: 50
description: Improve the performance of your Reader by performing fine-tuning.
category: "QA"
aliases: ['/tutorials/fine-tuning-a-model']
---


# Fine-tuning a Model on Your Own Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb)

For many use cases it is sufficient to just use one of the existing public models that were trained on SQuAD or other public QA datasets (e.g. Natural Questions).
However, if you have domain-specific questions, fine-tuning your model on custom examples will very likely boost your performance.
While this varies by domain, we saw that ~ 2000 examples can easily increase performance by +5-20%.
Expand Down
23 changes: 13 additions & 10 deletions markdowns/3.md → ...asic_QA_Pipeline_without_Elasticsearch.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
<!---
title: "Tutorial 3"
metaTitle: "Build a QA System Without Elasticsearch"
metaDescription: ""
slug: "/docs/tutorial3"
date: "2020-09-03"
id: "tutorial3md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb
toc: True
title: "Build a QA System Without Elasticsearch"
last_updated: 2022-10-12
level: "beginner"
weight: 15
description: Create a Retriever Reader pipeline that requires no external database dependencies.
category: "QA"
aliases: ['/tutorials/without-elasticsearch']
---


# Build a QA System Without Elasticsearch

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb)

Haystack provides alternatives to Elasticsearch for developing quick prototypes.

You can use an `InMemoryDocumentStore` or a `SQLDocumentStore`(with SQLite) as the document store.
Expand Down
23 changes: 13 additions & 10 deletions markdowns/4.md → markdowns/04_FAQ_style_QA.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
<!---
title: "Tutorial 4"
metaTitle: "Utilizing existing FAQs for Question Answering"
metaDescription: ""
slug: "/docs/tutorial4"
date: "2020-09-03"
id: "tutorial4md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/04_FAQ_style_QA.ipynb
toc: True
title: "Utilizing Existing FAQs for Question Answering"
last_updated: 2022-10-12
level: "beginner"
weight: 20
description: Create a smarter way to answer new questions using your existing FAQ documents.
category: "QA"
aliases: ['/tutorials/existing-faqs']
---


# Utilizing existing FAQs for Question Answering

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/04_FAQ_style_QA.ipynb)

While *extractive Question Answering* works on pure texts and is therefore more generalizable, there's also a common alternative that utilizes existing FAQ data.

**Pros**:
Expand Down
23 changes: 13 additions & 10 deletions markdowns/5.md → markdowns/05_Evaluation.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
<!---
title: "Tutorial 5"
metaTitle: "Evaluation of a QA System"
metaDescription: ""
slug: "/docs/tutorial5"
date: "2020-09-03"
id: "tutorial5md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb
toc: True
title: "Evaluation of a QA System"
last_updated: 2022-10-12
level: "advanced"
weight: 100
description: Learn how to evaluate the performance of individual nodes as well as entire pipelines.
category: "QA"
aliases: ['/tutorials/evaluation']
---


# Evaluation of a Pipeline and its Components

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb)

To be able to make a statement about the quality of results a question-answering pipeline or any other pipeline in haystack produces, it is important to evaluate it. Furthermore, evaluation allows determining which components of the pipeline can be improved.
The results of the evaluation can be saved as CSV files, which contain all the information to calculate additional metrics later on or inspect individual predictions.

Expand Down
23 changes: 13 additions & 10 deletions markdowns/6.md → ...tter_Retrieval_via_Embedding_Retrieval.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
<!---
title: "Tutorial 6"
metaTitle: "Better retrieval via Dense Passage Retrieval"
metaDescription: ""
slug: "/docs/tutorial6"
date: "2020-09-03"
id: "tutorial6md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb
toc: True
title: "Better Retrieval with Embedding Retrieval"
last_updated: 2022-10-12
level: "intermediate"
weight: 55
description: Use Transformer based dense Retrievers to improve your system’s performance.
category: "QA"
aliases: ['/tutorials/embedding-retrieval']
---


# Better Retrieval via "Embedding Retrieval"

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb)

### Importance of Retrievers

The Retriever has a huge impact on the performance of our overall search pipeline.
Expand Down
22 changes: 13 additions & 9 deletions markdowns/7.md → markdowns/07_RAG_Generator.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,19 @@
<!---
title: "Tutorial 7"
metaTitle: "Generative QA with RAG"
metaDescription: ""
slug: "/docs/tutorial7"
date: "2020-11-12"
id: "tutorial7md"
--->
---
layout: tutorial
colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/07_RAG_Generator.ipynb
toc: True
title: "Generative QA with Retrieval-Augmented Generation"
last_updated: 2022-10-12
level: "intermediate"
weight: 60
description: Try out a generative model in place of the extractive Reader.
category: "QA"
aliases: ['/tutorials/retrieval-augmented-generation']
---


# Generative QA with "Retrieval-Augmented Generation"

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/07_RAG_Generator.ipynb)

While extractive QA highlights the span of text that answers a query,
generative QA can return a novel text answer that it has composed.
Expand Down
Loading

0 comments on commit 9d3f5e5

Please sign in to comment.