New front matter (deepset-ai#40)

* first version of frontmatter creation * Generating new markdowns with new generate_markdowns script * up to date levels and desciptions * updated generate_markdowns to take a --notebooks argument instead * finalized frontmatter and aliases * removing 'open in colab' buttin as it's already done on hugo * generated new markdowns with no colab button * some minor updates and adding tomli to requirements * attempting to fix markdowns workflow * generate markdowns selectively * switch to better action * facepalm * try with all * fix id name * updated dates and for loop Co-authored-by: Massimiliano Pippi <[email protected]>
HaveF · Oct 20, 2022 · 9d3f5e5 · 9d3f5e5
1 parent 4adbb7c
commit 9d3f5e5
Show file tree

Hide file tree

Showing 40 changed files with 453 additions and 420 deletions.
diff --git a/.github/workflows/markdowns.yml b/.github/workflows/markdowns.yml
@@ -20,10 +20,17 @@ jobs:
           python-version: '3.9'
           cache: 'pip' # caching pip dependencies
 
+      - name: Get changed notebooks
+        id: changed-files
+        uses: tj-actions/changed-files@v32
+        with:
+          files: |
+            tutorials/*.ipynb
+
       - name: Install Dependencies and Generate Markdown
         run: |
           pip install -r requirements.txt
-          python scripts/generate_markdowns.py
+          python scripts/generate_markdowns.py --index index.toml --output markdowns --notebooks ${{ steps.changed-files.outputs.all_changed_files }}
 
       - name: Status
         run: |
@@ -33,13 +40,13 @@ jobs:
             echo "#"
             echo "# CHECK FAILED! You need to update the static version of the tutorials."
             echo "#"
-            echo "# Please run the tutorials documentation update script:"
+            echo "# Please run the tutorials markdown update script:"
             echo "#"
-            echo "#    python .github/utils/convert_notebooks_into_webpages.py"
+            echo "#    python scripts/generate_markdowns.py --index index.toml --output markdowns --notebooks ..."
             echo "#"
-            echo "# or see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md for help."
+            echo "# or see https://github.com/deepset-ai/haystack-tutorials/blob/main/CONTRIBUTING.md for help."
             echo "#"
-            echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack/issues"
+            echo "# If you have further problems, please open an issue: https://github.com/deepset-ai/haystack-tutorials/issues"
             echo "#"
             echo "##################################################################################################"
             exit 1

diff --git a/index.toml b/index.toml
@@ -0,0 +1,148 @@
+[config]
+layout = "tutorial"
+toc = true 
+colab = "https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/"
+
+[[tutorial]]
+title = "Build Your First QA System"
+description = "Get Started by creating a Retriever Reader pipeline."
+level = "beginner"
+weight = 10
+notebook = "01_Basic_QA_Pipeline.ipynb"
+aliases = ["first-qa-system"]
+
+[[tutorial]]
+title = "Fine-Tuning a Model on Your Own Data"
+description = "Improve the performance of your Reader by performing fine-tuning."
+level = "intermediate"
+weight = 50
+notebook = "02_Finetune_a_model_on_your_data.ipynb"
+aliases = ["fine-tuning-a-model"]
+
+[[tutorial]]
+title = "Build a QA System Without Elasticsearch"
+description = "Create a Retriever Reader pipeline that requires no external database dependencies."
+level = "beginner"
+weight = 15
+notebook = "03_Basic_QA_Pipeline_without_Elasticsearch.ipynb"
+aliases = ["without-elasticsearch"]
+
+[[tutorial]]
+title = "Utilizing Existing FAQs for Question Answering"
+description = "Create a smarter way to answer new questions using your existing FAQ documents."
+level = "beginner"
+weight = 20
+notebook = "04_FAQ_style_QA.ipynb"
+aliases = ["existing-faqs"]
+
+[[tutorial]]
+title = "Evaluation of a QA System"
+description = "Learn how to evaluate the performance of individual nodes as well as entire pipelines."
+level = "advanced"
+weight = 100
+notebook = "05_Evaluation.ipynb"
+aliases = ["evaluation"]
+
+[[tutorial]]
+title = "Better Retrieval with Embedding Retrieval"
+description = "Use Transformer based dense Retrievers to improve your system’s performance."
+level = "intermediate"
+weight = 55
+notebook = "06_Better_Retrieval_via_Embedding_Retrieval.ipynb"
+aliases = ["embedding-retrieval"]
+
+[[tutorial]]
+title = "Generative QA with Retrieval-Augmented Generation"
+description = "Try out a generative model in place of the extractive Reader."
+level = "intermediate"
+weight = 60
+notebook = "07_RAG_Generator.ipynb"
+aliases = ["retrieval-augmented-generation"]
+
+[[tutorial]]
+title = "Preprocessing Your Documents"
+description = "Start converting, cleaning, and splitting Documents using Haystack’s preprocessing capabilities."
+level = "beginner"
+weight = 25
+notebook = "08_Preprocessing.ipynb"
+aliases = ["preprocessing"]
+
+[[tutorial]]
+title = "Training Your Own Dense Passage Retrieval Model"
+description = "Learn about training a Dense Passage Retrieval model and the data needed to do so."
+level = "advanced"
+weight = 110
+notebook = "09_DPR_training.ipynb"
+aliases = ["train-dpr"]
+
+[[tutorial]]
+title = "Question Answering on a Knowledge Graph"
+description = "Experiment with a question answering system that draws upon knowledge graph.h"
+level = "advanced"
+weight = 120
+notebook = "10_Knowledge_Graph.ipynb"
+aliases = ["knowledge-graph"]
+
+[[tutorial]]
+title = "How to Use Pipelines"
+description = "Learn about the many ways which you can route queries through the nodes in a pipeline."
+level = "intermediate"
+weight = 65
+notebook = "11_Pipelines.ipynb"
+aliases = ["pipelines"]
+
+[[tutorial]]
+title = "Generatice QA with LFQA"
+description = "Try out a generative model in place of the extractive Reader."
+level = "intermediate"
+weight = 70
+notebook = "12_LFQA.ipynb"
+aliases = ["lfqa"]
+
+[[tutorial]]
+title = "Question Generation"
+description = "Generate a set of questions that can be answered by a given Document."
+level = "intermediate"
+weight = 75
+notebook = "13_Question_generation.ipynb"
+aliases = ["question-generation"]
+
+[[tutorial]]
+title = "Query Classifier"
+description = "Classify incoming queries so that they can be routed to the nodes that are best at handling them."
+level = "intermediate"
+weight = 80
+notebook = "14_Query_Classifier.ipynb"
+aliases = ["query-classifier"]
+
+[[tutorial]]
+title = "Open-Domain QA on Tables"
+description = "Perform question answering on tabular data."
+level = "advanced"
+weight = 130
+notebook = "15_TableQA.ipynb"
+aliases = ["table-qa"]
+
+[[tutorial]]
+title = "Document Classification at Index Time"
+description = "Generate and attach classification labels to your Documents when indexing."
+level = "intermediate"
+weight = 85
+notebook = "16_Document_Classifier_at_Index_Time.ipynb"
+aliases = ["doc-class-index"]
+
+[[tutorial]]
+title = "Make Your QA Pipelines Talk!"
+description = "Convert text Answers into speech."
+level = "intermediate"
+weight = 90
+notebook = "17_Audio.ipynb"
+aliases = ["audio"]
+
+[[tutorial]]
+title = "Generative Pseudo Labeling for Domain Adaptation"
+description = "Use a Retriever and a query generator to perform unsupervised domain adaptation."
+level = "advanced"
+weight = 140
+notebook = "18_GPL.ipynb"
+aliases = ["gpl"]
diff --git a/markdowns/1.md → markdowns/01_Basic_QA_Pipeline.md b/markdowns/1.md → markdowns/01_Basic_QA_Pipeline.md
@@ -1,18 +1,21 @@
-<!---
-title: "Tutorial 1"
-metaTitle: "Build Your First QA System"
-metaDescription: ""
-slug: "/docs/tutorial1"
-date: "2020-09-03"
-id: "tutorial1md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb
+toc: True
+title: "Build Your First QA System"
+last_updated: 2022-10-12
+level: "beginner"
+weight: 10
+description: Get Started by creating a Retriever Reader pipeline.
+category: "QA"
+aliases: ['/tutorials/first-qa-system']
+---
+
 
 # Build Your First QA System
 
 <img style="float: right;" src="https://upload.wikimedia.org/wikipedia/en/d/d8/Game_of_Thrones_title_card.jpg">
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/01_Basic_QA_Pipeline.ipynb)
-
 Question Answering can be used in a variety of use cases. A very common one:  Using it to navigate through complex knowledge bases or long documents ("search setting").
 
 A "knowledge base" could for example be your website, an internal wiki or a collection of financial reports. 

diff --git a/markdowns/2.md → ...downs/02_Finetune_a_model_on_your_data.md b/markdowns/2.md → ...downs/02_Finetune_a_model_on_your_data.md
@@ -1,16 +1,19 @@
-<!---
-title: "Tutorial 2"
-metaTitle: "Fine-tuning a model on your own data"
-metaDescription: ""
-slug: "/docs/tutorial2"
-date: "2020-09-03"
-id: "tutorial2md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb
+toc: True
+title: "Fine-Tuning a Model on Your Own Data"
+last_updated: 2022-10-12
+level: "intermediate"
+weight: 50
+description: Improve the performance of your Reader by performing fine-tuning.
+category: "QA"
+aliases: ['/tutorials/fine-tuning-a-model']
+---
+
 
 # Fine-tuning a Model on Your Own Data
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/02_Finetune_a_model_on_your_data.ipynb)
-
 For many use cases it is sufficient to just use one of the existing public models that were trained on SQuAD or other public QA datasets (e.g. Natural Questions).
 However, if you have domain-specific questions, fine-tuning your model on custom examples will very likely boost your performance.
 While this varies by domain, we saw that ~ 2000 examples can easily increase performance by +5-20%.

diff --git a/markdowns/3.md → ...asic_QA_Pipeline_without_Elasticsearch.md b/markdowns/3.md → ...asic_QA_Pipeline_without_Elasticsearch.md
@@ -1,16 +1,19 @@
-<!---
-title: "Tutorial 3"
-metaTitle: "Build a QA System Without Elasticsearch"
-metaDescription: ""
-slug: "/docs/tutorial3"
-date: "2020-09-03"
-id: "tutorial3md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb
+toc: True
+title: "Build a QA System Without Elasticsearch"
+last_updated: 2022-10-12
+level: "beginner"
+weight: 15
+description: Create a Retriever Reader pipeline that requires no external database dependencies.
+category: "QA"
+aliases: ['/tutorials/without-elasticsearch']
+---
+
 
 # Build a QA System Without Elasticsearch
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/03_Basic_QA_Pipeline_without_Elasticsearch.ipynb)
-
 Haystack provides alternatives to Elasticsearch for developing quick prototypes.
 
 You can use an `InMemoryDocumentStore` or a `SQLDocumentStore`(with SQLite) as the document store.

diff --git a/markdowns/4.md → markdowns/04_FAQ_style_QA.md b/markdowns/4.md → markdowns/04_FAQ_style_QA.md
@@ -1,16 +1,19 @@
-<!---
-title: "Tutorial 4"
-metaTitle: "Utilizing existing FAQs for Question Answering"
-metaDescription: ""
-slug: "/docs/tutorial4"
-date: "2020-09-03"
-id: "tutorial4md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/04_FAQ_style_QA.ipynb
+toc: True
+title: "Utilizing Existing FAQs for Question Answering"
+last_updated: 2022-10-12
+level: "beginner"
+weight: 20
+description: Create a smarter way to answer new questions using your existing FAQ documents.
+category: "QA"
+aliases: ['/tutorials/existing-faqs']
+---
+
 
 # Utilizing existing FAQs for Question Answering
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/04_FAQ_style_QA.ipynb)
-
 While *extractive Question Answering* works on pure texts and is therefore more generalizable, there's also a common alternative that utilizes existing FAQ data.
 
 **Pros**:

diff --git a/markdowns/5.md → markdowns/05_Evaluation.md b/markdowns/5.md → markdowns/05_Evaluation.md
@@ -1,16 +1,19 @@
-<!---
-title: "Tutorial 5"
-metaTitle: "Evaluation of a QA System"
-metaDescription: ""
-slug: "/docs/tutorial5"
-date: "2020-09-03"
-id: "tutorial5md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb
+toc: True
+title: "Evaluation of a QA System"
+last_updated: 2022-10-12
+level: "advanced"
+weight: 100
+description: Learn how to evaluate the performance of individual nodes as well as entire pipelines.
+category: "QA"
+aliases: ['/tutorials/evaluation']
+---
+
 
 # Evaluation of a Pipeline and its Components
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb)
-
 To be able to make a statement about the quality of results a question-answering pipeline or any other pipeline in haystack produces, it is important to evaluate it. Furthermore, evaluation allows determining which components of the pipeline can be improved.
 The results of the evaluation can be saved as CSV files, which contain all the information to calculate additional metrics later on or inspect individual predictions.
 

diff --git a/markdowns/6.md → ...tter_Retrieval_via_Embedding_Retrieval.md b/markdowns/6.md → ...tter_Retrieval_via_Embedding_Retrieval.md
@@ -1,16 +1,19 @@
-<!---
-title: "Tutorial 6"
-metaTitle: "Better retrieval via Dense Passage Retrieval"
-metaDescription: ""
-slug: "/docs/tutorial6"
-date: "2020-09-03"
-id: "tutorial6md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb
+toc: True
+title: "Better Retrieval with Embedding Retrieval"
+last_updated: 2022-10-12
+level: "intermediate"
+weight: 55
+description: Use Transformer based dense Retrievers to improve your system’s performance.
+category: "QA"
+aliases: ['/tutorials/embedding-retrieval']
+---
+
 
 # Better Retrieval via "Embedding Retrieval"
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/06_Better_Retrieval_via_Embedding_Retrieval.ipynb)
-
 ### Importance of Retrievers
 
 The Retriever has a huge impact on the performance of our overall search pipeline.

diff --git a/markdowns/7.md → markdowns/07_RAG_Generator.md b/markdowns/7.md → markdowns/07_RAG_Generator.md
@@ -1,15 +1,19 @@
-<!---
-title: "Tutorial 7"
-metaTitle: "Generative QA with RAG"
-metaDescription: ""
-slug: "/docs/tutorial7"
-date: "2020-11-12"
-id: "tutorial7md"
---->
+---
+layout: tutorial
+colab: https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/07_RAG_Generator.ipynb
+toc: True
+title: "Generative QA with Retrieval-Augmented Generation"
+last_updated: 2022-10-12
+level: "intermediate"
+weight: 60
+description: Try out a generative model in place of the extractive Reader.
+category: "QA"
+aliases: ['/tutorials/retrieval-augmented-generation']
+---
+
 
 # Generative QA with "Retrieval-Augmented Generation"
 
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/07_RAG_Generator.ipynb)
 
 While extractive QA highlights the span of text that answers a query,
 generative QA can return a novel text answer that it has composed.