Skip to content

Commit

Permalink
Merge pull request #2 from TuanaCelik/main
Browse files Browse the repository at this point in the history
Adding 3 more notebooks
  • Loading branch information
TuanaCelik committed Jan 2, 2024
2 parents fa64c59 + 87ccc3d commit aba14ad
Show file tree
Hide file tree
Showing 4 changed files with 658 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ A collection of example notebooks using Haystack 👇
| Name | Colab|
| ---- | ---- |
| Use Gemini Models with Vertex AI| <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/vertexai-gemini-examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>|

| Gradient AI Embedders and Generators for RAG | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/gradient-embeders-and-generators-for-notion-rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>|
| Hacker News RAG with Custom Component | <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/hackernews-custom-component-rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>|
| Cohere for Multilingual QA (Haystack 1.x)| <a href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/haystack-1.x/cohere-for-multilingual-qa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>|
## How to Contribute to this repository

If you have an example that uses Haystack, you can add it to this repository by creating a PR. You can also create a PR from Colab by creating a Fork of this repository and selecting "Save a Copy to GitHub". Once you add your example to your fork, you can create a PR onto this repository.
Expand Down
265 changes: 265 additions & 0 deletions gradient-embeders-and-generators-for-notion-rag.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyP6v9IBjf1C1AJvH3peU6XG"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Use Gradient Models for Notion RAG\n",
"\n",
"In this Colab, we will:\n",
"- Creating a custom Haystack component called `NotionExporter`\n",
"- Building an indexing pipeline to write our Notion pages into an `InMemoryDocumentStore` with embeddings\n",
"- Build a custom RAG pipeline to do question answering on our Notion pages"
],
"metadata": {
"id": "_coq_qCuItbN"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "prmChq6T_T4l"
},
"outputs": [],
"source": [
"!pip install haystack-ai gradient-haystack notion-haystack\n",
"!pip install nest-asyncio\n",
"\n",
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"source": [
"import getpass\n",
"import os\n",
"\n",
"notion_api_key = getpass.getpass(\"Enter Notion API key:\")\n",
"gradient_access_token = getpass.getpass(\"Gradient Access token:\")"
],
"metadata": {
"id": "JRogtDXaAMIF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Test the NotionExporter\n",
"\n",
"- You can follow the steps outlined in the Notion [documentation](https://developers.notion.com/docs/create-a-notion-integration#create-your-integration-in-notion) to create a new Notion integration, connect it to your pages, and obtain your API token.\n",
"- Page IDs in Notion are the tailing numbers at the end of the page URL, separated by a '-' at 8-4-4-4-12 digits"
],
"metadata": {
"id": "aPXd4RjEKzBG"
}
},
{
"cell_type": "code",
"source": [
"from notion_haystack import NotionExporter\n",
"\n",
"exporter = NotionExporter(api_token=notion_api_key)"
],
"metadata": {
"id": "3_blVmFsAGSf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"exporter.run(page_ids=[\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"])"
],
"metadata": {
"id": "cK8VTkJlAfLs"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Build an Indexing Pipeline to Write Notion Pages to a Document Store\n",
"\n",
"- Documentation on [`GradientDocumentEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n",
"- Documentation on [`DocumentSplitter`](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter)\n",
"- Documentation on [`DocumentWriter`](https://docs.haystack.deepset.ai/v2.0/docs/documentwriter)"
],
"metadata": {
"id": "KlMkLSVjJVoW"
}
},
{
"cell_type": "code",
"source": [
"from haystack.components.preprocessors import DocumentSplitter\n",
"from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder\n",
"from haystack.components.writers import DocumentWriter\n",
"from haystack.document_stores import InMemoryDocumentStore\n",
"\n",
"\n",
"document_store = InMemoryDocumentStore()\n",
"exporter = NotionExporter(api_token=notion_api_key)\n",
"splitter = DocumentSplitter()\n",
"document_embedder = GradientDocumentEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n",
"writer = DocumentWriter(document_store=document_store)\n"
],
"metadata": {
"id": "bZefXI2cBRME"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from haystack import Pipeline\n",
"\n",
"indexing_pipeline = Pipeline()\n",
"indexing_pipeline.add_component(instance=exporter, name=\"exporter\")\n",
"indexing_pipeline.add_component(instance=splitter, name=\"splitter\")\n",
"indexing_pipeline.add_component(instance=document_embedder, name=\"document_embedder\")\n",
"indexing_pipeline.add_component(instance=writer, name=\"writer\")"
],
"metadata": {
"id": "JZVzXyHdDxwg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"indexing_pipeline.connect(\"exporter.documents\", \"splitter.documents\")\n",
"indexing_pipeline.connect(\"splitter.documents\", \"document_embedder.documents\")\n",
"indexing_pipeline.connect(\"document_embedder.documents\", \"writer.documents\")"
],
"metadata": {
"id": "ht3U0JNJEFW6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"indexing_pipeline.run(data={\"exporter\":{\"page_ids\": [\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"]}})"
],
"metadata": {
"id": "mdn9N0XkEZBH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Build a RAG Pipeline with Cohere\n",
"\n",
"- Documentation on [`GradientTextEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n",
"- Documentation on [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder)\n",
"- Documentation on [`GradientGenerator`](GradientTextEmbedder)"
],
"metadata": {
"id": "A7iIQrtJJ1-6"
}
},
{
"cell_type": "code",
"source": [
"import torch\n",
"from haystack.components.retrievers import InMemoryEmbeddingRetriever\n",
"from haystack.components.builders import PromptBuilder\n",
"from gradient_haystack.embedders.gradient_text_embedder import GradientTextEmbedder\n",
"from gradient_haystack.generator.base import GradientGenerator\n",
"\n",
"prompt = \"\"\" Answer the query, based on the\n",
"content in the documents.\n",
"\n",
"Documents:\n",
"{% for doc in documents %}\n",
" {{doc.content}}\n",
"{% endfor %}\n",
"\n",
"Query: {{query}}\n",
"\"\"\"\n",
"text_embedder = GradientTextEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n",
"retriever = InMemoryEmbeddingRetriever(document_store=document_store)\n",
"prompt_builder = PromptBuilder(template=prompt)\n",
"generator = GradientGenerator(access_token=gradient_access_token,\n",
" workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\",\n",
" model_adapter_id=\"905db818-d031-4378-bd67-ac9804cb0961_model_adapter\",\n",
" max_generated_token_count=350)\n"
],
"metadata": {
"id": "tbX35iA3_ImP"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"rag_pipeline = Pipeline()\n",
"\n",
"rag_pipeline.add_component(instance=text_embedder, name=\"text_embedder\")\n",
"rag_pipeline.add_component(instance=retriever, name=\"retriever\")\n",
"rag_pipeline.add_component(instance=prompt_builder, name=\"prompt_builder\")\n",
"rag_pipeline.add_component(instance=generator, name=\"generator\")\n",
"\n",
"rag_pipeline.connect(\"text_embedder\", \"retriever\")\n",
"rag_pipeline.connect(\"retriever.documents\", \"prompt_builder.documents\")\n",
"rag_pipeline.connect(\"prompt_builder\", \"generator\")"
],
"metadata": {
"id": "71PMDYsUBOY8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"question = \"What are the steps for creating a custom component?\"\n",
"result = rag_pipeline.run(data={\"text_embedder\":{\"text\": question},\n",
" \"prompt_builder\":{\"query\": question}})"
],
"metadata": {
"id": "zszY4bgzB2M0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(result['generator']['replies'][0])"
],
"metadata": {
"id": "6EbRQ3OPCDs4"
},
"execution_count": null,
"outputs": []
}
]
}
Loading

0 comments on commit aba14ad

Please sign in to comment.