-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from TuanaCelik/main
Adding 3 more notebooks
- Loading branch information
Showing
4 changed files
with
658 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [], | ||
"authorship_tag": "ABX9TyP6v9IBjf1C1AJvH3peU6XG" | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Use Gradient Models for Notion RAG\n", | ||
"\n", | ||
"In this Colab, we will:\n", | ||
"- Creating a custom Haystack component called `NotionExporter`\n", | ||
"- Building an indexing pipeline to write our Notion pages into an `InMemoryDocumentStore` with embeddings\n", | ||
"- Build a custom RAG pipeline to do question answering on our Notion pages" | ||
], | ||
"metadata": { | ||
"id": "_coq_qCuItbN" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "prmChq6T_T4l" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install haystack-ai gradient-haystack notion-haystack\n", | ||
"!pip install nest-asyncio\n", | ||
"\n", | ||
"import nest_asyncio\n", | ||
"\n", | ||
"nest_asyncio.apply()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import getpass\n", | ||
"import os\n", | ||
"\n", | ||
"notion_api_key = getpass.getpass(\"Enter Notion API key:\")\n", | ||
"gradient_access_token = getpass.getpass(\"Gradient Access token:\")" | ||
], | ||
"metadata": { | ||
"id": "JRogtDXaAMIF" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"### Test the NotionExporter\n", | ||
"\n", | ||
"- You can follow the steps outlined in the Notion [documentation](https://developers.notion.com/docs/create-a-notion-integration#create-your-integration-in-notion) to create a new Notion integration, connect it to your pages, and obtain your API token.\n", | ||
"- Page IDs in Notion are the tailing numbers at the end of the page URL, separated by a '-' at 8-4-4-4-12 digits" | ||
], | ||
"metadata": { | ||
"id": "aPXd4RjEKzBG" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"from notion_haystack import NotionExporter\n", | ||
"\n", | ||
"exporter = NotionExporter(api_token=notion_api_key)" | ||
], | ||
"metadata": { | ||
"id": "3_blVmFsAGSf" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"exporter.run(page_ids=[\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"])" | ||
], | ||
"metadata": { | ||
"id": "cK8VTkJlAfLs" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Build an Indexing Pipeline to Write Notion Pages to a Document Store\n", | ||
"\n", | ||
"- Documentation on [`GradientDocumentEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n", | ||
"- Documentation on [`DocumentSplitter`](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter)\n", | ||
"- Documentation on [`DocumentWriter`](https://docs.haystack.deepset.ai/v2.0/docs/documentwriter)" | ||
], | ||
"metadata": { | ||
"id": "KlMkLSVjJVoW" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"from haystack.components.preprocessors import DocumentSplitter\n", | ||
"from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder\n", | ||
"from haystack.components.writers import DocumentWriter\n", | ||
"from haystack.document_stores import InMemoryDocumentStore\n", | ||
"\n", | ||
"\n", | ||
"document_store = InMemoryDocumentStore()\n", | ||
"exporter = NotionExporter(api_token=notion_api_key)\n", | ||
"splitter = DocumentSplitter()\n", | ||
"document_embedder = GradientDocumentEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n", | ||
"writer = DocumentWriter(document_store=document_store)\n" | ||
], | ||
"metadata": { | ||
"id": "bZefXI2cBRME" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"from haystack import Pipeline\n", | ||
"\n", | ||
"indexing_pipeline = Pipeline()\n", | ||
"indexing_pipeline.add_component(instance=exporter, name=\"exporter\")\n", | ||
"indexing_pipeline.add_component(instance=splitter, name=\"splitter\")\n", | ||
"indexing_pipeline.add_component(instance=document_embedder, name=\"document_embedder\")\n", | ||
"indexing_pipeline.add_component(instance=writer, name=\"writer\")" | ||
], | ||
"metadata": { | ||
"id": "JZVzXyHdDxwg" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"indexing_pipeline.connect(\"exporter.documents\", \"splitter.documents\")\n", | ||
"indexing_pipeline.connect(\"splitter.documents\", \"document_embedder.documents\")\n", | ||
"indexing_pipeline.connect(\"document_embedder.documents\", \"writer.documents\")" | ||
], | ||
"metadata": { | ||
"id": "ht3U0JNJEFW6" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"indexing_pipeline.run(data={\"exporter\":{\"page_ids\": [\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"]}})" | ||
], | ||
"metadata": { | ||
"id": "mdn9N0XkEZBH" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Build a RAG Pipeline with Cohere\n", | ||
"\n", | ||
"- Documentation on [`GradientTextEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n", | ||
"- Documentation on [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder)\n", | ||
"- Documentation on [`GradientGenerator`](GradientTextEmbedder)" | ||
], | ||
"metadata": { | ||
"id": "A7iIQrtJJ1-6" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import torch\n", | ||
"from haystack.components.retrievers import InMemoryEmbeddingRetriever\n", | ||
"from haystack.components.builders import PromptBuilder\n", | ||
"from gradient_haystack.embedders.gradient_text_embedder import GradientTextEmbedder\n", | ||
"from gradient_haystack.generator.base import GradientGenerator\n", | ||
"\n", | ||
"prompt = \"\"\" Answer the query, based on the\n", | ||
"content in the documents.\n", | ||
"\n", | ||
"Documents:\n", | ||
"{% for doc in documents %}\n", | ||
" {{doc.content}}\n", | ||
"{% endfor %}\n", | ||
"\n", | ||
"Query: {{query}}\n", | ||
"\"\"\"\n", | ||
"text_embedder = GradientTextEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n", | ||
"retriever = InMemoryEmbeddingRetriever(document_store=document_store)\n", | ||
"prompt_builder = PromptBuilder(template=prompt)\n", | ||
"generator = GradientGenerator(access_token=gradient_access_token,\n", | ||
" workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\",\n", | ||
" model_adapter_id=\"905db818-d031-4378-bd67-ac9804cb0961_model_adapter\",\n", | ||
" max_generated_token_count=350)\n" | ||
], | ||
"metadata": { | ||
"id": "tbX35iA3_ImP" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"rag_pipeline = Pipeline()\n", | ||
"\n", | ||
"rag_pipeline.add_component(instance=text_embedder, name=\"text_embedder\")\n", | ||
"rag_pipeline.add_component(instance=retriever, name=\"retriever\")\n", | ||
"rag_pipeline.add_component(instance=prompt_builder, name=\"prompt_builder\")\n", | ||
"rag_pipeline.add_component(instance=generator, name=\"generator\")\n", | ||
"\n", | ||
"rag_pipeline.connect(\"text_embedder\", \"retriever\")\n", | ||
"rag_pipeline.connect(\"retriever.documents\", \"prompt_builder.documents\")\n", | ||
"rag_pipeline.connect(\"prompt_builder\", \"generator\")" | ||
], | ||
"metadata": { | ||
"id": "71PMDYsUBOY8" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"question = \"What are the steps for creating a custom component?\"\n", | ||
"result = rag_pipeline.run(data={\"text_embedder\":{\"text\": question},\n", | ||
" \"prompt_builder\":{\"query\": question}})" | ||
], | ||
"metadata": { | ||
"id": "zszY4bgzB2M0" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"print(result['generator']['replies'][0])" | ||
], | ||
"metadata": { | ||
"id": "6EbRQ3OPCDs4" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
} | ||
] | ||
} |
Oops, something went wrong.