Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding 3 more notebooks #2

Merged
merged 4 commits into from
Jan 2, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Created using Colaboratory
  • Loading branch information
TuanaCelik committed Jan 2, 2024
commit 13e351eef0b791f783600e71d46ba32ea82700ac
265 changes: 265 additions & 0 deletions gradient-embeders-and-generators-for-notion-rag.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyP6v9IBjf1C1AJvH3peU6XG"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Use Gradient Models for Notion RAG\n",
"\n",
"In this Colab, we will:\n",
"- Creating a custom Haystack component called `NotionExporter`\n",
"- Building an indexing pipeline to write our Notion pages into an `InMemoryDocumentStore` with embeddings\n",
"- Build a custom RAG pipeline to do question answering on our Notion pages"
],
"metadata": {
"id": "_coq_qCuItbN"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "prmChq6T_T4l"
},
"outputs": [],
"source": [
"!pip install haystack-ai gradient-haystack notion-haystack\n",
"!pip install nest-asyncio\n",
"\n",
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"source": [
"import getpass\n",
"import os\n",
"\n",
"notion_api_key = getpass.getpass(\"Enter Notion API key:\")\n",
"gradient_access_token = getpass.getpass(\"Gradient Access token:\")"
],
"metadata": {
"id": "JRogtDXaAMIF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Test the NotionExporter\n",
"\n",
"- You can follow the steps outlined in the Notion [documentation](https://developers.notion.com/docs/create-a-notion-integration#create-your-integration-in-notion) to create a new Notion integration, connect it to your pages, and obtain your API token.\n",
"- Page IDs in Notion are the tailing numbers at the end of the page URL, separated by a '-' at 8-4-4-4-12 digits"
],
"metadata": {
"id": "aPXd4RjEKzBG"
}
},
{
"cell_type": "code",
"source": [
"from notion_haystack import NotionExporter\n",
"\n",
"exporter = NotionExporter(api_token=notion_api_key)"
],
"metadata": {
"id": "3_blVmFsAGSf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"exporter.run(page_ids=[\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"])"
],
"metadata": {
"id": "cK8VTkJlAfLs"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Build an Indexing Pipeline to Write Notion Pages to a Document Store\n",
"\n",
"- Documentation on [`GradientDocumentEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n",
"- Documentation on [`DocumentSplitter`](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter)\n",
"- Documentation on [`DocumentWriter`](https://docs.haystack.deepset.ai/v2.0/docs/documentwriter)"
],
"metadata": {
"id": "KlMkLSVjJVoW"
}
},
{
"cell_type": "code",
"source": [
"from haystack.components.preprocessors import DocumentSplitter\n",
"from gradient_haystack.embedders.gradient_document_embedder import GradientDocumentEmbedder\n",
"from haystack.components.writers import DocumentWriter\n",
"from haystack.document_stores import InMemoryDocumentStore\n",
"\n",
"\n",
"document_store = InMemoryDocumentStore()\n",
"exporter = NotionExporter(api_token=notion_api_key)\n",
"splitter = DocumentSplitter()\n",
"document_embedder = GradientDocumentEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n",
"writer = DocumentWriter(document_store=document_store)\n"
],
"metadata": {
"id": "bZefXI2cBRME"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from haystack import Pipeline\n",
"\n",
"indexing_pipeline = Pipeline()\n",
"indexing_pipeline.add_component(instance=exporter, name=\"exporter\")\n",
"indexing_pipeline.add_component(instance=splitter, name=\"splitter\")\n",
"indexing_pipeline.add_component(instance=document_embedder, name=\"document_embedder\")\n",
"indexing_pipeline.add_component(instance=writer, name=\"writer\")"
],
"metadata": {
"id": "JZVzXyHdDxwg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"indexing_pipeline.connect(\"exporter.documents\", \"splitter.documents\")\n",
"indexing_pipeline.connect(\"splitter.documents\", \"document_embedder.documents\")\n",
"indexing_pipeline.connect(\"document_embedder.documents\", \"writer.documents\")"
],
"metadata": {
"id": "ht3U0JNJEFW6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"indexing_pipeline.run(data={\"exporter\":{\"page_ids\": [\"6f98e9a6-a880-40e9-b191-1c4f41efec87\"]}})"
],
"metadata": {
"id": "mdn9N0XkEZBH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Build a RAG Pipeline with Cohere\n",
"\n",
"- Documentation on [`GradientTextEmbedder`](https://haystack.deepset.ai/integrations/gradient#usage)\n",
"- Documentation on [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder)\n",
"- Documentation on [`GradientGenerator`](GradientTextEmbedder)"
],
"metadata": {
"id": "A7iIQrtJJ1-6"
}
},
{
"cell_type": "code",
"source": [
"import torch\n",
"from haystack.components.retrievers import InMemoryEmbeddingRetriever\n",
"from haystack.components.builders import PromptBuilder\n",
"from gradient_haystack.embedders.gradient_text_embedder import GradientTextEmbedder\n",
"from gradient_haystack.generator.base import GradientGenerator\n",
"\n",
"prompt = \"\"\" Answer the query, based on the\n",
"content in the documents.\n",
"\n",
"Documents:\n",
"{% for doc in documents %}\n",
" {{doc.content}}\n",
"{% endfor %}\n",
"\n",
"Query: {{query}}\n",
"\"\"\"\n",
"text_embedder = GradientTextEmbedder(access_token=gradient_access_token, workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\")\n",
"retriever = InMemoryEmbeddingRetriever(document_store=document_store)\n",
"prompt_builder = PromptBuilder(template=prompt)\n",
"generator = GradientGenerator(access_token=gradient_access_token,\n",
" workspace_id=\"9ee7071c-2fa9-4155-8edd-94ed371f1750_workspace\",\n",
" model_adapter_id=\"905db818-d031-4378-bd67-ac9804cb0961_model_adapter\",\n",
" max_generated_token_count=350)\n"
],
"metadata": {
"id": "tbX35iA3_ImP"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"rag_pipeline = Pipeline()\n",
"\n",
"rag_pipeline.add_component(instance=text_embedder, name=\"text_embedder\")\n",
"rag_pipeline.add_component(instance=retriever, name=\"retriever\")\n",
"rag_pipeline.add_component(instance=prompt_builder, name=\"prompt_builder\")\n",
"rag_pipeline.add_component(instance=generator, name=\"generator\")\n",
"\n",
"rag_pipeline.connect(\"text_embedder\", \"retriever\")\n",
"rag_pipeline.connect(\"retriever.documents\", \"prompt_builder.documents\")\n",
"rag_pipeline.connect(\"prompt_builder\", \"generator\")"
],
"metadata": {
"id": "71PMDYsUBOY8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"question = \"What are the steps for creating a custom component?\"\n",
"result = rag_pipeline.run(data={\"text_embedder\":{\"text\": question},\n",
" \"prompt_builder\":{\"query\": question}})"
],
"metadata": {
"id": "zszY4bgzB2M0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(result['generator']['replies'][0])"
],
"metadata": {
"id": "6EbRQ3OPCDs4"
},
"execution_count": null,
"outputs": []
}
]
}