{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"DSPy7\n", "\n", "## DSPy: Compiling chains from `LangChain`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most powerful features in **DSPy** is optimizers. **DSPy optimizers** can take any LM system and tune the prompts (or the LM weights) to maximize any objective.\n", "\n", "Optimizers can improve the quality of your LM systems and make your code adaptive to new LMs or new data. This is meant to bring structure and modularity in place of hacky things like (i) manual prompt engineering, (ii) designing complex pipelines for generating synthetic data, (iii) or designing complex pipelines for finetuning." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Install the dependencies if needed.\n", "# %pip install -U dspy-ai\n", "# %pip install -U openai jinja2\n", "# %pip install -U langchain langchain-community langchain-openai langchain-core" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Typically, we use DSPy optimizers with DSPy modules. But here, we've worked with [Harrison Chase](https://twitter.com/hwchase17) to make sure DSPy can also optimize chains built with the `LangChain` library.\n", "\n", "This short tutorial demonstrates how this proof-of-concept feature works. _This will **not** give you the full power of DSPy or LangChain yet, but we will expand it if there's high demand._\n", "\n", "If we convert this into a fuller integration, all users stand to benefit. LangChain users will gain the ability to optimize any chain with any DSPy optimizer. DSPy users will gain the ability to _export_ any DSPy program into an LCEL that supports streaming and tracing, and other rich production-targeted features in LangChain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1) Setting Up\n", "\n", "First, let's import `dspy` and configure the default language model and retrieval model in it." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import dspy\n", "\n", "from dspy.evaluate.evaluate import Evaluate\n", "from dspy.teleprompt import BootstrapFewShotWithRandomSearch\n", "\n", "colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')\n", "\n", "dspy.configure(rm=colbertv2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's import `langchain` and the DSPy modules for interacting with LangChain runnables, namely, `LangChainPredict` and `LangChainModule`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from langchain_openai import OpenAI\n", "from langchain.globals import set_llm_cache\n", "from langchain.cache import SQLiteCache\n", "\n", "set_llm_cache(SQLiteCache(database_path=\"cache.db\"))\n", "\n", "llm = OpenAI(model_name=\"gpt-3.5-turbo-instruct\", temperature=0)\n", "retrieve = lambda x: dspy.Retrieve(k=5)(x[\"question\"]).passages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If it's useful, we can set up some caches so you can run this whole notebook in Google Colab without any API keys. Let us know." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2) Defining a chain as a `LangChain` expression\n", "\n", "For illustration, let's tackle the following task.\n", "\n", "**Task:** Build a RAG system for generating informative tweets.\n", "- **Input:** A factual **question**, which may be fairly complex.\n", "- **Output:** An engaging **tweet** that correctly answers the question from the retrieved info.\n", "\n", "Let's use LangChain's expression language (LCEL) to illustrate this. Any prompt here will do, we will optimize the final prompt with DSPy.\n", "\n", "Considering that, let's just keep it to the barebones: **Given {context}, answer the question {question} as a tweet.**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# From LangChain, import standard modules for prompting.\n", "from langchain_core.prompts import PromptTemplate\n", "from langchain_core.output_parsers import StrOutputParser\n", "from langchain_core.runnables import RunnablePassthrough\n", "\n", "# Just a simple prompt for this task. It's fine if it's complex too.\n", "prompt = PromptTemplate.from_template(\"Given {context}, answer the question `{question}` as a tweet.\")\n", "\n", "# This is how you'd normally build a chain with LCEL. This chain does retrieval then generation (RAG).\n", "vanilla_chain = RunnablePassthrough.assign(context=retrieve) | prompt | llm | StrOutputParser()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3) Converting the chain into a **DSPy module**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our goal is to optimize this prompt so we have a better tweet generator. DSPy optimizers can help, but they only work with DSPy modules!\n", "\n", "For this reason, we created two new modules in DSPy: `LangChainPredict` and `LangChainModule`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# From DSPy, import the modules that know how to interact with LangChain LCEL.\n", "from dspy.predict.langchain import LangChainPredict, LangChainModule\n", "\n", "# This is how to wrap it so it behaves like a DSPy program.\n", "# Just Replace every pattern like `prompt | llm` with `LangChainPredict(prompt, llm)`.\n", "zeroshot_chain = RunnablePassthrough.assign(context=retrieve) | LangChainPredict(prompt, llm) | StrOutputParser()\n", "zeroshot_chain = LangChainModule(zeroshot_chain) # then wrap the chain in a DSPy module." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4) Trying the module\n", "\n", "How good is our `LangChainModule` at this task? Well, we can ask it to generate a tweet for the following question." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Eddy Mazzoleni, Italian professional cyclist, was born in Bergamo, Italy on July 29, 1973. #cyclist #Italy #Bergamo'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "question = \"In what region was Eddy Mazzoleni born?\"\n", "\n", "zeroshot_chain.invoke({\"question\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ah that sounds about right! (It's technically not perfect: we asked for the _region_ not the city. We can do better below.)\n", "\n", "Inspecting questions and answers manually is very important to get a sense of your system. However, a good system designer always looks to iteratively **benchmark** their work to quantify progress!\n", "\n", "To do this, we need two things: the **metric** we want to maximize and a (tiny) **dataset** of examples for our system.\n", "\n", "Are there pre-defined metrics for good tweets? Should I label 100,000 tweets by hand? Probably not. We can easily do something reasonable, though, until you start getting data in production!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5) Evaluating the module\n", "\n", "To get started, we'll define our own simple metric and we'll borrow a bunch of questions from a QA dataset and use them here for tuning.\n", "\n", "**What makes a good tweet?** I don't know, but in the spirit of iterative development, let's start simple!\n", "\n", "Define a good tweet to be have three properties: it should be (1) factually correct, (2) based on real sources, and (3) engaging for people." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.\n", " table = cls._concat_blocks(blocks, axis=0)\n" ] }, { "data": { "text/plain": [ "(200, 50, 150)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We took the liberty to define this metric and load a few examples from a standard QA dataset.\n", "# Let's impore them from `tweet_metric.py` in the same directory that contains this notebook.\n", "from tweet_metric import metric, trainset, valset, devset\n", "\n", "# We loaded 200, 50, and 150 examples for training, validation (tuning), and development (evaluation), respectively.\n", "# You could load less (or more) and, chances are, the right DSPy optimizers will work well for many problems.\n", "len(trainset), len(valset), len(devset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is this the right metric or the most representative set of questions? Not necessarily. But they get us started in a way we can iterate on systematically!\n", "\n", "**Note:** Notice that our dataset doesn't actually include any tweets! It only has questions and answers. That's OK, our metric will take care of evaluating outputs in tweet form.\n", "\n", "Okay, let's evaluate the unoptimized \"zero-shot\" version of our chain, converted from our `LangChain` LCEL object." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 63.999999999999986 / 150 (42.7): 100%|██████████| 150/150 [00:02<00:00, 66.08it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 63.999999999999986 / 150 (42.7%)\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 questionanswergold_titlesoutputtweet_responsemetric
0Who was a producer who produced albums for both rock bands Juke Karten and Thirty Seconds to Mars?Brian Virtue{'Thirty Seconds to Mars', 'Levolution (album)'}Brian Virtue, who has worked with bands like Jane's Addiction and Velvet Revolver, produced albums for both Juke Kartel and Thirty Seconds to Mars, showcasing...Brian Virtue, who has worked with bands like Jane's Addiction and Velvet Revolver, produced albums for both Juke Kartel and Thirty Seconds to Mars, showcasing...1.0
1Are both the University of Chicago and Syracuse University public universities? no{'Syracuse University', 'University of Chicago'} No, only Syracuse University is a public university. The University of Chicago is a private research university. #Syracuse #University #Chicago #Public #Private No, only Syracuse University is a public university. The University of Chicago is a private research university. #Syracuse #University #Chicago #Public #Private0.3333333333333333
2In what region was Eddy Mazzoleni born?Lombardy, northern Italy{'Eddy Mazzoleni', 'Bergamo'} Eddy Mazzoleni, Italian professional cyclist, was born in Bergamo, Italy on July 29, 1973. #cyclist #Italy #Bergamo Eddy Mazzoleni, Italian professional cyclist, was born in Bergamo, Italy on July 29, 1973. #cyclist #Italy #Bergamo0.0
3Who edited the 1990 American romantic comedy film directed by Garry Marshall?Raja Raymond Gosnell{'Raja Gosnell', 'Pretty Woman'} J. F. Lawton edited the 1990 American romantic comedy film directed by Garry Marshall. #PrettyWoman #GarryMarshall #JFLawton J. F. Lawton edited the 1990 American romantic comedy film directed by Garry Marshall. #PrettyWoman #GarryMarshall #JFLawton0.0
4Burrs Country Park railway station is what stop on the railway line that runs between Heywood and Rawtenstallseventh{'East Lancashire Railway', 'Burrs Country Park railway station'} Burrs Country Park railway station is the seventh stop on the East Lancashire Railway line that runs between Heywood and Rawtenstall. Burrs Country Park railway station is the seventh stop on the East Lancashire Railway line that runs between Heywood and Rawtenstall.1.0
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " ... 145 more rows not displayed ...\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "42.67" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluate = Evaluate(metric=metric, devset=devset, num_threads=8, display_progress=True, display_table=5)\n", "evaluate(zeroshot_chain)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, cool. Our `zeroshot_chain` gets about **43%** on the 150 questions from the devset.\n", "\n", "The table above shows some examples. For instance:\n", "\n", "- **Question**: Who was a producer who produced albums for both rock bands Juke Karten and Thirty Seconds to Mars?\t\n", "- **Tweet**: Brian Virtue, who has worked with bands like Jane's Addiction and Velvet Revolver, produced albums for both Juke Kartel and Thirty Seconds to Mars, showcasing... [truncated]\n", "- **Metric**: 1.0 (A tweet that is correct, faithful, and engaging!*)\n", "\n", "footnote: * At least according to our metric, which is just a DSPy program, so _it too_ can be optimized if you'd like! Topic for another notebook, though." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6) Optimizing the module\n", "\n", "DSPy has many optimizers, but the de-facto default one currently is `BootstrapFewShotWithRandomSearch`.\n", "\n", "**If you're curious how it works:** This optimizer works by running your program (in this case, `zeroshot_chain`) on `trainset` questions. Each time it runs, DSPy will remember the input and output of each LM call. These are called traces, and this particular optimizer will keep track of \"good\" traces (i.e., ones that the metric likes). Then, this optimizer will try to find good ways to leverage these traces as automatic few-shot examples. It will try them out, seeking to maximize the average metric on `valset`. There are many ways to self-generate (bootstrap) examples. There are also many ways to optimize their selection (here, with random search). That's why there are several other optimizers in DSPy." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Going to sample between 1 and 3 traces per predictor.\n", "Will attempt to train 3 candidate sets.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 22.333333333333336 / 50 (44.7): 100%|██████████| 50/50 [00:00<00:00, 55.47it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 22.333333333333336 / 50 (44.7%)\n", "Score: 44.67 for set: [0]\n", "New best score: 44.67 for seed -3\n", "Scores so far: [44.67]\n", "Best score: 44.67\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 22.333333333333336 / 50 (44.7): 100%|██████████| 50/50 [00:00<00:00, 166.70it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 22.333333333333336 / 50 (44.7%)\n", "Score: 44.67 for set: [16]\n", "Scores so far: [44.67, 44.67]\n", "Best score: 44.67\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 2%|▎ | 5/200 [00:00<00:07, 26.88it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Bootstrapped 3 full traces after 6 examples in round 0.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 27.000000000000004 / 50 (54.0): 100%|██████████| 50/50 [00:00<00:00, 72.21it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 27.000000000000004 / 50 (54.0%)\n", "Score: 54.0 for set: [16]\n", "New best score: 54.0 for seed -1\n", "Scores so far: [44.67, 44.67, 54.0]\n", "Best score: 54.0\n", "Average of max per entry across top 1 scores: 0.54\n", "Average of max per entry across top 2 scores: 0.5933333333333334\n", "Average of max per entry across top 3 scores: 0.5933333333333334\n", "Average of max per entry across top 5 scores: 0.5933333333333334\n", "Average of max per entry across top 8 scores: 0.5933333333333334\n", "Average of max per entry across top 9999 scores: 0.5933333333333334\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 4%|▍ | 9/200 [00:00<00:06, 28.04it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Bootstrapped 2 full traces after 10 examples in round 0.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 25.000000000000007 / 50 (50.0): 100%|██████████| 50/50 [00:00<00:00, 70.71it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 25.000000000000007 / 50 (50.0%)\n", "Score: 50.0 for set: [16]\n", "Scores so far: [44.67, 44.67, 54.0, 50.0]\n", "Best score: 54.0\n", "Average of max per entry across top 1 scores: 0.54\n", "Average of max per entry across top 2 scores: 0.5933333333333334\n", "Average of max per entry across top 3 scores: 0.6066666666666667\n", "Average of max per entry across top 5 scores: 0.6066666666666667\n", "Average of max per entry across top 8 scores: 0.6066666666666667\n", "Average of max per entry across top 9999 scores: 0.6066666666666667\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 1/200 [00:00<00:07, 28.24it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Bootstrapped 1 full traces after 2 examples in round 0.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 25.666666666666664 / 50 (51.3): 100%|██████████| 50/50 [00:00<00:00, 75.37it/s]\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 25.666666666666664 / 50 (51.3%)\n", "Score: 51.33 for set: [16]\n", "Scores so far: [44.67, 44.67, 54.0, 50.0, 51.33]\n", "Best score: 54.0\n", "Average of max per entry across top 1 scores: 0.54\n", "Average of max per entry across top 2 scores: 0.5800000000000001\n", "Average of max per entry across top 3 scores: 0.6133333333333334\n", "Average of max per entry across top 5 scores: 0.6266666666666667\n", "Average of max per entry across top 8 scores: 0.6266666666666667\n", "Average of max per entry across top 9999 scores: 0.6266666666666667\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 1%| | 2/200 [00:00<00:07, 27.81it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Bootstrapped 1 full traces after 3 examples in round 0.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 26.0 / 50 (52.0): 100%|██████████| 50/50 [00:00<00:00, 73.67it/s] \n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 26.0 / 50 (52.0%)\n", "Score: 52.0 for set: [16]\n", "Scores so far: [44.67, 44.67, 54.0, 50.0, 51.33, 52.0]\n", "Best score: 54.0\n", "Average of max per entry across top 1 scores: 0.54\n", "Average of max per entry across top 2 scores: 0.5733333333333335\n", "Average of max per entry across top 3 scores: 0.6133333333333334\n", "Average of max per entry across top 5 scores: 0.64\n", "Average of max per entry across top 8 scores: 0.64\n", "Average of max per entry across top 9999 scores: 0.64\n", "6 candidate programs found.\n" ] } ], "source": [ "# Set up the optimizer. We'll use very minimal hyperparameters for this example.\n", "# Just do random search with ~3 attempts, and in each attempt, bootstrap <= 3 traces.\n", "optimizer = BootstrapFewShotWithRandomSearch(metric=metric, max_bootstrapped_demos=3, num_candidate_programs=3)\n", "\n", "# Now use the optimizer to *compile* the chain. This could take 5-10 minutes, unless it's cached.\n", "optimized_chain = optimizer.compile(zeroshot_chain, trainset=trainset, valset=valset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7) Evaluating the optimized chain\n", "\n", "Well, how good is this? _Not every optimization run will magically result in improvement on unseen examples!_ So let's check!\n", "\n", "First let's ask that question from above." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' Eddy Mazzoleni was born in Bergamo, a city in the Lombardy region of Italy. #EddyMazzoleni #Italy #Lombardy'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "question = \"In what region was Eddy Mazzoleni born?\"\n", "\n", "optimized_chain.invoke({\"question\": question})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice, anecdotally, it appears a bit more precise than the answer with `zeroshot_chain`. But now let's do some proper evals!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Average Metric: 78.66666666666667 / 150 (52.4): 100%|██████████| 150/150 [00:02<00:00, 72.64it/s] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Average Metric: 78.66666666666667 / 150 (52.4%)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "/scr-ssd/okhattab/miniconda3/envs/py39_jan2024_01/lib/python3.9/site-packages/dspy/evaluate/evaluate.py:126: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " df = df.applymap(truncate_cell)\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 questionanswergold_titlesoutputtweet_responsemetric
0Who was a producer who produced albums for both rock bands Juke Karten and Thirty Seconds to Mars?Brian Virtue{'Thirty Seconds to Mars', 'Levolution (album)'}Brian Virtue is a producer who has worked with both Juke Kartel and Thirty Seconds to Mars, helping to create their unique sounds. #BrianVirtue #producer...Brian Virtue is a producer who has worked with both Juke Kartel and Thirty Seconds to Mars, helping to create their unique sounds. #BrianVirtue #producer...1.0
1Are both the University of Chicago and Syracuse University public universities? no{'Syracuse University', 'University of Chicago'} Yes, both Northeastern Illinois University and Syracuse University are public universities. #publicuniversity #Chicago #Syracuse Yes, both Northeastern Illinois University and Syracuse University are public universities. #publicuniversity #Chicago #Syracuse0.0
2In what region was Eddy Mazzoleni born?Lombardy, northern Italy{'Eddy Mazzoleni', 'Bergamo'} Eddy Mazzoleni was born in Bergamo, a city in the Lombardy region of Italy. #EddyMazzoleni #Italy #Lombardy Eddy Mazzoleni was born in Bergamo, a city in the Lombardy region of Italy. #EddyMazzoleni #Italy #Lombardy1.0
3Who edited the 1990 American romantic comedy film directed by Garry Marshall?Raja Raymond Gosnell{'Raja Gosnell', 'Pretty Woman'} Garry Marshall directed and edited the 1990 American romantic comedy film \"Pretty Woman\", starring Richard Gere and Julia Roberts. #PrettyWoman #GarryMarshall #RomanticComedy Garry Marshall directed and edited the 1990 American romantic comedy film \"Pretty Woman\", starring Richard Gere and Julia Roberts. #PrettyWoman #GarryMarshall #RomanticComedy0.0
4Burrs Country Park railway station is what stop on the railway line that runs between Heywood and Rawtenstallseventh{'East Lancashire Railway', 'Burrs Country Park railway station'} Burrs Country Park railway station is the seventh stop on the East Lancashire Railway line, which runs between Heywood and Rawtenstall. #EastLancashireRailway #BurrsCountryPark #railwaystation Burrs Country Park railway station is the seventh stop on the East Lancashire Railway line, which runs between Heywood and Rawtenstall. #EastLancashireRailway #BurrsCountryPark #railwaystation1.0
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", " ... 145 more rows not displayed ...\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "52.44" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluate(optimized_chain)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We started with `zeroshot_chain` at **43%** and now we have **52%**. That's a nice **21%** relative improvement. Not bad!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8) Inspecting the optimized chain in action" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROMPT:\n", "\n", " Essential Instructions: Respond to the provided question based on the given context in the style of a tweet, which typically requires a concise and engaging answer within the character limit of a tweet (280 characters).\n", "\n", "---\n", "\n", "Follow the following format.\n", "\n", "Context: ${context}\n", "Question: ${question}\n", "Tweet Response: ${tweet_response}\n", "\n", "---\n", "\n", "Context:\n", "[1] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie \"Stealth Hunters\". Kita's first recurring television role was in Fox's \"Masked Rider\", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, \"Waydown\" in 1995. In 1996, Kita also appeared in the film \"Barb Wire\" (1996) and guest starred on \"The Wayans Bros.\". She also guest starred in \"Miriam Teitelbaum: Homicide\" with \"Saturday Night Live\" alumni Nora Dunn, \"Wall To Wall Records\" with Jordan Bridges, \"Even Stevens\", \"Felicity\" with Keri Russell, \"V.I.P.\" with Pamela Anderson, \"Girlfriends\", \"The Sweet Spot\" with Bill Murray, and \"Movies at Our House\". She also had recurring roles on the FX spoof, \"Son of the Beach\" from 2001 to 2002, ABC-Family's \"Dance Fever\" and Oxygen Network's \"Running with Scissors\". Kita also appeared in the films \"Little Heroes\" (2002) and \"Rennie's Landing\" (2001).»\n", "[2] «Jilly Kitzinger | Jilly Kitzinger is a fictional character in the science fiction series \"Torchwood\", portrayed by American actress Lauren Ambrose. The character was promoted as one of five new main characters to join \"Torchwood\" in its fourth series, \"\" (2011), as part of a new co-production between \"Torchwood\"' s British network, BBC One, and its American financiers on US premium television network Starz. Ambrose appears in seven of the ten episodes, and is credited as a \"special guest star\" throughout. Whilst reaction to the serial was mixed, Ambrose' portrayal was often singled out by critics for particular praise and in 2012 she received a Saturn Award nomination for Best Supporting Actress on Television.»\n", "[3] «Candace Brown | Candace June Brown (born June 15, 1980) is an American actress and comedian best known for her work on shows such as \"Grey's Anatomy\", \"Desperate Housewives\", \"Head Case\", The \"Wizards Of Waverly Place\". In 2011, she joined the guest cast for \"Torchwood\"' s fourth series' \"\", airing on BBC One in the United Kingdom and premium television network Starz.»\n", "[4] «Candace Elaine | Candace Elaine is a Canadian actress who has become a naturalized American citizen. Born 1972 in Edmonton, Alberta, Canada, Elaine is an accomplished dancer, fashionista, and stage and film actor. She most recently appeared opposite Stone Cold Steve Austin, Michael Shanks, and Michael Jai White in the action feature \"Tactical Force\", playing the role of Ilya Kalashnikova.»\n", "[5] «Amy Steel | Amy Steel (born Alice Amy Steel; May 3, 1960) is an American film and television actress. She is best known for her roles as Ginny Field in \"Friday the 13th Part 2\" (1981) and Kit Graham in \"April Fool's Day\" (1986). She has starred in films such as \"Exposed\" (1983), \"Walk Like a Man\" (1987), \"What Ever Happened to Baby Jane? \" (1991), and \"Tales of Poe\" (2014). Steel has had numerous guest appearances on several television series, such as \"Family Ties\" (1983), \"The A-Team\" (1983), \"Quantum Leap\" (1990), and \"China Beach\" (1991), as well as a starring role in \"The Powers of Matthew Star\" (1982–83).»\n", "Question: which American actor was Candace Kita guest starred with\n", "Tweet Response: Candace Kita has guest starred with many American actors, including Nora Dunn, Jordan Bridges, Keri Russell, Pamela Anderson, and Bill Murray. #CandaceKita #gueststar #Americanactors\n", "\n", "---\n", "\n", "Context:\n", "[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»\n", "[2] «Victorian (comics) | The Victorian is a 25-issue comic book series published by Penny-Farthing Press and starting in 1999. The brainchild of creator Trainor Houghton, the series included a number of notable script writers and illustrators, including Len Wein, Glen Orbik and Howard Chaykin.»\n", "[3] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»\n", "[4] «Victorian People | Victorian People: A Reassessment of Persons and Themes, 1851-1867 is a book by the historian Asa Briggs originally published in 1955. It is part of a trilogy that also incorporates \"Victorian Cities\" and \"Victorian Things\".»\n", "[5] «The Caxtons | The Caxtons: A Family Picture is an 1849 Victorian novel by Edward Bulwer-Lytton that was popular in its time.»\n", "Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?\n", "Tweet Response: The Victorians - Their Story In Pictures is a 2009 British documentary series written and presented by Jeremy Paxman, who was born in 1950. #Victorian #documentary #JeremyPaxman\n", "\n", "---\n", "\n", "Context:\n", "[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. \"Tae Kwon Do Times\" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»\n", "[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»\n", "[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th \"dan\" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including \"Fight to Win\", \"Best of the Best\", \"Bloodsport II\", and \"Bloodsport III\". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both \"Black Belt\" magazine's Hall of Fame and \"Tae Kwon Do Times\" magazine's Hall of Fame.»\n", "[4] «West Coast Magazine | West Coast Magazine (1987–1998). was a three times a year Scottish literary publication consisting of poetry, short fiction, articles, essays and reviews. Founding editors were Gordon Giles, Kenny MacKenzie and Joe Murray. The proof issue appeared in October 1987 and contained some articles and poems that did not appear in official issues. West Coast Magazine (WCM) was initially funded by East Glasgow Gear Project and Glasgow City Council; ultimately funded by the Scottish Arts Council.»\n", "[5] «Southwest Art | Southwest Art is a magazine published by F+W that specializes in fine art depicting artwork of the American Southwest.»\n", "Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?\n", "Tweet Response: Tae Kwon Do Times has published articles by Scott Shaw, along with other notable authors in the martial arts world. #TaeKwonDo #MartialArts #Magazine\n", "\n", "---\n", "\n", "Context:\n", "[1] «Scott Lowell | Scott Lowell (born February 22, 1965 in Denver, Colorado) is an American actor best known for his role as Ted Schmidt on the Showtime drama \"Queer as Folk\".»\n", "[2] «Ted Schmidt | Theodore \"Ted\" Schmidt is a fictional character from the American Showtime television drama series \"Queer as Folk\", played by Scott Lowell. Fellow show cast member Peter Paige, who plays Emmett Honeycutt originally auditioned for the role. Lowell was cast and he stated that he had an instant connection with the character. \"Queer as Folk\" is based on the British show of the same name and Ted is loosely based on the character Phil Delaney, played by Jason Merrells. Phil was killed off in that series, whereas show creator Daniel Lipman decided to develop the character into a full-time role for the US version.»\n", "[3] «Chris Lowell | Christopher Lowell (born October 17, 1984) is an American television actor. He played the role of Stosh \"Piz\" Piznarski in the CW noir drama \"Veronica Mars\" and the character William \"Dell\" Parker in the ABC \"Grey's Anatomy\" spin-off \"Private Practice\".»\n", "[4] «Kevin Schmidt | Kevin Gerard Schmidt (born August 16, 1988) is an American actor, known best for his role as Henry in \"Cheaper by the Dozen\" and its sequel and as Noah Newman in \"The Young and the Restless\". Schmidt also starred on Cartoon Network's first live-action scripted television series, \"Unnatural History\". Schmidt also co-created, starred in, produced, and directed a cult web-series, \"Poor Paul\". Schmidt continues to write, direct, and act, and has also participated in humanitarian organizations. Schmidt is president of the Conscious Human Initiative, a non-profit entity that intends to alleviate malnutrition worldwide. He played Ryan in .»\n", "[5] «Frederick Koehler | Frederick Koehler (born June 16, 1975) is an American actor best known for his role as Chip Lowell on \"Kate & Allie\" as well as Andrew Schillinger on the HBO drama \"Oz\". He is distinguished for appearing much younger than his chronological age (e.g., appearing about 20 years old when he was actually 38).»\n", "Question: What show is an American-Canadian drama starring Scott Lowell playing Ted Schmidt?\n", "Tweet Response:\n", "\n", "\n", "OUTPUT:\n", "\n", " Prediction(\n", " tweet_response=' Scott Lowell played Ted Schmidt on the American-Canadian drama \"Queer as Folk\". #ScottLowell #TedSchmidt #QueerAsFolk'\n", ")\n" ] } ], "source": [ "prompt, output = dspy.settings.langchain_history[-4]\n", "\n", "print('PROMPT:\\n\\n', prompt)\n", "print('\\n\\nOUTPUT:\\n\\n', output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Acknowledgements:\n", "\n", "Thanks to [Harrison Chase](https://twitter.com/hwchase17) for co-leading this new integration. Thanks to our own [Arnav Singhvi](https://arnavsinghvi11.github.io/) for helping cook this tweet generation task and the insight about how to get data to use here." ] } ], "metadata": { "kernelspec": { "display_name": "py39_dec_2023", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 2 }