Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from main #10

Merged
merged 112 commits into from
Jun 26, 2024
Merged
Changes from 1 commit
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
9a8026c
feat(tests): add requirements-dev.txt
May 21, 2024
35dcaed
feat(tests): BaseIntegrationTestWithCache
May 21, 2024
a1c8491
feat(tests): ignore assert in ruff
May 21, 2024
6f1c120
feat(tests): TestIntroIntegration
May 21, 2024
b773796
feat(tests): TestIntroIntegration
May 21, 2024
32b5470
feat(tests): TestIntroIntegration
May 21, 2024
2e10d67
feat(tests): add integration test to CI.
May 21, 2024
6df93fc
feat(tests): add integration test to CI.
May 21, 2024
195cd03
feat(tests): add integration test to CI.
May 21, 2024
965fbf3
feat(tests): add integration test to CI.
May 21, 2024
ff075fe
feat(tests): add integration test to CI.
May 21, 2024
56b00ad
feat(tests): add integration test to CI.
May 21, 2024
742eb56
feat(tests): add pytest.ini for integration
May 25, 2024
598f831
feat(tests): add pytest.ini for integration
May 25, 2024
73ff317
feat(tests): change it to ci way -_-
May 25, 2024
f6a64e3
feat(tests): change it to ci way -_-
May 25, 2024
e593154
feat(tests): change it to ci way -_-
May 25, 2024
35c39ad
feat(tests): change it to ci way -_-
May 25, 2024
065f2d1
feat(tests): change it to ci way -_-
May 25, 2024
81851d1
feat(tests): change it to ci way -_-
May 25, 2024
f7fa698
feat(tests): change it to ci way -_-
May 25, 2024
3b7da8d
feat(tests): change it to ci way -_-
May 25, 2024
11c84bb
feat(tests): change it to ci way -_-
May 25, 2024
d96c0ce
feat(tests): change it to ci way -_-
May 25, 2024
b4f3391
feat(tests): change it to ci way -_-
May 25, 2024
0098e4e
feat(tests): cache file.
May 25, 2024
8f81e6c
feat(tests): cache file.
May 25, 2024
b2e51bb
feat(tests): cache file.
May 25, 2024
cac66f8
feat(tests): cache file.
May 25, 2024
a7c229e
feat(tests): cache file.
May 25, 2024
4ad4b9f
feat(tests): cache file.
May 25, 2024
29ec21f
feat(tests): add print for start file.
May 25, 2024
9de0989
feat(tests): fix versions for cache.
May 25, 2024
abdb933
feat(tests): fix versions for cache.
May 25, 2024
c5e1e28
feat(tests): fix cache in the ci.
May 25, 2024
87dd340
feat(tests): fix cache in the ci.
May 25, 2024
f14c91d
feat(tests): fix cache in the ci.
May 25, 2024
9811198
feat(tests): remove top import.
May 25, 2024
9c2652d
feat(tests): add new pytest-ci.ini
May 25, 2024
f5573c5
feat(tests): fix run_tests.yml
May 25, 2024
d6a36f9
feat(tests): fix run_tests.yml
May 25, 2024
5d994d4
feat(tests): fix run_tests.yml
May 25, 2024
a6f1395
feat(tests): remove pytest-ci.ini
May 25, 2024
5f31aa2
feat(tests): split requirements and requirements-dev
May 25, 2024
53fb2a5
feat(tests): revert cache
May 25, 2024
9a847f5
feat(tests): revert cache
May 25, 2024
a11e80e
feat(tests): revert pre-commit.
May 25, 2024
c1aa768
feat(tests): clean up.
May 25, 2024
f0bd17d
feat(tests): show ruff errors.
May 25, 2024
c1d49c0
feat(tests): revert requirements.
May 25, 2024
66309ee
feat(tests): fix ruff.
May 25, 2024
1be9a20
feat(tests): fix ruff.
May 25, 2024
01b671d
feat(tests): fix ruff.
May 25, 2024
26f2ffd
feat(tests): fix ruff.
May 25, 2024
b42e8bf
feat(tests): fix ruff.
May 25, 2024
c9e687e
feat(tests): cleanup.
May 25, 2024
1c65f3c
feat(tests): remove second ruff.
May 25, 2024
e06883c
feat(tests): arnav comments on intro integration
May 26, 2024
beca119
feat(tests): revert requirements.txt
May 27, 2024
481e0f1
feat(tests): remove base.py
May 27, 2024
38371f8
MIPRO optimizer updates for paper release
XenonMolecule Jun 18, 2024
8e945e7
Update test_mipro_optimizer.py
XenonMolecule Jun 18, 2024
4774de2
Ruff fixes
XenonMolecule Jun 18, 2024
ea1b328
Switch new MIPRO to MIPROv2
XenonMolecule Jun 18, 2024
34725d0
Update mipro_optimizer.py
XenonMolecule Jun 18, 2024
d1975d3
cr
jerryjliu Jun 18, 2024
0f85bc3
cr
jerryjliu Jun 18, 2024
07d8e1d
Update llamaindex.py
arnavsinghvi11 Jun 19, 2024
2ba0abb
Reduced comments/debugging statements from MIPRO dev
XenonMolecule Jun 19, 2024
95547f3
cr
jerryjliu Jun 20, 2024
70c6f21
feat(dspy): added support for managed identity in AzureOpenAI
utsavtulsyan Jun 20, 2024
00c7b65
fix(dspy): remove dummy csv file
utsavtulsyan Jun 20, 2024
c7e0116
Update README.md
okhat Jun 20, 2024
e6373de
avoid llama_index dependency breaking tests
arnavsinghvi11 Jun 21, 2024
01c8de0
Merge pull request #1170 from jerryjliu/jerry/add_llamaindex_integration
arnavsinghvi11 Jun 21, 2024
4059328
Added caching to notebooks
XenonMolecule Jun 21, 2024
f04d03c
Updated notebook docs
XenonMolecule Jun 21, 2024
1ee5479
Explained teacher settings better
XenonMolecule Jun 21, 2024
015c649
Merge pull request #1169 from stanfordnlp/mipro_v2
XenonMolecule Jun 21, 2024
de44089
Fixed notebook caches
XenonMolecule Jun 21, 2024
8909e06
Merge pull request #1185 from stanfordnlp/mipro_v2
XenonMolecule Jun 21, 2024
52ac406
Ensuring that candidate count for Gemini is always 1
Jun 21, 2024
bff6d45
Update pyproject.toml
okhat Jun 21, 2024
c65445f
Update setup.py
okhat Jun 21, 2024
e3ba07b
Fix linting issues with Ruff
Jun 22, 2024
eb6db37
Added trust_remote_code=True to hotpot_qa dataset to skip prompt for …
hmoazam Jun 22, 2024
46a3c73
update default valset
arnavsinghvi11 Jun 22, 2024
66f7f5b
Merge pull request #1194 from stanfordnlp/miprov2_patch
XenonMolecule Jun 22, 2024
d1a0d93
Merge pull request #1048 from stanfordnlp/integration-tests
okhat Jun 22, 2024
32b1060
Merge pull request #1192 from hmoazam/add_hf_param
hmoazam Jun 23, 2024
6efb0eb
Automation to release to pypi, including an intermediate deployment t…
hmoazam Jun 22, 2024
2c0305a
deleted unused file
hmoazam Jun 23, 2024
01e5066
Merge pull request #919 from hmoazam/automate-pypi-deployment
okhat Jun 23, 2024
42fb1b1
Bug - Dataset summary generation failing due to log file set to None
shubham-skr Jun 24, 2024
21d776f
Add initial template
erika-cardenas Jun 24, 2024
265a2f9
Merge pull request #1200 from shubham-skr/patch-1
XenonMolecule Jun 24, 2024
4f3e346
Update link to Weaviate recipes
erika-cardenas Jun 24, 2024
1a83af9
Add the documentation
erika-cardenas Jun 24, 2024
22bd7e9
Add link to Weaviate integration page
erika-cardenas Jun 24, 2024
d81402c
Update setup.py -- thanks Shukri!
CShorten Jun 24, 2024
0293d78
Fix Weaviate recipes link
erika-cardenas Jun 24, 2024
8d73d53
Merge pull request #1204 from erika-cardenas/add-weaviate-docs
CShorten Jun 24, 2024
d38cbf5
Update broken build for docs
krypticmouse Jun 24, 2024
14f1674
update links
krypticmouse Jun 24, 2024
7e1af2b
Use comments to mark location to replace instead of placeholders
hmoazam Jun 25, 2024
2874855
fix spaces
hmoazam Jun 25, 2024
3ade209
fix typo
hmoazam Jun 25, 2024
189f3e4
Updated to actual dspy-ai package post testing
hmoazam Jun 25, 2024
57a69b8
Merge pull request #1205 from hmoazam/fix/pkg-name-and-version
arnavsinghvi11 Jun 25, 2024
5ceb906
Update README.md
okhat Jun 25, 2024
39cda49
Merge pull request #1182 from utsavtulsyan/feature/azureopenai-manage…
arnavsinghvi11 Jun 25, 2024
c6219ca
Merge pull request #1188 from marshmellow77/fix-vertexai-gemini-candi…
arnavsinghvi11 Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat(tests): TestIntroIntegration
  • Loading branch information
Amir Mehr committed May 21, 2024
commit 6f1c1209dac538f71cb36a1fd06e501fb8e04e94
299 changes: 299 additions & 0 deletions tests_integration/test_intro.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
from typing import Any

from tests_integration.base import BaseIntegrationTestWithCache


class TestIntroIntegration(BaseIntegrationTestWithCache):
def test_dspy_workflow(self) -> None:
dspy = self.setup_dspy()

dev_example, dev_set, training_set = self.assert_dataset_loading()

self.assert_basic_qa(dev_example, dspy)

self.assert_retrieval(dev_example, dspy)

self.assert_compilation(dev_set, dspy, training_set)

def assert_compilation(self, devset, dspy, trainset) -> None:
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""

context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")

class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()

self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

def forward(self, question):
context = self.retrieve(question).passages
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)

from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context actually contains that answer.
def validate_context_and_answer(example, pred, trace=None): # noqa
answer_em = dspy.evaluate.answer_exact_match(example, pred)
answer_pm = dspy.evaluate.answer_passage_match(example, pred)
return answer_em and answer_pm

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
# Compile the RAG model
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

# Test the compiled RAG model with a question
my_question = "What castle did David Gregory inherit?"
pred = compiled_rag(my_question)

# Assertions to verify the compiled RAG model
assert f"Question: {my_question}" == "Question: What castle did David Gregory inherit?"
assert f"Predicted Answer: {pred.answer}" == "Predicted Answer: Kinnairdy Castle"
assert f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}" == (
"Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) "
"was a Scottish physician and inventor."
"His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', "
"'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: \"Gregorio "
'Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 '
"t...', "
"'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October "
"1708) was a Scottish mathematician and astronomer."
"He was professor of mathematics at the University ...']"
)

# Verify compiled model's parameters
for name, parameter in compiled_rag.named_predictors():
assert name is not None
assert parameter.demos[0] is not None

from dspy.evaluate.evaluate import Evaluate

# Set up the evaluation function
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)
# Evaluate the compiled RAG program with the exact match metric
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)

def gold_passages_retrieved(example, pred, trace=None): # noqa
gold_titles = set(map(dspy.evaluate.normalize_text, example["gold_titles"]))
found_titles = set(map(dspy.evaluate.normalize_text, [c.split(" | ")[0] for c in pred.context]))
return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

class GenerateSearchQuery(dspy.Signature):
"""Write a simple search query that will help answer a complex question."""

context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
query = dspy.OutputField()

from dsp.utils import deduplicate

class SimplifiedBaleen(dspy.Module):
def __init__(self, passages_per_hop=3, max_hops=2):
super().__init__()

self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
self.retrieve = dspy.Retrieve(k=passages_per_hop)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
self.max_hops = max_hops

def forward(self, question):
context = []

for hop in range(self.max_hops):
query = self.generate_query[hop](context=context, question=question).query
passages = self.retrieve(query).passages
context = deduplicate(context + passages)

pred = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=pred.answer)

# Test the SimplifiedBaleen model with a question
my_question = "How many storeys are in the castle that David Gregory inherited?"
uncompiled_baleen = SimplifiedBaleen()
pred = uncompiled_baleen(my_question)

# Assertions to verify the SimplifiedBaleen model
assert (
f"Question: {my_question}" == "Question: How many storeys are in the castle that David Gregory inherited?"
)
assert f"Predicted Answer: {pred.answer}" == "Predicted Answer: five"
assert f"Retrieved Contexts (truncated): {[c[:20] + '...' for c in pred.context]}" == (
"Retrieved Contexts (truncated): ['David Gregory (physi...', 'The Boleyn Inheritan...', 'Gregory of Gaeta "
"| G...',"
"'Kinnairdy Castle | K...', 'Kinnaird Head | Kinn...', 'Kinnaird Castle, Bre...']"
)

def validate_context_and_answer_and_hops(example, pred, trace=None):
if not dspy.evaluate.answer_exact_match(example, pred):
return False
if not dspy.evaluate.answer_passage_match(example, pred):
return False

hops = [example.question] + [outputs.query for *_, outputs in trace if "query" in outputs]

if max([len(h) for h in hops]) > 100:
return False
if any(
dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))
):
return False

return True

teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)
compiled_baleen = teleprompter.compile(
SimplifiedBaleen(),
teacher=SimplifiedBaleen(passages_per_hop=2),
trainset=trainset,
)
uncompiled_baleen_retrieval_score = evaluate_on_hotpotqa(uncompiled_baleen, metric=gold_passages_retrieved)
compiled_baleen_retrieval_score = evaluate_on_hotpotqa(compiled_baleen, metric=gold_passages_retrieved)

# Assertions for the retrieval scores
assert f"## Retrieval Score for RAG: {compiled_rag_retrieval_score}" == "## Retrieval Score for RAG: 26.0"
assert (
f"## Retrieval Score for uncompiled Baleen: {uncompiled_baleen_retrieval_score}"
== "## Retrieval Score for uncompiled Baleen: 36.0"
)
assert (
f"## Retrieval Score for compiled Baleen: {compiled_baleen_retrieval_score}"
== "## Retrieval Score for compiled Baleen: 60.0"
)
assert compiled_baleen("How many storeys are in the castle that David Gregory inherited?") is not None

def assert_retrieval(self, dev_example, dspy) -> None:
retrieve = dspy.Retrieve(k=3)
top_k_passages = retrieve(dev_example.question).passages

# Assertions to verify the retrieval functionality
assert retrieve.k == 3
assert (
dev_example.question
== "What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?"
)
assert (
top_k_passages[0]
== "Restaurant: Impossible | Restaurant: Impossible is an American reality television series, featuring "
"chef and restaurateur Robert Irvine, that aired on Food Network from 2011 to 2016."
)
assert (
top_k_passages[1]
== "Jean Joho | Jean Joho is a French-American chef and restaurateur. He is chef/proprietor of Everest in "
"Chicago (founded in 1986), Paris Club Bistro & Bar and Studio Paris in Chicago, The Eiffel Tower "
"Restaurant in Las Vegas, and Brasserie JO in Boston."
)
assert top_k_passages[2] == (
"List of Restaurant: Impossible episodes | This is the list of the episodes for the American cooking and "
'reality television series "Restaurant Impossible", '
"produced by Food Network. The premise of the series is that within two days and on a budget of $10,000, "
"celebrity chef Robert Irvine renovates a failing American restaurant with the goal of helping to restore "
"it to profitability and prominence."
"Irvine is assisted by a designer (usually Taniya Nayak, Cheryl Torrenueva, or Lynn Keagan, but sometimes "
"Vanessa De Leon, Krista Watterworth, Yvette Irene, or Nicole Faccuito), along with general contractor "
"Tom Bury, who sometimes does double duty as both general contractor and designer."
"After assessing the problems with the restaurant, Robert Irvine typically creates a plan for the new "
"decor, oversees the cleaning of the restaurant, reduces the size of the menu and improves the food, "
"develops a promotional activity, educates the restaurant's owners, or trains the staff, as needed by "
"each restaurant."
)

retrieved_value = retrieve("When was the first FIFA World Cup held?").passages[0]
assert retrieved_value == (
"History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules "
"Rimet decided to stage an international football tournament."
"The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited "
"by the organization."
"Since then, the World Cup has experienced successive expansions and format remodeling to its current "
"32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around "
"the world."
)

def assert_basic_qa(self, dev_example, dspy) -> None:
class BasicQA(dspy.Signature):
"""Answer questions with short factoid answers."""

question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")

# Define the predictor
generate_answer = dspy.Predict(BasicQA)
# Call the predictor on a particular input
pred = generate_answer(question=dev_example.question)

# Assertions to verify the basic QA functionality
assert (
f"Question: {dev_example.question}"
== "Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?"
)
assert f"Predicted Answer: {pred.answer}" == "Predicted Answer: American"

# Define the predictor with chain of thought
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)
# Call the predictor on the same input
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# Assertions to verify the chain of thought functionality
assert (
f"Question: {dev_example.question}"
== "Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?"
)
assert (
f"Thought: {pred.rationale.split('.', 1)[1].strip()}"
== "Thought: We know that the chef and restaurateur featured in Restaurant: Impossible is Robert Irvine."
)
assert f"Predicted Answer: {pred.answer}" == "Predicted Answer: British"

def assert_dataset_loading(self) -> None:
from dspy.datasets import HotPotQA

# Load the dataset
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Prepare the datasets for training and development
trainset = [x.with_inputs("question") for x in dataset.train]
devset = [x.with_inputs("question") for x in dataset.dev]
train_example = trainset[0]

# Assertions to verify the dataset loading
assert (
f"Question: {train_example.question}"
== "Question: At My Window was released by which American singer-songwriter?"
)
assert f"Answer: {train_example.answer}" == "Answer: John Townes Van Zandt"
dev_example = devset[18]
assert (
f"Question: {dev_example.question}"
== "Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?"
)
assert f"Answer: {dev_example.answer}" == "Answer: English"
assert "Restaurant: Impossible" in list(dev_example.gold_titles)
assert "Robert Irvine" in list(dev_example.gold_titles)
assert (
f"For this dataset, training examples have input keys {train_example.inputs().keys()} and label keys "
f"{train_example.labels().keys()}"
== "For this dataset, training examples have input keys ['question'] and label keys ['answer']"
)
assert (
f"For this dataset, dev examples have input keys {dev_example.inputs().keys()} and label keys "
f"{dev_example.labels().keys()}"
== "For this dataset, dev examples have input keys ['question'] and label keys ['answer', 'gold_titles']"
)
return dev_example, devset, trainset

def setup_dspy(self) -> Any:
import dspy

turbo = dspy.OpenAI(model="gpt-3.5-turbo")
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url="http:https://20.102.90.50:2017/wiki17_abstracts")
dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)
return dspy