Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colbert local mode support both as retriever and reranker. #797

Merged
merged 32 commits into from
Jun 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
9632e5e
return metadata changes
Athe-kunal Apr 4, 2024
e415f39
Merge branch 'main' of https://github.com/Athe-kunal/dspy
Athe-kunal Apr 4, 2024
a4b3844
add metadata changes
Athe-kunal Apr 4, 2024
321a768
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 5, 2024
6cd1d56
add support for returning metadata and reranking
Athe-kunal Apr 6, 2024
eeafacb
colbert integration
Athe-kunal Apr 8, 2024
1639bd2
colbert local modifications
Athe-kunal Apr 8, 2024
ec062b6
kwargs filtered ids
Athe-kunal Apr 8, 2024
987d923
colbert return
Athe-kunal Apr 8, 2024
9ff5b28
colbert retriever and reranker
Athe-kunal Apr 9, 2024
825a272
colbert retriever error fixes
Athe-kunal Apr 9, 2024
c25e9c4
colbert config changes in __init__
Athe-kunal Apr 10, 2024
ab5b12e
colbert notebook
Athe-kunal Apr 10, 2024
63dd534
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 10, 2024
f6a9293
import errors for colbert
Athe-kunal Apr 10, 2024
197a2c2
improt dspy fixes and linting fixes
Athe-kunal Apr 10, 2024
4698b00
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 13, 2024
81d142f
PR fixes for colbert
Athe-kunal Apr 13, 2024
b73753c
making the linting gods happy
Athe-kunal Apr 13, 2024
0ec1ded
remove unnecessary outputs
Athe-kunal Apr 14, 2024
567d5c4
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 17, 2024
685df2a
colbertv2 docs
Athe-kunal Apr 17, 2024
fa2bc20
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 19, 2024
509b36c
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 20, 2024
34328fd
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 22, 2024
146ec7b
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 26, 2024
f0437e3
Merge branch 'stanfordnlp:main' into main
Athe-kunal Apr 29, 2024
9cb522b
Colbert PR fixes
Athe-kunal Apr 29, 2024
ec4b9b3
linting fixes
Athe-kunal Apr 29, 2024
326ce01
more linting fixes
Athe-kunal Apr 29, 2024
b5913fc
fixing previous cache breaks with separate funcs
Athe-kunal Jun 8, 2024
c60fadc
Merge branch 'main' into main
arnavsinghvi11 Jun 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
colbert integration
  • Loading branch information
Athe-kunal committed Apr 8, 2024
commit eeafacb27ecf7d9f052b96bd3105c0fcd42f3292
2 changes: 1 addition & 1 deletion dsp/modules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from .cache_utils import *
from .clarifai import *
from .cohere import *
from .colbertv2 import ColBERTv2
from .colbertv2 import ColBERTv2, ColBERTv2Local
from .databricks import *
from .google import *
from .gpt3 import *
Expand Down
41 changes: 39 additions & 2 deletions dsp/modules/colbertv2.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
import functools
from typing import Any, Optional, Union
from typing import Any, Optional, Union, List

import requests

import colbert
from colbert import Indexer, Searcher
from colbert.infra import Run, RunConfig, ColBERTConfig
from colbert.data import Queries, Collection
from dsp.modules.cache_utils import CacheMemory, NotebookCacheMemory
from dsp.utils import dotdict
import os

# TODO: Ideally, this takes the name of the index and looks up its port.

Expand Down Expand Up @@ -74,3 +78,36 @@ def colbertv2_post_request_v2_wrapped(*args, **kwargs):


colbertv2_post_request = colbertv2_post_request_v2_wrapped
os.environ['COLBERT_LOAD_TORCH_EXTENSION_VERBOSE'] = "True"
Athe-kunal marked this conversation as resolved.
Show resolved Hide resolved

class ColBERTv2Local:
def __init__(self,checkpoint:str='colbert-ir/colbertv2.0'):

self.checkpoint = checkpoint
Athe-kunal marked this conversation as resolved.
Show resolved Hide resolved


def build_index(self,passages:List[str],nranks:int=1,index_name_or_path:str = "Colbert-RM-",nbits:int=2,DOC_MAXLEN:int=300,INDEX_BSIZE:int=256,KMEANS_ITER:int=8,experiment_name:str="Colbert-Experiment"):

with Run().context(RunConfig(nranks=nranks, experiment=experiment_name)):
config = ColBERTConfig(doc_maxlen=DOC_MAXLEN, nbits=nbits, kmeans_niters=KMEANS_ITER,index_bsize=INDEX_BSIZE)


indexer = Indexer(checkpoint=self.checkpoint, config=config)
indexer.index(name=index_name_or_path, collection=passages, overwrite=True)

def get_index(self,index_name_or_path:str = "Colbert-RM-",experiment_name:str="Colbert-Experiment",passages:List[str] = []):
with Run().context(RunConfig(experiment=experiment_name)):
searcher = Searcher(index=index_name_or_path, collection=passages)
self.searcher = searcher
return searcher

def get_docs(self,searcher:Searcher,query:str,k:int=7):

results = searcher.search(
query,
#Number of passages to receive
k=k)
#Passing the filter function of relevant
# filter_fn=lambda pids: torch.tensor(
# [pid for pid in pids if pid in relevant_ids],dtype=torch.int32).to(device))
return results
1 change: 1 addition & 0 deletions dspy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
Databricks = dsp.Databricks
Cohere = dsp.Cohere
ColBERTv2 = dsp.ColBERTv2
ColBERTv2Local = dsp.ColBERTv2Local
Pyserini = dsp.PyseriniRetriever
Clarifai = dsp.ClarifaiLLM
Google = dsp.Google
Expand Down
143 changes: 133 additions & 10 deletions rm_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,141 @@
# print(retriever(["Software Internet"],by_prob=False,where={"table_name":"capexIndia"}))
# print("-"*100)
# print(retriever(["Software Internet","Packaging"],by_prob=False,where={"table_name":"capexIndia"}))
import dspy
# import dspy

# colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http:https://20.102.90.50:2017/wiki17_abstracts')
# dspy.settings.configure(rm=colbertv2_wiki17_abstracts,reranker=colbertv2_wiki17_abstracts)

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http:https://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(rm=colbertv2_wiki17_abstracts,reranker=colbertv2_wiki17_abstracts)
# #Define Retrieve Module
# retriever = dspy.RetrieveThenRerank(k=3)

#Define Retrieve Module
retriever = dspy.RetrieveThenRerank(k=3)
# query='When was the first FIFA World Cup held?'

query='When was the first FIFA World Cup held?'
# # Call the retriever on a particular query.
# topK_passages = retriever([query])

# Call the retriever on a particular query.
topK_passages = retriever([query])
# for idx, passage in enumerate(topK_passages):
# print(f'{idx+1}]', passage, '\n')

import os
import dspy
os.environ['COLBERT_LOAD_TORCH_EXTENSION_VERBOSE'] = "True"
if __name__ == "__main__":
passages = [
"The quick brown fox jumps over the lazy dog.",
"She sells seashells by the seashore.",
"I am the master of my fate, I am the captain of my soul.",
"To be or not to be, that is the question.",
"All's fair in love and war.",
"A journey of a thousand miles begins with a single step.",
"Two wrongs don't make a right.",
"The pen is mightier than the sword.",
"Actions speak louder than words.",
"Beauty is in the eye of the beholder.",
"Practice makes perfect.",
"Where there's a will, there's a way.",
"When in Rome, do as the Romans do.",
"The early bird catches the worm.",
"You can't judge a book by its cover.",
"A picture is worth a thousand words.",
"Honesty is the best policy.",
"Don't count your chickens before they hatch.",
"Every cloud has a silver lining.",
"If at first you don't succeed, try, try again.",
"Look before you leap.",
"Rome wasn't built in a day.",
"The grass is always greener on the other side.",
"Absence makes the heart grow fonder.",
"Actions speak louder than words.",
"Ask and you shall receive.",
"Better late than never.",
"Don't bite the hand that feeds you.",
"Don't put all your eggs in one basket.",
"Easy come, easy go.",
"Every dog has its day.",
"Good things come to those who wait.",
"It's a piece of cake.",
"It's raining cats and dogs.",
"Kill two birds with one stone.",
"Let sleeping dogs lie.",
"Like father, like son.",
"Make hay while the sun shines.",
"Necessity is the mother of invention.",
"Out of sight, out of mind.",
"Patience is a virtue.",
"Practice what you preach.",
"The best things in life are free.",
"The squeaky wheel gets the grease.",
"There's no place like home.",
"Too many cooks spoil the broth.",
"When the going gets tough, the tough get going.",
"You reap what you sow.",
"A watched pot never boils.",
"Actions speak louder than words.",
"An apple a day keeps the doctor away.",
"Beggars can't be choosers.",
"Curiosity killed the cat.",
"Don't cry over spilled milk.",
"Don't put off until tomorrow what you can do today.",
"Every cloud has a silver lining.",
"Fortune favors the bold.",
"If the shoe fits, wear it.",
"It takes two to tango.",
"Keep your friends close and your enemies closer.",
"Let bygones be bygones.",
"No pain, no gain.",
"Once bitten, twice shy.",
"Practice makes perfect.",
"The apple doesn't fall far from the tree.",
"The early bird catches the worm.",
"The grass is always greener on the other side.",
"The more, the merrier.",
"There's no such thing as a free lunch.",
"To kill two birds with one stone.",
"When in Rome, do as the Romans do.",
"You can't have your cake and eat it too.",
"You can't make an omelet without breaking eggs.",
"A friend in need is a friend indeed.",
"A penny saved is a penny earned.",
"Actions speak louder than words.",
"Beauty is in the eye of the beholder.",
"Better late than never.",
"Don't count your chickens before they hatch.",
"Don't put all your eggs in one basket.",
"Every cloud has a silver lining.",
"If at first you don't succeed, try, try again.",
"If you can't beat them, join them.",
"Necessity is the mother of invention.",
"One man's trash is another man's treasure.",
"Practice makes perfect.",
"The early bird catches the worm.",
"The grass is always greener on the other side.",
"There's no place like home.",
"Too many cooks spoil the broth.",
"When in Rome, do as the Romans do.",
"You can't judge a book by its cover.",
"You reap what you sow.",
"A bird in the hand is worth two in the bush.",
"A penny for your thoughts.",
"Actions speak louder than words.",
"All good things must come to an end.",
"Beauty is only skin deep.",
"Don't bite the hand that feeds you.",
"Don't put off until tomorrow what you can do today.",
"Every dog has its day.",
"Fortune favors the bold.",
"If you want something done right, do it yourself.",
"It's better to be safe than sorry.",
"Make hay while the sun shines.",
"Necessity is the mother of invention.",
"Out of sight, out of mind.",
"Practice what you preach.",
"The best things in life are free.",
"The early bird catches the worm."
]

for idx, passage in enumerate(topK_passages):
print(f'{idx+1}]', passage, '\n')
col = dspy.ColBERTv2Local()
col.build_index(passages=passages)
searcher = col.get_index(passages=passages[:10])
res = searcher.get_docs(searcher,query="Software",k=5)
print(res)