Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add pgvector retriever #465

Merged
merged 8 commits into from
Feb 27, 2024

Conversation

poppingtonic
Copy link
Contributor

Implements a retriever that (as the name suggests) uses pgvector to retrieve passages,
using a raw SQL query and a postgresql connection managed by psycopg2.

It needs to register the pgvector extension with the psycopg2 connection

Returns a list of dspy.Example objects

Args:
    db_url (str): A PostgreSQL database URL in psycopg2's DSN format
    pg_table_name (Optional[str]): name of the table containing passages
    openai_client (openai.OpenAI): OpenAI client to use for computing query embeddings
    k (Optional[int]): Default number of top passages to retrieve. Defaults to 20
    embedding_field (str = "embedding"): Field containing passage embeddings. Defaults to "embedding"
    fields (List[str] = ['text']): Fields to retrieve from the table. Defaults to "text"

Examples:
    Below is a code snippet that shows how to use PgVector as the default retriever

    ```python
    import dspy
    import openai
    import psycopg2

    openai.api_key = os.environ.get("OPENAI_API_KEY", None)
    openai_client = openai.OpenAI()
    
    llm = dspy.OpenAI(model="gpt-3.5-turbo")
    
    DATABASE_URL should be in the format postgresql:https://user:password@host/database 
    db_url=os.getenv("DATABASE_URL")

    retriever_model = PgVectorRM(conn, openai_client=openai_client, "paragraphs", fields=["text", "document_id"], k=20)
    dspy.settings.configure(lm=llm, rm=retriever_model)
    ```

    Below is a code snippet that shows how to use PgVector in the forward() function of a module
    ```python
    self.retrieve = PgVectorRM(db_url, openai_client=openai_client, "paragraphs", fields=["text", "document_id"], k=20)

@insop
Copy link
Contributor

insop commented Feb 26, 2024

It would be great if you could update this document.

https://github.com/stanfordnlp/dspy/blob/main/docs/retrieval_models_client.md

@poppingtonic
Copy link
Contributor Author

poppingtonic commented Feb 26, 2024

Hi @insop, I'll write this. Thank you!

@okhat okhat merged commit 1cc15d7 into stanfordnlp:main Feb 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants