Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llamaindex integration #1170

Merged

Conversation

jerryjliu
Copy link
Contributor

add integrations + a full cookbook showing llamaindex + dspy:

  1. optimize a query pipeline with dspy
  2. port over a llamaindex prompt to a dspy component
  3. extract dspy component back into llamaindex prompt template

@okhat
Copy link
Collaborator

okhat commented Jun 19, 2024

Thank you Jerry. We should merge this, but we're failing on Ruff checks, so that's the only thing we gotta resolve.


def _output_keys_from_template(template: dsp.Template) -> InputKeys:
"""Get input keys from template."""
# get only fields that are marked OldInputField and NOT OldOutputField
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo with L53 and 54 - likely should be "output keys" and "OldOutputField"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yes will get to this soon! (probably later in afternoon/tonight)

self.predict_modules = []
for module in query_pipeline.module_dict.values():
if isinstance(module, DSPyComponent):
self.predict_modules.append(module.predict_module)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these predict_modules used later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not explicitly, but isn't this needed to have the optimizer work? to explicitly register predict_modules as a field in the module

Copy link
Collaborator

@arnavsinghvi11 arnavsinghvi11 Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since LlamaIndexModule inherits from dspy.Module, these are already registered under predictors, named_predictors which are then used in the optimizers. I tested the notebook locally and it works without the predict_modules (since the codebase doesn't recognize it 😄 ). Let me know if that makes sense and feel free to test it out on your end to confirm the expected behavior without it!

ex: predict_modules (not being used but aligns with what's done in named_predictors) ->

[ChainOfThought(GenerateAnswer(context_str, query_str -> answer
     instructions='Answer questions with short factoid answers.'
     context_str = Field(annotation=str required=True json_schema_extra={'desc': 'contains relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context Str:'})
     query_str = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Query Str:', 'desc': '${query_str}'})
     answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
 ))]

named_predictors (used in optimizers - already registered with dspy.Module)

<bound method Module.named_predictors of predict_modules[0] = ChainOfThought(GenerateAnswer(context_str, query_str -> answer
    instructions='Answer questions with short factoid answers.'
    context_str = Field(annotation=str required=True json_schema_extra={'desc': 'contains relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context Str:'})
    query_str = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Query Str:', 'desc': '${query_str}'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
))>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i'm not sure how this works, since these are buried in the query_pipeline.module_dict - i took a brief skim at the named_parameters code, but if you're saying that these are autoregistered because it recursively traverses all subfields then sure i trust you :)

@arnavsinghvi11
Copy link
Collaborator

Thanks @jerryjliu for this PR! I patched the ruff fix directly.

left a minor comment for a typo and also just wanted to confirm if the PR was ready to merge from your end since there were some in-line TODOs.

will merge once you're ready!

"outputs": [],
"source": [
"from llama_index.core.query_pipeline import QueryPipeline as QP, InputComponent, FnComponent\n",
"from llamaindex import DSPyComponent, LlamaIndexModule\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran into import errors while running the notebook - should be dspy.predict.llamaindex (or you can add llamaindex imports here and then call dspy.LlamaIndexModule directly!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops that's embarassing, yes need to migrate imports

"outputs": [],
"source": [
"from llama_index.core.query_pipeline import QueryPipeline as QP, InputComponent, FnComponent\n",
"from llamaindex import DSPyComponent, LlamaIndexModule\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just flagging from above

},
"outputs": [],
"source": [
"from llamaindex import DSPyPromptTemplate\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just flagging from above

@arnavsinghvi11
Copy link
Collaborator

arnavsinghvi11 commented Jun 20, 2024

ah so with the updated imports, this is failing the tests since llama_index is not a supported dependency. The llama_index import statements need to be wrapped in a try/except block to avoid triggering the errors like done here or moving it within the various initialization methods where the imports are needed as done here.

I can take a look at this soon to resolve this locally as well.

[update - feel free to disregard above. llamaindex needs to be a required dependency to import all its functions but updating poetry.lock within this PR breaks local DSPy tests. integration for libraries to avoid this import patch will be added via comprehensive integration tests in upcoming PRs!]

@arnavsinghvi11 arnavsinghvi11 force-pushed the jerry/add_llamaindex_integration branch from f576bbf to e6373de Compare June 21, 2024 01:58
@arnavsinghvi11 arnavsinghvi11 merged commit 01c8de0 into stanfordnlp:main Jun 21, 2024
4 checks passed
@arnavsinghvi11
Copy link
Collaborator

Thanks again @jerryjliu ! Exciting integration!

@jerryjliu
Copy link
Contributor Author

woot oops sorry just seeing this. @arnavsinghvi11 thanks for landing!!!

(btw feel free to DM me your twitter/linkedin handles, would be happy to promote on socials when this goes out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants