Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Component names should be parsed into component serialization methods during pipeline serialization #7763

Closed
LastRemote opened this issue May 30, 2024 · 2 comments
Labels
2.x Related to Haystack v2.0

Comments

@LastRemote
Copy link
Contributor

LastRemote commented May 30, 2024

Is your feature request related to a problem? Please describe.

Hello, I was experimenting with pipeline serialization in one of my projects that includes multiple prompt builders and LLM calls. I decided to save these prompt templates in separate Jinja files along with the main pipeline YAML during the serialization process (to allow prompt template versioning without making significant changes to the pipeline YAML). However, I realized that the component has no way to identify its name in the pipeline during the serialization process. In my case, I have to generate random UUIDs for the templates, which negatively impacts readability (and it also hurts if serialization occurs more than once when testing). I also believe there should be a way for the components to obtain their names in the pipeline.

Expected behavior (for this particular problem):

├── pipeline.yaml
├── prompt_templates
│   ├── my_first_prompt_builder.jinja2 # These should be identical to the component name for PromptBuilder or something like component.template_1/2/3.jinja for ChatPromptBuilder
│   ├── my_second_prompt_builder.jinja2

Describe the solution you'd like

I haven't come up with a perfect solution yet. My initial thought is to introduce an optional parameter, like component_name, in the component's to_dict() method. However, this would be a breaking change as all components would need to update their method definitions even if they do not use the component name. A less invasive but somewhat hacky approach is to update the behavior of haystack.core.serialization.component_to_dict(obj), so it either checks the method definition of component.to_dict() or simply calls another method to achieve the desired functionality. Both options seem a bit too hacky to me, and I am not sure if there is a better solution.

Describe alternatives you've considered

I need to customize my PromptBuilder components anyway, so I think I could enforce a name parameter when initializing the instance. This is not ideal but still doable.

edited: fix grammar and improve overall writing

@shadeMe shadeMe added the 2.x Related to Haystack v2.0 label Jun 25, 2024
@shadeMe
Copy link
Collaborator

shadeMe commented Jun 25, 2024

Unless I'm misunderstanding your problem, you use a DeserializationCallback to achieve your goals: The pre-init callback receives the name of the component being deserialized.

@shadeMe shadeMe closed this as completed Jun 25, 2024
@LastRemote
Copy link
Contributor Author

LastRemote commented Jul 10, 2024

Hello @shadeMe, thanks for your reply and sorry for the delay.

My goal is to create a customized MyChatPromptBuilder that saves the prompts as Jinja files during the serialization process of the pipeline, instead of saving the entire strings in pipeline.yaml. The motivation behind this is that the prompts are more likely to change during the testing phase, and the prompts are getting a bit too long (and include non-ascii characters) in my current use case.

My ideal pattern for saving these prompts would be something like {component-name-in-the-pipeline}.{system-prompt/user-prompt}.jinja2. However, I am unable to retrieve the component name in the pipeline when the pipeline gets serialized (

for name, instance in self.graph.nodes(data="instance"): # type:ignore
components[name] = component_to_dict(instance)
), and I am wondering if there is an elegant way to get this piece of information. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0
Projects
None yet
Development

No branches or pull requests

2 participants