Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a new PipelineWrapper component #9

Closed
wants to merge 4 commits into from
Closed

Conversation

masci
Copy link
Contributor

@masci masci commented May 28, 2024

Related Issues

  • fixes #issue-number

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

@coveralls
Copy link

coveralls commented May 28, 2024

Pull Request Test Coverage Report for Build 9367752599

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.764%

Totals Coverage Status
Change from base Build 9352626696: 0.0%
Covered Lines: 612
Relevant Lines: 626

💛 - Coveralls

@mikebellerU
Copy link

Hi @masci -- i tried it out but am confused about how it would work. Here I have an example code -- I create two WrappedPipelines and try to connect them together.

from haystack import Pipeline
from haystack_experimental.components.wrappers.pipeline import PipelineWrapper
from haystack.components.converters import OutputAdapter

p1 = Pipeline()
p1.add_component("adap", OutputAdapter(
    template="Hello {{inp}}", output_type=str))
p2 = Pipeline()
p2.add_component("adap", OutputAdapter(
    template="Goodbye {{inp}}", output_type=str))
p = Pipeline()
p.add_component("pipeline1", PipelineWrapper(p1))
p.add_component("pipeline2", PipelineWrapper(p2))
p.connect("pipeline1.adap:output", "pipeline2.adap:inp")
print(p.run(data={"pipeline1": {"adap:inp": "Paris"}}))

I expect the output to be: {'pipeline2': {'adap': {'output': 'Goodbye Hello Paris'}}} but the actual output is: {'pipeline1': {'adap': {'output': 'Hello Paris'}}}

Is this because the 'connect' function is not working yet?

But, taking a step back, I also am concerned that this whole concept of SuperPipelines being implemented using Components might get very complicated. I don't know how a user will keep track of all the levels of data needed. Trying to reuse a Pipeline will require deep knowledge of its internals, which is kind of "code smell". Thoughts?

@masci
Copy link
Contributor Author

masci commented Jun 4, 2024

@mikebellerU I fixed the wrap/unwrap of the data passed from the wrapping component to the wrapped pipeline, now the output of your snippet should be {'pipeline2': {'adap:output': 'Goodbye Hello Paris'}} as expected.

Trying to reuse a Pipeline will require deep knowledge of its internals, which is kind of "code smell". Thoughts?

The goal of this component is to treat certain pipelines as "blackboxes" that you can invoke like it was a regular component. I don't think you need to necessarily know the internals of the wrapped pipeline to be able to use it, for example from your snippet if you print(p.inputs()) you would get {'pipeline1': {'adap:inp': {'type': typing.Any, 'is_mandatory': True}}}, which is all you need to use the wrapped pipeline: connect an upstream output with type Any to adap:inp and you're good to go.

(Any consideration about how complex is passing data to run() in Haystack in general, that holds true)

@mikebellerU
Copy link

Ah! The patched version works. I think I also understand your point about input / output. I really hadn't used these methods enough or really understood their value until you just explained it now -- so thank you for that! I wonder though whether the PipelineWrapper still isn't quite right. Here is an example where we I have wrapped a pipeline in a pipeline, in a pipeline. The pipeline.inputs() call seems to give the correct answer, but when I try to run the pipeline, something is going wrong:

from haystack import Pipeline
from haystack_experimental.components.wrappers.pipeline import PipelineWrapper
from haystack.components.converters import OutputAdapter


pg = Pipeline()
pg.add_component("adap", OutputAdapter(
    template="Goodbye {{inp1}}", output_type=str))
print("PG IN:", pg.inputs(), "OUT:", pg.outputs())

ph = Pipeline()
ph.add_component("pg", PipelineWrapper(pg))
ph.add_component("adap", OutputAdapter(
    template="Hello {{inp2}}", output_type=str))
ph.connect("pg.adap:output", "adap.inp2")
print("PH IN:", ph.inputs(), "OUT:", ph.outputs())

pp = Pipeline()
pp.add_component("goodbye", PipelineWrapper(ph))
print("PP IN:", pp.inputs(), "OUT:", pp.outputs())

print(pp.run(data={"goodbye.pg:adap:inp1": "world"}))

PP's inputs are described (in the print output) as: PP IN: {'goodbye': {'pg:adap:inp1': {'type': typing.Any, 'is_mandatory': True}}}
Even though I believe I have aligned my inputs with this information, this program produces:

Inputs ['goodbye.pg:adap:inp1'] were not matched to any component inputs, please check your run parameters.
Traceback (most recent call last):
  File "/Users/mike/tmp/haystack-experimental/mystuff.py", line 22, in <module>
    print(pp.run(data={"goodbye.pg:adap:inp1": "world"}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mike/tmp/haystack/.venv/lib/python3.11/site-packages/haystack/core/pipeline/pipeline.py", line 110, in run
    self._validate_input(data)
  File "/Users/mike/tmp/haystack/.venv/lib/python3.11/site-packages/haystack/core/pipeline/base.py", line 638, in _validate_input
    raise ValueError(f"Missing input for component {component_name}: {socket_name}")
ValueError: Missing input for component goodbye: pg:adap:inp1

Thoughts?

@masci
Copy link
Contributor Author

masci commented Jun 12, 2024

After an internal sync we decided not to pursue this component further. The solution doesn't feel quite right and the use cases that would benefit the most (like agentic pipelines) still have unclear requirements.

Currently the problem can be addressed by breaking down pipelines into smaller ones when possible, or considering to have bigger components.

If anyone from the community wants to take this on, this could make a good integration.

@masci masci closed this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants