Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create Argilla integration for haystacknodes #25

Open
davidberenstein1957 opened this issue Jul 13, 2023 · 27 comments · Fixed by deepset-ai/haystack#5736
Open

create Argilla integration for haystacknodes #25

davidberenstein1957 opened this issue Jul 13, 2023 · 27 comments · Fixed by deepset-ai/haystack#5736

Comments

@davidberenstein1957
Copy link

I think it would be cool to be able to monitor Haystack pipelines and nodes directly to Argilla.

https://docs.haystack.deepset.ai/docs/prompts_and_llms could be similar to https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html

However, I can also see support for:

@davidberenstein1957
Copy link
Author

  1. Does it make sense from your perspective?
  2. In terms of usage, what should be the prioritization for each one of these supported tasks?

@davidberenstein1957
Copy link
Author

@TuanaCelik, what do you think about this issue?

@TuanaCelik
Copy link
Member

@davidberenstein1957 this to me sounds like an amazing idea. @masci @anakin87 wdyt?

In terms of usage, I think the most interesting one might be PromptNode as you mentioned. But I can see it extending to the rest as well

@anakin87
Copy link
Member

It seems great to me!

I haven't tried Argilla Feedback and Callbacks but I think that by setting up an ArgillaCallbackHandler,
you get some "functional" observability:

  • graphical monitoring of your LLM activity
  • some data to score/annotate if you then want to fine-tune an LLM.

@TuanaCelik
Copy link
Member

@anakin87 would you see that as it's own component, or would you see it as a functionality of a component.
@davidberenstein1957 are you interested in contributing such an integration. Having a package that Haystack users can install and use alongside Haystack to log to Argilla would be amazing

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Jul 19, 2023

I am interested in working on this. I am not sure if a separate package makes sense or we should install it through extras? But let me know what makes sense for your part of the codebase.

@TuanaCelik
Copy link
Member

Hey @davidberenstein1957 - we're going to do some investigation with @anakin87 on Monday to get a better understanding how/if the source code should change, or whether this would be possible to implement as a lone standing integration. We will comment back here with our findings 👍

@davidberenstein1957
Copy link
Author

No problem👍

@TuanaCelik
Copy link
Member

Hey @davidberenstein1957 we've done some thinking and we see 2 options for you. And it's largely your choice of what type of integration you would like to build:

  1. The Agent in Haystack aldready has callback functionality. Similar to what you did with LangChain, you could make use of this callback and implement an ArgillaCallbackHandler, similar to what Stefano suggested. If you decide to do this, and find out that our callback functionality in Agent isn't enough, we'd be happy to extend it.

  2. Since the rest of our nodes such as PromptNode doesn't have callbacks, you could opt to create a custom node that can go before and after a node of which you would like to record input and output to. You could design this node(s) as something I could add into my pipeline, before and after a node, and it records the data to Argilla.

If you decide to go with option 2, @anakin87 had the idea that it could even be a custom node that can wrap another node like the PromptNode

We're currently desigining Haystack v2, whatever option you prefer to go for, we would be happy to update you if there's any major change that would make your integration better. E.g., if we do end up implementing callbacks for PromptNode etc.

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Aug 12, 2023

Hi @TuanaCelik @anakin87 ,

I think starting off with the callback functionality would be best in terms of effort vs gain. However, I think an ArgillaNode encompassing a lot of additional tasks would be a more robust solution for the future but we work on this, whenever the first integration proves successful.

What do you think?

@anakin87
Copy link
Member

Thanks, @davidberenstein1957!
That sounds like a good idea to me...

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Aug 14, 2023

@anakin87 Great! So, from your perspective, should we add the ArgillaCallbackHandler as an integration or as an addition to the main repo? Within LangChain the integration is managed in the main repository but I am not sure what you prefer.

@anakin87
Copy link
Member

Sorry! I missed this one...
We would like to have ArgillaCallbackHandler as an integration.

@TuanaCelik
Copy link
Member

@anakin87 - correct me if I'm wrong, but I think currently this would be a good integration and if @davidberenstein1957 is interested in it, we can involve you in Haystack v2 discussions when we start looking into evaluation and observability. You can see our current status on the Discussions page.

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Aug 21, 2023

@TuanaCelik @anakin87, I noticed that you do not directly allow for customization of the callback_manager.

For me, it would make sense to change the haystack/agents/base.py.

And introduces a base CallbackManager class and allow for passing a custom implement of this class to the Agent.init().

from abc import ABC, abstractmethod

class BaseCallbackManager(ABC):
    @abstractmethod
    def abstract_func(self, *args, **kwargs):
        pass

class CallbackManager(BaseCallbackManager):
    def __init__(self, agent_color: Color = Color.GREEN) -> None:
        self.agent_color = agent_color

    def on_tool_finish(
        self,
        tool_output: str,
        color: Optional[Color] = None,
        observation_prefix: Optional[str] = None,
        llm_prefix: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        print_text(observation_prefix)  # type: ignore
        print_text(tool_output, color=color)
        print_text(f"\n{llm_prefix}")

    def on_new_token(self, token: str, **kwargs: Any) -> None:
        print_text(token, color=self.agent_color)

    def on_agent_start(self, **kwargs: Any) -> None:
        agent_name = kwargs.pop("name", "react")
        print_text(f"\nAgent {agent_name} started with {kwargs}\n", color=self.agent_color)

    def on_agent_step(self, agent_step, **kwargs: Any) -> None:
        print_text(agent_step.prompt_node_response, end="\n", color=self.agent_color)

I, and others, would then be able to do:

class ArgillaCallbackManager(CallbackManager):
    def __init__(self, *args, **kwargs):
        pass

argilla_callbackmanager = ArgillaCallbackManager()

prompt_node = ...
agent = Agent(
    prompt_node=...,
    callback_manager=argilla_callbackmanager
)

Let me know if I can pick this up in a PR to the core package.

@davidberenstein1957
Copy link
Author

I also noticed that callback_manager.on_agent_finish() is being used within Agents but that it is not being defined as a callable. Not sure what the intended behavior would have been for this but it takes agent_step as a param.

@anakin87
Copy link
Member

@davidberenstein1957 your observations make sense to me and I would say that introducing a base CallbackManager class is reasonable.
@masci @vblagoje any thoughts on this?

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Aug 22, 2023

@masci @vblagoje

For me, it makes sense to also add a couple of additional callback points and use the same callback manager for both agents and tools from the ToolManager,i.e., also include on_tool_* methods to the CallbackManager. Not sure how fixed you are on the usage of Events but perhaps we can move CallbackManager to make the function usage more specific and maintainable. Also, this would allow any user to fully customize their Callbacks and re-use base logic where needed.

@davidberenstein1957
Copy link
Author

@masci @vblagoje reminder

@vblagoje
Copy link
Member

vblagoje commented Sep 4, 2023

@davidberenstein1957 apologies for the late response; I never got these notifications you sent. I understand your intentions regarding the custom Argilla callback manager. In striving for simplicity and prevention of a class explosion, we used the Events class from events Python library for agent callbacks. Is there a way you can adapt to this approach for the 1.x branch, and then for 2.x, we can talk more about trade-offs of a more explicit callback mechanism and which one to use? Let us know your thoughts.

@davidberenstein1957
Copy link
Author

davidberenstein1957 commented Sep 6, 2023

@vblagoje I think this normally would be a problem but the hardcoded event-call are still limited. At the moment it is not possible to get the final answer from the Events because there is no on_agent_finish(final step) method hardcoded in the Agent class so it remains difficult to add monitoring options similar to the LangChain integration.

In order to do the 1.x integration, we would need to mimic the methods that have been defined here, which I can handle in a PR if you want to.

I propose altering this line https://github.com/deepset-ai/haystack/blob/d048bb53523b11426d3c39f7cdc0aa8d15f0cf67/haystack/agents/base.py#L362

@vblagoje
Copy link
Member

vblagoje commented Sep 6, 2023

That seems reasonable @davidberenstein1957 Would you please follow up on this idea @anakin87 ?

@davidberenstein1957
Copy link
Author

@anakin87 I would like to handle the PR. Do you think it makes sense to also include other steps outlined by langchain or to just include the on_agent_final_answer?

@anakin87
Copy link
Member

anakin87 commented Sep 7, 2023

@davidberenstein1957 As this part will be rethought and redesigned in Haystack 2.0, I would say open a PR and include the minimum set of events needed for the Argilla implementation.
(thanks for your patience 💟)

@davidberenstein1957
Copy link
Author

@anakin87 I outlined the PR. Not sure if it requires doc changes or tests?
https://github.com/deepset-ai/haystack/pull/5736/files

@davidberenstein1957
Copy link
Author

@anakin87 @TuanaCelik , do you have any pointers w.r.t. this PR? We would love to log some more info into Argilla but maybe the Agent implementation is still limited?

We hope to align further with components/node and pipeline/chain monitoring functionality in a follow-up PR :)

@sdiazlor
Copy link

Hi, @anakin87 @TuanaCelik, finally we decided to add the integration for Haystack 1.x with Argilla. But we're excited about the Haystack 2.0 possibilities and we plan a refactoring when agents or callbacks become available (I think that it's planned for the upcoming months). So, we are looking forward to the new updates! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants