Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create TransformersTextRouter based on HF TextClassification pipeline #7753

Closed
sjrl opened this issue May 28, 2024 · 5 comments · Fixed by #7801
Closed

feat: Create TransformersTextRouter based on HF TextClassification pipeline #7753

sjrl opened this issue May 28, 2024 · 5 comments · Fixed by #7801
Assignees
Labels
type:feature New feature or request

Comments

@sjrl
Copy link
Contributor

sjrl commented May 28, 2024

Is your feature request related to a problem? Please describe.
Recently we half ported over this component from v1 with this PR. By this I mean previously the TransformerQueryClassifier worked both the the ZeroShotClassification pipeline from HF and the TextClassification pipeline from HF.

However, to follow the philosophy of Haystack v2 we decided to split up this old component into two new ones, the TransformersZeroShotTextRouter (based on the ZeroShotClassification pipeline) and the yet to be made TransformersTextRouter (based on the TextClassification pipeline).

Describe the solution you'd like
A new Router called TransformersTextRouter based on the TextClassification pipeline.

Additional context
This would allow a query routing based on text classification which the Sol team actively uses to build pipelines in Haystack V1 for clients (e.g. prompt injection classification). Therefore, to ease the transition to v2 for clients it would be great to add this component.

@mrm1001
Copy link

mrm1001 commented May 28, 2024

Hi, there is some demand for a DocumentClassifier component (#7669). So instead of having a TransformersTextRouter, another approach with a slightly longer pipeline would be having a DocumentClassifier connected to a ConditionalRouter. Did I understand correctly?

@sjrl
Copy link
Contributor Author

sjrl commented May 28, 2024

Not quite, since the goal of the DocumentClassifier is to add classification labels to Documents in their metadata. And then index the Documents with that metadata into to the DocStore to then be used later by a query pipeline (e.g. through filters).

While technically it would be possible to use the DocumentClassifier with the ConditionalRouter as you outlined you would also have to convert the query into a Document (so some DocumentBuilder component), then pass that to the DocumentClassifier and then the ConditionalRouter, which is somewhat clunky.

I think it makes sense to have a component that can directly work on Text to perform this routing so we can skip the Document intermediate. I think this is similar reasoning as to why we have separate TextEmbedder and DocumentEmbedder components that use the same underlying models.

What do you think?

@mrm1001
Copy link

mrm1001 commented May 28, 2024

Makes sense! Thanks @sjrl

@masci masci added the type:feature New feature or request label May 31, 2024
@julian-risch
Copy link
Member

Adding a new component TransformersTextRouter based on the TextClassification pipeline is a good idea! 👍
@sjrl We won't have capacity to work on it in the next sprint. Would you like to add the component in a PR? Sounds like you already have a clear understanding of what the implementation should look like and you could use the TransformersZeroShotTextRouter as a blue print.

@sjrl
Copy link
Contributor Author

sjrl commented May 31, 2024

No worries, this isn't something we need urgently, but within the next quarter. I'd be happy to open PR once I have some free time to work on this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants