Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input parameter meta in FileTypeRouter #6392

Closed
bilgeyucel opened this issue Nov 23, 2023 · 1 comment · Fixed by #6702
Closed

Input parameter meta in FileTypeRouter #6392

bilgeyucel opened this issue Nov 23, 2023 · 1 comment · Fixed by #6702
Assignees
Labels
2.x Related to Haystack v2.0 P2 Medium priority, add to the next sprint if no P1 available type:feature New feature or request

Comments

@bilgeyucel
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When a preprocessing pipeline starts with FileTypeRouter, which is usually the case when we use multiple converters, it's not possible to provide meta information for files

Describe the solution you'd like
Extra meta input parameter that accepts dictionary to add this info to files.
Probably FileTypeRouter will pass this info directly to converters as they route the files

Describe alternatives you've considered

  1. Running the converters as a standalone component, adding metadata, running the rest of the preprocessing pipeline
  2. Passing the metadata in .run() to converters. This solution is a mess though.

Additional context
Something to think about: Why is the name of the input parameter is "sources" when the component name is FileTypeRouter? Does it make sense to change it to "files"?

@bilgeyucel bilgeyucel added type:feature New feature or request 2.x Related to Haystack v2.0 labels Nov 23, 2023
@masci masci added the P2 Medium priority, add to the next sprint if no P1 available label Dec 11, 2023
@ZanSara ZanSara self-assigned this Dec 19, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Dec 20, 2023

After a sync with @silvanocerza and @TuanaCelik we agreed to update the converters to accept a single metadata dictionary. In this way, users can use a Multiplexer to provide this dictionary directly to all converters, covering this usecase.

I will also make an example script to explain the solution.

TODO:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P2 Medium priority, add to the next sprint if no P1 available type:feature New feature or request
Projects
None yet
3 participants