Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal transcribers (v2) #5366

Open
3 tasks
Tracked by #5265
ZanSara opened this issue Jul 14, 2023 · 0 comments
Open
3 tasks
Tracked by #5265

Multimodal transcribers (v2) #5366

ZanSara opened this issue Jul 14, 2023 · 0 comments
Labels
2.x Related to Haystack v2.0 epic P3 Low priority, leave it in the backlog

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Jul 14, 2023

Multi modal transcribers convert image/audio/video documents into text documents.

The main question about these components, however, is what input should they deal with in order to be able to function both in Indexing and in query scenarios.

  • input path, output document: Works well for indexing, clumsy for query (document needs to be converted back to string)
  • input document, output document: Same as above
  • input path, output string: works for query, doesn't work for indexing (metadata is likely lost, for example whisper timestamps)

Currently WhisperTranscribers for v2 follow the path --> document pattern, but that makes them ugly to use in query pipelines.

Once we decide on a strategy, all transcribers should work similarly:

Tasks

Existing work:

@ZanSara ZanSara changed the title Multi modal transcribers (v2) Multimodal transcribers (v2) Jul 14, 2023
@ZanSara ZanSara added the 2.x Related to Haystack v2.0 label Aug 25, 2023
@Timoeller Timoeller added the P3 Low priority, leave it in the backlog label Sep 29, 2023
@masci masci added the epic label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 epic P3 Low priority, leave it in the backlog
Projects
None yet
Development

No branches or pull requests

3 participants