You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Multi modal transcribers convert image/audio/video documents into text documents.
The main question about these components, however, is what input should they deal with in order to be able to function both in Indexing and in query scenarios.
input path, output document: Works well for indexing, clumsy for query (document needs to be converted back to string)
input document, output document: Same as above
input path, output string: works for query, doesn't work for indexing (metadata is likely lost, for example whisper timestamps)
Currently WhisperTranscribers for v2 follow the path --> document pattern, but that makes them ugly to use in query pipelines.
Once we decide on a strategy, all transcribers should work similarly:
The content you are editing has changed. Please copy your edits and refresh the page.
Multi modal transcribers convert image/audio/video documents into text documents.
The main question about these components, however, is what input should they deal with in order to be able to function both in Indexing and in query scenarios.
path
, outputdocument
: Works well for indexing, clumsy for query (document needs to be converted back to string)document
, outputdocument
: Same as abovepath
, outputstring
: works for query, doesn't work for indexing (metadata is likely lost, for example whisper timestamps)Currently WhisperTranscribers for v2 follow the
path --> document
pattern, but that makes them ugly to use in query pipelines.Once we decide on a strategy, all transcribers should work similarly:
Tasks
Existing work:
LocalWhisperTranscriber
(v2) #4909RemoteWhisperTranscriber
(v2) #4910The text was updated successfully, but these errors were encountered: