Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create ready-made pipelines #5992

Closed
Timoeller opened this issue Oct 6, 2023 · 2 comments · Fixed by #6424 or #7001
Closed

create ready-made pipelines #5992

Timoeller opened this issue Oct 6, 2023 · 2 comments · Fixed by #6424 or #7001
Assignees
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint type:documentation Improvements on the docs type:feature New feature or request
Milestone

Comments

@Timoeller
Copy link
Contributor

Timoeller commented Oct 6, 2023

Ready made pipelines

Similar to https://github.com/deepset-ai/haystack/blob/main/haystack/pipelines/standard_pipelines.py we want to have predefined pipelines in haystack 2.0. We start with:

RAG pipeline

We want a simple RAG pipeline. That means one retriever + a Generator. Make the embedding model optional. If it is None BM25 will be used.

  1. Construction params: Embedding model, prompt="default_rag_prompt", generation model.
  2. Run params: query

indexing pipeline

indexing with the help of native + OSS haystack converters. We want to customize this pipeline on the number of supported file formats. This will make installation easier depending on which file types we want to convert. E.g. we can showcase an indexing pipeline that just converts TXT without additional dependencies.
Also, we want this pipeline to convert a list of file_paths (already supported through the filetyperouter) and ideally all files present in a folder (I believe this needs a new component). It should write warnings for files it cannot convert.

  1. Construction params: supported_file_types=["PDF", "TXT", "markdown", "HTML"], embedding model.
  2. Run params: either [list of files] or "folder".

We want to think about the most important parameters for these 2 pipeline types. Which LLM (+ API key) is being used are two of those parameters, but you should define and implement 1-4 most important parameters in total.

Closing the gap from Simple to Complex

Please think about ways to gap the simple representation to a more complex + customizable one. That means we find ways to transition from using the ready-made RAGPipline() to the underlying components in an easy-to-use and understandable way. This should be done via documentation AND inside the code.

@Timoeller Timoeller added type:feature New feature or request 2.x Related to Haystack v2.0 labels Oct 6, 2023
@Timoeller Timoeller added this to the 2.0-beta milestone Oct 6, 2023
@CrypticRevenger
Copy link

Please assign me, I want to do it.

@Timoeller
Copy link
Contributor Author

hey thanks for working on this @CrypticRevenger
I wrote a response in the PR you opened: #5996

@Timoeller Timoeller added P2 Medium priority, add to the next sprint if no P1 available type:documentation Improvements on the docs labels Oct 9, 2023
@ZanSara ZanSara self-assigned this Nov 23, 2023
@Timoeller Timoeller added P0 Highest priority, add to the current sprint and removed P2 Medium priority, add to the next sprint if no P1 available labels Nov 23, 2023
@ZanSara ZanSara linked a pull request Nov 23, 2023 that will close this issue
@masci masci removed the P0 Highest priority, add to the current sprint label Feb 12, 2024
@masci masci modified the milestones: 2.0-beta, 2.0.0 Feb 12, 2024
@masci masci reopened this Feb 12, 2024
@masci masci added the P2 Medium priority, add to the next sprint if no P1 available label Feb 12, 2024
@masci masci added P1 High priority, add to the next sprint and removed P2 Medium priority, add to the next sprint if no P1 available labels Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint type:documentation Improvements on the docs type:feature New feature or request
Projects
None yet
5 participants