Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add indexing example #6412

Merged
merged 3 commits into from
Nov 27, 2023
Merged

docs: Add indexing example #6412

merged 3 commits into from
Nov 27, 2023

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Nov 27, 2023

Related Issues

Proposed Changes:

  • Add example of an indexing pipeline that converts txt and pdf to documents, joins resulting lists, cleans and splits them, creates embeddings and then writes them to a document store.

How did you test it?

Manually with and without a directory containing txt and pdf files.
This code is also a subset of the code used in one of our e2e tests: https://github.com/deepset-ai/haystack/blob/main/e2e/pipelines/test_preprocessing_pipeline.py

Notes for the reviewer

Checklist

@julian-risch julian-risch added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Nov 27, 2023
@julian-risch julian-risch requested a review from a team as a code owner November 27, 2023 08:10
@julian-risch julian-risch requested review from vblagoje and removed request for a team November 27, 2023 08:10
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julian-risch imho it makes sense to add the main function that accepts file/dirs from the terminal to make it easier to use on various user-provided inputs.

@julian-risch
Copy link
Member Author

@julian-risch imho it makes sense to add the main function that accepts file/dirs from the terminal to make it easier to use on various user-provided inputs.

Thank you for your review! Let's talk about this in a discussion with @TuanaCelik and/or @mathislucka It's about the purpose and structure of these examples. If the examples are for quick copy'n'paste I'd leave out a main and any console input to keep the example as short as possible. In this particular example, one could make the input directory a variable and put it in the beginning of the file to increase visibility.

@julian-risch julian-risch merged commit c3a5d0d into main Nov 27, 2023
7 checks passed
@julian-risch julian-risch deleted the indexing-example branch November 27, 2023 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-for-release-notes PRs with this flag won't be included in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create minimal indexing pipeline for Haystack 2.0
2 participants