Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added split by page to DocumentSplitter #6753

Merged

Conversation

sahusiddharth
Copy link
Contributor

  • fixes feat: Add split by page to DocumentSplitter #6707

  • The new split_by value, 'page', has been introduced in the DocumentSplitter to enable users to split documents based on the "\f" character, facilitating the preservation of context and allowing for more flexible chunking options.

@sahusiddharth sahusiddharth requested review from a team as code owners January 17, 2024 09:18
@sahusiddharth sahusiddharth requested review from dfokina and shadeMe and removed request for a team January 17, 2024 09:18
@github-actions github-actions bot added 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Jan 17, 2024
@coveralls
Copy link
Collaborator

coveralls commented Jan 17, 2024

Pull Request Test Coverage Report for Build 7556521537

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 88.015%

Totals Coverage Status
Change from base Build 7543114073: 0.1%
Covered Lines: 4531
Relevant Lines: 5148

💛 - Coveralls

@shadeMe shadeMe requested review from sjrl and removed request for shadeMe January 17, 2024 09:33
@sjrl
Copy link
Contributor

sjrl commented Jan 17, 2024

Thanks @sahusiddharth this is looking good! I left a few minor change suggestions.

Otherwise, it would be great if you could add a unit test for this new split_by option. So please make an similar test as this one


in that same file.

Then this PR should be good to go!

@sjrl
Copy link
Contributor

sjrl commented Jan 17, 2024

Looking good! Almost there, could you also add this suggested change?

@shadeMe shadeMe changed the title feat-added-split-by-page-to-DocumentSplitter feat: added split by page to DocumentSplitter Jan 17, 2024
@sahusiddharth
Copy link
Contributor Author

sahusiddharth commented Jan 17, 2024

@sjrl, I'm sorry for wasting so much of your time. I didn't understood what needs to be done and was barking on the wrong tree.

@sjrl
Copy link
Contributor

sjrl commented Jan 17, 2024

@sjrl, I'm sorry for wasting so much of your time. I didn't understood what needs to be done and was barking on the wrong tree.

Sorry, I didn't mean for it to come off that way! I just thought I could make a quick change to the PR to finish it up :)

@sahusiddharth
Copy link
Contributor Author

@sjrl It is not your fault, it's just that it is a very small issue, I should have completed it in one go.

@sjrl
Copy link
Contributor

sjrl commented Jan 17, 2024

No problem, we really appreciate the contribution!

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sjrl sjrl merged commit a7ac4ed into deepset-ai:main Jan 17, 2024
22 checks passed
@sahusiddharth sahusiddharth deleted the feat/add-split-by-page-to-DocumentSplitter branch January 18, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add split by page to DocumentSplitter
3 participants