Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add page_number to metadata in DocumentSplitter #7599

Merged
merged 17 commits into from
Apr 29, 2024

Conversation

CarlosFerLo
Copy link
Contributor

@CarlosFerLo CarlosFerLo commented Apr 25, 2024

Related Issues

Proposed Changes:

I updated the DocumentSplitter methods so that it adds the "page_number" field to the metadata of output documents. This field contains the page number where you can find the document on the original document. The implementation is the same as the one on the v1.25.x.

How did you test it?

I added some new unit test for testing this behaviour, but testing was mainly functional as it was based on a previously functioning code.

Notes for the reviewer

This is my first contribution!!! The .gitignore change is to counter a VSCode extension I have that I am not able to eliminate the commit.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes ✅
  • I added unit tests and updated the docstrings ✅
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
  • I documented my code ✅
  • I ran pre-commit hooks and fixed any issue ✅

@github-actions github-actions bot added topic:tests topic:DX Developer Experience 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Apr 25, 2024
@CarlosFerLo CarlosFerLo marked this pull request as ready for review April 25, 2024 19:51
@CarlosFerLo CarlosFerLo requested review from a team as code owners April 25, 2024 19:51
@CarlosFerLo CarlosFerLo requested review from dfokina and vblagoje and removed request for a team April 25, 2024 19:51
@davidsbatista davidsbatista self-requested a review April 26, 2024 13:03
.gitignore Outdated Show resolved Hide resolved
@davidsbatista
Copy link
Contributor

@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.

@CarlosFerLo
Copy link
Contributor Author

@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something.

Thanks, really appreciate it. Excited to be able to collaborate.

@CLAassistant
Copy link

CLAassistant commented Apr 29, 2024

CLA assistant check
All committers have signed the CLA.

@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 8876963566

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 90.12%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_splitter.py 1 98.57%
Totals Coverage Status
Change from base Build 8849558850: 0.02%
Covered Lines: 6330
Relevant Lines: 7024

💛 - Coveralls

@davidsbatista davidsbatista enabled auto-merge (squash) April 29, 2024 10:50
@davidsbatista davidsbatista merged commit d2c87b2 into deepset-ai:main Apr 29, 2024
23 checks passed
@CarlosFerLo CarlosFerLo deleted the issue-6705 branch April 29, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:DX Developer Experience topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add page_number to meta of Documents in DocumentSplitter
4 participants