Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add keep-id to DocumentCleaner #7703

Merged
merged 2 commits into from
May 16, 2024

Conversation

CarlosFerLo
Copy link
Contributor

Related Issues

Proposed Changes:

The DocumentCleaner has now an optional property called keep_id that keeps the original id of all the input documents.

How did you test it?

Added one extra unit test and edited the one that checks correct initialisation of the object.

Notes for the reviewer

This PR is a clone of the #7618 original PR that cannot be reopened because I messed up with git.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes ✅
  • I added unit tests and updated the docstrings ✅
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
  • I documented my code ✅
  • I ran pre-commit hooks and fixed any issue ✅

@CarlosFerLo CarlosFerLo requested review from a team as code owners May 16, 2024 09:17
@CarlosFerLo CarlosFerLo requested review from dfokina and vblagoje and removed request for a team May 16, 2024 09:17
@coveralls
Copy link
Collaborator

coveralls commented May 16, 2024

Pull Request Test Coverage Report for Build 9109722913

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.002%) to 90.587%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_cleaner.py 1 98.82%
Totals Coverage Status
Change from base Build 9103206687: 0.002%
Covered Lines: 6592
Relevant Lines: 7277

💛 - Coveralls

@vblagoje
Copy link
Member

Thank you @CarlosFerLo I'll let @silvanocerza approve this one and remove myself from PR review. Thanks for your contribution 🙏

@vblagoje vblagoje requested review from silvanocerza and removed request for vblagoje May 16, 2024 09:38
Copy link
Contributor

@silvanocerza silvanocerza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the contribution!

@silvanocerza silvanocerza merged commit 57af95d into deepset-ai:main May 16, 2024
25 checks passed
@CarlosFerLo CarlosFerLo deleted the issue-7557b branch June 26, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preprocessing should allow keeping Document ids unchanged
4 participants