Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add document page number of ExtractedAnswer to meta #7572

Merged
merged 9 commits into from
May 2, 2024

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Apr 22, 2024

Related Issues

Proposed Changes:

  • Calculate document page number of ExtractedAnswer as in Haystack 1.x
  • Add page number to ExtractedAnswer's meta data under the key "answer_page_number"
  • Updated existing unit test

How did you test it?

tests are still missing Ran all ExtractiveReader tests locally
we need to check whether the document's meta contains page number #7599 adds page numbers to meta

Notes for the reviewer

Checklist

@github-actions github-actions bot added the 2.x Related to Haystack v2.0 label Apr 22, 2024
@coveralls
Copy link
Collaborator

coveralls commented Apr 22, 2024

Pull Request Test Coverage Report for Build 8923598800

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 6 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.07%) to 90.203%

Files with Coverage Reduction New Missed Lines %
components/readers/extractive.py 6 95.58%
Totals Coverage Status
Change from base Build 8881259484: 0.07%
Covered Lines: 6399
Relevant Lines: 7094

💛 - Coveralls

meta_to_add = {}
if answer.document and "page_number" in answer.document.meta:
ans_start = answer.document_offset.start
answer_page_number = answer.document.meta["page_number"] + answer.document.content[:ans_start].count("\f")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads up @julian-risch there is no component in Haystack v2 that would add the page_number key to document.meta. Please see this issue #6705

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just finished resolving issue #6705, see pull request #7599, that has already been pulled.
The document splitter will now add the page_number field to the metadata of the documents, as it did on Haystach 1.x

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Apr 30, 2024
@julian-risch julian-risch marked this pull request as ready for review April 30, 2024 12:56
@julian-risch julian-risch requested review from a team as code owners April 30, 2024 12:56
@julian-risch julian-risch requested review from dfokina and masci and removed request for a team April 30, 2024 12:56
@julian-risch julian-risch requested a review from masci May 2, 2024 12:21
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@julian-risch julian-risch merged commit b028497 into main May 2, 2024
23 checks passed
@julian-risch julian-risch deleted the add-page-number-to-answer branch May 2, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExtractedAnswer missing page_number meta
5 participants