Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add comment about the generation of no-answer samples in FARMReader training #3404

Merged
merged 7 commits into from
Oct 18, 2022

Conversation

brandenchan
Copy link
Contributor

Related Issues

Proposed Changes:

  • Add comment to explain that no-answer samples are generated by the FARMReader during training if the passage is long

Checklist

@brandenchan brandenchan requested a review from a team as a code owner October 17, 2022 14:29
@brandenchan brandenchan requested review from ZanSara and removed request for a team October 17, 2022 14:29
@sjrl
Copy link
Contributor

sjrl commented Oct 17, 2022

Hey @brandenchan could you rebase with main to fix the schema check error? We just merged a PR that should fix this.

haystack/nodes/reader/farm.py Outdated Show resolved Hide resolved
brandenchan and others added 2 commits October 17, 2022 17:37
@sjrl
Copy link
Contributor

sjrl commented Oct 17, 2022

Hey @brandenchan just as a heads up the additional lines changed in the json schema shown by github under Files Changed are erroneous (since those changes are already in main). This is an issue on Githubs side that can happen after rebasing a branch. This Stack Overflow post explains the situation and a workaround.

@brandenchan brandenchan changed the base branch from main to add_big_bird October 17, 2022 15:49
@brandenchan brandenchan requested a review from a team as a code owner October 17, 2022 15:49
@brandenchan brandenchan changed the base branch from add_big_bird to main October 17, 2022 15:49
Copy link
Contributor

@agnieszka-m agnieszka-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor change

@@ -377,6 +377,9 @@ def train(
Checkpoints can be stored via setting `checkpoint_every` to a custom number of steps.
If any checkpoints are stored, a subsequent run of train() will resume training from the latest available checkpoint.

Note that when performing training with this function, long documents are split into chunks.
If a chunk does not contain the answer to the question, it is treated as a no-answer sample.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a chunk does not contain the answer to the question, it is treated as a no-answer sample.
If a chunk doesn'tt contain the answer to the question, it is treated as a no-answer sample.

@ZanSara ZanSara removed their request for review October 18, 2022 07:29
@brandenchan brandenchan merged commit 3bf5d43 into main Oct 18, 2022
@brandenchan brandenchan deleted the reader_train_no_answer branch October 18, 2022 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants