Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post naacl fixups #1

Merged
merged 17 commits into from
Jun 6, 2019
Merged

Post naacl fixups #1

merged 17 commits into from
Jun 6, 2019

Conversation

jayded
Copy link
Owner

@jayded jayded commented Jun 5, 2019

  • Includes plain text documents for user convenience
  • minor code cleanup
  • update READMEs
  • update instructions for experiment reproduction

jayded and others added 16 commits June 5, 2019 13:21
…to the path were made in the python version of the file as well.
There were two things done in the patch:
1. Change the extraction of text method to only include the abstract
once.
2. Update the corresponding start/end spans to reflect the change in
text extraction, such that article_text[start:end] is equal to the span
shown in the annotations_merged file.
There were a few major fixes to this code:
1. The addition of a try-catch to help deal with an outside library used
for the heuristic crashing for no apparent reason.
2. Changed the python script that runs the code to run the correct
experiment when run, as scan-net was not being run, but rather
scan-net-ico.
3. Disabled the GPU code for the LR temporarily. Some GPUs are able to
load in the data, while others are not able to. For now, it is disabled
until batching is implemented.
The purpose of this commit is to update all of the files listed in the
additional file section. These files still have old spans. All of the
changes made in this commit do not affect any modeling or preprocessing
steps, but rather are for data distribution purposes.
The purpose of this commit is to remove a file that is not useful, and
whose sole purpose was to simply compile statistics and do simple
calculations of data.
In this commit, we add an additional option for users to set a variable
in order to extract plaintext versions of the xml files. Currently, the
spans are not correct for this. This will be done in a future commit.
The previous commit enabled users to extract plaintext version of the
XML files. However, there was no ability to extract section information
from the plaintext article. This commit has the function that extracts
the plaintext also return a dictionary of section titles mapped to
start/end cordinates.
In this commit, we do 3 things:
1. Update the readme to reflect the rest of the changes in the commit.
2. Add plain-text versions of all the XML files. These have similar
names, but with a different extension.
3. Update the annotations_merged.csv to have the offsets of evidence
spans into these plaintext files.

We do realize that this makes the additional_file/*.csv files out of
date. However, we will be working to add the changes to those files as
soon as possible.
@jayded jayded self-assigned this Jun 5, 2019
With un-even XML files, the code currently breaks when attempting to
parse section offsets. This code will work despite any malformed XMLs.
@jayded jayded merged commit ea1769d into master Jun 6, 2019
@jayded jayded deleted the post-naacl-fixups branch June 6, 2019 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants