Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two initial bugs: something with missing "author" and mistakes from CrossRef #1

Open
dwsideriusNIST opened this issue Jun 28, 2018 · 4 comments

Comments

@dwsideriusNIST
Copy link

Hello, I'm associated with @davidmobley who started the twitter question that led to your fixbibtex project.

I've tested your code and found some opportunities for improvements. First, use this BibTeX file to reproduce my issues:
https://github.com/dmzuckerman/Sampling-Uncertainty/blob/e98db8be56fb89bf0b7261bbb20d6f5b3945dff6/refs.bib

  1. The first bug is that some BibTeX entries cause the script to crash if the "author" key is empty. The problematic key for the "refs.bib" example is "NIST_SRSW"

  2. The script is making some erroneous substitutions. E.g., "Kabsch1976" and "Leimkuhler" are replaced by the wrong article or book.

I suspect that the problem is that CrossRef's search is returning the wrong reference. Maybe incorporate some basic error checking to avoid really erroneous substitutions?

@davidlmobley
Copy link

BTW, thanks for this tool!!

@jaimergp
Copy link
Owner

Hi! Thanks for the feedback!

I have added two commits addressing (or trying to) these issues.

  • The author bug is fixed.
  • A simple string distance measurement has been implemented. If the new title is very different (< 0.75 score), a DOI search will be attempted (if that is available, at least). This part is still performed synchronously, but once we settle on a proper threshold and distance criteria, we can increase the performance.

Let me know if this helps. This is not a definitive solution by any means, so please keep reporting issues!

@dwsideriusNIST
Copy link
Author

@jaimergp Thanks for working in those fixes! I'm amazed at how many fixes your codes suggests.

A few remaining issues I run into (you may want to re-download https://github.com/dmzuckerman/Sampling-Uncertainty/blob/master/refs.bib):

  1. Entries using @book seem to really challenge the script. Most of the time, it's identifying a journal article similar to the book title, so I suspect the problem is that CrossRef can't distinguish between the two. Maybe it should just skip books?

  2. Here's a really interesting case: the key "Bergonzo2014" has citation year=2014, but the publication date was 2013; fixbibtex is opting for the 2013 date.

Again, many, many thanks!

@jaimergp
Copy link
Owner

jaimergp commented Jun 30, 2018

Hi again @dwsideriusNIST!

Please, check the new changes! I have (hopefully) addressed both points.

Regarding dates: Crossref provides both online and print publication dates. Until now we were choosing the online one for all cases, but now the script will use the print date if available, and the online one as a fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants