Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backend to IACR Cryptoeprint #40

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lzmartinico
Copy link

This PR adds a new backend to the IACR Cryptology ePrint. This is my first non-trivial elisp, so any stylistic feedback would be appreciated! This eprint service does not have a nice API like some of the other backends, so the HTML code is quite brittle for now.

@cpitclaudel
Copy link
Owner

🎉 👍 Thanks for the contribution! A quick thought: if here's no clean API from IACR, maybe this would be best maintained as a separate MELPA package? In any case I will have a detailed look.

Copy link
Owner

@cpitclaudel cpitclaudel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent! And much simpler than I expected, since the HTML is very regular.
My only high-level comment is that I think some parts might be more readable with pcase.

(Btw, you have a few indentation issues throughout the file.)

biblio-iacr.el Outdated Show resolved Hide resolved
biblio-iacr.el Outdated Show resolved Hide resolved
(defun biblio-iacr--extract-interesting-fields (header body)
"Prepare a IACR search result, composed of HEADER and BODY, for display.
HEADER contains the technical report id, which includes the year of publication.
BODY includes the Title and Authors."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a more robust way to write this than using nth and caddr; it might actually read nicer with pcase?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at 6c6a175, using pcase is much more expressive, but I don't know if it adds anything in terms of robustness (if the IACR DOM changes, it will still fail in unexpected ways - but maybe it will be easier to fix now)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like pcase a lot better — it makes it clear what structure is expected, which should make it clearer if we need a fix later.

biblio-iacr.el Outdated Show resolved Hide resolved
biblio-iacr.el Outdated Show resolved Hide resolved
biblio-iacr.el Outdated Show resolved Hide resolved
biblio-iacr.el Outdated Show resolved Hide resolved
Uses pcase backquote style pattern to parse each bibliographic entry.
While functionally equivalent to using nth and caddr, this syntax more
clearly shows the structure of the HTML document

Signed-off-by: Lorenzo Martinico <[email protected]>
When the only contents of the dl node is a h3 node (containing text "No
reports found"), an error is displayed and no results are returned
@lzmartinico
Copy link
Author

Thanks for all the comments! I have pushed some more commits to address these

pcase-let is greedy and will match variable even if h3 tag is not
present in the header.
Author string should be part of a list
@lzmartinico
Copy link
Author

While testing, I have also noted that the IACR search page fails if the year is one of the search terms (spaces are counted as ANDs, but the search string is never checked against the year). Would it make sense to correct this, or should the library try not to be as opinionated and keep to the original API? For me it's pretty normal to search for "Author year" as a pattern

Rather than having bib key with format cryptoeprint:year/issue, use
cryptoeprint:year:issue, the format used by IACR for eprint entries.
@cpitclaudel
Copy link
Owner

While testing, I have also noted that the IACR search page fails if the year is one of the search terms (spaces are counted as ANDs, but the search string is never checked against the year). Would it make sense to correct this, or should the library try not to be as opinionated and keep to the original API? For me it's pretty normal to search for "Author year" as a pattern

What would be the normal way to search for an author and year on the website?

Stores the metadata authors entry for an IACR bibliography entry as a
list rather than and AND-separated string, and recomposes it when
forwarding the bibtex entry.
@lzmartinico
Copy link
Author

lzmartinico commented Dec 12, 2020

While testing, I have also noted that the IACR search page fails if the year is one of the search terms (spaces are counted as ANDs, but the search string is never checked against the year). Would it make sense to correct this, or should the library try not to be as opinionated and keep to the original API? For me it's pretty normal to search for "Author year" as a pattern

What would be the normal way to search for an author and year on the website?

There is only an optional lastmonths parameter to limit search results to the last N months, but it doesn't seem to work, as it still returns all the results for the matching keywords.

@cpitclaudel
Copy link
Owner

There is only an optional lastmonths parameter to limit search results to the last N months, but it doesn't seem to work, as it still returns all the results for the matching keywords.

I see. I think we should stick with exposing the raw website's functionality. But it definitely wouldn't hurt to write the maintainers of the archive to ask them for a proper API, for the long run :)

Copy link
Owner

@cpitclaudel cpitclaudel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great; I'm very happy with how succinct the code is.

Do you want to publish it as an independent package (MELPA makes this very easy, see https://github.com/melpa/melpa/blob/master/CONTRIBUTING.org — and you get extra feedback on the code, too, which is nice), or do you want to merge it into biblio itself (if the latter, I think we should add tests, since when I originally wrote biblio I tried to have 100% coverage, so the former may be more straightforward)

@lzmartinico
Copy link
Author

Thanks! I think it makes more sense to include this within biblio, I think it's a bit too small to be meaningful as a full package.
I'll have a look at the test file to see how the other backends are being tested - or is there any further specific test you'd like to see?

@cpitclaudel
Copy link
Owner

Nope, same style as the others is perfect. Let me know if you need help (in the past the testing framework moved quickly and that caused breakage; if you see similar breakage let me know and I'll fix the tests ^^)

This commit extends the Unit and Feature tests with examples related to
the IACR module. Feature tests require the existence of 'sym-lookup
function, so the biblio-iacr-lookup function and iacr-lookup alias are added
@lzmartinico
Copy link
Author

Hi, I just pushed some tests for the new module, let me know if you think anything is missing!

(title . "Universally Composable Security: A New Paradigm for Cryptographic Protocols")
(authors "Ran Canetti")
(containter . "Cryptology ePrint Archive, Report 2000/067") (type . "eprint")
(url . "https://eprint.iacr.org/2000/067") (doi . nil))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a weird bug with the test suite, where if this sample item was inserted after the dblp samples, some tests in the -get-url section would fail. I don't understand buttercup well enough to debug this though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants