Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestions for improving elx_dowload_xml and make query #26

Open
michelemaroni opened this issue Feb 11, 2022 · 1 comment
Open

suggestions for improving elx_dowload_xml and make query #26

michelemaroni opened this issue Feb 11, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@michelemaroni
Copy link

Hi Michal,

Thanks for releasing v0.4.0, I updated R and eurlex and i am using it.
I recently used elx_dowload_xml and I wanted to suggest some improvements:

  1. line 28 should likely be : notice type must be correctly specified" = notice %in% c("tree", "branch", "object")) (this is more of an issue)
  2. file = basename(url) could be file = paste(basename(url), ".xml)"
  3. With the current settings when object is passed to notice the object expression notice is retrieved (p 44 of cellar), however this does not contain metadata. I'd suggest to drop the language header and use ?language= a the end of the url when object is passed (p 42 of cellar), so that the object notice with the object metadata is retrieved.
  4. elx_dowload_xml could encapsulate a function that returns the xml notice as a string. So a user could decide wether to directly dowload the xml notice, or to get the xml notice as a string an parse it to get other fields and complement the make_query and run_query functions.
  5. About elx_make_query, you remember that there was the issue of the 10e6 limit? A workaraound/improvement could be to group together multiple items of the same property of a work. e.g. if i pass include_authors = TRUE, it could help to use (group_concat(distinct ?author_;separator=", ") as ?author) in the select statement and OPTIONAL{?work cdm:work_created_by_agent ?author_.} in the where statement of the sparql query. The uri would still be inside, but i see this less of an issue to clean it afterwards. This would help in not having duplicated works when running queries.

What do you think about theese?

All the best

@michalovadek michalovadek added the enhancement New feature or request label Feb 11, 2022
@michalovadek
Copy link
Owner

michalovadek commented Feb 15, 2022

hi, thanks for these suggestions again, it's much appreciated that you are testing the features and giving feedback.

  1. fixed in 0.4.1
  2. fixed in 0.4.1
  3. fixed in 0.4.1
  4. I don't recommend doing this, as the notices are usually large and can be a drag on Eur-Lex servers, so it's better to first download the notice and then work with it. Having said that, you can now access the different notices in R through elx_fetch_data(). If you want the data as a string just wrap the output in as.character()
  5. this sounds like a good suggestion but it might require a major rewrite of the make_query architecture which is admittedly pretty clumsy. I don't have time for this right now, but if you create a pull request I can have a look at it

You can get the provisional fixes for 1-4 by updating to the github version of the package. I haven't tested them properly yet so feedback would be welcome again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants