Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to add number of downloads in the past year #60

Merged
merged 5 commits into from
Dec 20, 2019

Conversation

jmanitz
Copy link
Contributor

@jmanitz jmanitz commented Oct 17, 2019

check fails because its not able to build the vignettes:

E  creating vignettes (18.5s)
   --- re-building 'extending-riskmetric.Rmd' using rmarkdown
   Quitting from lines 211-215 (extending-riskmetric.Rmd) 
   Error: processing vignette 'extending-riskmetric.Rmd' failed with diagnostics:
   non-numeric argument to binary operator
   --- failed re-building 'extending-riskmetric.Rmd'
   
   --- re-building 'riskmetric.Rmd' using rmarkdown
   Quitting from lines 154-157 (riskmetric.Rmd) 
   Error: processing vignette 'riskmetric.Rmd' failed with diagnostics:
   non-numeric argument to binary operator
   --- failed re-building 'riskmetric.Rmd'
   
   SUMMARY: processing the following files failed:
     'extending-riskmetric.Rmd' 'riskmetric.Rmd'
   
   Error: Vignette re-building failed.
   Execution halted
Error in (function (command = NULL, args = character(), error_on_status = TRUE,  : 
  System command error

Looking into it in more detail, I am not quite sure how to resolve the following error. The vignette with that code runs through if I run it directly from Rstudio. The specific lines where the vignette fails calls the following, not sure where the gzfile is called..

> pkg_ref(c("riskmetric", "utils", "tools")) %>% 
+   as_tibble() %>% assess()
# A tibble: 3 x 13
  package version pkg_ref             assess_family_r~ assess_news_cur~ news_current
  <chr>   <chr>   <list<pkg_ref>>     <list<pkg_metri> <list<pkg_metri> <list<pkg_m>
1 riskme~ 0.1.0.~ riskmetric<install> <error>          <lgl [0]>        <lgl [0]>   
2 utils   3.6.1   utils<install>      <error>          <lgl [0]>        <lgl [0]>   
3 tools   3.6.1   tools<install>      <error>          <lgl [0]>        <lgl [0]>   
# ... with 7 more variables: assess_export_help.pkg_install <list<pkg_metric>>,
#   assess_downloads_1yr.pkg_ref <list<pkg_metric>>,
#   assess_has_news.pkg_ref <list<pkg_metric>>, downloads_1yr <list<pkg_metric>>,
#   export_help <list<pkg_metric>>,
#   assess_news_current.pkg_remote <list<pkg_metric>>, has_news <list<pkg_metric>>
Warning message:
In gzfile(file, "rb") :
  cannot open compressed file 'C:/.../Rpkgs/riskmetric/help/aliases.rds', probable reason 'No such file or directory'

@dgkf dgkf self-requested a review October 17, 2019 21:52
@dgkf
Copy link
Collaborator

dgkf commented Oct 17, 2019

Thanks @jmanitz - off to a great start.

recap

paraphrasing our email earlier for posterity:

When running R CMD check, you're getting errors when building vignettes. Specifically when knitting L211-215 of extending-riskmetric.Rmd and L154-157 of riskmetric.Rmd.

It seems it might be related to a warning when running

pkg_ref(c("riskmetric", "utils", "tools")) %>% as_tibble() %>% assess()

which gives a "no such file or directory" warning when trying to read the aliases.rds file for the riskmetric package.

possible issues

aliases.rds missing

I think that aliases.rds issue, though something I'd like to solve, might be unrelated. This is looking for a documentation metadata file, so make sure that the settings for generating documentation are checked and that the file has been created before trying to knit the vignettes.

pkg_ref_cache return type

I think that the culprit is the return type of pkg_ref_cache.downloads_1yr, which currently returns a data.frame, yet in the score.pkg_metric_downloads_1yr function it is treated like a numeric. This will throw an error when trying to score this metric. I'll review in-line to suggest some changes that should fix this part.


If you continue being unable to knit the vignettes after making those changes, please follow up and we can explore further.

Copy link
Collaborator

@dgkf dgkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that changing the cached data type such that it will work with the scoring function should resolve the vignette problems (might still have that warning, though).

attr(assess_downloads_1yr,"label") <- "number of downloads in the past year"

pkg_ref_cache.downloads_1yr <- function(x, name, ...) {
cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) %>% group_by(package) %>% summarize(downloads_1yr = sum(count))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to cache the whole data.frame that is returned by cran_downloads since it might be useful for other download-related metrics - for example, if we have a "downloads_1yr" and "downloads_1mo", both can use the same data without downloading it twice.

pkg_ref_cache.downloads_1yr <- function(x, ...) {
  cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) 
}

Comment on lines 23 to 25
assess_downloads_1yr.pkg_ref <- function(x, ...) {
pkg_metric(x$downloads_1yr, class = "pkg_metric_downloads_1yr")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pkg_ref_cache.downloads_1yr returns the whole data.frame, we could consider the assessment as pulling the sum of the count column.

assess_downloads_1yr.pkg_ref <- function(x, ...) {
  downloads <- sum(x$downloads_1yr$count)
  pkg_metric(downloads, class = "pkg_metric_downloads_1yr")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not catching this earlier but fields in x will need to be named the same as the pkg_ref_cache method that is used (e.g. x$downloads when a pkg_ref_cache.downloads is defined)

pkg_ref_cache.downloads <- function(x, ...) {
  cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) 
}

#' @import cranlogs
#' @export
assess_downloads_1yr.pkg_ref <- function(x, ...) {
  pkg_metric(x$downloads, class = "pkg_metric_downloads_1yr")
}

I think this change should resolve the build error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry but I guess I have to ask very stupid questions to get the stuff into the right format.

If I understand correctly you want the downloads and download_1yr to be separated, which makes sense so that we could have other assess functions to get 6month, rates etc.
Does that mean you want the summary of counts (sum(x$downloads_1yr$count)) to happen later in the score function? I would have expected it in the assess function. Does that mean I need another assess function?

Copy link
Collaborator

@dgkf dgkf Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all - this is a tricky package, and all work-in-progress code is hard to come back to after time away from the project.

This is how the code looks now (jmanitz/riskmetric R/assess_downloads.r#L17-L25)

pkg_ref_cache.downloads <- function(x, ...) {
  cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) 
}


#' @import cranlogs
#' @export
assess_downloads_1yr.pkg_ref <- function(x, ...) {
  pkg_metric(x$downloads_1yr, class = "pkg_metric_downloads_1yr")
}

When a field in x is accessed (like x$downloads_1yr), it uses the function pkg_ref_cache to try to retrieve it based on the name. The field that you are trying to access has to match the defined pkg_ref_cache.* method.

The problem here is that we're accessing a field called "downloads_1yr", but have defined it as pkg_ref_cache.downloads. We just need to change x$downloads_1yr to x$downloads to match the pkg_ref_cache method and we should be good to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much for your reply. That part makes sense to me and is an easy fix. However, I think we also need to accumulate the counts in the assess function, right?

pkg_ref_cache.downloads <- function(x, ...) {
  cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) 
}


#' @import cranlogs
#' @export
assess_downloads_1yr.pkg_ref <- function(x, ...) {
  downloads_1yr <- sum(x$downloads$count)
  pkg_metric(downloads_1yr, class = "pkg_metric_downloads_1yr")
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right - sorry for the oversight.

Copy link
Contributor Author

@jmanitz jmanitz Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i committed the changes as discussed, but its still buggy for list_of_pkg_ref - can you help whats going wrong?

> pkg_ref(c("dplyr","ggplot2")) %>% assess_downloads_1yr()
Error in UseMethod("assess_downloads_1yr") : 
  no applicable method for 'assess_downloads_1yr' applied to an object of class "c('list_of_pkg_ref', 'vctrs_list_of', 'vctrs_vctr')"

dgkf
dgkf previously requested changes Nov 4, 2019
Copy link
Collaborator

@dgkf dgkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmanitz, sorry for the delay in reviewing the changes. I think that there is just one lingering issue that needs to be resolved before we pull in the code.

Comment on lines 23 to 25
assess_downloads_1yr.pkg_ref <- function(x, ...) {
pkg_metric(x$downloads_1yr, class = "pkg_metric_downloads_1yr")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not catching this earlier but fields in x will need to be named the same as the pkg_ref_cache method that is used (e.g. x$downloads when a pkg_ref_cache.downloads is defined)

pkg_ref_cache.downloads <- function(x, ...) {
  cran_downloads(x$name, from=Sys.Date()-365, to=Sys.Date()) 
}

#' @import cranlogs
#' @export
assess_downloads_1yr.pkg_ref <- function(x, ...) {
  pkg_metric(x$downloads, class = "pkg_metric_downloads_1yr")
}

I think this change should resolve the build error

@dgkf dgkf dismissed their stale review December 20, 2019 18:09

I'm going to go ahead and pull this in. I'll make any necessary changes to get it working.

@dgkf dgkf self-requested a review December 20, 2019 18:09
@dgkf dgkf merged commit 2bc0d0b into pharmaR:master Dec 20, 2019
@dgkf
Copy link
Collaborator

dgkf commented Dec 20, 2019

I must have missed the most recent commits. When I went to make any last changes I think everything was in good shape. Apologies for not recognizing that sooner.

Thanks again @emdserono for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants