Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve obs_to_matrix with data.table #210

Open
wants to merge 12 commits into
base: devel
Choose a base branch
from
Open

Improve obs_to_matrix with data.table #210

wants to merge 12 commits into from

Conversation

andrewrech
Copy link

Replace reshape2::dcast with data.table::dcast for speed gain and to avoid issue with long sample names causing vector return error in reshape2 but not data.table:

Using reshape2:

obs_counts <- dcast(obj$obs_norm, target_id ~ sample, value.var = value_name)
Aggregation function missing: defaulting to length

Error during wrapup: dims [product 868328] do not match the length of object [41674225]

But no error using data.table.

obj_norm to reproduce

Linux i-094bb338984186ac0 4.15.0-1031-aws #33-Ubuntu SMP Fri Dec 7 09:32:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sleuth_0.30.0     data.table_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0            bindr_0.1.1           magrittr_1.5.0.9000
 [4] tidyselect_0.2.5.9000 munsell_0.5.0.9000    colorspace_1.4-0
 [7] R6_2.3.0              rlang_0.3.1.9000      dplyr_0.7.8
[10] tools_3.5.2           parallel_3.5.2        grid_3.5.2
[13] rhdf5_2.24.0          gtable_0.2.0          lazyeval_0.2.1.9000
[16] assertthat_0.2.0.9000 tibble_2.0.1          crayon_1.3.4
[19] bindrcpp_0.2.2        purrr_0.3.0           Rhdf5lib_1.2.1
[22] ggplot2_3.1.0.9000    glue_1.3.0.9000       compiler_3.5.2
[25] pillar_1.3.1          scales_1.0.0.9000     pkgconfig_2.0.2

warrenmcg and others added 6 commits June 27, 2018 14:23
Replace reshape2::dcast with data.table::dcadt for speed gain and to avoid issue with long sample names causing vector return error in reshape2 but not data.table:

Using reshape2:

```
 obs_counts <- dcast(obj$obs_norm, target_id ~ sample, value.var = value_name)
Aggregation function missing: defaulting to length
Error during wrapup: dims [product 868328] do not match the length of object [41674225]
```

But no error using data.table.

[`obj_norm`](https://s3.amazonaws.com/rech-ul/obj_norm.RDS?AWSAccessKeyId=AKIAI6SXE4VOIPIZJI6Q&Expires=1601162547&Signature=3lPIt6yZ2UPFhMioUwwRopzo8eM%3D) to reproduce:
@andrewrech
Copy link
Author

Additionally fixed use of data.table::dcast on data.frame (this function does not implicitly convert)

@warrenmcg
Copy link
Collaborator

Hi @andrewrech,

Thank you for your work here! Two suggestions:

  1. A cleaner way to handle issue Unstated namespace dependency on data.table::head #135 is to add an #' @importFrom utils head statement to the documentation for sleuth_gene_table (between lines 1092 and 1093 of sleuth.R here), as well as sleuth_to_matrix(between lines 32 and 33 of matrix.R here). Once you've made the change, open an R session in the sleuth directory and use devtools::document() to update the NAMESPACE file.
  2. There are three calls to reshape2 in the plot_transcript_heatmap function in plots.R. If you convert those to data.table::dcast and then add a line after the if statements to convert tab_df to a data.frame, we can remove a dependency to reshape2 altogether (just deleting it from the "depends" list in the DESCRIPTION file). You should also run devtools::document() to make sure references to reshape2 are gone from the NAMESPACE file.

In both cases, you can run devtools::check() to see if there are any other issues.

@warrenmcg warrenmcg changed the base branch from master to devel March 14, 2019 14:52
@warrenmcg
Copy link
Collaborator

Also, I've switched to the devel branch for merging, as it allows us to combine this with other changes for the next stable release of sleuth.

@andrewrech
Copy link
Author

@warrenmcg Thank you for your help

  • added importFrom
  • update NAMESPACE
  • check that no further namespace issues exist with devtools::check
  • convert plots.R reshape2 calls
  • remove reshape2 import from DESCRIPTION

then add a line after the if statements to convert tab_df to a data.frame

data.table::dcast returns a data frame if given a data frame; no need to convert back

data.table::dcast(test_df, time ~ variable, fun=mean) %>% str
'data.frame':   12 obs. of  2 variables:
 $ time  : num  0 2 4 6 8 10 12 14 16 18 ...
 $ weight: num  41.1 49.2 60 74.3 91.2 ...
  • confirm reshape2 is not required using devtools::check

Copy link
Collaborator

@warrenmcg warrenmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing reshape2::dcast to data.table::dcast means that tabd_df is now a data.table object rather than a data.frame. I think it would be best if there was a line after the if statements to convert tabd_df to a data.frame (at line 1049, for example).

@andrewrech
Copy link
Author

@warrenmcg see my note above

@warrenmcg warrenmcg dismissed their stale review March 16, 2019 18:23

I missed the note above

@warrenmcg
Copy link
Collaborator

Hi @andrewrech, you are correct. Sorry about that! We will be reviewing everything next week to merge and hopefully have a new release soon! Thanks again for your help and effort! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants