Ensure gene expression values are numeric #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have implemented a solution in re: #5 (see recent comment). My motivation for this update is so we can use TDM with R version 4 in another project. Initially we were alarmed by much worse results coming from R-4, so we stuck with developing the project with R-3, but now we want to make results consistent across R versions. So after some digging...
The issue is rooted in changes from R version 3 to R version 4, especially in how
data.matrix()
works.Looking at R News in particular the upates under R 4.0.0, it says that
Also, here in data.matrix() documentation.
Previously, gene expression values were characters in many intermediate steps of TDM functions. This was fine in R-3 because they could be handled as numeric values in later steps. However, with R-4 changes, those character values were converted into factors and then integers, entirely changing the values (notice how the summary values in #5 are integers in R-4 results).
My solution here is to explicitly ensure gene expression values are numeric in the intermediate steps of each TDM function. I've written a new function:
ensure_numeric_gex()
inR/tdm.R
that takes in adata.table
and returns adata.table
with the gene column treatedas.character()
and the gene expression columns explicitly treatedas.numeric()
.That
ensure_numeric_gex()
function now shows up in transformation functions likeinv_log_transform()
to make sure the gene expression values are numeric at key stages. The gene name values now also get treatedas.character()
as needed throughout.The updated code has undergone toy data testing (see #5) as well as larger scale functional testing with our other project.
We hope these additions can continue to make TDM a useful package for many R versions to come! Thank you!