Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure gene expression values are numeric #7

Merged
merged 5 commits into from
Apr 4, 2022

Conversation

envest
Copy link
Contributor

@envest envest commented Mar 28, 2022

I have implemented a solution in re: #5 (see recent comment). My motivation for this update is so we can use TDM with R version 4 in another project. Initially we were alarmed by much worse results coming from R-4, so we stuck with developing the project with R-3, but now we want to make results consistent across R versions. So after some digging...

The issue is rooted in changes from R version 3 to R version 4, especially in how data.matrix() works.

Looking at R News in particular the upates under R 4.0.0, it says that

data.matrix() now converts character columns to factors and from this to integers.

Also, here in data.matrix() documentation.

Previously, gene expression values were characters in many intermediate steps of TDM functions. This was fine in R-3 because they could be handled as numeric values in later steps. However, with R-4 changes, those character values were converted into factors and then integers, entirely changing the values (notice how the summary values in #5 are integers in R-4 results).

My solution here is to explicitly ensure gene expression values are numeric in the intermediate steps of each TDM function. I've written a new function: ensure_numeric_gex() in R/tdm.R that takes in a data.table and returns a data.table with the gene column treated as.character() and the gene expression columns explicitly treated as.numeric().

That ensure_numeric_gex() function now shows up in transformation functions like inv_log_transform() to make sure the gene expression values are numeric at key stages. The gene name values now also get treated as.character() as needed throughout.

The updated code has undergone toy data testing (see #5) as well as larger scale functional testing with our other project.
We hope these additions can continue to make TDM a useful package for many R versions to come! Thank you!

@envest envest mentioned this pull request Mar 28, 2022
@jeffreyat jeffreyat merged commit b041807 into greenelab:master Apr 4, 2022
@jeffreyat
Copy link
Collaborator

Nice work - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants