diff --git a/.Rbuildignore b/.Rbuildignore new file mode 100644 index 0000000..91114bf --- /dev/null +++ b/.Rbuildignore @@ -0,0 +1,2 @@ +^.*\.Rproj$ +^\.Rproj\.user$ diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5b6a065 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +.Rproj.user +.Rhistory +.RData +.Ruserdata diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml new file mode 100644 index 0000000..c3d6ebc --- /dev/null +++ b/.gitlab-ci.yml @@ -0,0 +1,22 @@ +image: continuumio/miniconda3:latest + +before_script: + - conda config --set always_yes yes --set changeps1 no + - conda update -n base conda + - conda info -a + - conda config --add channels defaults + - conda config --add channels bioconda + - conda config --add channels conda-forge + - conda install mamba + - mamba install boa + - cd .tests/ + - conda create -y -q -n beyondcell_test + +conda_build: + stage: build + only: + - master + - merge_requests + script: + - source activate beyondcell_test + - bash build_conda_package.sh ${PYTHON_VERSION} \ No newline at end of file diff --git a/.tests/build_conda_package.sh b/.tests/build_conda_package.sh new file mode 100644 index 0000000..44006e0 --- /dev/null +++ b/.tests/build_conda_package.sh @@ -0,0 +1,11 @@ + +if [ -z "$1" ] +then + PYTHON_VERSION=3.6 +else + PYTHON_VERSION=$1 +fi + +git clone https://gitlab.com/bu_cnio/conda-recipes +conda mambabuild conda-recipes/r-beyondcell --output-folder ./ +mamba install --use-local --update-deps linux-64/r-beyondcell-*.tar.bz2 diff --git a/DESCRIPTION b/DESCRIPTION index 1b2f1e3..c2cf9f2 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: beyondcell Type: Package Title: A Tool for the Analysis of Tumour Therapeutic Heterogeneity in single-cell RNA-seq -Version: 0.99.6 +Version: 1.0.0 Authors@R: c(person("Coral", "Fustero-Torre", email = "cfustero@cnio.es", role = "aut"), person("Maria Jose", "Jimenez-Santos", email = "mjjimenez@cnio.es", role = c("aut", "cre")), person("Santiago", diff --git a/LICENSE b/LICENSE index 82c5387..6dc088e 100644 --- a/LICENSE +++ b/LICENSE @@ -1,53 +1,53 @@ License -The following license governs the use of Beyondcell in academic and educational -environments. Commercial use requires a commercial license from the Spanish -National Cancer Research Center (CNIO), www.cnio.es. +The following license governs the use of Beyondcell in academic and educational +environments. Commercial use requires a commercial license from the Spanish +National Cancer Research Centre (CNIO), www.cnio.es. ACADEMIC PUBLIC LICENSE -Copyright (C) 2021 Coral Fustero-Torre, María José Jiménez-Santos, -Santiago García-Martín, Carlos Carretero-Puche, Luis García-Jimeno, +Copyright (C) 2021 Coral Fustero-Torre, María José Jiménez-Santos, +Santiago García-Martín, Carlos Carretero-Puche, Luis García-Jimeno, Tomás Di Domenico, Gonzalo Gómez-López and Fátima Al-Shahrour Preamble -This license contains the terms and conditions of using Beyondcell in -noncommercial settings: at academic institutions for teaching and research use, -and at non-profit research organizations. You will find that this license -provides noncommercial users of Beyondcell with rights that are similar to the -well-known GNU General Public License, yet it retains the possibility for -Beyondcell authors to financially support the development by selling commercial -licenses. In fact, if you intend to use Beyondcell in a "for-profit" -environment, where research is conducted to develop or enhance a product, is -used in a commercial service offering, or when a commercial company uses -Beyondcell to participate in a research project (for example government-funded -or EU-funded research projects), then you need to obtain a commercial license -for Beyondcell. In that case, please contact the Author (falshahrour@cnio.es) +This license contains the terms and conditions of using Beyondcell in +noncommercial settings: at academic institutions for teaching and research use, +and at non-profit research organizations. You will find that this license +provides noncommercial users of Beyondcell with rights that are similar to the +well-known GNU General Public License, yet it retains the possibility for +Beyondcell authors to financially support the development by selling commercial +licenses. In fact, if you intend to use Beyondcell in a "for-profit" +environment, where research is conducted to develop or enhance a product, is +used in a commercial service offering, or when a commercial company uses +Beyondcell to participate in a research project (for example government-funded +or EU-funded research projects), then you need to obtain a commercial license +for Beyondcell. In that case, please contact the Author (falshahrour@cnio.es) to inquire about commercial licenses. -What are the rights given to noncommercial users? Similarly to GPL, you have -the right to use the software, to distribute copies, to receive source code, to -change the software and distribute your modifications or the modified software. -Also similarly to the GPL, if you distribute verbatim or modified copies of +What are the rights given to noncommercial users? Similarly to GPL, you have +the right to use the software, to distribute copies, to receive source code, to +change the software and distribute your modifications or the modified software. +Also similarly to the GPL, if you distribute verbatim or modified copies of this software, they must be distributed under this license. -By modeling the GPL, this license guarantees that you're safe when using -Beyondcell in your work, for teaching or research. This license guarantees that -Beyondcell will remain available free of charge for nonprofit use. You can -modify Beyondcell to your purposes, and you can also share your modifications. -Even in the unlikely case of the authors abandoning Beyondcell entirely, this -license permits anyone to continue developing it from the last release, and to +By modeling the GPL, this license guarantees that you're safe when using +Beyondcell in your work, for teaching or research. This license guarantees that +Beyondcell will remain available free of charge for nonprofit use. You can +modify Beyondcell to your purposes, and you can also share your modifications. +Even in the unlikely case of the authors abandoning Beyondcell entirely, this +license permits anyone to continue developing it from the last release, and to create further releases under this license. -We believe that the combination of noncommercial open-source and commercial -licensing will be beneficial for the whole user community, because income from -commercial licenses will enable faster development and a higher level of -software quality, while further enjoying the informal, open communication and +We believe that the combination of noncommercial open-source and commercial +licensing will be beneficial for the whole user community, because income from +commercial licenses will enable faster development and a higher level of +software quality, while further enjoying the informal, open communication and collaboration channels of open source development. -The precise terms and conditions for using, copying, distribution and +The precise terms and conditions for using, copying, distribution and modification follow. ACADEMIC PUBLIC LICENSE @@ -56,176 +56,176 @@ TERMS AND CONDITIONS FOR USE, COPYING, DISTRIBUTION AND MODIFICATION 0. Definitions -"Program" means a copy of Beyondcell, which is said to be distributed under +"Program" means a copy of Beyondcell, which is said to be distributed under this Academic Public License. -"Work based on the Program" means either the Program or any derivative work -under copyright law: that is to say, a work containing the Program or a portion -of it, either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in the term +"Work based on the Program" means either the Program or any derivative work +under copyright law: that is to say, a work containing the Program or a portion +of it, either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in the term "modification".) -"Using the Program" means any act of creating executables that contain or -directly use libraries that are part of the Program, running any of the tools +"Using the Program" means any act of creating executables that contain or +directly use libraries that are part of the Program, running any of the tools that are part of the Program, or creating works based on the Program. Each licensee is addressed as "you". -1. Permission is hereby granted to use the Program free of charge for any -noncommercial purpose, including teaching and research at universities, -colleges and other educational institutions, research at non-profit research -institutions, and personal non-profit purposes. For using the Program for -commercial purposes, including but not restricted to consulting activities, -design of commercial hardware or software networking products, and a commercial -entity participating in research projects, you have to contact the Author for -an appropriate license. Permission is also granted to use the Program for a -reasonably limited period of time for the purpose of evaluating its usefulness +1. Permission is hereby granted to use the Program free of charge for any +noncommercial purpose, including teaching and research at universities, +colleges and other educational institutions, research at non-profit research +institutions, and personal non-profit purposes. For using the Program for +commercial purposes, including but not restricted to consulting activities, +design of commercial hardware or software networking products, and a commercial +entity participating in research projects, you have to contact the Author for +an appropriate license. Permission is also granted to use the Program for a +reasonably limited period of time for the purpose of evaluating its usefulness for a particular purpose. -2. You may copy and distribute verbatim copies of the Program's source code as -you receive it, in any medium, provided that you conspicuously and -appropriately publish on each copy an appropriate copyright notice and -disclaimer of warranty; keep intact all the notices that refer to this License -and to the absence of any warranty; and give any other recipients of the +2. You may copy and distribute verbatim copies of the Program's source code as +you receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice and +disclaimer of warranty; keep intact all the notices that refer to this License +and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. -3. You may modify your copy or copies of the Program or any portion of it, thus -forming a work based on the Program, and copy and distribute such modifications -or work under the terms of Section 2 above, provided that you also meet all of +3. You may modify your copy or copies of the Program or any portion of it, thus +forming a work based on the Program, and copy and distribute such modifications +or work under the terms of Section 2 above, provided that you also meet all of these conditions: - a) You must cause the modified files to carry prominent notices stating + a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. - b) You must cause any work that you distribute or publish, that in whole or -in part contains or is derived from the Program or any part thereof, to be -licensed as a whole at no charge to all third parties under the terms of this + b) You must cause any work that you distribute or publish, that in whole or +in part contains or is derived from the Program or any part thereof, to be +licensed as a whole at no charge to all third parties under the terms of this License. -These requirements apply to the modified work as a whole. If identifiable -sections of that work are not derived from the Program, and can be reasonably -considered independent and separate works in themselves, then this License, and -its terms, do not apply to those sections when you distribute them as separate -works. But when you distribute the same sections as part of a whole which is a -work based on the Program, the distribution of the whole must be on the terms -of this License, whose regulations for other licensees extend to the entire -whole, and thus to each and every part regardless of who wrote it. (If the -same, independent sections are distributed as part of a package that is -otherwise reliant on, or is based on the Program, then the distribution of the -whole package, including but not restricted to the independent section, must be -on the unmodified terms of this License, regardless of who the author of the +These requirements apply to the modified work as a whole. If identifiable +sections of that work are not derived from the Program, and can be reasonably +considered independent and separate works in themselves, then this License, and +its terms, do not apply to those sections when you distribute them as separate +works. But when you distribute the same sections as part of a whole which is a +work based on the Program, the distribution of the whole must be on the terms +of this License, whose regulations for other licensees extend to the entire +whole, and thus to each and every part regardless of who wrote it. (If the +same, independent sections are distributed as part of a package that is +otherwise reliant on, or is based on the Program, then the distribution of the +whole package, including but not restricted to the independent section, must be +on the unmodified terms of this License, regardless of who the author of the included sections was.) -Thus, it is not the intent of this section to claim rights or contest your -rights to work written entirely by you; rather, the intent is to exercise the -right to control the distribution of derivative or collective works based or +Thus, it is not the intent of this section to claim rights or contest your +rights to work written entirely by you; rather, the intent is to exercise the +right to control the distribution of derivative or collective works based or reliant on the Program. -In addition, mere aggregation of another work not based on the Program with the -Program (or with a work based on the Program) on a volume of storage or -distribution medium does not bring the other work under the scope of this +In addition, mere aggregation of another work not based on the Program with the +Program (or with a work based on the Program) on a volume of storage or +distribution medium does not bring the other work under the scope of this License. -4. You may copy and distribute the Program (or a work based on it, under -Section 3) in object code or executable form under the terms of Sections 2 and +4. You may copy and distribute the Program (or a work based on it, under +Section 3) in object code or executable form under the terms of Sections 2 and 3 above provided that you also do one of the following: - a) Accompany it with the complete corresponding machine-readable source -code, which must be distributed under the terms of Sections 2 and 3 above on a + a) Accompany it with the complete corresponding machine-readable source +code, which must be distributed under the terms of Sections 2 and 3 above on a medium customarily used for software interchange; or, - b) Accompany it with a written offer, valid for at least three years, to -give any third party, for a charge no more than your cost of physically -performing source distribution, a complete machine-readable copy of the -corresponding source code, to be distributed under the terms of Sections 2 and + b) Accompany it with a written offer, valid for at least three years, to +give any third party, for a charge no more than your cost of physically +performing source distribution, a complete machine-readable copy of the +corresponding source code, to be distributed under the terms of Sections 2 and 3 above on a medium customarily used for software interchange; or, - c) Accompany it with the information you received as to the offer to -distribute corresponding source code. (This alternative is allowed only for -noncommercial distribution and only if you received the program in object code + c) Accompany it with the information you received as to the offer to +distribute corresponding source code. (This alternative is allowed only for +noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b) above.) -The source code for a work means the preferred form of the work for making -modifications to it. For an executable work, complete source code means all the -source code for all modules it contains, plus any associated interface -definition files, plus the scripts used to control compilation and installation -of the executable. However, as a special exception, the source code distributed -need not include anything that is normally distributed (in either source or -binary form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component itself +The source code for a work means the preferred form of the work for making +modifications to it. For an executable work, complete source code means all the +source code for all modules it contains, plus any associated interface +definition files, plus the scripts used to control compilation and installation +of the executable. However, as a special exception, the source code distributed +need not include anything that is normally distributed (in either source or +binary form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component itself accompanies the executable. -If distribution of executable or object code is made by offering access to copy -from a designated place, then offering equivalent access to copy the source -code from the same place counts as distribution of the source code, even though +If distribution of executable or object code is made by offering access to copy +from a designated place, then offering equivalent access to copy the source +code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. -5. You may not copy, modify, sublicense, or distribute the Program except as -expressly provided under this License. Any attempt otherwise to copy, modify, -sublicense or distribute the Program is void, and will automatically terminate -your rights under this License. However, parties who have received copies, or -rights, from you under this License will not have their licenses terminated so +5. You may not copy, modify, sublicense, or distribute the Program except as +expressly provided under this License. Any attempt otherwise to copy, modify, +sublicense or distribute the Program is void, and will automatically terminate +your rights under this License. However, parties who have received copies, or +rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. -6. You are not required to accept this License, since you have not signed it. -Nothing else grants you permission to modify or distribute the Program or its -derivative works; law prohibits these actions if you do not accept this -License. Therefore, by modifying or distributing the Program (or any work based -on the Program), you indicate your acceptance of this License and all its terms -and conditions for copying, distributing or modifying the Program or works +6. You are not required to accept this License, since you have not signed it. +Nothing else grants you permission to modify or distribute the Program or its +derivative works; law prohibits these actions if you do not accept this +License. Therefore, by modifying or distributing the Program (or any work based +on the Program), you indicate your acceptance of this License and all its terms +and conditions for copying, distributing or modifying the Program or works based on it, to do so. -7. Each time you redistribute the Program (or any work based on the Program), -the recipient automatically receives a license from the original licensor to -copy, distribute or modify the Program subject to these terms and conditions. -You may not impose any further restrictions on the recipients' exercise of the -rights granted herein. You are not responsible for enforcing compliance by +7. Each time you redistribute the Program (or any work based on the Program), +the recipient automatically receives a license from the original licensor to +copy, distribute or modify the Program subject to these terms and conditions. +You may not impose any further restrictions on the recipients' exercise of the +rights granted herein. You are not responsible for enforcing compliance by third parties to this License. -8. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), conditions -are imposed on you (whether by court order, agreement or otherwise) that -contradict the conditions of this License, they do not excuse you from the -conditions of this License. If you cannot distribute so as to satisfy -simultaneously your obligations under this License and any other pertinent -obligations, then as a consequence you may not distribute the Program at all. -For example, if a patent license would not permit royalty-free redistribution -of the Program by all those who receive copies directly or indirectly through -you, then the only way you could satisfy both it and this License would be to +8. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), conditions +are imposed on you (whether by court order, agreement or otherwise) that +contradict the conditions of this License, they do not excuse you from the +conditions of this License. If you cannot distribute so as to satisfy +simultaneously your obligations under this License and any other pertinent +obligations, then as a consequence you may not distribute the Program at all. +For example, if a patent license would not permit royalty-free redistribution +of the Program by all those who receive copies directly or indirectly through +you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. -If any portion of this section is held invalid or unenforceable under any -particular circumstance, the balance of the section is intended to apply and +If any portion of this section is held invalid or unenforceable under any +particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. -9. If the distribution and/or use of the Program are restricted in certain -countries either by patents or by copyrighted interfaces, the original -copyright holder who places the Program under this License may add an explicit -geographical distribution limitation excluding those countries, so that -distribution is permitted only in or among countries not thus excluded. In such -case, this License incorporates the limitation as if written in the body of +9. If the distribution and/or use of the Program are restricted in certain +countries either by patents or by copyrighted interfaces, the original +copyright holder who places the Program under this License may add an explicit +geographical distribution limitation excluding those countries, so that +distribution is permitted only in or among countries not thus excluded. In such +case, this License incorporates the limitation as if written in the body of this License. NO WARRANTY -10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR -THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE -STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE -PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, -INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND -FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND -PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU +10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR +THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE +STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE +PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, +INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND +PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. -11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED ON IN WRITING WILL -ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE -PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY -GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR -INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA -BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A -FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER +11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED ON IN WRITING WILL +ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE +PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR +INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA +BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A +FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS -NOTE In case the above text differs from the license file in the source +NOTE In case the above text differs from the license file in the source distribution, the latter is the valid one. diff --git a/NAMESPACE b/NAMESPACE index ad46796..1d85627 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,17 +1,17 @@ # Generated by roxygen2: do not edit by hand export(BreakString) +export(CaseFraction) export(CreatebcObject) -export(FilteredIDS) export(FindDrugs) -export(GeneCase) export(GenerateGenesets) +export(GetIDs) export(GetStatistics) export(ListFilters) export(Mean.Med.SD) export(SwitchPoint) export(bc4Squares) -export(bcAddMetatada) +export(bcAddMetadata) export(bcCellCycle) export(bcClusters) export(bcHistogram) @@ -25,7 +25,7 @@ export(bcSubset) export(bcUMAP) export(center_scale_colour_stepsn) export(colMinus) -export(get_colour_stepsn) +export(get_colour_steps) export(minus) export(rankSigs) import(Seurat) diff --git a/R/Basics.R b/R/Basics.R index c50c75e..0057225 100644 --- a/R/Basics.R +++ b/R/Basics.R @@ -1,9 +1,9 @@ #' @title Substracts to the first element of a vector the rest of elements -#' @description This function substracts to the first element of a numerical +#' @description This function substracts to the first element of a numeric #' vector (\code{x[1]}) the rest of elements of the same vector. #' (\code{x[2:length(x)]}). #' @name minus -#' @param x Numerical vector +#' @param x Numeric vector. #' @param na.rm (From \code{base}) Logical. Should missing values (including #' \code{NaN}) be removed? #' @return The result of the substraction. @@ -26,14 +26,14 @@ minus <- function(x, na.rm = FALSE) { return(out) } -#' @title Substracts to the first row of a rectangular object the rest of rows -#' @description This function substracts to the first row of a numerical -#' rectangular object (\code{x[1, ]}) the rest of rows of the same rectangular -#' object (\code{x[2:length(x), ]}). +#' @title Computes the column substraction +#' @description This function substracts to the first element of each column of +#' a rectangular object (\code{x[1, n]}) the rest of elements of the same column +#' (\code{x[2:length(x), n]}). #' @name colMinus -#' @param x A matrix or a data frame. +#' @param x A matrix or a dataframe. #' @param na.rm (From \code{base}) Logical. Should missing values (including -#' \code{NaN}) be omitted from the calculations? +#' \code{NaN}) from rows \code{2:length(x)} be omitted from the calculations? #' @return A numeric rectangular object with the result of the substraction. #' @examples #' @export @@ -50,20 +50,20 @@ colMinus <- function(x, na.rm = FALSE) { } # --- Code --- # If x has a single row, append a row of zeros so we can run the next step. - if (dim(x)[1] == 1) x <- rbind(x, rep(0, ncol(x))) - # If na.rm == TRUE, remove NAs from the first.row. - if (na.rm) first.row <- na.omit(x[1, , drop = FALSE]) - else first.row <- x[1, , drop = FALSE] + if (dim(x)[1] == 1) x <- rbind(x, rep(0, times = ncol(x))) # Substract to the first row of x the rest of rows. + first.row <- x[1, , drop = FALSE] out <- first.row - colSums(x[2:nrow(x), , drop = FALSE], na.rm = na.rm) return(out) } #' @title Computes the mean, the median and the sd of a vector #' @description This function computes the mean, the median and the sd of a -#' vector removing \code{NA} values. +#' vector. #' @name Mean.Med.SD -#' @param x A numeric vector for which to compute the statistics. +#' @param x Numeric vector. +#' @param na.rm (From base) Logical. Should missing values (including NaN) be +#' removed? #' @return A named numeric vector with the mean, median and sd of \code{x}. #' @examples #' @export @@ -82,23 +82,23 @@ Mean.Med.SD <- function(x) { c("mean", "median", "sd"))) } -#' @title Computes bcRanks' statistics and ranks -#' @description This function computes the bcscores' statistics and ranks -#' returned by \code{\link[bcRnaks]{bcRanks}}. +#' @title Computes the BCS' statistics and ranks +#' @description This function computes the beyondcell scores' (BCS) statistics +#' and ranks returned by \code{\link[beyondcell]{bcRanks}}. #' @name GetStatistics #' @param bc \code{\link[beyondcell]{beyondcell}} object. #' @param signatures Vector with the names of the signatures of interest. #' @param cells Vector with the names of the cells of interest. -#' @param pb Progress bar. -#' @param total Number of iterations to complete the progress bar. -#' @param i Iteration number; used for increasing the progress bar. -#' @param n.rows Number of signatures; used for increasing the progress bar. +#' @param pb \code{\link[utils]{txtProgressBar}}. +#' @param total Number of iterations to complete the \code{pb}. +#' @param i Iteration number. Used to increase the \code{pb}. +#' @param n.rows Number of signatures. Used to increase the \code{pb}. #' @param extended If \code{extended = TRUE}, this function returns the switch #' point, mean, median, sd, variance, min, max, proportion of \code{NaN} and #' residuals' mean per signature. If \code{extended = FALSE}, this function #' returns only the switch point, mean and residuals' mean. #' -#' @return A \code{data.frame} with the statistics and ranks per signature. +#' @return A \code{data.frame} with the BCS' statistics and ranks. #' @examples #' @export @@ -108,7 +108,7 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check signatures. - in.signatures <- signatures %in% rownames(bc@normalized) + in.signatures <- !is.null(signatures) & signatures %in% rownames(bc@normalized) if (all(!in.signatures)) { stop('None of the specified signatures were found.') } else if (any(!in.signatures)) { @@ -116,7 +116,7 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, paste0(signatures[!in.signatures], collapse = ", "), '.')) } # Check cells. - in.cells <- cells %in% colnames(bc@normalized) + in.cells <- !is.null(cells) & cells %in% colnames(bc@normalized) if (all(!in.cells)) { stop('None of the specified cells were found.') } else if (any(!in.cells)) { @@ -154,11 +154,11 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, data <- cbind(seq_len(n.rows) + (n.rows * 6) * (i - 1), bc@data[signatures, cells]) mean.med.sd <- as.data.frame(t(apply(data, 1, function(u) { - mms <- round(Mean.Med.SD(u[-1]), 2) + mms <- round(Mean.Med.SD(u[-1]), digits = 2) ### Update the progress bar. if (u[1]%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, u[1]) + setTxtProgressBar(pb, value = u[1]) } return(mms) }))) @@ -169,40 +169,40 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, ### Update the progress bar. if (v[1]%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, v[1]) + setTxtProgressBar(pb, value = v[1]) } return(variance) }) - # Min normalized bcscore per signature. + # Min normalized BCS per signature. data[, 1] <- data[, 1] + n.rows min.bcscore <- apply(data, 1, function(w) { min.bcs <- min(w[-1], na.rm = TRUE) ### Update the progress bar. if (w[1]%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, w[1]) + setTxtProgressBar(pb, value = w[1]) } return(min.bcs) }) - # Max normalized bcscore per signature. + # Max normalized BCS per signature. data[, 1] <- data[, 1] + n.rows max.bcscore <- apply(data, 1, function(x) { max.bcs <- max(x[-1], na.rm = TRUE) ### Update the progress bar. if (x[1]%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, x[1]) + setTxtProgressBar(pb, value = x[1]) } return(max.bcs) }) # NA proportion per signature. data[, 1] <- data[, 1] + n.rows prop.na <- apply(data, 1, function(y) { - nas <- round(sum(is.na(y[-1]))/length(y[-1]), 2) + nas <- round(sum(is.na(y[-1]))/length(y[-1]), digits = 2) ### Update the progress bar. if (y[1]%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, y[1]) + setTxtProgressBar(pb, value = y[1]) } return(nas) }) @@ -216,8 +216,9 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, max = max.bcscore, prop.na = prop.na, row.names = signatures) } else { - # Mean bcscores per signature. - mean.bc <- round(rowMeans(bc@data[signatures, cells], na.rm = TRUE), 2) + # Mean BCS per signature. + mean.bc <- round(rowMeans(bc@data[signatures, cells], na.rm = TRUE), + digits = 2) # Residuals. normalized <- cbind(seq_len(n.rows) + (n.rows * (i - 1)), bc@normalized[signatures, cells]) @@ -225,17 +226,168 @@ GetStatistics <- function(bc, signatures, cells, pb, total, i, n.rows, stats <- data.frame(switch.point = switch.p, mean = mean.bc, row.names = signatures) } - # Regression residuals. + # Residuals' mean. resid <- apply(normalized, 1, function(z) { - res <- round(mean(z[-1], na.rm = TRUE), 2) + res <- round(mean(z[-1], na.rm = TRUE), digits = 2) ### Update the progress bar. if (z[1]%%bins == 0 | z[1] == total) { Sys.sleep(0.1) - setTxtProgressBar(pb, z[1]) + setTxtProgressBar(pb, value = z[1]) } return(res) }) # Update dataframe. - stats <- cbind(stats, data.frame(residuals = resid, row.names = signatures)) + stats <- cbind(stats, data.frame(residuals.mean = resid, row.names = signatures)) return(stats) } + +#' @title Returns the IDs that match the specified filter's values +#' @description This function subsets \code{df} to select only the entries that +#' match the specified \code{filter}'s \code{values} and returns the +#' corresponding IDs. +#' @name GetIDs +#' @param values User-supplied filtering vector. +#' @param filter Column name or number to subset by. +#' @param df \code{data.frame} with drug information. It must contain, at least, +#' two columns: \code{"IDs"} and \code{filter}. +#' @return A vector with the IDs that match the \code{filter}'s values. +#' @export + +GetIDs <- function(values, filter, df = drugInfo) { + # --- Checks --- + # Check values. + if (length(values) < 1 | !is.character(values)) { + stop('values must be a character vector.') + } + # Check filter. + if (length(filter) != 1) { + stop('You must specify a single filter.') + } + if (is.character(filter) & !(filter %in% colnames(df))) { + stop(paste('filter =', filter, 'is not a column of df.')) + } + if (is.numeric(filter) & (filter < 1 | filter > ncol(df))) { + stop(paste('filter = ', filter, 'is out of range.')) + } + # Check df. + if (class(df) != "data.frame") { + stop('df must be a data.frame') + } + if (!("IDs" %in% colnames(df))) { + stop('df must contain an "IDs" column.') + } + # --- Code --- + upper.values <- toupper(values) + selected <- subset(df, subset = toupper(df[[filter]]) %in% upper.values) + if (filter == "drugs" & "preferred.drug.names" %in% colnames(df)) { + synonyms <- subset(df, subset = toupper(df[["preferred.drug.names"]]) %in% + unique(toupper(selected[["preferred.drug.names"]]))) + selected <- unique(rbind(selected, synonyms)) + } + ids <- unique(selected$IDs) + not.found <- values[!(upper.values %in% toupper(df[[filter]]))] + if (all(values %in% not.found)) { + stop('No sig ID was found for any of the elements in values.') + } else if (length(not.found) > 0) { + filtername <- gsub(pattern = '"', replacement = '', + x = deparse(substitute(filter))) + warning(paste0('sig IDs were not found for ', length(not.found), ' out of ', + length(values), " ", filtername, ': ', + paste0(not.found, collapse = ", "), ".")) + } + return(ids) +} + +#' @title Returns a dataframe with information about the input drugs +#' @description This function searches the input drugs in the pre-loaded +#' \code{beyondcell} matrices and returns a dataframe with drug information, +#' including drug synonyms and MoAs. +#' @name FindDrugs +#' @param bc \code{\link[beyondcell]{beyondcell}} object. +#' @param x A character vector with drug names and/or sig IDs. +#' @param na.rm Logical. Should \code{x} entries with no available information +#' be removed from the final output? +#' @details The output \code{data.frame} has the following columns: +#' \itemize{ +#' \item{\code{original.names}}: Input drug names. +#' \item{\code{bc.names}}: Drug names used in \code{bc}. +#' \item{\code{preferred.drug.names}}: Standard drug names. +#' \item{\code{drugs}}: Other drug names. +#' \item{\code{IDs}}: Signature IDs. +#' \item{\code{preferred.and.sigs}}: \code{preferred.drug.names} (or alternatively +#' \code{bc.names}) and \code{IDs}. Used as title in \code{beyondcell} plots. +#' \item{\code{MoAs}}: Mechanism(s) of action. +#' } +#' @return A \code{data.frame}. +#' @examples +#' @export + +FindDrugs <- function(bc, x, na.rm = TRUE) { + # --- Checks --- + # Check that bc is a beyondcell object. + if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') + # Check x. + if (!is.character(x)) stop('x must be a character vector.') + # Check na.rm. + if (length(na.rm) != 1 | !is.logical(na.rm)) { + stop('na.rm must be TRUE or FALSE.') + } + # --- Code --- + # bc signatures. + sigs <- rownames(bc@normalized) + # Match x with bc signatures and get the indices of matching elements. + indices <- lapply(x, function(y) { + idx <- match(toupper(y), table = toupper(sigs), nomatch = 0) + if (idx == 0) { + idx <- unique(match(drugInfo$IDs[drugInfo$drugs == toupper(y)], + table = sigs)) + } + return(idx[!is.na(idx)]) + }) + # Original names (x) and bc names (sigs). + df <- data.frame(original.names = unlist(sapply(seq_along(x), function(i) { + rep(x[i], times = length(indices[[i]])) + })), IDs = unlist(sapply(indices, function(z) sigs[z]))) + df.not.found <- !(x %in% df$original.names) + if (any (df.not.found)) { + empty.df <- data.frame(original.names = x[df.not.found], + IDs = rep(NA, sum(df.not.found))) + df <- rbind(df, empty.df) + } + # Get the names and pathways of the selected signatures. + info <- subset(drugInfo, subset = IDs %in% df$IDs) + if (all(dim(info) != 0)) { + info <- aggregate(.~ IDs, data = info, na.action = NULL, FUN = function(w) { + paste(na.omit(unique(w)), collapse = ", ") + }) + } + info.not.found <- !(df$IDs %in% drugInfo$IDs) + if (any(info.not.found)) { + empty.info <- matrix(rep(NA, times = sum(info.not.found)*6), ncol = 6, + dimnames = list(1:sum(info.not.found), + colnames(drugInfo))) + info <- rbind(info, as.data.frame(empty.info)) + } + # Merge df and info. + df <- unique(merge(df, info[, c("IDs", "drugs", "preferred.drug.names", + "MoAs")], by = "IDs", all.x = TRUE)) + # Add bc.names column and remove names that are not sig IDs from sig_id + # column. + df$bc.names <- df$IDs + df$IDs[!startsWith(df$IDs, prefix = "sig_")] <- NA + # Create preferred.and.sigs column: Preferred name and sig_id. + df$preferred.and.sigs <- sapply(1:nrow(df), function(j) { + return(ifelse(test = !is.na(df$preferred.drug.names[j]), + yes = paste0(df$preferred.drug.names[j], + paste0(" (", df$IDs[j], ")")), + no = df$bc.names[j])) + }) + # Reorder df. + rows <- unlist(lapply(x, function(entry) which(df$original.names == entry))) + cols <- c("original.names", "bc.names", "preferred.drug.names", "drugs", "IDs", + "preferred.and.sigs", "MoAs") + df <- df[rows, cols] + # If na.rm = TRUE, remove rows with NAs in all fields but original.names. + if (na.rm) df <- df[rowSums(is.na(df[, -1])) != ncol(df) - 1, ] + return(df) +} diff --git a/R/Format.R b/R/Format.R index 22b4f6f..a42705a 100644 --- a/R/Format.R +++ b/R/Format.R @@ -1,18 +1,18 @@ -#' @title Returns a vector of colors to use in \code{center_scale_colour_stepsn} -#' @description This function gets a color palette and returns a vector with 5 -#' color values to be used in -#' \code{\link[center_scale_colour_stepsn]{center_scale_colour_stepsn}}. -#' @name get_colour_stepsn +#' @title Returns a vector of 5 colours +#' @description This function gets a colour palette and returns a vector with 5 +#' colour values to be used in +#' \code{\link[beyondcell]{center_scale_colour_stepsn}}. +#' @name get_colour_steps #' @importFrom viridis viridis #' @importrom RColorBrewer brewer.pal #' @param colorscale Either a \code{viridis}, \code{RColorBrewer} or a custom -#' palette of 3 colors (low, medium and high). If \code{colorscale = NULL} +#' palette of 3 colours (low, medium and high). If \code{colorscale = NULL} #' (default), the function returns \code{beyondcell}'s own palette. -#' @return A vector with 5 color values. +#' @return A vector with 5 colour values. #' @examples #' @export -get_colour_stepsn <- function(colorscale = NULL) { +get_colour_steps <- function(colorscale = NULL) { # --- Checks and Code --- default <- c("#1D61F2", "#98B9FF", "#F7F7F7", "#FF9CBB", "#DA0078") # Check colorscale and get its value. @@ -26,10 +26,10 @@ get_colour_stepsn <- function(colorscale = NULL) { warning = function(cond) cond) if (inherits(guess.colors, "error") | inherits(guess.colors, "warning")) { ### Is colorscale an RColorBrewer palette? - guess.colors <- tryCatch(RColorBrewer::brewer.pal(11, colorscale), - error = function(cond) cond, - warning = function(cond) cond) - if (inherits(guess.colors, "error") | inherits(guess.colors, "warning")) { + guess.colors <- tryCatch(suppressWarnings( + RColorBrewer::brewer.pal(12, name = colorscale)), + error = function(cond) cond) + if (inherits(guess.colors, "error")) { ### Is colorscale any other palette? guess.colors <- tryCatch(scale_colour_stepsn(colours = colorscale), error = function(cond) cond) @@ -41,23 +41,23 @@ get_colour_stepsn <- function(colorscale = NULL) { ### If colorscale contains less than 3 values, set default colorscale. len.colors <- length(colorscale) if (len.colors < 3) { - warning(paste('Colorscale too short. It must contain 3 colors:', + warning(paste('Colorscale too short. It must contain 3 colours:', 'high, medium and low. Settig default colorscale...')) colors <- default } else { - ### Else, construct a scale with 5 colors: the first, middle and - ### last values in colorscale and 2 intermediate colors between them - ### (color.low and color.high, computed with colorRampPalette). + ### Else, construct a scale with 5 colours: the first, middle and + ### last values in colorscale and 2 intermediate colours between + ### them (color.low and color.high, computed with colorRampPalette). color.middle <- colorscale[ceiling(len.colors/2)] - color.low <- colorRampPalette(c(colorscale[1], color.middle), + color.low <- colorRampPalette(colors = c(colorscale[1], color.middle), space = "Lab")(3)[2] - color.high <- colorRampPalette(c(colorscale[3], color.middle), + color.high <- colorRampPalette(colors = c(colorscale[3], color.middle), space = "Lab")(3)[2] colors <- c(colorscale[1], color.low, color.middle, color.high, colorscale[len.colors]) if (len.colors > 3) { - warning(paste('Custom colorscale too long. It must contain 3', - 'colors: high, medium and low. Colors chosen:', + warning(paste('Colorscale too long. It must contain 3 colours:', + 'high, medium and low. Colours chosen:', paste0(colors[c(1, 3, 5)], collapse = ", "))) } } @@ -65,62 +65,73 @@ get_colour_stepsn <- function(colorscale = NULL) { ### If colorscale is an RColorBrewer palette, subset 5 values to create ### the final palette. } else { - colors <- c(guess.colors[c(1, 5, 6, 7, 11)]) + len.guess <- length(guess.colors) + idx.middle <- ceiling(len.guess/2) + colors <- guess.colors[c(1, idx.middle - 1, idx.middle, + idx.middle + 1, len.guess)] } ### If colorscale is a viridis palette, subset 5 values to create the final ### palette. } else { - colors <- c(guess.colors[c(1, 5, 6, 7, 11)]) + colors <- guess.colors[c(1, 5, 6, 7, 11)] } } return(colors) } -#' @title Creates a centered diverging binned colour gradient -#' @description This function creates a diverging binned colour gradient -#' (low-mid-high) centered around \code{center}. +#' @title Creates a centred sequential binned colour gradient +#' @description This function creates a sequential binned colour gradient +#' (low-mid-high) centred around \code{center}. #' @name center_scale_colour_stepsn #' @import ggplot2 #' @import scales -#' @param x A numeric vector. Can contain \code{NA}s. +#' @param x A numeric vector. It can contain \code{NA}s. +#' @param colorscale A vector with 5 colours that can be obtained using +#' \code{\link[beyondcell]{get_colour_steps}}. +#' @param alpha Transparency level between 1 (not transparent) and 0 (fully +#' transparent). +#' @param na.value Colour to use for missing values. #' @param limits Vector with the desired limits. -#' @param center A single number indicating the center of the \code{colorscale}. -#' Alternatively, the \code{center} can be a vector of two numbers. In this -#' case, the \code{center} of the \code{colorscale} is the middle point between -#' these two numbers and the \code{breaks} are computed using the difference -#' between them. If \code{center = NULL} (default), the center is set to the -#' middle point of \code{x}. +#' @param center A single number indicating the centre of the \code{colorscale}. +#' If \code{center = NULL} (default), the centre is set to the middle point of +#' \code{x}. #' @param breaks A single number indicating the break size of the #' \code{colorscale}. Alternatively, it can be a vector with the desired breaks -#' (which don't have to be symmetric or equally distributed). If \code{center} -#' is a vector of two numbers, \code{breaks} are computed using the difference -#' between them. -#' @param colorscale A vector with 5 colors which can be obtained using -#' \code{\link[get_colour_stepsn]{get_colour_stepsn}}. -#' @param alpha Transparency level between 0 (not transparent) and 1 (fully -#' transparent). -#' @param na.value Color to use for missing values. -#' @return A centered diverging binned colour gradient that can be use to color -#' \code{\link[ggplot2]{ggplot2}} objects. +#' (which don't have to be symmetric or equally distributed). +#' @return A centred sequential binned colour gradient that can be used to +#' colour \code{\link[ggplot2]{ggplot2}} objects. #' @examples #' @export -center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, - breaks = 0.1, colorscale = NULL, - alpha = 0.7, na.value = "grey50") { +center_scale_colour_stepsn <- function(x, colorscale, alpha = 0.7, + na.value = "grey50", limits = c(NA, NA), + center = NULL, breaks = 0.1) { # --- Checks --- # Check x. if (!is.numeric(x)) { stop('x must be a numeric vector.') } range.values <- pretty(x) + # Check colorscale. + if (length(colorscale) != 5 | + !tryCatch(is.matrix(col2rgb(colorscale)), error = function(cond) FALSE)) { + stop('colorscale must contain exactly 5 colours.') + } + # Check alpha. + if (length(alpha) != 1 | alpha[1] < 0 | alpha[1] > 1) { + stop('alpha must be a positive number between 0 and 1.') + } + # Check na.value. + if (!tryCatch(is.matrix(col2rgb(na.value)), error = function(cond) FALSE)) { + stop('na.value is not a colour.') + } # Check limits. if (length(limits) != 2) { - stop('Limits must be a vector of length 2.') + stop('limits must be a vector of length 2.') } na.limits <- is.na(limits) if (length(limits[!na.limits]) > 0 & !is.numeric(limits[!na.limits])) { - stop('Limits must be numeric or NAs.') + stop('limits must be numeric or NAs.') } # If some limits are NAs, compute them. if (any(na.limits)) { @@ -130,17 +141,16 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, if (limits[2] < limits[1]) { warning(paste('Upper limit is smaller than lower limit.', 'Sorting limits in increasing order.')) - limits <- sort(limits) + limits <- sort(limits, decreasing = FALSE) } # Check center. - len.center <- length(center) if (!is.null(center)) { - if (len.center < 1 | len.center > 2 | !is.numeric(center)) { - stop('Center must be a vector of 1 or 2 numbers.') + if (length(center)!= 1| !is.numeric(center)) { + stop('center must be a single number.') } - if (center[1] < limits[1] | center[len.center] > limits[2]) { - stop(paste('center =', paste0(center, collapse = ", "), - 'outside of limits =', paste0(limits, collapse = ", "))) + if (center < limits[1] | center > limits[2]) { + stop(paste('center =', center, 'outside of limits =', + paste0(limits, collapse = ", "))) } # If center = NULL, set center to middle point in range.values. } else { @@ -148,58 +158,28 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, ### If len.range is odd, get the middle point. if (len.range%%2 == 1) { center <- range.values[ceiling(len.range/2)] - ### If len.range is pair, get the two middle points and do the mean. + ### If len.range is even, get the two middle points and do the mean. } else if (len.range%%2 == 0) { - center <- round(sum(range.values[(len.range/2):((len.range/2)+1)])/2, 2) + center <- round(sum(range.values[(len.range/2):((len.range/2)+1)])/2, + digits = 2) } } # Check breaks. - # If center is a vector... - if (length(center) == 2) { - if (!is.null(breaks)) { - warning(paste('Center is a numeric vector of length 2. Breaks will be', - 'constructed according to this range (breaks is deprecated).')) + if (!is.numeric(breaks)) { + stop('breaks must be numeric.') + } + # If breaks is a single number... + if (length(breaks) == 1) { + if (breaks > abs(limits[1] - limits[2])) { + stop('breaks is bigger than the difference between limits.') } - breaks <- abs(center[1] - center[2]) - center <- center[1] - # Else, if center is not a vector... + # Else, if breaks is a vector... } else { - if (!is.null(breaks)) { - if (!is.numeric(breaks)) { - stop('breaks must be numeric.') - } - ### If breaks is not a vector... - if (length(breaks) == 1) { - if (breaks > abs(limits[1] - limits[2])) { - stop('breaks is bigger than the difference between limits.') - } - ### Else, if breaks is a vector... - } else { - if (any(breaks < limits[1]) | any(breaks > limits[2])) { - warning('Removing breaks outside the specified limits.') - breaks <- breaks[which(breaks >= limits[1] & breaks <= limits[2])] - } - } - ### If breaks = NULL and center is not a vector, raise an error. - } else { - stop('Incorrect breaks.') + if (any(breaks < limits[1]) | any(breaks > limits[2])) { + warning('Removing breaks outside the specified limits.') + breaks <- breaks[which(breaks >= limits[1] & breaks <= limits[2])] } } - # Check colorscale. - if (is.null(colorscale)) { - stop('Incorrect colorscale.') - } - # Check alpha. - if (length(alpha) != 1 | !is.numeric(alpha)) { - stop('alpha must be a single number.') - } - if (alpha < 0 | alpha > 1) { - stop('alpha must be a positive number between 0 and 1.') - } - # Check na.value. - if (!tryCatch(is.matrix(col2rgb(na.value)), error = function(e) FALSE)) { - stop('na.value is not a color.') - } # --- Code --- # If breaks is not a vector... if (length(breaks) == 1) { @@ -216,22 +196,25 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, center <- center - (breaks/2) ### Compute brk.low (from the lower limit to the new center, by breaks). if (limits[1] < center) { - brk.low <- c(limits[1], seq(center, limits[1], by = -breaks)) - brk.low <- sort(unique(brk.low[which(brk.low >= limits[1])])) + brk.low <- c(limits[1], seq(from = center, to = limits[1], by = -breaks)) + brk.low <- sort(unique(brk.low[which(brk.low >= limits[1])]), + decreasing = FALSE) } else { brk.low <- center } ### Compute brk.high (from the new center to the upper limit, by breaks). if (limits[2] > center) { - brk.high <- c(limits[2], seq(center, limits[2], by = breaks)) - brk.high <- sort(unique(brk.high[which(brk.high <= limits[2])])) + brk.high <- c(limits[2], seq(from = center, to = limits[2], by = breaks)) + brk.high <- sort(unique(brk.high[which(brk.high <= limits[2])]), + decreasing = FALSE) } else { brk.high <- center } - ### Pseudocenter: the new center + breaks/2. - pseudo.center <- tail(brk.low, 1) + breaks/2 + ### pseudo.center: the new center + breaks/2. + pseudo.center <- tail(brk.low, n = 1) + breaks/2 ### Final breaks. - final.breaks <- brk.labels <- sort(unique(c(brk.low, pseudo.center, brk.high))) + final.breaks <- brk.labels <- sort(unique(c(brk.low, pseudo.center, brk.high)), + decreasing = FALSE) ### Remove all labels but the limits and the pseudo.center. brk.labels[which(!(brk.labels %in% c(pseudo.center, limits)))] <- "" idx.pseudo.center <- which(brk.labels == pseudo.center) @@ -245,8 +228,8 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, ### If breaks is a vector... } else { ### Add limits to breaks. - breaks <- sort(unique(c(limits, breaks))) - ### The pseudocenter = original center. + breaks <- sort(unique(c(limits, breaks)), decreasing = FALSE) + ### The pseudo.center = center. pseudo.center <- center ### Check which breaks element is the minimum value that is >= center. idx.bigger.than.center <- which(cumsum(breaks >= center) == 1) @@ -259,35 +242,36 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, ### brk.high (from the new center + 1 to the upper limit). brk.high <- breaks[(which(breaks == center) + 1):length(breaks)] ### Final breaks and labels. - final.breaks <- brk.labels <- sort(unique(c(brk.low, pseudo.center, brk.high))) + final.breaks <- brk.labels <- sort(unique(c(brk.low, pseudo.center, brk.high)), + decreasing = FALSE) } - # Colors. - # The color of the center (last break of brk.low and first break of brk.high) - # and the pseudo.center is the same (so these three values form a single color - # break). - rampcol.mid <- rep(colorscale[3], 3) - # If brk.low is more than just the center, get a different color for each + # Colours. + # The colour of the center (last break of brk.low and first break of brk.high) + # and the pseudo.center is the same (so these three values form a single + # colour break). + rampcol.mid <- rep(colorscale[3], times = 3) + # If brk.low is more than just the center, get a different colour for each # break. if (length(brk.low) > 1) { rampcol.low <- colorRampPalette(colors = colorscale[1:2], space = "Lab")(length(brk.low)-1) } else rampcol.low <- character(0) # If brk.high is more than just the center and the limits[2], get a different - # color for each break. + # colour for each break. if (length(brk.high) > 2) { rampcol.high <- colorRampPalette(colors = colorscale[4:5], space = "Lab")(length(brk.high)-1) } else rampcol.high <- character(0) - # Rampcolors is the vector with the final colors. + # Rampcolors is the vector with the final colours. rampcolors <- c(rampcol.low, rampcol.mid, rampcol.high) # Guide argument. guide <- ggplot2::guide_coloursteps(even.steps = FALSE, show.limits = FALSE, title = "Beyondcell") - # Output: ggplot2 color scale. - out <- scale_colour_stepsn(colours = scales::alpha(rampcolors, alpha), + # Output: ggplot2 colour scale. + out <- scale_colour_stepsn(colours = scales::alpha(rampcolors, alpha = alpha), breaks = final.breaks, labels = brk.labels, values = scales::rescale(final.breaks, to = c(0, 1)), - na.value = scales::alpha(na.value, alpha), + na.value = scales::alpha(na.value, alpha = alpha), limits = limits, guide = guide) return(out) } @@ -296,10 +280,10 @@ center_scale_colour_stepsn <- function(x, limits = c(NA, NA), center = NULL, #' @description This function breaks a string \code{x} formed by elements #' separated by \code{split} into lines of length \code{line.length}. #' @name BreakString -#' @param x String to be breaked, formed by elements separated by \code{split}. +#' @param x String to be broken, formed by elements separated by \code{split}. #' @param split Character that separates the elements of \code{x}. -#' @param line.length Length of the lines into which \code{x} will be breaked. -#' @return A string with the same contents that \code{x} breaked in lines of +#' @param line.length Length of the lines into which \code{x} will be broken. +#' @return A string with the same content as \code{x} broken in lines of #' length \code{line.length}. #' @examples #' @export @@ -315,11 +299,8 @@ BreakString <- function(x, split = ", ", line.length = 50) { stop('split must be a single string.') } # Check line.length. - if (length(line.length) != 1 | !is.numeric(line.length)) { - stop('line.length must be a single number.') - } - if (line.length < 1 | line.length%%1 != 0) { - stop('line.length must be an integer > 0.') + if (length(line.length) != 1 | line.length[1] < 1 | line.length[1]%%1 != 0) { + stop('line.length must be a single integer > 0.') } # --- Code --- if (nchar(x) <= line.length) final.x <- x @@ -333,73 +314,8 @@ BreakString <- function(x, split = ", ", line.length = 50) { final.x <- paste0(sapply(1:max(n.line), function(i) { sub.x <- paste0(names(which(n.line == i)), collapse = split) return(sub.x) - }), collapse = paste0(gsub("^\\s+|\\s+$", "", split), "\n")) # Trim + }), collapse = paste0(gsub(pattern = "^\\s+|\\s+$", replacement = "", + x = split), "\n")) # Trim } return(final.x) } - -#' @title Returns a dataframe with information about the input drugs -#' @description This function searches information about the inputted drugs in -#' the pre-loaded \code{\link[beyondcell]{beyondcell}} matrices and returns a -#' \code{data.frame} with their drug synonyms and MoAs. -#' @name FindDrugs -#' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param x A character vector with drug names and/or sig IDs. -#' @details The output \code{data.frame} has the following columns: -#' \itemize{ -#' \item{\code{Original_Name}}: Inputted drug name. -#' \item{\code{bc_Name}}: Drug name used in \code{bc}. -#' \item{\code{Preferred_Name}}: Drug name used in \code{beyondcell} plots. -#' \item{\code{Name}}: Other drug names. -#' \item{\code{sig_id}}: Signature ID. -#' \item{\code{Preferred_and_sig}}: \code{Preferred_Name} (or alternatively -#' \code{bc_Name}) and \code{sig_id}. -#' \item{\code{MoA}}: Mechanism(s) of action. -#' } -#' @return A \code{data.frame}. -#' @examples -#' @export - -FindDrugs <- function(bc, x) { - # --- Checks --- - # Check that bc is a beyondcell object. - if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') - # Check x. - if (!is.character(x)) stop('x must be a character vector.') - # --- Code --- - # bc signatures. - sigs <- rownames(bc@normalized) - # Match x with bc signatures and get the indices of matching elements. - indices <- lapply(x, function(y) { - idx <- match(toupper(y), toupper(sigs), nomatch = 0) - if (idx == 0) { - idx <- unique(match(drugInfo$sig_id[drugInfo$Name == toupper(y)], sigs)) - } - return(idx[!is.na(idx)]) - }) - # Original names (x) and bc names (sigs). - df <- data.frame(Original_Name = unlist(sapply(seq_along(x), function(i) { - rep(x[i], length(indices[[i]])) - })), sig_id = unlist(sapply(indices, function(z) sigs[z]))) - # Get the names and pathways of the selected signatures. - info <- subset(drugInfo, sig_id %in% df$sig_id) - info <- aggregate(.~ sig_id, data = info, na.action = NULL, FUN = function(y) { - paste(na.omit(unique(y)), collapse = ", ") - }) - # Merge df and info. - df <- merge(df, info[, c("sig_id", "Name", "Preferred_Name", "MoA")], - by = "sig_id", all.x = TRUE) - # Add bc_Name column and remove names that are not sig IDs from sig_id column. - df$bc_Name <- df$sig_id - df$sig_id[!startsWith(df$sig_id, "sig_")] <- NA - # Create Preferred_and_sig column: Preferred name and sig_id. - df$Preferred_and_sig <- sapply(1:nrow(df), function(i) { - name <- ifelse(!is.na(df$Preferred_Name[i]), df$Preferred[i], df$bc_Name[i]) - sig <- ifelse(!is.na(df$sig_id[i]), paste0(" (", df$sig_id[i], ")"), "") - return(paste0(name, sig)) - }) - # Reorder df. - df <- df[c("Original_Name", "bc_Name", "Preferred_Name", "Name", - "sig_id", "Preferred_and_sig", "MoA")] - return(df) -} diff --git a/R/Genesets.R b/R/Genesets.R index cfb9071..3169cc6 100644 --- a/R/Genesets.R +++ b/R/Genesets.R @@ -1,64 +1,68 @@ #' @title Creates a geneset object -#' @description This function generates up and/or down genelists for each drug -#' and pathway. +#' @description This function creates a \code{\link[beyondcell]{geneset}} +#' object. #' @name GenerateGenesets #' @importFrom qusage read.gmt #' @importFrom gdata trim -#' @param x A pre-loaded matrix, a numeric matrix or a path to a GMT file with -#' custom genesets. -#' @param n.genes Number of top and/or down genes in the desired genelists. -#' @param mode \code{"up"}, \code{"down"} or \code{c("up", "down")}. See Details -#' for more information. +#' @param x A pre-loaded matrix, a ranked matrix or a path to a GMT file with +#' custom gene sets. See Details for more information. +#' @param n.genes Number of up and/or down-regulated genes used to compute each +#' signature. +#' @param mode Whether the output \code{geneset} must contain up and/or +#' down-regulated genes. See Details for more information. #' @param filters If \code{x} is a pre-loaded matrix, you can provide a list of -#' filters to subset these matrices. You can specify which drug names, sig IDs, -#' mechanisms of action, target genes and sources you are interested in (cap -#' insensitive). You can call \code{\link[ListFilters]{ListFilters}} function to -#' check all the available values for these filters. The signatures that pass -#' \strong{ANY} of these filters are included in the output. +#' filters to subset it. You can specify which \code{drugs}, sig \code{IDs}, +#' mechanisms of action (\code{MoAs}), \code{targets} and/or \code{sources} you +#' are interested in (cap insensitive). You can call +#' \code{\link[beyondcell]{ListFilters}} to check all the available values for +#' these filters. The signatures that pass \strong{ANY} of them are included in +#' the output. #' @param comparison \code{"treated_vs_control"} or #' \code{"sensitive_vs_resistant"}. See Details for more information. -#' @param include.pathways \code{TRUE} (default) or \code{FALSE}. Whether or -#' not return \code{beyoncell}'s pre-computed genesets for functional pathways. +#' @param include.pathways Logical. Return \code{beyoncell}'s pre-computed +#' signatures for functional pathways? #' @details \code{x} can be: #' \itemize{ -#' \item{A pre-loaded matrix:} {Either \code{\link[PSc]{PSc}}, -#' \code{\link[SSc]{SSc}} or \code{\link[DSS]{DSS}}.} -#' \item{A numeric matrix:} {A matrix with genes as rows and signatures as -#' columns that contains some type of numerical value such a t-stat or a LFC to +#' \item{A pre-loaded matrix:} {Either \code{PSc}, \code{SSc} or \code{DSS}.} +#' \item{A ranked matrix:} {A matrix with genes as rows and signatures as +#' columns that contains some type of numeric value such a t-stat or a LFC to #' rank the genes accordingly.} -#' \item{A path to a GMT file:} {A file that contains custom genesets. Each -#' geneset must have an "_UP" or "_DOWN" suffix.} +#' \item{A path to a GMT file:} {A file that contains custom gene sets. Each +#' gene set must have an "_UP" or "_DOWN" suffix.} #' } #' In addition, \code{mode} can be: #' \itemize{ -#' \item{\code{"up"}:} {To return over-expressed genes in the signatures.} -#' \item{\code{"down"}:} {To return under-expressed genes in the signatures.} +#' \item{\code{"up"}:} {To compute the signatures using only up-regulated +#' genes.} +#' \item{\code{"down"}:} {To compute the signatures using only down-regulated +#' genes.} +#' \item{\code{c("up", "down")} :} {To compute the signatures using both up and +#' down-regulated genes.} #' } #' If \code{x} is a path to a GMT file, \code{mode} is deprecated and the names -#' of all genesets must end in "_UP" or "_DOWN" to indicate the mode of each -#' one. +#' of all gene sets must end in "_UP" or "_DOWN" to indicate the \code{mode} of +#' each one. #' #' Finally, \code{comparison} can be: #' \itemize{ #' \item{\code{"treated_vs_control"}:} {(\code{PSc} and \code{DSS} like) When -#' the numeric values or the genesets in the GMT file were obtained from a +#' the numeric values or the gene sets in the GMT file were obtained from a #' comparison between drug treated and untreated cells.} #' \item{\code{"sensitive_vs_resistant"}:} {(\code{SSc} like) When the numeric -#' values or the genesets in the GMT file were obtained from a comparison +#' values or the gene sets in the GMT file were obtained from a comparison #' between drug sensitive and resistant cells.} #' } #' When \code{x} is a pre-loaded matrix, \code{comparison} is set automatically. -#' @return A \code{\link[geneset]{geneset}} object with up and/or down genes for -#' each drug and pathway. +#' @return A \code{geneset} object. #' @examples #' @export GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), - filters = list(drugs = NULL, IDs = NULL, MoA = NULL, - targets = NULL, source = NULL), + filters = list(drugs = NULL, IDs = NULL, MoAs = NULL, + targets = NULL, sources = NULL), comparison = NULL, include.pathways = TRUE) { # --- Global Checks --- - # Check if x is a pre-loaded matrix, a numeric matrix or a path to a GMT file. + # Check if x is a pre-loaded matrix, a ranked matrix or a path to a GMT file. is.D <- c(identical(x, PSc), identical(x, SSc), identical(x, DSS)) if (any(is.D)) { type <- "pre-loaded matrix" @@ -72,7 +76,7 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), x <- as.matrix(x) ### Check if there are missing values. if (sum(is.na(x)) > 0) { - warning('x contains NAs that will not be used to compute the genesets.') + warning('x contains NAs that will not be used to compute the geneset object.') } ### Check if there are duplicated values in the same column. dup.values <- apply(x, 2, FUN = function(y) any(duplicated(na.omit(y)))) @@ -89,30 +93,28 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), message('Reading gmt file...') gmt.file <- tryCatch(qusage::read.gmt(x), error = function(cond) { message(paste0('The GMT file does not seem to exist: ', x, ' .')) - stop(paste('x must be either a pre-loaded matrix (PSc, SSc or DSS) or', - 'the path to a GMT file.')) + stop(paste('x must be either a pre-loaded matrix (PSc, SSc or DSS), a + ranked matrix or the path to a GMT file.')) }, warning = function(cond) { message(paste('Warning when reading the GMT file. Here is the', 'original warning message:')) warning(paste0(cond)) return(suppressWarnings(qusage::read.gmt(x))) }) - ### Check for duplicated genesets. - if (anyDuplicated(names(gmt.file)) != 0) { - duplicated_genesets <- unique(names(gmt.file)[duplicated(names(gmt.file))]) - stop(paste0('The GMT file contains duplicated geneset\'s names: ', - paste0(duplicated_genesets, collapse = ", "), '.')) + ### Check for duplicated gene sets. + upper.gmt.names <- toupper(names(gmt.file)) + if (anyDuplicated(upper.gmt.names) != 0) { + duplicated.gene.sets <- unique(names(gmt.file)[duplicated(upper.gmt.names)]) + stop(paste0('The GMT file contains duplicated gene set\'s: ', + paste0(duplicated.gene.sets, collapse = ", "), '.')) } } # Check n.genes and mode. - if (!all(mode %in% c("up", "down"))) stop('Incorrect mode.') + if (any(!(mode %in% c("up", "down")))) stop('Incorrect mode.') mode <- sort(unique(mode), decreasing = TRUE) if (type != "gmt") { ### Number of genes. - if (length(n.genes) != 1 | !is.numeric(n.genes)) { - stop('n.genes must be a single number.') - } - if (n.genes%%1 != 0 | n.genes < 1) { + if (length(n.genes) != 1 | n.genes[1]%%1 != 0 | n.genes[1] < 1) { stop('n.genes must be a positive integer.') } else if (n.genes > n.max) { stop(paste0('n.genes exceeds the maximum number of genes in signature (', @@ -122,34 +124,34 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), ### Number of genes. if (!identical(n.genes, 250)) warning('x is a GMT file, n.genes is deprecated.') ### Mode in GMT files. - n.up <- length(unique(grep("_UP$", names(gmt.file), ignore.case = TRUE))) - n.down <- length(unique(grep("_DOWN$", names(gmt.file), ignore.case = TRUE))) + n.up <- length(unique(grep(pattern = "_UP$", x = upper.gmt.names))) + n.down <- length(unique(grep(pattern = "_DOWN$", x = upper.gmt.names))) if (n.up + n.down != length(names(gmt.file))) { - stop('All geneset in the GMT file names must be finished in _UP or _DOWN.') + stop('All gene sets\' names in the GMT file must end in "_UP" or "_DOWN".') } else { if (n.up > 0 & n.down > 0) { if (!identical(mode, c("up", "down"))) { mode <- c("up", "down") - warning(paste('The GMT file includes UP and DOWN genesets. mode', + warning(paste('The GMT file includes UP and DOWN gene sets. mode', 'changed to c("up", "down").')) } } else if (n.up > 0) { if (mode != "up") { mode <- "up" - warning('The GMT file only includes UP genesets. mode changed to "up".') + warning('The GMT file only includes UP gene sets. mode changed to "up".') } } else if (n.down > 0) { if (mode != "down") { mode <- "down" - warning('The GMT file only includes DOWN genesets. mode changed to "down".') + warning('The GMT file only includes DOWN gene sets. mode changed to "down".') } } } } # Check filters and comparison. - filters_names <- c("drugs", "IDs", "MoA", "targets", "source") - selected_filters <- names(filters) - if (!all(selected_filters %in% filters_names)) stop('Invalid names in filters.') + filters.names <- c("drugs", "IDs", "MoAs", "targets", "sources") + selected.filters <- names(filters) + if (any(!(selected.filters %in% filters.names))) stop('Invalid names in filters.') if (type != "pre-loaded matrix") { ### Filters. filters_class <- sapply(filters, is.null) @@ -159,33 +161,33 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), ### Comparison. if (is.null(comparison)) { stop(paste('Comparison must be either "treated_vs_control" or', - '"sensitive_vs_resistant".')) + '"control_vs_treated".')) } else if (length(comparison) != 1 | - !(comparison[1] %in% c("treated_vs_control", "sensitive_vs_resistant"))) { + !(comparison[1] %in% c("treated_vs_control", "control_vs_treated"))) { stop(paste('Comparison must be either "treated_vs_control" or', - '"sensitive_vs_resistant".')) + '"control_vs_treated".')) } } else { ### Filters. filters_class <- sapply(filters, is.null) | sapply(filters, is.character) if (any(!filters_class)) { stop(paste0('Incorrect value for filter\'s entry: "', - paste0(selected_filters[!filters_class], collapse = ", "), + paste0(selected.filters[!filters_class], collapse = ", "), '". You must provide a character vector.')) } - selected_filters <- selected_filters[!sapply(filters, is.null)] + selected.filters <- selected.filters[!sapply(filters, is.null)] ### Comparison. if (is.null(comparison)) { - if(is.D[2]) comparison <- "sensitive_vs_resistant" + if(is.D[2]) comparison <- "control_vs_treated" else comparison <- "treated_vs_control" } else { if (length(comparison) != 1 | - !(comparison[1] %in% c("treated_vs_control", "sensitive_vs_resistant"))) { + !(comparison[1] %in% c("treated_vs_control", "control_vs_treated"))) { stop('Incorrect comparison.') } - if (is.D[2] & comparison != "sensitive_vs_resistant") { - comparison <- "sensitive_vs_resistant" - warning('x = SSc, comparison changed to "sensitive_vs_resistant".') + if (is.D[2] & comparison != "control_vs_treated") { + comparison <- "control_vs_treated" + warning('x = SSc, comparison changed to "control_vs_treated".') } else if (!is.D[2] & comparison != "treated_vs_control") { comparison <- "treated_vs_control" warning(paste0('x = ', c("PSc", "SSc", "DSS")[is.D], ', comparison ', @@ -202,51 +204,40 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), if (type == "pre-loaded matrix") { ### sig IDs. if (is.D[1]) { - info <- subset(drugInfo, drugInfo$Source == "LINCS") + info <- subset(drugInfo, subset = drugInfo$sources == "LINCS") } else if (is.D[2]) { - info <- subset(drugInfo, drugInfo$Source != "LINCS") + info <- subset(drugInfo, subset = drugInfo$sources != "LINCS") } else if (is.D[3]) { - info <- subset(drugInfo, drugInfo$sig_id %in% DSS[[1]]$sig_id) + info <- subset(drugInfo, subset = drugInfo$IDs %in% DSS[[1]]$sig_id) x <- PSc # DSS is a subset of PSc } - if (length(selected_filters) == 0) { - ids <- unique(info$sig_id) + if (length(selected.filters) == 0) { + ids <- unique(info$IDs) } else { - ids <- c() - warn <- c() # Warnings - ### Filters. - if("drugs" %in% selected_filters) { - out <- FilteredIDS(info, "Name", gdata::trim(filters$drugs), "drugs") - ids <- c(ids, out[[1]]) - warn <- c(warn, out[[2]]) - } - if("IDs" %in% selected_filters) { - out <- FilteredIDS(info, "sig_id", gdata::trim(filters$IDs), "IDs") - ids <- c(ids, out[[1]]) - warn <- c(warn, out[[2]]) - } - if ("MoA" %in% selected_filters) { - out <- FilteredIDS(info, "MoA", gdata::trim(filters$MoA), "MoAs") - ids <- c(ids, out[[1]]) - warn <- c(warn, out[[2]]) - } - if ("targets" %in% selected_filters) { - out <- FilteredIDS(info, "Target", gdata::trim(filters$targets), - "target genes") - ids <- c(ids, out[[1]]) - warn <- c(warn, out[[2]]) - } - if ("source" %in% selected_filters) { - out <- FilteredIDS(info, "Source", gdata::trim(filters$source), - "source databases") - ids <- c(ids, out[[1]]) - warn <- c(warn, out[[2]]) - } - ids <- unique(ids) + ids <- unique(unlist(lapply(selected.filters, function(y) { + tryCatch(suppressWarnings(GetIDs(values = filters[[y]], filter = y, + df = info)), + error = function(cond) character()) + }))) + warnings <- unlist(lapply(selected.filters, function(z) { + tryCatch(GetIDs(values = filters[[z]], filter = z, df = info), + error = function(cond) { + err <- paste0(z, ": ", paste0(filters[[z]], + collapse = ", "), ".\n") + return(err) + }, warning = function(cond) { + warn <- as.character(cond) + warn.values <- strsplit(sapply(strsplit(warn, split = ": "), + `[[`, 3), split = ", ") + return(paste0(z, ": ", warn.values)) + }) + })) + warnings <- warnings[!startsWith(warnings, prefix = "sig_")] if (length(ids) == 0) { - stop('Couldn\'t find drugs that matched any of the filters.') - } else { - if (!all(is.null(warn))) sapply(warn[!is.null(warn)], function(w) warning(w)) + stop('Couldn\'t find signatures that matched any of the filters.') + } else if (length(warnings) > 0) { + warning(paste('The following filters\' values yielded no results:\n', + paste0(" - ", warnings, " ", collapse = ""))) } } genes <- lapply(ids, function(sig) { @@ -254,7 +245,7 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), return(l) }) names(genes) <- ids - # Else if x is a numeric matrix... + # Else if x is a ranked matrix... } else if (type == "matrix") { genes <- apply(x, 2, FUN = function(sig) { l <- list() @@ -264,26 +255,36 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), l <- c(l, list(up = up)) } if ("down" %in% mode) { - down <- na.omit(rownames(x)[order(sig, na.last = NA)[1:n.genes]]) + down <- na.omit(rownames(x)[order(sig, decreasing = FALSE, + na.last = NA)[1:n.genes]]) l <- c(l, list(down = down)) } return(l) }) - # Else if x is a GMT file. + # Else if x is a GMT file. } else if (type == "gmt") { - unique_genesets <- unique(gsub("_UP$|_DOWN$", "", names(gmt.file))) - genes <- setNames(lapply(unique_genesets, function(sig) { - entry <- gmt.file[grep(paste0("^", sig), names(gmt.file), value = TRUE)] - return(Genelist(entry, mode = mode)) - }), unique_genesets) + unique.gene.sets <- unique(gsub(pattern = "_UP$|_DOWN$", replacement = "", + x = names(gmt.file), ignore.case = TRUE)) + genes <- setNames(lapply(unique.gene.sets, function(sig) { + l <- list() + if (toupper(paste0(sig, "_UP")) %in% upper.gmt.names) { + l <- c(l, list(up = gmt.file[[match(toupper(paste0(sig, "_UP")), + table = upper.gmt.names)]])) + } + if (toupper(paste0(sig, "_DOWN")) %in% upper.gmt.names) { + l <- c(l, list(down = gmt.file[[match(toupper(paste0(sig, "_DOWN")), + table = upper.gmt.names)]])) + } + return(l) + }), unique.gene.sets) } # Drug IDs. if (type == "pre-loaded matrix") { - info <- subset(info, info$sig_id %in% ids) - info <- aggregate(.~ sig_id, data = info, na.action = NULL, FUN = function(rw) { + info <- subset(info, subset = info$IDs %in% ids) + info <- aggregate(.~ IDs, data = info, na.action = NULL, FUN = function(rw) { paste(na.omit(unique(rw)), collapse = ", ") }) - info <- info[order(info$sig_id), ] + info <- info[order(info$IDs, decreasing = FALSE), ] } else { info <- data.frame() } @@ -300,89 +301,28 @@ GenerateGenesets <- function(x, n.genes = 250, mode = c("up", "down"), #' @title Returns all the possible values for the specified filter #' @description This function returns all the available values for \code{drugs}, -#' \code{sig_ids}, \code{MoAs}, \code{targets} or \code{source} filters in -#' \code{\link[GenerateGenesets]{GenerateGenesets}} function. +#' \code{IDs}, \code{MoAs}, \code{targets} or \code{sources} filters in +#' \code{\link[beyondcell]{GenerateGenesets}} function. #' @name ListFilters -#' @param entry Either \code{"drugs"}, \code{"IDs"}, \code{"MoA"}, -#' \code{"targets"} or \code{"source"}. +#' @param entry Either \code{"drugs"}, \code{"IDs"}, \code{"MoAs"}, +#' \code{"targets"} or \code{"sources"}. #' @return All the possible values for the specified \code{entry}. #' @export ListFilters <- function(entry) { # --- Checks and Code --- if (entry == "drugs") { - out <- sort(unique(drugInfo$Name)) + out <- sort(unique(drugInfo$drugs), decreasing = FALSE) } else if (entry == "IDs") { - out <- sort(unique(drugInfo$sig_id)) - } else if (entry == "MoA") { - out <- sort(unique(drugInfo$MoA)) + out <- sort(unique(drugInfo$IDs), decreasing = FALSE) + } else if (entry == "MoAs") { + out <- sort(unique(drugInfo$MoAs), decreasing = FALSE) } else if (entry == "targets") { - out <- sort(unique(drugInfo$Target)) - } else if (entry == "source") { - out <- sort(unique(drugInfo$Source)) + out <- sort(unique(drugInfo$targets), decreasing = FALSE) + } else if (entry == "sources") { + out <- sort(unique(drugInfo$sources), decreasing = FALSE) } else { stop("Incorrect entry.") } return(out) } - -#' @title Returns the \code{sig_ids} that match the specified filters -#' @description This function is meant to be used inside -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. It subsets \code{infodf} -#' to select only the entries that match the specified \code{filter} and returns -#' the corresponding \code{sig_ids}. -#' @name FilteredIDS -#' @param infodf \code{data.frame} with all drug information. -#' @param col Column name to subset by. -#' @param filter User-supplied filtering vector for either drugs, MoA, target -#' genes or source database. -#' @param filtername String to be printed with the warning (in case some of the -#' \code{filter} elements are not found in \code{infodf}). -#' @return A vector with the \code{sig_ids} that match the \code{filter} -#' elements. -#' @export - -FilteredIDS <- function(infodf, col, filter, filtername) { - # --- Checks --- - # Check infodf. - if (class(infodf) != "data.frame") { - stop('infodf must be a data.frame') - } - if (!("sig_id" %in% colnames(infodf))) { - stop('infodf must contain a "sig_id" column.') - } - # Check col. - if (length(col) != 1) { - stop('You must specify a single col.') - } - if (is.character(col) & !(col %in% colnames(infodf))) { - stop(paste('col =', col, 'is not a column of infodf.')) - } - if (is.numeric(col) & (col < 1 | col > ncol(infodf))) { - stop(paste('col = ', col, 'is out of range.')) - } - # Check filter. - if (length(filter) < 1 | !is.character(filter)) { - stop('filter must be a character vector.') - } - # Check filtername. - if (length(filtername) != 1 | !is.character(filtername)) { - stop('filtername must be a string.') - } - # --- Code --- - upper.filter <- toupper(filter) - selected <- subset(infodf, toupper(infodf[[col]]) %in% upper.filter) - if (col == "Name") { - synonyms <- subset(infodf, toupper(infodf[["Preferred_Name"]]) %in% - unique(toupper(selected[["Preferred_Name"]]))) - selected <- unique(rbind(selected, synonyms)) - } - ids <- unique(selected$sig_id) - not_found <- filter[!(upper.filter %in% toupper(infodf[[col]]))] - if (length(not_found) > 0) { - w <- paste(length(not_found), 'out of', length(filter), - filtername, 'were not found in the signature:', - paste0(not_found, collapse = ", ")) - } else w <- NULL - return(list(ids, w)) -} diff --git a/R/Manipulation.R b/R/Manipulation.R index 7d2ccd4..2457c85 100644 --- a/R/Manipulation.R +++ b/R/Manipulation.R @@ -1,27 +1,27 @@ #' @title Subsets a beyondcell object #' @description This function subsets a \code{\link[beyondcell]{beyondcell}} -#' object based on the names of the signatures, cells and/or the maximum desired -#' proportion of \code{NaN} values in each signature and/or cell. See Details -#' for more information. +#' object based on signature names, cells and/or the maximum proportion of +#' \code{NaN} values desired in each signature and/or cell. See Details for more +#' information. #' @name bcSubset #' @param bc \code{beyondcell} object. #' @param signatures Vector with the names of the signatures to subset by. If #' \code{signatures = NULL}, all signatures will be kept. -#' @param bg.signatures Vector with the names of the \code{@@background} -#' signatures to subset by. If \code{bg.signatures = NULL}, all background -#' signatures will be kept. +#' @param bg.signatures Vector with the names of the background signatures to +#' subset by. If \code{bg.signatures = NULL}, all background signatures will be +#' kept. #' @param cells Vector with the names of the cells to subset by. If #' \code{cells = NULL}, all cells will be kept. -#' @param nan.sigs Maximum desired proportion of \code{NaN} values per signature -#' in the output \code{beyondcell} object. All signatures with a proportion of -#' \code{NaN >= nan.sig} will be removed. -#' @param nan.cells Maximum desired proportion of \code{NaN} values per cell -#' in the output \code{beyondcell} object. All cells with a proportion of -#' \code{NaN >= nan.cells} will be removed. +#' @param nan.sigs Maximum proportion of \code{NaN} values per signature in the +#' output \code{beyondcell} object. All signatures with a proportion of +#' \code{NaN > nan.sig} will be removed. +#' @param nan.cells Maximum proportion of \code{NaN} values per cell in the +#' output \code{beyondcell} object. All cells with a proportion of +#' \code{NaN > nan.cells} will be removed. #' @details This function can subset a \code{beyondcell} object using its 5 #' parameters alone or in combination. #' -#' So, for example if you specify \code{signatures} and \code{cells}, the +#' So, for example, if you specify \code{signatures} and \code{cells}, the #' resulting \code{beyondcell} object (except the \code{@@background} slot) will #' be subsetted according to those vectors. The slot \code{@@background} will be #' only subsetted according to \code{cells}. If you want to subset it by @@ -29,20 +29,20 @@ #' #' On the other hand, if you specify \code{cells} and \code{nan.sigs}, the #' output \code{beyondcell} object will keep the selected cells and those -#' signatures with aproportion of \code{NaN} values above \code{nan.sigs}. Note -#' that \code{nan.sigs} and \code{nan.cells} arguments also subset the -#' signatures and cells that do not meet the criteria in \code{@@background} -#' slot. +#' signatures with a proportion of \code{NaN} values below or equal to +#' \code{nan.sigs}. Note that \code{nan.sigs} and \code{nan.cells} arguments +#' also subset the signatures and cells that meet the criteria in +#' \code{@@background} slot. #' #' Finally, if you specify all parameters, the result will keep those signatures -#' and cells of interest with a proportion of \code{NaN} above \code{nan.sigs} -#' and \code{nan.cells} respectively. +#' and cells of interest with a proportion of \code{NaN} below or equal to +#' \code{nan.sigs} and \code{nan.cells}, respectively. #' @return A subsetted \code{beyondcell} object. #' @examples #' @export bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, - nan.sigs = NULL, nan.cells = NULL) { + nan.sigs = 1, nan.cells = 1) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') @@ -88,38 +88,23 @@ bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, pass.cells <- unique(cells[in.cells]) } # Check nan.sigs. - if (is.null(nan.sigs)) { - pass.nan.sigs <- rownames(bc@scaled) - pass.nan.sigs.bg <- rownames(bc@background) - } else { - if (length(nan.sigs) != 1 | !is.numeric(nan.sigs)) { - stop('nan.sigs must be a single number.') - } - if (nan.sigs < 0 | nan.sigs > 1) { - stop('nan.sigs must be a positive number between 0 and 1.') - } - pass.nan.sigs <- rownames(bc@scaled)[apply(bc@scaled, 1, function(x) { - sum(is.na(x)) < ncol(bc@scaled) * nan.sigs - })] - pass.nan.sigs.bg <- rownames(bc@background)[apply(bc@background, 1, - function(y) { - sum(is.na(y)) < ncol(bc@background) * nan.sigs - })] + if (length(nan.sigs) != 1 | nan.sigs[1] < 0 | nan.sigs[1] > 1) { + stop('nan.sigs must be a positive number between 0 and 1.') } + pass.nan.sigs <- rownames(bc@scaled)[apply(bc@scaled, 1, function(x) { + sum(is.na(x)) < ncol(bc@scaled) * nan.sigs + })] + pass.nan.sigs.bg <- rownames(bc@background)[apply(bc@background, 1, + function(y) { + sum(is.na(y)) < ncol(bc@background) * nan.sigs + })] # Check nan.cells. - if (is.null(nan.cells)) { - pass.nan.cells <- colnames(bc@scaled) - } else { - if (length(nan.cells) != 1 | !is.numeric(nan.cells)) { - stop('nan.cells must be a single number.') - } - if (nan.cells < 0 | nan.cells > 1) { - stop('nan.cells must be a positive number between 0 and 1.') - } - pass.nan.cells <- colnames(bc@scaled)[apply(bc@scaled, 2, function(y) { - sum(is.na(y)) < nrow(bc@scaled) * nan.cells - })] + if (length(nan.cells) != 1 | nan.cells[1] < 0 | nan.cells[1] > 1) { + stop('nan.cells must be a positive number between 0 and 1.') } + pass.nan.cells <- colnames(bc@scaled)[apply(bc@scaled, 2, function(y) { + sum(is.na(y)) < nrow(bc@scaled) * nan.cells + })] # Check regression and subset order. reg.order <- bc@regression$order reg.order.bg <- bc@regression$order.background @@ -132,7 +117,7 @@ bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, bc <- bcRecompute(bc, slot = "data") bc@background <- matrix(ncol = 0, nrow = 0) bc@regression <- list(order = c("subset", ""), vars = NULL, - order.background = rep("", 2)) + order.background = rep("", times = 2)) reg.order <- rep("", 2) } else { ### If bc was previously subsetted and then regressed, raise an error. @@ -140,14 +125,14 @@ bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, stop(paste('bc was previously subsetted and then regressed. Please run', 'bcSubset() on a new beyondcell object created with', 'CreatebcObject(bc).')) - ### Else if reg.order == c("", "") or reg.order == c("regression", ""), add - ### "subset" to bc@regression$order. + ### Else if reg.order == c("", "") or reg.order == c("regression", ""), add + ### "subset" to bc@regression$order. } else if (identical(reg.order, rep("", 2)) | identical(reg.order, c("regression", ""))) { - bc@regression$order[match("", reg.order)] <- "subset" - ### Else if the last step in reg.order is "subset", raise a warning. + bc@regression$order[match("", table = reg.order)] <- "subset" + ### Else if the last step in reg.order is "subset", raise a warning. } else if (tail(reg.order[which(reg.order != "")], n = 1) == "subset") { - warning('bc is already a subsetted object.') + warning('bc is an already subsetted object.') } } # --- Code --- @@ -162,12 +147,16 @@ bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, # final.sigs.bg == rownames(bc@background) and # final.cells == colnames(bc@scaled) == colnames(bc@background), # the output is equal to the input. Else, subset. - if (!identical(sort(final.sigs), sort(rownames(bc@normalized))) | - !identical(sort(final.sigs.bg), sort(rownames(bc@background))) | - !identical(sort(final.cells), sort(colnames(bc@normalized))) | - !identical(sort(final.cells), sort(colnames(bc@background)))) { + if (!identical(sort(final.sigs, decreasing = FALSE), + sort(rownames(bc@normalized), decreasing = FALSE)) | + !identical(sort(final.sigs.bg, decreasing = FALSE), + sort(rownames(bc@background), decreasing = FALSE)) | + !identical(sort(final.cells, decreasing = FALSE), + sort(colnames(bc@normalized), decreasing = FALSE)) | + !identical(sort(final.cells, decreasing = FALSE), + sort(colnames(bc@background), decreasing = FALSE))) { bc@normalized <- round(bc@normalized[final.sigs, final.cells, - drop = FALSE], 2) + drop = FALSE], digits = 2) bc <- bcRecompute(bc, slot = "normalized") if (any(dim(bc@background) != 0)) { bc@background <- bc@background[final.sigs.bg, final.cells, drop = FALSE] @@ -180,33 +169,31 @@ bcSubset <- function(bc, signatures = NULL, bg.signatures = NULL, cells = NULL, return(bc) } -#' @title Regress out unwanted effects from beyondcell scores +#' @title Regresses out unwanted effects from BCS #' @description This function regresses out unwanted effects from normalized -#' beyondcell scores. +#' beyondcell scores (BCS). #' @name bcRegressOut #' @importFrom bnstruct knn.impute #' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param vars.to.regress Vector of metadata columns to regress out the -#' beyondcell scores. -#' @return Returns a \code{beyondcell} object with regressed normalized scores, -#' regressed scaled scores and regressed switch points. +#' @param vars.to.regress Vector of metadata columns to regress out the BCS. +#' @return Returns a \code{beyondcell} object with regressed normalized BCS, +#' regressed scaled BCS and regressed switch points. #' @examples #' @export -bcRegressOut <- function(bc, vars.to.regress = NULL) { +bcRegressOut <- function(bc, vars.to.regress) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check vars.to.regress. - if (!is.null(vars.to.regress)) { - in.vars.to.regress <- vars.to.regress %in% colnames(bc@meta.data) - if (!all(in.vars.to.regress)) { - stop(paste0('Some vars.to.regress not found: ', - paste0(vars.to.regress[!in.vars.to.regress], collapse = ", "), - '.')) - } - } else { - stop('No variables to regress out.') + in.vars.to.regress <- !is.null(vars.to.regress) & + vars.to.regress %in% colnames(bc@meta.data) + if (all(!in.vars.to.regress)) { + stop('vars.to.regress not found.') + } else if (any(!in.vars.to.regress)) { + stop(paste0('Some vars.to.regress not found: ', + paste0(vars.to.regress[!in.vars.to.regress], collapse = ", "), + '.')) } vars <- unique(vars.to.regress[in.vars.to.regress]) # Check regression and subset order. @@ -228,14 +215,14 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { stop(paste('bc was previously regressed and then subsetted. Please run', 'bcSubset() on a new beyondcell object created with', 'CreatebcObject(bc).')) - ### Else if reg.order == c("", "") or reg.order == c("subset", ""), add - ### "regression" to bc@regression$order. + ### Else if reg.order == c("", "") or reg.order == c("subset", ""), add + ### "regression" to bc@regression$order. } else if (identical(reg.order, rep("", 2)) | - identical(reg.order, c("subset", ""))) { - bc@regression$order[match("", reg.order)] <- "regression" - ### Else if the last step in reg.order is "subset", raise a warning. + identical(reg.order, c("subset", ""))) { + bc@regression$order[match("", reg.order)] <- "regression" + ### Else if the last step in reg.order is "subset", raise a warning. } else if (tail(reg.order[which(reg.order != "")], n = 1) == "regression") { - warning('bc is already a regressed object.') + warning('bc is an already regressed object.') vars <- unique(c(vars, bc@regression$vars)) sigs <- rownames(bc@scaled) bg.sigs <- rownames(bc@background) @@ -249,7 +236,7 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { GenerateGenesets(DSS, n.genes = bc@n.genes, mode = bc@mode, include.pathways = FALSE)) background <- suppressWarnings( - bcScore(bc@expr.matrix, gs.background, expr.thres = bc@thres)) + bcScore(bc@expr.matrix, gs = gs.background, expr.thres = bc@thres)) bc@background <- background@normalized } if ("subset" %in% reg.order) { @@ -258,13 +245,13 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { cells = cells)) } bc@regression <- list(order = reg.order, order.background = reg.order) + } } -} # --- Code --- # Latent data. latent.data <- bc@meta.data[colnames(bc@normalized), vars, drop = FALSE] - # Impute normalized scores matrix - message('Imputing normalized scores...') + # Impute normalized BCS matrix + message('Imputing normalized BCS...') bc@normalized <- bnstruct::knn.impute(bc@normalized) # Limma formula. fmla <- as.formula(object = paste('bcscore ~', paste(vars, collapse = '+'))) @@ -275,19 +262,19 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { bins <- ceiling(total / 100) normalized.regressed <- t(apply(cbind(seq_len(nrow(bc@normalized)), bc@normalized), 1, function(x) { - regression.mat <- cbind(latent.data, x[-1]) - colnames(regression.mat) <- c(colnames(latent.data), "bcscore") - qr <- lm(fmla, data = regression.mat, qr = TRUE, na.action = na.exclude)$qr - resid <- qr.resid(qr = qr, y = x[-1]) - ### Update the progress bar. - if (x[1]%%bins == 0 | x[1] == total) { - Sys.sleep(0.1) - setTxtProgressBar(pb, x[1]) - } - ### Return residues. - return(resid) - })) - bc@normalized <- round(normalized.regressed, 2) + regression.mat <- cbind(latent.data, x[-1]) + colnames(regression.mat) <- c(colnames(latent.data), "bcscore") + qr <- lm(fmla, data = regression.mat, qr = TRUE, na.action = na.exclude)$qr + resid <- qr.resid(qr = qr, y = x[-1]) + ### Update the progress bar. + if (x[1]%%bins == 0 | x[1] == total) { + Sys.sleep(0.1) + setTxtProgressBar(pb, value = x[1]) + } + ### Return residues. + return(resid) + })) + bc@normalized <- round(normalized.regressed, digits = 2) # Close the progress bar. Sys.sleep(0.1) close(pb) @@ -297,26 +284,26 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { bc@regression$vars <- vars # Regress the background, if needed. if (any(dim(bc@background) != 0)) { - message('Imputing background scores...') + message('Imputing background BCS...') bc@background <- bnstruct::knn.impute(bc@background) - message('Regressing background scores...') + message('Regressing background BCS...') total.bg <- nrow(bc@background) pb.bg <- txtProgressBar(min = 0, max = total.bg, style = 3) bins.bg <- ceiling(total.bg / 100) background.regressed <- t(apply(cbind(seq_len(nrow(bc@background)), bc@background), 1, function(y) { - regression.mat <- cbind(latent.data, y[-1]) - colnames(regression.mat) <- c(colnames(latent.data), "bcscore") - qr <- lm(fmla, data = regression.mat, qr = TRUE, na.action = na.exclude)$qr - resid <- qr.resid(qr = qr, y = y[-1]) - ### Update the progress bar. - if (y[1]%%bins == 0 | y[1] == total.bg) { - Sys.sleep(0.1) - setTxtProgressBar(pb.bg, y[1]) - } - ### Return residues. - return(resid) - })) + regression.mat <- cbind(latent.data, y[-1]) + colnames(regression.mat) <- c(colnames(latent.data), "bcscore") + qr <- lm(fmla, data = regression.mat, qr = TRUE, na.action = na.exclude)$qr + resid <- qr.resid(qr = qr, y = y[-1]) + ### Update the progress bar. + if (y[1]%%bins == 0 | y[1] == total.bg) { + Sys.sleep(0.1) + setTxtProgressBar(pb.bg, value = y[1]) + } + ### Return residues. + return(resid) + })) bc@background <- background.regressed bc@regression$order.background <- bc@regression$order # Close the background progress bar. @@ -328,18 +315,17 @@ bcRegressOut <- function(bc, vars.to.regress = NULL) { } #' @title Recomputes a beyondcell object -#' @description This function recomputes a beyondcell object using the matrix -#' stored in the slot \code{@@data} (original scores) or \code{@@normalized} -#' (which can contain subsetted and/or regressed bcscores). Columns added with -#' \code{\link[bcAddMetadata]{bcAddMetadata}} are preserved, except if they -#' define therapeutic clusters. Important: \code{bc@reductions} and -#' \code{bc@background} remain the same, while \code{bc@ranks} and -#' \code{bc@reductions} are removed. +#' @description This function recomputes a \code{\link[beyondcell]{beyondcell}} +#' object using the matrix stored in the slot \code{@@data} (original scores) or +#' \code{@@normalized} (which can contain subsetted and/or regressed scores). +#' Columns added with \code{\link[beyondcell]{bcAddMetadata}} are preserved, +#' except if they define therapeutic clusters. Important: \code{bc@background} +#' remains the same, while \code{bc@ranks} and \code{bc@reductions} are removed. #' @name bcRecompute #' @import scales -#' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param slot Matrix to recompute the beyondcell object. Either \code{"data"} -#' or \code{"normalized"}. +#' @param bc \code{beyondcell} object. +#' @param slot Score matrix to recompute the \code{beyondcell} object. Either +#' \code{"data"} or \code{"normalized"}. #' @return A recomputed \code{beyondcell} object. #' @examples #' @export @@ -357,10 +343,11 @@ bcRecompute <- function(bc, slot = "data") { # bc@normalized = bc@data. bc@normalized <- bc@data } else if (slot == "normalized") { - bc@normalized <- round(bc@normalized, 2) + bc@normalized <- round(bc@normalized, digits = 2) } - # Recompute scaled scores. - scaled <- round(t(apply(bc@normalized, 1, scales::rescale, to = c(0, 1))), 2) + # Recompute scaled BCS. + scaled <- round(t(apply(bc@normalized, 1, scales::rescale, to = c(0, 1))), + digits = 2) rownames(scaled) <- rownames(bc@normalized) colnames(scaled) <- colnames(bc@normalized) bc@scaled <- scaled @@ -371,7 +358,8 @@ bcRecompute <- function(bc, slot = "data") { if (!is.null(names(bc@reductions))) message('Removing @reductions slot...') bc@ranks <- bc@reductions <- vector(mode = "list", length = 0) # Remove therapeutic clusters from bc@meta.data. - therapeutic.clusters <- grep("bc_clusters_res.", colnames(bc@meta.data)) + therapeutic.clusters <- grep(pattern = "bc_clusters_res.", + x = colnames(bc@meta.data)) if (length(therapeutic.clusters) > 0) { message('Removing therapeutic clusters...') bc@meta.data <- bc@meta.data[, -c(therapeutic.clusters), drop = FALSE] @@ -380,27 +368,29 @@ bcRecompute <- function(bc, slot = "data") { return(bc) } -#' @title Add new metadata columns to an existing beyondcell object -#' @description This function adds new metadata columns to an existing +#' @title Add new metadata to an existing beyondcell object +#' @description This function adds new metadata to an existing #' \code{\link[beyondcell]{beyondcell}} object. -#' @name bcAddMetatada +#' @name bcAddMetadata #' @param bc \code{beyondcell} object. -#' @param metadata Dataframe with metadata to add. Rownames should be cell names -#' and colnames should not be already present in \code{beyondcell@@meta.data}. -#' @return Returns a \code{beyondcell} object with new added metadata columns. +#' @param metadata Matrix or dataframe with metadata to add. Rownames should be +#' cell names and colnames should not be already present in +#' \code{bc@@meta.data}. +#' @return Returns a \code{beyondcell} object with updated metadata. #' @examples #' @export -bcAddMetatada<- function(bc, metadata = NULL) { +bcAddMetadata <- function(bc, metadata) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check metadata. - if (class(metadata) != "data.frame") { - stop('metadata must be an object of class data.frame.') + if(!is.matrix(metadata) & !is.data.frame(metadata)) { + stop('metadata must be must be a matrix or a data.frame.') } # Check that the rownames of bc@meta.data and the new metadata are the same. - if (!identical(sort(rownames(bc@meta.data)), sort(rownames(metadata)))) { + if (!identical(sort(rownames(bc@meta.data), decreasing = FALSE), + sort(rownames(metadata), decreasing = FALSE))) { stop('metadata and bc@meta.data rownames are not the same.') } # Check that columns in metadata are different from the existing columns in @@ -417,8 +407,8 @@ bcAddMetatada<- function(bc, metadata = NULL) { #' @title Merges two beyondcell objects #' @description This function merges two \code{\link[beyondcell]{beyondcell}} #' objects obtained from the same single-cell matrix using the same -#' \code{expr.thres} (See \code{\link[bcScore]{bcScore}} for more information). -#' It binds signatures, not cells. +#' \code{expr.thres} (see \code{\link[beyondcell]{bcScore}} for more +#' information). It binds signatures, not cells. #' @name bcMerge #' @importFrom plyr join #' @param bc1 First \code{beyondcell} object to merge. @@ -447,10 +437,10 @@ bcMerge <- function(bc1, bc2) { identical.sigs <- sapply(duplicated.sigs, function(x) { identical(bc1@data[x, ], bc2@data[x, ]) }) - if (!all(identical.sigs)) { + if (any(!identical.sigs)) { stop(paste0('Duplicated signatures: ', paste0(duplicated.sigs[!identical.sigs], collapse = ", "), - ' without matching beyondcell scores in slot @data.')) + ' without matching BCS in slot @data.')) } } # Check regression steps. @@ -470,9 +460,9 @@ bcMerge <- function(bc1, bc2) { regression = bc1@regression, n.genes = unique(c(bc1@n.genes, bc2@n.genes)), mode = unique(c(bc1@mode, bc2@mode)), thres = bc1@thres) - # rbind scaled scores. + # rbind scaled BCS. bc@scaled <- unique(rbind(bc1@scaled, bc2@scaled[, cells]))[, cells] - # rbind normalized scores. + # rbind normalized BCS. bc@normalized <- unique(rbind(bc1@normalized, bc2@normalized[, cells]))[, cells] # rbind data. bc@data <- unique(rbind(bc1@data, bc2@data[, colnames(bc1@data)]))[, colnames(bc1@data)] @@ -482,7 +472,7 @@ bcMerge <- function(bc1, bc2) { bc@meta.data <- suppressMessages(plyr::join(bc1@meta.data, bc2@meta.data)) rownames(bc@meta.data) <- rownames(bc1@meta.data) # Remove therapeutic clusters from bc@meta.data. - therapeutic.clusters <- grep("bc_clusters_res.", colnames(bc@meta.data)) + therapeutic.clusters <- grep(pattern = "bc_clusters_res.", x = colnames(bc@meta.data)) if (length(therapeutic.clusters) > 0) { bc@meta.data <- bc@meta.data[, -c(therapeutic.clusters), drop = FALSE] } @@ -498,10 +488,10 @@ bcMerge <- function(bc1, bc2) { #' @import scales #' @param bc \code{beyondcell} object. #' @details This function creates a new \code{beyondcell} object by using the -#' \code{@@normalized} scores as the original \code{@@data}. Switch points are -#' recomputed and \code{@@regression} is restarted. The expression and -#' metadata matrices are subsetted to keep only those cells present in the new -#' \code{@@data} slot. +#' normalized BCS as the original \code{@@data}. Switch points are +#' recomputed and \code{@@regression} is restarted. The \code{@@expr.matrix} and +#' \code{@@meta.data} slots are subsetted to keep only those cells present in +#' the new \code{@@data} slot. #' @return A new \code{beyondcell} object. #' @examples #' @export diff --git a/R/Ranks.R b/R/Ranks.R index 88dd1a0..9b4f806 100644 --- a/R/Ranks.R +++ b/R/Ranks.R @@ -1,6 +1,6 @@ #' @title Ranks the signatures from most sensitive to least sensitive -#' @description This function computes the bcscores statistics of each -#' signature and ranks them according to the switch point and mean. +#' @description This function computes the beyondcell score's (BCS) statistics +#' of each signature and ranks them according to the switch point and mean. #' @name bcRanks #' @param bc \code{\link[beyondcell]{beyondcell}} object. #' @param idents Name of the metadata column of interest. If @@ -8,12 +8,12 @@ #' \code{idents != NULL}, the signatures' ranks are computed for each level in #' \code{idents}. #' @param extended If \code{extended = TRUE}, this function returns the switch -#' point, mean, median, sd, variance, min, max, proportion of \code{NaN} and -#' residuals' mean per signature. If \code{extended = FALSE}, this function -#' returns only the switch point, mean and residuals' mean. -#' @return A \code{beyondcell object} with the results in a new entry of -#' \code{bc@@ranks} (\code{bc@@ranks$general} if \code{idents = NULL} or -#' \code{bc@@ranks$idents} if \code{idents != NULL}). +#' point, mean, median, standard deviation, variance, min, max, proportion of +#' \code{NaN} and residuals' mean per signature. If \code{extended = FALSE}, +#' this function returns only the switch point, mean and residuals' mean. +#' @return A \code{beyondcell} object with the results in a new entry of +#' \code{bc@@ranks}: \code{bc@@ranks[["general"]]} (if \code{idents = NULL}) or +#' \code{bc@@ranks[[idents]]} (if \code{idents != NULL}). #' @examples #' @export @@ -52,7 +52,7 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { # Extended. if (extended) n <- 6 else n <- 1 - # Statistics for all normalized beyondcell scores. + # Statistics for all normalized BCS. if (is.null(idents)) { # Progress bar. total <- n.rows * n @@ -63,7 +63,8 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { order.col <- "rank" # Get Statistics dataframe. final.stats <- GetStatistics(bc = bc, signatures = sigs, cells = cells, - pb, total, i = 1, n.rows, extended) + pb = pb, total = total, i = 1, n.rows = n.rows, + extended = extended) # Add rank. sig.order <- order(-1 * final.stats$switch.point, final.stats$mean, decreasing = TRUE) @@ -72,7 +73,7 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { final.stats <- final.stats[c("rank", colnames(final.stats)[-ncol(final.stats)])] } else { # Metadata levels. - lvls <- sort(unique(as.factor(meta))) + lvls <- sort(unique(as.factor(meta)), decreasing = FALSE) # Progress bar. total <- n.rows * n * length(lvls) pb <- txtProgressBar(min = 0, max = total, style = 3) @@ -93,7 +94,8 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { bcSubset(sub.bc, cells = group.cells))) ### Get Statistics dataframe. out <- GetStatistics(bc = sub.bc, signatures = sigs, cells = group.cells, - pb, total, i, n.rows, extended) + pb = pb, total = total, i = i, n.rows = n.rows, + extended = extended) ### Add rank. sig.order <- order(-1 * out$switch.point, out$mean, decreasing = TRUE) out$rank[sig.order] <- 1:nrow(out) @@ -104,24 +106,24 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { return(out) }) # Merge dataframes of all levels. - final.stats <- do.call(cbind.data.frame, stats) + final.stats <- do.call(what = cbind.data.frame, args = stats) } # Add Drug name and MoA to final.stats. cols <- colnames(final.stats) - info <- subset(drugInfo, drugInfo$sig_id %in% rownames(final.stats)) + info <- subset(drugInfo, subset = drugInfo$IDs %in% rownames(final.stats)) if (dim(info)[1] > 0) { - info <- aggregate(.~sig_id, data = info, na.action = NULL, FUN = function(x) { + info <- aggregate(.~ IDs, data = info, na.action = NULL, FUN = function(x) { paste(na.omit(unique(x)), collapse = "; ") }) } - rownames(info) <- info$sig_id - info <- info[, c("Name", "Preferred_Name", "MoA", "Target", "Source")] + rownames(info) <- info$IDs + info <- info[, c("drugs", "preferred.drug.names", "MoAs", "targets", "sources")] final.stats <- transform(merge(final.stats, info, by = 0, all.x = TRUE), row.names = Row.names, Row.names = NULL) # Order by rank and reorder columns. - final.stats <- final.stats[order(final.stats[, order.col]), - c("Name", "Preferred_Name", "MoA", "Target", - "Source", cols)] + final.stats <- final.stats[order(final.stats[, order.col], decreasing = FALSE), + c("drugs", "preferred.drug.names", "MoAs", + "targets", "sources", cols)] # Add to beyondcell object. bc@ranks[[idents]] <- final.stats # Close the progress bar. @@ -132,17 +134,18 @@ bcRanks <- function(bc, idents = NULL, extended = TRUE) { #' @title Returns the first/last n ranked signatures #' @description This function returns the top/bottom \code{n} signatures ranked -#' according to the bcscores of cells that satisfy -#' \code{bc@@meta.data$idents == cond}. +#' by \code{\link[beyondcell]{bcRanks}}. If the rank has not been previously +#' computed, \code{rankSigs} performs the ranking itself. #' @name rankSigs #' @param bc \code{\link[beyondcell]{beyondcell}} object. #' @param idents Name of the metadata column of interest. If -#' \code{idents = NULL}, the function computes the ranks using all cells. +#' \code{idents = NULL}, the function uses the general rank computed with all +#' cells. #' @param cond Level of \code{idents} to rank by the output vector. If #' \code{idents = NULL}, this parameter is deprecated. #' @param n Number of signatures to return in the output vector. -#' @param decreasing Return the top \code{n} signatures (default) or the bottom -#' \code{n} signatures (\code{decreasing = FALSE}). +#' @param decreasing Logical. Return the top \code{n} signatures (default) or +#' the bottom \code{n} signatures (\code{decreasing = FALSE})?. #' @return An ordered vector with the signature's names. #' @examples #' @export @@ -160,13 +163,13 @@ rankSigs <- function(bc, idents = NULL, cond = NULL, n = 10, if (!idents %in% colnames(bc@meta.data)) { stop('Idents not found.') } - if (is.null(cond)) { + if (is.null(cond[1])) { stop('Invalid cond.') } meta <- idents } else { meta <- "general" - if (!is.null(cond)) { + if (!is.null(cond[1])) { warning('idents not specified, cond is deprecated.') cond <- NULL } @@ -184,26 +187,20 @@ rankSigs <- function(bc, idents = NULL, cond = NULL, n = 10, if (length(n) != 1 | (!is.numeric(n) & !is.character(n))) { stop('n must be a single number or "all".') } - if (is.numeric(n)) { - if (n < 1 | n%%1 != 0) { - stop('n must be an integer > 0.') - } - } else if (is.character(n)) { - if (n == "all") { - n <- nrow(bc@normalized) - } else { - stop('To select all signatures, please set n = "all".') - } + if (is.numeric(n) & (n < 1 | n%%1 != 0)) stop('n must be an integer > 0.') + else if (is.character(n)) { + if (n == "all") n <- nrow(bc@normalized) + else stop('To select all signatures, please set n = "all".') } # Check decreasing. if (length(decreasing) != 1 | !is.logical(decreasing)) { - stop('decreasimg must be a logical argument of length 1.') + stop('decreasimg must be TRUE or FALSE.') } # --- Code --- # If ranks have not been computed, compute them now. if (!meta %in% names(bc@ranks)) { message('Computing ranks...') - bc <- bcRanks(bc, idents, extended = FALSE) + bc <- bcRanks(bc, idents = idents, extended = FALSE) } # Get ranks for the specified idents. df <- bc@ranks[[meta]] @@ -214,8 +211,9 @@ rankSigs <- function(bc, idents = NULL, cond = NULL, n = 10, idx <- nrow(bc@normalized):(nrow(bc@normalized) - n + 1) } # Return signatures whose rank == idx. - order.col <- ifelse(meta == "general", "rank", paste0("rank.", cond)) - ordered.df <- df[order(df[, order.col]), ] + order.col <- ifelse(test = meta == "general", yes = "rank", + no = paste0("rank.", cond)) + ordered.df <- df[order(df[, order.col], decreasing = FALSE), ] sigs <- ordered.df[idx, "Name"] return(sigs) } diff --git a/R/Reductions.R b/R/Reductions.R index 7ca182f..07ad7ac 100644 --- a/R/Reductions.R +++ b/R/Reductions.R @@ -1,51 +1,50 @@ -#' @title Computes the UMAP projection and the therapeutic clusters using the -#' beyondcell scores -#' @description This function uses the beyondcell scores to compute an UMAP -#' projection of the data and clusterizes cells according to their sensitivity -#' to the tested drugs (therapeutic clusters). +#' @title Computes a UMAP projection and therapeutic clusters using the BCS +#' @description This function uses the beyondcell scores (BCS) to compute a +#' UMAP projection of the data and to cluster cells according to their +#' sensitivity to the tested drugs (therapeutic clusters). #' @name bcUMAP #' @import scales #' @import Seurat #' @import ggplot2 #' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param pc Number of principal components to use (estimated from the Elbow -#' plot). If \code{pc = NULL} (default), the function will stop prior to compute -#' the UMAP projection and the therapeutic clusters. See Details for more -#' information. -#' @param k.neighbors (\code{Seurat}'s \code{k.param}). Defines \emph{k} for the -#' k-nearest neighbor algorithm. -#' @param res (\code{Seurat}'s \code{resolution}) Value of the resolution -#' parameter, use a value above (below) 1.0 if you want to obtain a larger -#' (smaller) number of communities. Can be a single number or a numeric vector. -#' @param add.DSS Use background beyondcell scores computed with the -#' \code{\link[DSS]{DSS}}) signature (\code{add.DSS = TRUE}) or just use the -#' drugs included in the \code{bc} object (\code{add.DSS = FALSE}) to compute -#' the UMAP projection and the therapeutic clusters. If the number of drugs in -#' \code{bc} (excluding pathways) is < 20, it is recomended to set -#' \code{add.DSS = TRUE}. Note that if \code{add.DSS = TRUE}, the regression and -#' subset steps that have been applied to \code{bc} will also be applied to the -#' background beyondcell scores. -#' @param elbow.path Path to save the Elbow plot. If \code{elbow.path = NULL} +#' @param pc Number of principal components to use (which can be estimated from +#' an elbow plot). If \code{pc = NULL} (default), the function will stop prior +#' to compute the UMAP projection and the therapeutic clusters. See Details for +#' more information. +#' @param k.neighbors (\code{\link[Seurat]{FindNeighbors}}' \code{k.param}) +#' Defines \emph{k} for the k-Nearest Neighbour algorithm. +#' @param res (\code{\link[Seurat]{FindClusters}}' \code{resolution}) Value of +#' the resolution parameter, use a value above/below 1.0 if you want to obtain +#' a larger/smaller number of communities. Can be a single number or a numeric +#' vector. +#' @param add.DSS Use background BCS computed with \code{DSS} signatures +#' (\code{add.DSS = TRUE}) or just use the signatures included in the \code{bc} +#' object (\code{add.DSS = FALSE}) to compute the UMAP projection and the +#' therapeutic clusters. If the number of drugs in \code{bc} +#' (excluding pathways) is <= 20, it is recomended to set \code{add.DSS = TRUE}. +#' Note that if \code{add.DSS = TRUE}, the regression and subset steps that have +#' been applied on \code{bc} will also be applied on the background BCS. +#' @param elbow.path Path to save the elbow plot. If \code{elbow.path = NULL} #' (default), the plot will be printed. -#' @details This function performs all the steps required to obtain an UMAP -#' reduction of the data and clusterize the cells according to the beyondcell -#' scores. You will normally require to run the function twice: +#' @details This function performs all the steps required to obtain a UMAP +#' reduction of the data and cluster the cells according to the BCS. +#' +#' You will normally require to run the function twice: #' \enumerate{ -#' \item Using \code{pc = NULL} to obtain the Elbow plot. +#' \item Using \code{pc = NULL} to obtain the elbow plot. #' \item Specifying a value for the \code{pc} parameter according to this plot. -#' This second time, the UMAP reduction and the beyondcell clusters will be -#' computed. -#' } +#' This second time, the UMAP reduction and the therapeutic clusters will be +#' computed.} #' -#' Note that \code{add.DSS} must be the same in both runs, so the Elbow plot +#' Note that \code{add.DSS} must be the same in both runs, so the elbow plot #' obtained in 1 is still valid in 2. If \code{add.DSS = TRUE}, the background -#' beyondcell scores will be stored in the \code{bc} object and the function -#' will skip this step the second time. +#' BCS will be stored in the \code{bc} object and the function will skip this +#' step the second time. #' @return A \code{beyondcell} object with the UMAP reduction in -#' \code{bc@@reductions} slot and the therapeutic clusters for each \code{res} -#' in \code{bc@@meta.data}. Also, an Elbow plot (\code{\link[ggplot]{ggplot}} +#' \code{@@reductions} slot and the therapeutic clusters for each \code{res} +#' in \code{bc@@meta.data}. Also, an elbow plot (\code{\link[ggplot2]{ggplot2}} #' object) is printed (if \code{elbow.path = NULL}) or saved (if -#' \code{elbow.path} is not \code{NULL}). +#' \code{elbow.path != NULL}). #' @examples #' @export @@ -56,16 +55,15 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check pc. if (!is.null(pc)) { - if (length(pc) != 1 | !is.numeric(pc) | pc[1] < 2) { - stop('pc must be an integer >= 2.') - } + if (length(pc) != 1 | pc[1] < 2) stop('pc must be an integer >= 2.') } # Check k.neighbors. - if (length(k.neighbors) != 1 | !is.numeric(k.neighbors) | k.neighbors < 1) { + if (length(k.neighbors) != 1 | k.neighbors < 1) { stop('k.neighbors must be a positive integer.') } # Check res. - if (!all(sapply(res, is.numeric)) | any(sapply(res, function(x) x < 0))) { + if (any(sapply(res, function(x) !is.numeric(x))) | + any(sapply(res, function(y) y < 0))) { stop('res must be a vector of numbers >= 0.') } # Check add.DSS. @@ -74,22 +72,23 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, if (length(add.DSS) != 1 | !is.logical(add.DSS)) { stop('add.DSS must be TRUE or FALSE.') } else if (!add.DSS) { - if (n.drugs < 3) { + if (n.drugs <= 10) { stop(paste('Only', n.drugs, 'drug signatures (excluding pathways) are', - 'present in bc object, please set add.DSS = TRUE.')) - } else if (n.drugs < 20) { + 'present in the bc object, please set add.DSS = TRUE.')) + } else if (n.drugs <= 20) { warning(paste('Computing an UMAP reduction for', n.drugs, 'drugs. We recommend to set add.DSS = TRUE when the number', - 'of signatures (excluding pathways) is below 20.')) + 'of signatures (excluding pathways) is below or equal to 20.')) } } # Check elbow.path. if (!is.null(elbow.path)) { if (length(elbow.path) != 1 | !is.character(elbow.path)) { - stop(paste('To save the Elbow plot, you must specify a character string', + stop(paste('To save the elbow plot, you must specify a character string', 'with the desired destination.')) } else { - parent.dir <- sub("(.*\\/)([^.]+)(\\.[[:alnum:]]+$)", "\\1", elbow.path) + parent.dir <- sub(pattern = "(.*\\/)([^.]+)(\\.[[:alnum:]]+$)", + replacement = "\\1", x = elbow.path) if (!dir.exists(parent.dir)) { stop(paste(parent.dir, 'path not found.')) } @@ -99,33 +98,35 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, # Cells in bc. cells <- colnames(bc@normalized) if (add.DSS) { - ### DSS (background) beyondcell scores. - if (!identical(sort(rownames(bc@background)), sort(DSS[[1]]$sig_id)) | - !identical(sort(colnames(bc@background)), sort(cells)) | + ### DSS (background) BCS. + if (!identical(sort(rownames(bc@background), decreasing = FALSE), + sort(DSS[[1]]$sig_id, decreasing = FALSE)) | + !identical(sort(colnames(bc@background), decreasing = FALSE), + sort(cells, decreasing = FALSE)) | !identical(bc@regression$order, bc@regression$order.background)) { - message('Computing background beyondcell scores using DSS signatures...') + message('Computing background BCS using DSS signatures...') ### Genesets. gs.background <- suppressMessages( GenerateGenesets(DSS, n.genes = bc@n.genes, mode = bc@mode, include.pathways = FALSE)) - ### Beyondcell score. + ### BCS. background <- suppressWarnings( - bcScore(bc@expr.matrix, gs.background, expr.thres = bc@thres)) + bcScore(bc@expr.matrix, gs = gs.background, expr.thres = bc@thres)) ### Add metadata. background@meta.data <- background@meta.data[, -c(1:ncol(background@meta.data))] - background <- bcAddMetatada(background, metadata = bc@meta.data) + background <- bcAddMetadata(background, metadata = bc@meta.data) ### Subset and regress (if needed). if (bc@regression$order[1] == "subset") { background <- bcSubset(background, cells = cells) } else if (bc@regression$order[1] == "regression") { - message('Regressing background scores...') + message('Regressing background BCS...') background <- suppressMessages( bcRegressOut(background, vars.to.regress = bc@regression[["vars"]])) } if (bc@regression$order[2] == "subset") { background <- bcSubset(background, cells = cells) } else if (bc@regression$order[2] == "regression") { - message('Regressing background scores...') + message('Regressing background BCS...') background <- suppressMessages( bcRegressOut(background, vars.to.regress = bc@regression[["vars"]])) } @@ -134,7 +135,7 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, ### Add order.background to bc@regression. bc@regression[["order.background"]] <- bc@regression[["order"]] } else { - message('Background beyondcell scores already computed. Skipping this step.') + message('Background BCS already computed. Skipping this step.') } ### Add background to bc. all.rows <- unique(c(rownames(bc@normalized), rownames(bc@background))) @@ -143,7 +144,7 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, merged.score <- t(apply(merged.score, 1, scales::rescale, to = c(0, 1))) bc.merged <- beyondcell(scaled = merged.score) } else { - ### No background DSS beyondcell scores. + ### No background BCS. message(paste('DSS background not computed. UMAP will be created just with', 'the drugs (not pathways) in bc object.')) bc.merged <- bc @@ -156,10 +157,10 @@ bcUMAP <- function(bc, pc = NULL, k.neighbors = 20, res = 0.2, # Elbow plot. elbowplot <- Seurat::ElbowPlot(sc, ndims = 50) + ggplot2::theme(legend.position = "bottom") if (is.null(elbow.path)) { - message('Printing Elbow plot...') + message('Printing elbow plot...') print(elbowplot) } else { - message(paste('Saving Elbow plot in', elbow.path)) + message(paste('Saving elbow plot in', elbow.path)) ggplot2::ggsave(elbow.path, plot = elbowplot) } if (!is.null(pc)) { diff --git a/R/Score.R b/R/Score.R index bfccdf4..03eb7a1 100644 --- a/R/Score.R +++ b/R/Score.R @@ -1,15 +1,14 @@ -#' @title Computes the beyondcell score -#' @description This function computes the beyondcell score and returns an +#' @title Computes the BCS +#' @description This function computes the beyondcell score (BCS) and returns an #' object of class \code{\link[beyondcell]{beyondcell}}. #' @name bcScore #' @import Seurat #' @import scales #' @param sc \code{\link[Seurat]{Seurat}} object or expression matrix. -#' @param gs \code{\link[geneset]{geneset}} object. +#' @param gs \code{\link[beyondcell]{geneset}} object. #' @param expr.thres Minimum fraction of signature genes that must be -#' expressed in a cell to compute its beyondcell score. Cells with a number of -#' expressed genes below this fraction will have a \code{NaN} beyondcell -#' score. +#' expressed in a cell to compute its BCS. Cells with a number of expressed +#' genes below this fraction will have a \code{NaN} BCS. #' @return A \code{beyondcell} object. #' @examples #' @export @@ -29,32 +28,29 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { stop('Default assay must include a normalized data (@data) slot.') } } else { - stop(paste('Seurat default assay must be either RNA or SCT.', - 'Integrated data is not accepted as input.')) + stop('Seurat default assay must be either RNA or SCT.') } - } else if ("matrix"%in% class(sc) & is.numeric(sc)) { + } else if ("matrix" %in% class(sc) & is.numeric(sc)) { input <- "expression matrix" warning(paste('Using count matrix as input. Please, check that this matrix', 'is normalized and unscaled.')) expr.matrix <- sc sc <- Seurat::CreateSeuratObject(expr.matrix) - } else stop('sc must be either a Seurat object or an expression matrix.') + } else stop(paste('sc must be either a Seurat object or a single-cell', + 'expression matrix.')) # Check if gs is a geneset object. if (class(gs) != "geneset") stop('gs must be a geneset object.') # Check expr.thres. - if (length(expr.thres) != 1 | !is.numeric(expr.thres)) { - stop('expr.thres must be a single number.') - } - if (expr.thres < 0 | expr.thres > 1) { + if (length(expr.thres) != 1 | expr.thres[1] < 0 | expr.thres[1] > 1) { stop('expr.thres must be a positive number between 0 and 1.') } # Check that gene names are in the same format. - sc.gene.case <- names(which.max(GeneCase(rownames(expr.matrix)))) - gs.gene.case <- names(which.max(GeneCase(unique(unlist(gs@genelist))))) + sc.gene.case <- names(which.max(CaseFraction(rownames(expr.matrix)))) + gs.gene.case <- names(which.max(CaseFraction(unique(unlist(gs@genelist))))) if (sc.gene.case != gs.gene.case) { warning(paste0('gs genes are ', sc.gene.case, ' and sc genes are ', - gs.gene.case, '. Please check your ', input, - ' and translate the genes if necessary.')) + gs.gene.case, '. Please check your ', input, + ' and translate the genes if necessary.')) } # --- Code --- # Create beyondcell object. @@ -90,14 +86,14 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { ### Update the progress bar. if (i%%bins == 0) { Sys.sleep(0.1) - setTxtProgressBar(pb, i) + setTxtProgressBar(pb, value = i) } ### Is n.expr.genes < all.genes * expr.thres? return(n.expr.genes < (length(all.genes) * expr.thres)) })) rownames(below.thres) <- names(gs@genelist) below.thres <- below.thres[, colnames(expr.matrix)] - # Beyondcell scores. + # BCS. bcs <- lapply(seq_along(gs@mode), function(j) { score <- t(sapply(1:len.gs, function(k) { ### Common genes between the expr.marix and each signature. @@ -108,15 +104,15 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { sum.expr <- colSums(sub.expr.matrix) ### Raw score (mean). raw <- colMeans(sub.expr.matrix) - ### Stdev (for bc score normalization). + ### Stdev (for BCS normalization). sig.stdev <- apply(sub.expr.matrix, 2, sd) - ### Normalized score. + ### Normalized BCS. norm.score <- raw * ((sum.expr - sig.stdev)/(raw + sig.stdev)) ### Update the progress bar. step <- len.gs + (j - 1) * len.gs + k if (step%%bins == 0 | step == total) { Sys.sleep(0.1) - setTxtProgressBar(pb, step) + setTxtProgressBar(pb, value = step) } return(norm.score) })) @@ -144,7 +140,7 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { scoring.matrix <- -1 * scoring.matrix[, , drop = FALSE] } } - # If genesets were obtained in a TREATED vs CONTROL comparison, invert bcscore + # If genesets were obtained in a TREATED vs CONTROL comparison, invert BCS # sign (exclude pathways). if (gs@comparison == "treated_vs_control") { not.paths <- which(!(rownames(scoring.matrix) %in% names(pathways))) @@ -156,10 +152,10 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { scoring.matrix <- (-1) * scoring.matrix[, , drop = FALSE] } } - slot(bc, "normalized") <- slot(bc, "data") <- round(scoring.matrix, 2) + slot(bc, "normalized") <- slot(bc, "data") <- round(scoring.matrix, digits = 2) # Scale the final matrix [0, 1] for each signature (for visualization). scaled.matrix <- t(apply(bc@normalized, 1, scales::rescale, to = c(0, 1))) - slot(bc, "scaled") <- round(scaled.matrix, 2) + slot(bc, "scaled") <- round(scaled.matrix, digits = 2) # Compute the switch point. switch.point <- SwitchPoint(bc) slot(bc, "switch.point") <- switch.point @@ -168,19 +164,19 @@ bcScore <- function(sc, gs, expr.thres = 0.1) { return(bc) } -#' @title Returns the fraction of each case type in the input vector +#' @title Returns the fraction of each case type in the input #' @description This function computes the fraction of of each case type #' (uppercase, lowercase or capitalized) in a character vector. -#' @name GeneCase +#' @name CaseFraction #' @import useful #' @param x Character vector. -#' @return A named numeric vector with the fractions of each case type. +#' @return A named numeric vector with the fraction of each case type. #' @examples #' @export -GeneCase <- function(x) { +CaseFraction <- function(x) { # --- Checks --- - if(!is.character(x)) stop('x must be of class character.') + if(!is.character(x)) stop('x must be a character vector.') # --- Code --- perc <- sapply(c("upper", "lower", "capitalized"), function(case) { if (case == "upper" | case == "lower") { @@ -193,7 +189,7 @@ GeneCase <- function(x) { p <- sum(useful::upper.case(first.letter) & useful::lower.case(rest.letters))/length(x) } - return(round(p, 2)) + return(round(p, digits = 2)) }) names(perc)[1:2] <- paste0("in ", names(perc)[1:2], "case") return(perc) @@ -202,10 +198,10 @@ GeneCase <- function(x) { #' @title Computes the switch point #' @description This function computes the switch point of the signatures of a #' given \code{\link[beyondcell]{beyondcell}} object. The switch point is the -#' (subsetted and/or regressed) scaled bcscore that corresponds to the point in -#' which the normalized bcscores in \code{beyondcell@@data} switch from -#' negative (insensitive) to positive (sensitive) values. The closer to 0, the -#' more sensitive are the cells to a given drug. +#' (subsetted and/or regressed) scaled beyondcell score (BCS) that corresponds +#' to the point in which the normalized BCS in \code{beyondcell@@data} switchs +#' from negative (insensitive) to positive (sensitive) values. The closer to 0, +#' the more sensitive are the cells to a given drug. #' @name SwitchPoint #' @param bc \code{beyondcell} object. #' @return A named vector with the swich points of the signatures in \code{bc}. @@ -217,7 +213,7 @@ SwitchPoint <- function(bc) { # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check bc@mode. - if (!all(bc@mode %in% c("up", "down")) | all(is.null(bc@mode))) { + if (any(!(bc@mode %in% c("up", "down"))) | all(is.null(bc@mode))) { stop('Incorrect mode.') } bc@mode <- unique(bc@mode) @@ -226,36 +222,36 @@ SwitchPoint <- function(bc) { sigs <- rownames(bc@normalized) # Cells in bc. cells <- colnames(bc@normalized) - # If bc@mode == "up", all normalized bcscores will be positive and all - # switch points will be 0. + # If bc@mode == "up", all normalized BCS will be positive and all switch + # points will be 0. if (all(bc@mode == "up")) { - switch.point <- rep(0, length(sigs)) - # If bc@mode == "down", all normalized bcscores will be negative and all - # switch points will be 1. + switch.point <- rep(0, times = length(sigs)) + # If bc@mode == "down", all normalized BCS will be negative and all switch + # points will be 1. } else if (all(bc@mode == "down")) { - switch.point <- rep(1, length(sigs)) - # If bc@mode == c("up", "down")... + switch.point <- rep(1, times = length(sigs)) + # If bc@mode == c("up", "down")... } else if(length(bc@mode) == 2) { ### Create a list with an entry for each signature. If the entry has ### length == 1, it is the final switch point. If it has length == 2, it - ### corresponds to the indexes of the normalized bcscores closest to 0. + ### corresponds to the indexes of the normalized BCS closest to 0. indexes <- lapply(sigs, FUN = function(x) { - ### Subset the normalized bcscores in @data. + ### Subset the normalized BCS in @data. m <- bc@data[x, cells] ### If all values are NaN, return NaN. if (all(is.na(m))) return(NaN) ### Else, remove NaN values. m.nona <- na.omit(m) - ### If all normalized bcscores are positive, the switch point is 0. + ### If all normalized BCS are positive, the switch point is 0. if (all(m.nona >= 0)) return(0) - ### If all normalized bcscores are negative, the switch point is 1. + ### If all normalized BCS are negative, the switch point is 1. else if (all(m.nona <= 0)) return(1) - ### If there are positive and negative normalized bcscores... + ### If there are positive and negative normalized BCS... else { exact.0 <- m.nona == 0 - ### If any of the normalized bcscores == 0, return its index (we repeat + ### If any of the normalized BCS == 0, return its index (we repeat ### it twice to indicate it's an index and not the final switch point). - if (any(exact.0)) return(rep(which(m == 0)[1], 2)) + if (any(exact.0)) return(rep(which(m == 0)[1], times = 2)) ### Else, get the indexes of the values closer to 0. else { lower.bound <- which(m == max(m.nona[m.nona <= 0]))[1] @@ -270,7 +266,7 @@ SwitchPoint <- function(bc) { ### If length(entry) == 2, the values are indexes. We subset those ### indexes in the scaled bcsores and take the arithmetic mean. if (length(indexes[[y]]) == 2) { - return(round(sum(bc@scaled[y, indexes[[y]]])/2, 2)) + return(round(sum(bc@scaled[y, indexes[[y]]])/2, digits = 2)) ### If length(entry) == 1, the value is the final switch point. } else { return(indexes[[y]]) diff --git a/R/Visualization.R b/R/Visualization.R index d2c9355..1d4b7c2 100644 --- a/R/Visualization.R +++ b/R/Visualization.R @@ -1,31 +1,38 @@ -#' @title Plots the UMAP reduction colored by metadata information -#' @description This function returns a {\link[ggplot]{ggplot}} object with the -#' UMAP reduction (either \code{beyondcell}'s or \code{Seurat}'s) colored by the -#' specified metadata column. +#' @title Plots a UMAP reduction coloured by metadata information +#' @description This function returns a {\link[ggplot2]{ggplot2}} object with +#' a UMAP reduction coloured by the specified metadata column. #' @name bcClusters #' @import Seurat #' @import ggplot2 #' @import scales #' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param UMAP UMAP reduction to plot. Either "beyondcell" (computed using -#' \code{\link[bcUMAP]{bcUMAP}}) or "Seurat" computed using \code{Seurat}'s -#' functions. -#' @param idents Name of the metadata column to color by. +#' @param idents Name of the metadata column to colour by. +#' @param UMAP UMAP reduction to plot. Either \code{"beyondcell"}, computed +#' using \code{\link[beyondcell]{bcUMAP}}, or \code{"Seurat"}, obtained using +#' \code{Seurat}'s functions. #' @param factor.col Logical indicating if \code{idents} column is a factor or -#' not. If \code{idents} is a numerical column (such as \code{percent.mt} or -#' \code{nFeature_RNA}, \code{factor.col} must be \code{FALSE}). -#' @param ... Other arguments passed to \code{Seurat}'s -#' \code{\link[DimPlot]{DimPlot}}. -#' @return A \code{\link[ggplot]{ggplot}} object with the UMAP reduction colored -#' by \code{idents}. +#' not. Set \code{factor.col = FALSE} if \code{idents} is a numeric column (such +#' as \code{percent.mt} or \code{nFeature_RNA}). +#' @param ... Other arguments passed to \code{\link[Seurat]{DimPlot}}. +#' @return A \code{ggplot2} object with the UMAP reduction coloured by +#' \code{idents}. #' @examples #' @export -bcClusters <- function(bc, UMAP = "beyondcell", idents = NULL, - factor.col = TRUE, ...) { +bcClusters <- function(bc, idents, UMAP = "beyondcell", factor.col = TRUE, + ...) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') + # Check idents. + if (length(idents) != 1) { + stop('Idents must be a single metadata column.') + } + if (idents %in% colnames(bc@meta.data)) { + meta <- bc@meta.data[colnames(bc@scaled), idents, drop = FALSE] + } else { + stop('Idents not found.') + } # Check UMAP. if (UMAP == "beyondcell") { if (length(bc@reductions) == 0) { @@ -42,19 +49,13 @@ bcClusters <- function(bc, UMAP = "beyondcell", idents = NULL, } else { stop('Incorrect UMAP argument. Please use either "Seurat" or "beyondcell".') } - # Check idents. - if (idents %in% colnames(bc@meta.data)) { - meta <- bc@meta.data[colnames(bc@scaled), idents, drop = FALSE] - } else { - stop('Idents not found.') - } # Check factor.col. if (length(factor.col) != 1 | !is.logical(factor.col)) { stop('factor.col must be TRUE or FALSE.') } # --- Code --- # Add metadata. - sc <- Seurat::AddMetaData(sc, meta) + sc <- Seurat::AddMetaData(sc, metadata = meta) # Add reductions. sc@reductions <- reduction # Plot. @@ -64,48 +65,45 @@ bcClusters <- function(bc, UMAP = "beyondcell", idents = NULL, p <- Seurat::DimPlot(sc, reduction = "umap", ...) + ggplot2::theme_minimal() } else { p <- Seurat::FeaturePlot(sc, reduction = "umap", features = idents, ...) + - ggplot2::theme_minimal() + ggplot2::theme_minimal() + ggplot2::labs(title = NULL) } return(p) } -#' @title Plots a histogram with the beyondcell scores of the signature of -#' interest -#' @description This function drawns a histogram plot of bcscores for each -#' signature of interest. The plot can be a single histogram (if -#' \code{idents = NULL}) or a histogram for each level found in \code{idents}. +#' @title Plots a histogram with the BCS of the signature of interest +#' @description This function drawns a histogram of beyondcell scores (BCS) for +#' each signature of interest. The plot can be a single histogram or a histogram +#' for each level found in \code{idents}. #' @name bcHistogram #' @import ggplot2 #' @param bc \code{\link[beyondcell]{beyondcell}} object. #' @param signatures Vector with the names of the signatures of interest. If -#' \code{signatures == "all"}, all signatures are selected. +#' \code{signatures = "all"}, all signatures are selected. #' @param idents Name of the metadata column of interest. If -#' \code{idents = NULL}, a single histogram with all bcscores is drawn. On the -#' other hand, if \code{idents != NULL} a histogram for each level found in +#' \code{idents = NULL}, a single histogram with all BCS is drawn. On the other +#' hand, if \code{idents != NULL} a histogram for each level found in #' \code{idents} will be drawn. -#' @return A list of \code{\link[ggplot]{ggplot}} histograms, one for each +#' @return A list of \code{\link[ggplot2]{ggplot2}} histograms, one for each #' signature of interest. In each histogram, the median, mean and sd are #' reported. Also, the mean is indicated with a black dashed line and the median #' with a red dashed line. #' @examples #' @export -bcHistogram <- function(bc, signatures = NULL, idents = NULL) { +bcHistogram <- function(bc, signatures, idents = NULL) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check signatures. - if (is.null(signatures)) { - stop('You must specify the signatures of interest.') - } if (!is.character(signatures)) { stop("Signatures must be a character vector.") } if (length(signatures) == 1 & signatures[1] == "all") { signatures <- rownames(bc@normalized) - in.signatures <- rep(TRUE, nrow(bc@normalized)) + in.signatures <- rep(TRUE, times = nrow(bc@normalized)) } else { - in.signatures <- signatures %in% rownames(bc@normalized) + in.signatures <- !is.null(signatures) & + signatures %in% rownames(bc@normalized) if (all(!in.signatures)) { stop('None of the specified signatures were found.') } else if (any(!in.signatures)) { @@ -126,34 +124,29 @@ bcHistogram <- function(bc, signatures = NULL, idents = NULL) { } } else { ### If idents = NULL, all cells have the same metadata. - meta <- rep("", ncol(bc@normalized)) + meta <- rep("", times = ncol(bc@normalized)) } # --- Code --- # Metadata levels. lvls <- levels(as.factor(meta)) # Subset bc to the selected signatures. sub.bc <- bc@data[signatures[in.signatures], , drop = FALSE] - # Get maximum and minimum normalized bcscores (for common x axis in all - # plots). + # Get maximum and minimum normalized BCS (for common x axis in all plots). limits <- c(min(as.vector(sub.bc), na.rm = TRUE), max(as.vector(sub.bc), na.rm = TRUE)) - # Get the names and pathways of the selected signatures. - info <- subset(drugInfo, sig_id %in% signatures[in.signatures]) - if (nrow(info) > 0) { - info <- aggregate(.~ sig_id, data = info, na.action = NULL, FUN = function(m) { - paste(na.omit(unique(m)), collapse = ", ") - }) - } + # Get info about drugs (their corresponding name in bc, the preferred name + # used by beyondcell and the MoA). + info <- FindDrugs(bc, x = signatures[in.signatures]) # For each signature. p <- lapply(signatures[in.signatures], function(x) { - ### Data frame of normalized bcscores and metadata. + ### Data frame of normalized BCS and metadata. sub.df <- na.omit(data.frame(bcscore = sub.bc[x, ], condition = as.factor(meta), row.names = colnames(sub.bc))) ### Statistics by metadata (mean, median and sd). stats.cond <- sapply(lvls, function(y) { - stats <- round(Mean.Med.SD(subset(sub.df$bcscore, - sub.df$condition == y)), 2) + stats <- round(Mean.Med.SD(subset(sub.df$bcscore, subset = sub.df$condition == y)), + digits = 2) return(stats) }) stats.labels <- data.frame(label = apply(stats.cond, 2, function(z) { @@ -161,9 +154,10 @@ bcHistogram <- function(bc, signatures = NULL, idents = NULL) { }), mean = stats.cond["mean", ], median = stats.cond["median", ], condition = colnames(stats.cond)) ### Drug name and MoA - if (x %in% info$sig_id) { - drug.and.MoA <- info[which(info$sig_id == x), c("Name", "MoA")] - drug.and.MoA[2] <- ifelse(drug.and.MoA[2] == "NA", "", drug.and.MoA[2]) + if (x %in% info$IDs) { + drug.and.MoA <- info[which(info$IDs == x), c("preferred.and.sigs", "MoAs")] + drug.and.MoA[2] <- ifelse(test = drug.and.MoA[2] == "NA", yes = "", + no = drug.and.MoA[2]) } else { drug.and.MoA <- c(x, "") } @@ -188,88 +182,81 @@ bcHistogram <- function(bc, signatures = NULL, idents = NULL) { return(p) } -#' @title Plots the UMAP reduction colored by bcscores or gene expression values +#' @title Plots a UMAP reduction coloured by BCS or gene expression values #' @description This function returns a list of -#' \code{\link[patchwork]{patchwork}}s or \code{\link[ggplot2]{ggplot2}}s with -#' the desired UMAP reduction (either \code{beyondcell}'s or \code{Seurat}'s) -#' colored by bcscores or gene expression values. +#' \code{\link[patchwork]{patchwork}} or \code{\link[ggplot2]{ggplot2}} objects +#' with the desired UMAP reduction coloured by beyondcell scores (BCS) or gene +#' expression values. #' @name bcSignatures #' @import Seurat #' @import ggplot2 #' @importFrom patchwork wrap_plots #' @param bc \code{\link[beyondcell]{beyondcell}} object. -#' @param UMAP UMAP reduction to plot. Either \code{"beyondcell"} (computed -#' using \code{\link[bcUMAP]{bcUMAP}}) or \code{"Seurat"} computed using -#' \code{\link[Seurat]{Seurat}}'s functions. -#' @param signatures List with parameters to color the UMAP by bcscores: +#' @param UMAP UMAP reduction to plot. Either \code{"beyondcell"}, computed +#' using \code{\link[beyondcell]{bcUMAP}}, or \code{"Seurat"}, obtained using +#' \code{Seurat}'s functions. +#' @param signatures List with plot parameters to colour the UMAP by BCS: #' \itemize{ #' \item{\code{values}:} {Vector with the names of the signatures of interest. #' If \code{signatures[["values"]] = "all"}, all signatures are selected.} -#' \item{\code{limits}:} {Vector with the desired limits for all signatures' -#' plots. If \code{limits = c(NA, NA)} (default), the \code{limits} are computed -#' for each signature independently.} -#' \item{\code{center}:} {A single number indicating the center of the -#' \code{colorscale} for all signatures' plots. Alternatively, the \code{center} -#' can be a vector of two numbers. In this case, the \code{center} of the -#' \code{colorscale} is the middle point between these two numbers and the -#' \code{breaks} are computed using the difference between them. If -#' \code{center = NULL} (default), the \code{center} of each signature is its -#' switch point.} -#' \item{\code{breaks}:} {A single number indicating the break size of the -#' \code{colorscale}. Alternatively, it can be a vector with the desired breaks -#' (which don't have to be symmetric or equally distributed). If \code{center} -#' is a vector of two numbers, \code{breaks} are computed using the difference -#' between them.} -#' \item{\code{share.limits}:} {Logical argument. If \code{share.limits = TRUE} -#' (default), all signatures' plots will have the same \code{limits = c(0, 1)}. -#' If \code{share.limits = FALSE}, each signature plot will have its own -#' \code{limits}. Note that if \code{limits != c(NA, NA)}, -#' \code{share.limits = TRUE}.} #' \item{\code{colorscale}:} {Either a \code{viridis}, \code{RColorBrewer} or a -#' custom palette of 3 colors (low, medium and high) to color all signatures' -#' plots. If \code{colorscale = NULL} (default), the plots are colored using +#' custom palette of 3 colours (low, medium and high) to colour all signatures' +#' plots. If \code{colorscale = NULL} (default), the plots are coloured using #' \code{beyondcell}'s own palette.} -#' \item{\code{alpha}:} {Transparency level between 0 (not transparent) and 1 +#' \item{\code{alpha}:} {Transparency level between 1 (not transparent) and 0 #' (fully transparent).} -#' \item{\code{na.value}:} {Color to use for missing values (\code{NA}s).}} -#' @param genes List with parameters to color the UMAP by gene expression +#' \item{\code{na.value}:} {Colour to use for missing values (\code{NA}s).} +#' \item{\code{limits}:} {Vector with the desired limits for all signatures' +#' plots.} +#' \item{\code{center}:} {A single number indicating the centre of the +#' \code{colorscale} for all signatures' plots. If \code{center = NULL} +#' (default), the \code{center} for each signature is its switch point.} +#' \item{\code{breaks}:} {A single number indicating the break size of the +#' \code{colorscale}. Alternatively, it can be a vector with the desired breaks +#' (which don't have to be symmetric or equally distributed).}} +#' @param genes List with plot parameters to colour the UMAP by gene expression #' values: #' \itemize{ #' \item{\code{values}:} {Vector with the names of the genes of interest. If #' \code{genes[["values"]] = "all"}, all genes are selected.} #' \item{\code{limits}:} {Vector with the desired limits for all genes' plots. -#' If \code{limits = c(NA, NA)} (default), the \code{limits} are computed for -#' each gene independently.} +#' If \code{limits = c(NA, NA)} (default), the \code{limits} are computed +#' automatically. See Details for more information.} #' \item{\code{share.limits}:} {Logical argument. If \code{share.limits = TRUE}, #' all genes' plots will have the same \code{limits}. If #' \code{share.limits = FALSE} (default), each gene plot will have its own -#' \code{limits}. Note that if \code{limits != c(NA, NA)}, -#' \code{share.limits = TRUE}.}} +#' \code{limits}. See Details for more information.}} #' @param merged If \code{merged != NULL}, two signatures will be superposed in #' the same plot. If \code{merged = "direct"}, the signatures are assumed to -#' have a direct relationship and the bcscores will be added (+). On the other -#' hand, if \code{merged = "indirect"}, the signatures are assumed to have an -#' indirect relationship and their bcscores will be substracted (-). -#' @param blend (From \code{Seurat}) Scale and blend expression values to -#' visualize coexpression of two genes. +#' have a direct relationship and the BCS will be added. On the other hand, if +#' \code{merged = "indirect"}, the signatures are assumed to have an indirect +#' relationship and their BCS will be substracted. +#' @param blend (From \code{\link[Seurat]{FeaturePlot}}) Scale and blend +#' expression values to visualise co-expression of two genes. #' @param mfrow Numeric vector of the form \code{c(nr, nc)}. \code{nr} #' corresponds to the number of rows and \code{nc} to the number of columns of -#' the arrays in which the plots will be drawn. If you want to draw the plots +#' the grid in which the plots will be drawn. If you want to draw the plots #' individually, set \code{mfrow = c(1, 1)}. -#' @param ... Other arguments passed to \code{Seurat}'s -#' \code{\link[FeaturePlot]{FeaturePlot}}. -#' @return A list of \code{patchwork}s (if \code{mfrow != c(1, 1)}) or -#' \code{ggplot2}s (if \code{mfrow = c(1, 1)}) of the desired UMAP reduction -#' colored by the beyondcell scores (for signatures) or gene expression values -#' (for genes). +#' @param ... Other arguments passed to \code{FeaturePlot}. +#' @details When \code{genes[["limits"]] = c(NA, NA)}, \code{bcSignatures} +#' computes the limits automatically. You can make all plots share the same +#' limits by specifying \code{genes[["share.limits"]] = TRUE}, or make the +#' function to compute the limits individually for each gene with +#' \code{genes[["share.limits"]] = FALSE}. Moreover, if you specify a value for +#' \code{genes[["limits"]]}, \code{genes[["share.limits"]]} will automatically +#' set to \code{TRUE} and all plots will share those limits. +#' @return A list of \code{patchwork} (if \code{mfrow != c(1, 1)}) or +#' \code{ggplot2} objects (if \code{mfrow = c(1, 1)}) of the desired UMAP +#' reduction coloured by the BCS (for signatures) or gene expression values (for +#' genes). #' @examples #' @export bcSignatures <- function(bc, UMAP = "beyondcell", - signatures = list(values = NULL, limits = c(0, 1), - center = NULL, breaks = 0.1, - share.limits = TRUE, colorscale = NULL, - alpha = 0.7, na.value = "grey50"), + signatures = list(values = NULL, colorscale = NULL, + alpha = 0.7, na.value = "grey50", + limits = c(0, 1), center = NULL, + breaks = 0.1), genes = list(values = NULL, limits = c(NA, NA), share.limits = FALSE), merged = NULL, blend = FALSE, mfrow = c(1, 1), ...) { @@ -283,7 +270,7 @@ bcSignatures <- function(bc, UMAP = "beyondcell", } reduction <- bc@reductions cells <- subset(rownames(bc@meta.data), - rownames(bc@meta.data) %in% colnames(bc@normalized)) + subset = rownames(bc@meta.data) %in% colnames(bc@normalized)) } else if (UMAP == "Seurat") { if (length(bc@SeuratInfo$reductions) == 0) { stop('No UMAP projection available for your Seurat\'s object.') @@ -294,9 +281,9 @@ bcSignatures <- function(bc, UMAP = "beyondcell", stop('Incorrect UMAP argument. Please use either "Seurat" or "beyondcell".') } # Check signatures' list values. - default.sigs <- list(values = NULL, limits = c(0, 1), center = NULL, - breaks = 0.1, share.limits = TRUE, colorscale = NULL, - alpha = 0.7, na.value = "grey") + default.sigs <- list(values = NULL, colorscale = NULL, alpha = 0.7, + na.value = "grey", limits = c(0, 1), center = NULL, + breaks = 0.1) selected.sigs <- names(signatures) %in% names(default.sigs) if (any(!selected.sigs)) { warning(paste0('Incorrect entries in signatures: ', @@ -320,7 +307,7 @@ bcSignatures <- function(bc, UMAP = "beyondcell", if (!is.null(signatures[["values"]])) { if (length(signatures[["values"]]) == 1 & signatures[["values"]][1] == "all") { signatures[["values"]] <- rownames(bc@normalized) - in.signatures <- rep(TRUE, nrow(bc@normalized)) + in.signatures <- rep(TRUE, times = nrow(bc@normalized)) } else { in.signatures <- signatures[["values"]] %in% rownames(bc@normalized) if (all(!in.signatures)) { @@ -338,9 +325,9 @@ bcSignatures <- function(bc, UMAP = "beyondcell", if (!is.null(genes[["values"]])) { if (length(genes[["values"]]) == 1 & genes[["values"]][1] == "all") { genes[["values"]] <- rownames(bc@expr.matrix) - in.genes <- rep(TRUE, nrow(bc@expr.matrix)) + in.genes <- rep(TRUE, times = nrow(bc@expr.matrix)) } else { - in.genes <- tolower(genes[["values"]]) %in% tolower(rownames(bc@expr.matrix)) + in.genes <- toupper(genes[["values"]]) %in% toupper(rownames(bc@expr.matrix)) if (all(!in.genes)) { stop('None of the specified genes were found.') } else if (any(!in.genes)) { @@ -355,21 +342,27 @@ bcSignatures <- function(bc, UMAP = "beyondcell", sigs <- unique(signatures[["values"]][in.signatures]) gene <- unique(genes[["values"]][in.genes]) features <- c(sigs, gene) + # Check signature's colorscale. + signatures[["colorscale"]] <- get_colour_steps(signatures[["colorscale"]]) + # Check signatures' alpha, na.value and breaks -> inside center_scale_colour_stepsn(). # Check signatures' limits. if (length(signatures[["limits"]]) != 2) { stop('Signatures\' limits must be a vector of length 2.') } - na.limits.sigs <- is.na(signatures[["limits"]]) - if (length(signatures[["limits"]][!na.limits.sigs]) > 0 & - (!is.numeric(signatures[["limits"]][!na.limits.sigs]) | - any(signatures[["limits"]][!na.limits.sigs] < 0))) { - stop('Signatures\' limits must be numeric (>= 0) or NAs.') + if (!is.numeric(signatures[["limits"]]) | any(signatures[["limits"]] < 0)) { + stop('Signatures\' limits must be numeric (>= 0).') } - if (all(!na.limits.sigs) & - signatures[["limits"]][2] < signatures[["limits"]][1]) { + if (signatures[["limits"]][2] < signatures[["limits"]][1]) { warning(paste('Signatures\' upper limit is smaller than lower limit.', 'Sorting limits in increasing order.')) - signatures[["limits"]] <- sort(signatures[["limits"]]) + signatures[["limits"]] <- sort(signatures[["limits"]], decreasing = FALSE) + } + # Check signature's center. + if (!is.null(signatures[["center"]])) { + if (length(signatures[["center"]]) != 1 | + !is.numeric(signatures[["center"]])) { + stop('Signatures\' center must be a single number or NULL.') + } } # Check genes' limits. if (length(genes[["limits"]]) != 2) { @@ -384,17 +377,7 @@ bcSignatures <- function(bc, UMAP = "beyondcell", if (all(!na.limits.genes) & genes[["limits"]][2] < genes[["limits"]][1]) { warning(paste('Genes\' upper limit is smaller than lower limit.', 'Sorting limits in increasing order.')) - genes[["limits"]] <- sort(genes[["limits"]]) - } - # Check signatures' share.limits. - if (length(signatures[["share.limits"]]) != 1 | - !is.logical(signatures[["share.limits"]])) { - stop('signatures$share.limits must be TRUE or FALSE.') - } - if (!signatures[["share.limits"]] & - !identical(signatures[["limits"]], c(NA, NA))) { - warning(paste('Signatures\' limits were specified, setting', - 'signatures[["share.limits"]] = TRUE.')) + genes[["limits"]] <- sort(genes[["limits"]], decreasing = FALSE) } # Check genes' share.limits. if (length(genes[["share.limits"]]) != 1 | !is.logical(genes[["share.limits"]])) { @@ -405,34 +388,19 @@ bcSignatures <- function(bc, UMAP = "beyondcell", warning(paste('Genes\' limits were specified, setting', 'genes[["share.limits"]] = TRUE.')) } - # Check signature's center. - len.center <- length(signatures[["center"]]) - if (!is.null(signatures[["center"]])) { - if (len.center < 1 | len.center > 2 | !is.numeric(signatures[["center"]])) { - stop('Signatures\' center must be a vector of 1 or 2 numbers.') - } - } - # Check signatures' breaks, alpha and na.value -> inside center_scale_colour_stepsn(). - # Check signature's colorscale. - signatures[["colorscale"]] <- get_colour_stepsn(signatures[["colorscale"]]) - # Check if merged and blend values are both specified. - if (!is.null(merged) & blend) { - stop(paste('You can\'t specify simultaneously a value for merged and', - 'blend = TRUE.')) - } # Check merged. if (!is.null(merged)) { if (length(merged) != 1 | !(merged %in% c("direct", "indirect"))) { stop(paste('Incorrect merged value: It must be either NULL, "direct"', 'or "indirect".')) } - if (length(features) != 2) { + if (length(features) != 2 | length(signatures[["values"]]) != 2) { stop('When merged != NULL, the number of signatures must be exactly 2.') - } else if (!(all(features %in% sigs))) { + } else if (any(!(features %in% sigs))) { stop(paste('The merged features must be signatures. For blending genes,', 'please use blend = TRUE.')) } - merged.symbol <- ifelse(merged == "direct", " + ", " - ") + merged.symbol <- ifelse(test = merged == "direct", yes = " + ", no = " - ") merged.sigs <- paste0(sigs, collapse = merged.symbol) } else { merged.symbol <- "" @@ -443,19 +411,16 @@ bcSignatures <- function(bc, UMAP = "beyondcell", stop('blend must be TRUE or FALSE.') } if (blend) { - if (length(features) != 2) { + if (length(features) != 2 | length(genes[["values"]]) != 2) { stop('When blend = TRUE, the number of genes must be exactly 2.') - } else if (!(all(features %in% gene))) { + } else if (any(!(features %in% gene))) { stop(paste('The blended features must be genes. For merging signatures,', 'please use merged argument.')) } } # Check mfrow. - if (length(mfrow) != 2 | !is.numeric(mfrow)) { - stop('mfrow must be a vector of two numbers.') - } - if (any(mfrow < 0) | any(mfrow%%1 != 0)) { - stop('mfrow must contain two integers > 0.') + if (length(mfrow) != 2 | any(mfrow < 0) | any(mfrow%%1 != 0)) { + stop('mfrow must be a vector of two integers > 0.') } # --- Code --- # If blend = TRUE, plot a Seurat::FeaturePlot. @@ -465,19 +430,16 @@ bcSignatures <- function(bc, UMAP = "beyondcell", ### Add reductions. sc@reductions <- reduction ### Plot. - p <- list(suppressMessages( - Seurat::FeaturePlot(sc, features = gene, blend = TRUE))) + p <- suppressMessages( + Seurat::FeaturePlot(sc, features = gene, blend = TRUE, combine = FALSE)) # Else... } else { - ### Get the names and pathways of the selected signatures. - info <- subset(drugInfo, sig_id %in% sigs) - if (dim(info)[1] > 0) { - info <- aggregate(.~ sig_id, data = info, na.action = NULL, FUN = function(x) { - paste(na.omit(unique(x)), collapse = ", ") - }) - } + # Get info about drugs (their corresponding name in bc, the preferred name + # used by beyondcell and the MoA). + info <- tryCatch(suppressWarnings(FindDrugs(bc, x = sigs)), + error = function(cond) data.frame()) ### If we want to merge signatures, we must recompute the bc object using - ### the added or substracted bc@normalized scores. + ### the added or substracted bc@normalized BCS. if (!is.null(merged)) { if (merged == "direct") { merged.bcscores <- colSums(bc@normalized[sigs, cells, drop = FALSE], @@ -487,25 +449,21 @@ bcSignatures <- function(bc, UMAP = "beyondcell", na.rm = TRUE) } bc@data <- matrix(merged.bcscores, nrow = 1, dimnames = list(merged.sigs, cells)) - bc <- bcRecompute(bc, slot = "data") + bc <- suppressMessages(bcRecompute(bc, slot = "data")) features <- merged.sigs } - ### Join scaled bcscores and gene expression values for selected features. + ### Join scaled BCS and gene expression values for selected features. full.matrix <- rbind(bc@scaled[merged.sigs, cells, drop = FALSE], - bc@expr.matrix[gene, cells, drop = FALSE])[features, , - drop = FALSE] + bc@expr.matrix[gene, cells, + drop = FALSE])[features, , drop = FALSE] ### Signature's center. If center == NULL, set center to switch.points. if (is.null(signatures[["center"]])) { center.sigs <- bc@switch.point[merged.sigs] } else { - center.sigs <- setNames(rep(signatures[["center"]], length(merged.sigs)), - merged.sigs) - } - ### Signature's limits. - if (signatures[["share.limits"]] & any(na.limits.sigs)) { - signatures[["limits"]][na.limits.sigs] <- c(0, 1)[na.limits.sigs] + center.sigs <- setNames(rep(signatures[["center"]], + times = length(merged.sigs)), merged.sigs) } - ### Gene's colors. + ### Gene's colours. if ((genes[["share.limits"]] | !identical(genes[["limits"]], c(NA, NA))) & !is.null(genes[["values"]])) { if (any(na.limits.genes)) { @@ -530,34 +488,42 @@ bcSignatures <- function(bc, UMAP = "beyondcell", ids <- unlist(strsplit(y, split = merged.symbol, fixed = TRUE)) } else ids <- y ### Drug name and MoA. - if (any(ids %in% info$sig_id)) { - drug.and.MoA <- info[which(info$sig_id %in% ids), c("Name", "MoA")] - if (nrow(drug.and.MoA) > 1) { ### When merged != NULL, convert "" to "NA". - drug.and.MoA[drug.and.MoA[, 2] == "", 2] <- "NA" - drug.and.MoA <- t(as.data.frame(apply(drug.and.MoA, 2, paste0, - collapse = merged.symbol))) + if (any(ids %in% info$IDs)) { + drug.and.MoA <- info[which(info$IDs %in% ids), + c("preferred.and.sigs", "MoAs")] + ### When merged != NULL... + if (nrow(drug.and.MoA) > 1) { + ### Paste both drug names and MoAs. If MoAs are the same, just print + ### them one time + drug.and.MoA <- as.data.frame(t(apply(drug.and.MoA, 2, FUN = function(z) { + paste0(unique(z), collapse = merged.symbol) + }))) } drug.and.MoA[, 2] <- BreakString(drug.and.MoA[, 2]) ### Format subtitle. } else { ### Empty subtitle. drug.and.MoA <- c(paste0(ids, collapse = merged.symbol), "") } - ### Colors (depending on wether y is a signature or a gene). + ### Colours (depending on wether y is a signature or a gene). if (all(ids %in% sigs)) { - ### Binned colorscale centered around the switch point or the - ### specified center. - colors <- center_scale_colour_stepsn(full.matrix[y, ], signatures[["limits"]], - center.sigs[y], signatures[["breaks"]], - signatures[["colorscale"]], signatures[["alpha"]]) + ### Binned colorscale centred around the switch point or the specified + ### center. + colors <- center_scale_colour_stepsn(full.matrix[y, ], + colorscale = signatures[["colorscale"]], + alpha = signatures[["alpha"]], + na.value = signatures[["na.value"]], + limits = signatures[["limits"]], + center = center.sigs[y], + breaks = signatures[["breaks"]]) } else { - ### Continuous colorscale with default seurat colors. + ### Continuous colorscale with default Seurat colours. colors <- colors.genes } ### Plot. fp <- suppressMessages( - Seurat::FeaturePlot(sc, features = gsub("_", "-", y), - combine = FALSE, ...)[[1]] + colors + - ggplot2::labs(title = drug.and.MoA[1], subtitle = drug.and.MoA[2])) + Seurat::FeaturePlot(sc, features = gsub(pattern = "_", replacement = "-", + x = y), combine = FALSE, ...)[[1]] + + colors + ggplot2::labs(title = drug.and.MoA[1], subtitle = drug.and.MoA[2])) return(fp) }) } @@ -579,26 +545,25 @@ bcSignatures <- function(bc, UMAP = "beyondcell", return(final.p) } -#' @title Plots a violindot of the beyondcell scores grouped by the cell cycle -#' phase +#' @title Plots a violindot of the BCS grouped by cell cycle phase #' @description This function drawns, for each signature of interest, a -#' \code{\link[violindot]{violindot}} plot of the bcscores grouped by the cell -#' cycle phase (G1, G2M or S). Note that this information must be present in -#' \code{bc@@meta.data} and can be obtained using \code{\link[Seurat]{Seurat}}'s -#' function \code{\link[CellCycleScoring]{CellCycleScoring}}. +#' \code{\link[see]{geom_violindot}} plot of the beyondcell scores (BCS) grouped +#' by cell cycle phase (G1, G2M or S). Note that this information must be +#' present in \code{bc@@meta.data} and can be obtained using +#' \code{\link[Seurat]{CellCycleScoring}}. #' @name bcCellCycle #' @import ggplot2 #' @importFrom see geom_violindot #' @param bc \code{\link[beyondcell]{beyondcell}} object. #' @param signatures Vector with the names of the signatures of interest. If -#' \code{signatures == "all"}, all signatures are selected. +#' \code{signatures = "all"}, all signatures are selected. #' @return A list of \code{\link[ggplot2]{ggplot2}} violindots, one for each -#' signature of interest. In each violindot, the bcscores are grouped in G1, G2M -#' or S phase groups. +#' signature of interest. In each violindot, the BCS are grouped in G1, G2M or S +#' phase groups. #' @examples #' @export -bcCellCycle <- function(bc, signatures = NULL) { +bcCellCycle <- function(bc, signatures) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') @@ -607,17 +572,14 @@ bcCellCycle <- function(bc, signatures = NULL) { stop("Cell cycle information not present in bc@meta.data.") } # Check signatures. - if (is.null(signatures)) { - stop('You must specify the signatures of interest.') - } if (!is.character(signatures)) { stop("Signatures must be a character vector.") } if (length(signatures) == 1 & signatures[1] == "all") { signatures <- rownames(bc@normalized) - in.signatures <- rep(TRUE, nrow(bc@normalized)) + in.signatures <- rep(TRUE, times = nrow(bc@normalized)) } else { - in.signatures <- signatures %in% rownames(bc@normalized) + in.signatures <- !is.null(signatures) & signatures %in% rownames(bc@normalized) if (all(!in.signatures)) { stop('None of the specified signatures were found.') } else if (any(!in.signatures)) { @@ -628,21 +590,20 @@ bcCellCycle <- function(bc, signatures = NULL) { # --- Code --- # Cells in beyondcell object. cells <- subset(rownames(bc@meta.data), - rownames(bc@meta.data) %in% colnames(bc@normalized)) - # Get the names and pathways of the selected signatures. - info <- subset(drugInfo, sig_id %in% signatures[in.signatures]) - info <- aggregate(.~ sig_id, data = info, na.action = NULL, FUN = function(y) { - paste(na.omit(unique(y)), collapse = ", ") - }) + subset = rownames(bc@meta.data) %in% colnames(bc@normalized)) + # Get info about drugs (their corresponding name in bc, the preferred name + # used by beyondcell and the MoA). + info <- FindDrugs(bc, x = signatures[in.signatures]) # For each signature... p <- lapply(signatures[in.signatures], function(x) { - ### Data frame of normalized bcscores and phase metadata. + ### Data frame of normalized BCS and phase metadata. sub.df <- na.omit(data.frame(bcscore = bc@data[x, cells], phase = bc@meta.data[cells, "Phase"], row.names = cells)) ### Drug name and MoA. - if (x %in% info$sig_id) { - drug.and.MoA <- info[which(info$sig_id == x), c("Name", "MoA")] + if (x %in% info$IDs) { + drug.and.MoA <- info[which(info$IDs == x), + c("preferred.and.sigs", "MoAs")] drug.and.MoA[, 2] <- BreakString(drug.and.MoA[, 2]) ### Format subtitle. } else { drug.and.MoA <- c(x, "") @@ -656,14 +617,16 @@ bcCellCycle <- function(bc, signatures = NULL) { return(p) } -#' @title Plots a 4 squares plot +#' @title Drawns a 4 squares plot #' @description This function drawns a 4 square plot of the drug signatures #' present in a \code{\link[beyondcell]{beyondcell}} object. A 4 squares plot -#' consists in a scatter plot of the bcscores' residuals (x axis) vs the switch -#' points (y axis). 4 quadrants are highlighted: the top-left and bottom-right -#' corners contain the drugs to which all selected cells are most/least -#' sensistive, respectively. The centre quadrants show the drugs to which half -#' of the selected cells are sensitive and the other half insensitive. +#' consists in a scatter plot of the residuals' means (x axis) vs the switch +#' points (y axis) of a specified cluster (either a therapeutic cluster or a +#' group defined by experimental condition or phenotype). 4 quadrants are +#' highlighted: the top-left and bottom-right corners contain the drugs to which +#' all selected cells are least/most sensistive, respectively. The centre +#' quadrants show the drugs to which these cells are differentially insensitive +#' or sensitive when compared to the other clusters. #' #' x cut-offs: first and last deciles; y cut-offs: 0.1, 0.4, 0.6 and 0.9. #' @name bc4Squares @@ -672,44 +635,40 @@ bcCellCycle <- function(bc, signatures = NULL) { #' @importFrom cowplot theme_cowplot #' @param bc \code{beyondcell} object. #' @param idents Name of the metadata column of interest. -#' @param lvl Character vector with the \code{idents}' level of interest. If +#' @param lvl Character vector with the \code{idents}' level(s) of interest. If #' \code{lvl = NULL}, all levels will be plotted. -#' @param top Number of top drugs per group to be labelled. +#' @param top Number of top drugs per quadrant to be labelled. #' @param topnames Character vector with additional interesting drugs to be -#' labelled (either their names and/or sig IDs). -#' @param force (From \code{ggrepel}) Force of repulsion between overlapping -#' text labels. Defaults to 1. -#' @param alpha Transparency level between 0 (not transparent) and 1 (fully +#' labelled (either their names or sig IDs). +#' @param force (From \code{\link[ggrepel]{ggrepel}}) Force of repulsion between +#' overlapping text labels. Defaults to 1. +#' @param alpha Transparency level between 1 (not transparent) and 0 (fully #' transparent). -#' @pt.size Point size. +#' @param pt.size Point size. #' @details This function returns a list of \code{\link[ggplot2]{ggplot2}} -#' objects, one per each \code{lvl}. Note that residuals are different for each -#' level while swicth points are signature-specific. So, x axis will vary and y -#' axis will remain constant accross all plots. +#' objects, one per each \code{lvl}. Note that residuals' means are different +#' for each level while swicth points are signature-specific. So, x axis will +#' vary and y axis will remain constant accross all plots. #' @return A list of \code{ggplot2} objects, one per \code{lvl}. #' @examples #' @export -bc4Squares <- function(bc, idents = NULL, lvl = NULL, top = 3, +bc4Squares <- function(bc, idents, lvl = NULL, top = 3, topnames = NULL, force = 1, alpha = 0.7, pt.size = 3) { # --- Checks --- # Check that bc is a beyondcell object. if (class(bc) != "beyondcell") stop('bc must be a beyondcell object.') # Check idents. - if (is.null(idents)) { - stop("You must specify an idents value.") - } else { - if (length(idents) != 1) stop('Idents must be a single metadata column.') - if (!(idents %in% names(bc@ranks))) { - stop(paste0('$', idents, ' not found in bc@ranks.')) - } else if (idents == "general") { - stop('General rank cant be used in bc4Squares(). All residuals are 0.') - } + if (length(idents) != 1) stop('Idents must be a single metadata column.') + if (!(idents %in% names(bc@ranks))) { + stop(paste0('$', idents, ' not found in bc@ranks.')) + } else if (idents == "general") { + stop('General rank can\'t be used in bc4Squares(). All residuals are 0.') } # Check lvl. if (is.null(lvl)) { lvl <- unique(bc@meta.data[, idents]) - in.lvl <- rep(TRUE, length(lvl)) + in.lvl <- rep(TRUE, times = length(lvl)) } else { in.lvl <- lvl %in% unique(bc@meta.data[, idents]) if (all(!in.lvl)) { @@ -720,22 +679,21 @@ bc4Squares <- function(bc, idents = NULL, lvl = NULL, top = 3, } } # Check top. - if (length(top) != 1 | !is.numeric(top) | top[1]%%1 != 0 | top[1] < 0) { + if (length(top) != 1 | top[1]%%1 != 0 | top[1] < 0) { stop('top must be a single integer >= 0.') } # Check topnames. if (!is.null(topnames)) { not.paths <- which(!(toupper(rownames(bc@normalized)) %in% toupper(names(pathways)))) - in.topnames <- toupper(topnames) %in% drugInfo$Name | tolower(topnames) %in% drugInfo$sig_id | + in.topnames <- toupper(topnames) %in% drugInfo$drugs | + tolower(topnames) %in% drugInfo$IDs | toupper(topnames) %in% toupper(rownames(bc@normalized)[not.paths]) if (all(!in.topnames)) { - warning(paste('None of the specified topname drugs were found in the', - 'beyondcell object.')) + warning('None of the specified topname drugs were found in bc.') } else if (any(!in.topnames)) { - warning(paste0('The following topname drugs were not found in the' , - 'beyondcell object: ', paste0(topnames[!in.topnames], - collapse = ", "), '.')) + warning(paste0('The following topname drugs were not found in bc: ' , + paste0(topnames[!in.topnames], collapse = ", "), '.')) } } else { in.topnames <- NULL @@ -755,68 +713,73 @@ bc4Squares <- function(bc, idents = NULL, lvl = NULL, top = 3, # --- Code --- # Get info about drugs (their corresponding name in bc, the preferred name # used by beyondcell and the MoA). - drug <- FindDrugs(bc, rownames(bc@scaled)) + info <- FindDrugs(bc, x = rownames(bc@scaled)) # Switch points. - sp <- data.frame(switch.point = bc@switch.point[drug$bc_Name], - row.names = drug$bc_Name) + sp <- data.frame(switch.point = bc@switch.point[info$bc.names], + row.names = info$bc.names) # One plot per level. p4s <- lapply(lvl[in.lvl], function(l) { - ### Subset residuals and switch points. - res <- bc@ranks[[idents]][drug$bc_Name, paste0("residuals.", l), drop = FALSE] - colnames(res) <- "residuals" + ### Subset residuals' means and switch points. + res <- bc@ranks[[idents]][info$bc.names, + paste0("residuals.mean.", l), drop = FALSE] + colnames(res) <- "residuals.mean" df <- transform(merge(res, sp, by = 0), row.names = Row.names, Row.names = NULL) ### Residual's deciles. - res.decil <- quantile(as.numeric(res$residuals), prob = seq(0, 1, length = 11)) + res.decil <- quantile(as.numeric(res$residuals.mean), + prob = seq(from = 0, to = 1, length = 11)) ### Drug annotation. sp_lower_01 <- as.numeric(df$switch.point) < 0.1 sp_lower_06 <- as.numeric(df$switch.point) < 0.6 sp_higher_04 <- as.numeric(df$switch.point) > 0.4 sp_higher_09 <- as.numeric(df$switch.point) > 0.9 - res_lower_10 <- as.numeric(df$residuals) < res.decil[["10%"]] - res_higher_90 <- as.numeric(df$residuals) > res.decil[["90%"]] - df$annotation <- rep("no", nrow(df)) + res_lower_10 <- as.numeric(df$residuals.mean) < res.decil[["10%"]] + res_higher_90 <- as.numeric(df$residuals.mean) > res.decil[["90%"]] + df$annotation <- rep("no", times = nrow(df)) df$annotation[sp_lower_01 & res_higher_90] <- "TOP-HighSensitivityDrugs" df$annotation[sp_higher_09 & res_lower_10] <- "TOP-LowSensitivityDrugs" df$annotation[sp_higher_04 & sp_lower_06 & - res_lower_10] <- "TOPDifferential-LowSensitivityDrugs" + res_lower_10] <- "TOP-Differential-LowSensitivityDrugs" df$annotation[sp_higher_04 & sp_lower_06 & - res_higher_90] <- "TOPDifferential-HighSensitivityDrugs" + res_higher_90] <- "TOP-Differential-HighSensitivityDrugs" ### Drug labels. - df$labels <- rep("", nrow(df)) - decreasing_order <- c("TOPDifferential-HighSensitivityDrugs", + df$labels <- rep("", times = nrow(df)) + decreasing_order <- c("TOP-Differential-HighSensitivityDrugs", "TOP-HighSensitivityDrugs") unique.annotations <- unique(df$annotation[df$annotation != "no"]) sel.labels <- unlist(sapply(unique.annotations, function(x) { - sub.df <- subset(df, df$annotation == x) + sub.df <- subset(df, subset = df$annotation == x) if (x %in% decreasing_order) { - sub.df <- sub.df[order(sub.df$residuals, sub.df$switch.point, + sub.df <- sub.df[order(sub.df$residuals.mean, sub.df$switch.point, decreasing = TRUE), ] } else { - sub.df <- sub.df[order(sub.df$residuals, sub.df$switch.point), ] + sub.df <- sub.df[order(sub.df$residuals.mean, sub.df$switch.point, + decreasing = FALSE), ] } return(rownames(sub.df)[1:min(top, nrow(sub.df))]) })) - df[sel.labels, "labels"] <- drug$Preferred_and_sig[match(sel.labels, - drug$bc_Name)] + df[sel.labels, "labels"] <- info$preferred.and.sigs[match(sel.labels, + info$bc.names)] ### Topnames. if(length(topnames[in.topnames]) > 0) { - topnames <- FindDrugs(bc, topnames[in.topnames]) - df[match(topnames$bc_Name, rownames(df)), "labels"] <- topnames$Preferred_and_sig + topnames <- FindDrugs(bc, x = topnames[in.topnames]) + df[match(topnames$bc.names, + table = rownames(df)), "labels"] <- topnames$preferred.and.sigs } - ### Colors and names. + ### Colours and names. colors <- c("#1D61F2", "#DA0078", "orange", "#C7A2F5", "grey80", "black") names <- c("TOP-LowSensitivityDrugs", "TOP-HighSensitivityDrugs", - "TOPDifferential-HighSensitivityDrugs", - "TOPDifferential-LowSensitivityDrugs", "no", "black") - ### Circle's borders color. + "TOP-Differential-HighSensitivityDrugs", + "TOP-Differential-LowSensitivityDrugs", "no", "black") + ### Circle's borders colour. df$borders <- df$annotation df$borders[df$labels != ""] <- "black" ### Reorder df so labeled points are plotted on top. - df <- rbind(subset(df, df$borders != "black"), - subset(df, df$borders == "black")) + df <- rbind(subset(df, subset = df$borders != "black"), + subset(df, subset = df$borders == "black")) ### Plot. - p <- ggplot(df, aes(x = as.numeric(residuals), y = as.numeric(switch.point), - color = borders, fill = annotation)) + + p <- ggplot(df, aes(x = as.numeric(residuals.mean), + y = as.numeric(switch.point), color = borders, + fill = annotation)) + geom_point(shape = 21, alpha = alpha, size = pt.size) + scale_color_manual(values = setNames(colors, names)) + scale_fill_manual(values = setNames(colors, names), breaks = names[1:4], @@ -829,7 +792,7 @@ bc4Squares <- function(bc, idents = NULL, lvl = NULL, top = 3, geom_hline(yintercept = 0.6, linetype = "dotted") + ylim(0, 1) + labs(title = paste(idents, "=", lvl), caption = paste0("x cut-offs: first and last deciles; y cut-offs:", - " 0.1, 0.4, 0.6 and 0.9")) + xlab("Residuals") + + " 0.1, 0.4, 0.6 and 0.9")) + xlab("Residuals' Mean") + ylab("Switch Point") + ggrepel::geom_text_repel(label = df$labels, force = force) + guides(fill = guide_legend(title = "Drug Annotation"), color = FALSE) + diff --git a/R/objects.R b/R/objects.R index 66bdf6f..65ce6e2 100644 --- a/R/objects.R +++ b/R/objects.R @@ -1,18 +1,17 @@ -#' An S4 class to represent the genesets of each drug -#' -#' @slot genelist A list of drug and functional signatures (\code{pathways}) -#' with up and/or down-regulated genes. -#' @slot n.genes Argument passed to -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. -#' @slot mode Argument passed to -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. -#' @slot info If \code{GenerateGenesets}' input is a pre-loaded matrix, -#' \code{info} will be a \code{data.frame} with the correspondences between -#' \code{sig_ids} and drug names, as well as the source of the pre-loaded -#' matrices (LINCS, CTRP, GDSC or CCLE). If \code{GenerateGenesets}' input is a -#' path to a GMT file, \code{info} will be empty. -#' @slot comparison Argument passed to -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. +#' @title Geneset class +#' @description An object to represent signatures. +#' @slot genelist A list of drug signatures and functional \code{pathways} (if +#' specified) with up and/or down-regulated genes. +#' @slot n.genes Argument passed to \code{\link[beyondcell]{GenerateGenesets}}. +#' Number of up and/or down-regulated genes per signature. +#' @slot mode Argument passed to \code{GenerateGenesets}. Whether the +#' \code{geneset} contains up and/or down-regulated genes. +#' @slot info Dataframe with drug signatures information, including sig IDs, +#' drug names, MoAs, target genes and data sources (LINCS, CTRP, GDSC or CCLE). +#' This slot is only filled if \code{GenerateGenesets}' input is a pre-loaded +#' matrix. +#' @slot comparison Argument passed to \code{GenerateGenesets}. Either +#' \code{"treated_vs_control"} or \code{"control_vs_treated"}. geneset <- setClass("geneset", slots = list(genelist = "list", n.genes = "numeric", @@ -20,36 +19,36 @@ geneset <- setClass("geneset", slots = list(genelist = "list", info = "data.frame", comparison = "character")) -#' An S4 class to represent the beyondcell scores for each cell and signature. -#' -#' @slot scaled (Subsetted and/or regressed) scaled beyondcell scores. -#' @slot normalized (Subsetted and/or regressed) normalized beyondcell scores. -#' @slot data Original normalized beyondcell scores, without subsetting or -#' regression. -#' @slot switch.point (Subsetted and/or regressed) scaled beyondcell score for -#' which the normalized score in \code{@@data} is 0 (one switch point per -#' signature). -#' @slot ranks List of dataframes with the statistics (switch point, -#' mean, median, sd, variance, min, max, proportion of \code{NaN}, residuals' -#' mean and ranks of each signature in the beyondcell object. This slot is -#' filled using the function \code{\link[bcRanks]{bcRanks}}. -#' @slot expr.matrix Expression matrix used to compute the beyondcell scores. -#' @slot meta.data Contains information about each cell (including the -#' therapeutic clusters and \code{Seurat}'s \code{@@metadata}). -#' @SeuratInfo List with information about the input \code{Seurat} object, +#' @title Beyondcell class +#' @description An object to represent the beyondcell scores (BCS) for each cell and +#' signature. +#' @slot scaled (Subsetted and/or regressed) scaled BCS. +#' @slot normalized (Subsetted and/or regressed) normalized BCS. +#' @slot data Original normalized BCS, without subsetting or regression. +#' @slot switch.point (Subsetted and/or regressed) scaled BCS for which the +#' normalized score in \code{@@data} is 0 (one switch point per signature). +#' @slot ranks List of dataframes with the BCS' statistics and ranks returned +#' by \code{\link[beyondcell]{bcRanks}}. +#' @slot expr.matrix Single-cell expression matrix used to compute the BCS. +#' @slot meta.data Dataframe that contains information about each cell +#' (including the therapeutic clusters and \code{\link[Seurat]{Seurat}}'s +#' \code{@@meta.data}). +#' @slot SeuratInfo List with information about the input \code{Seurat} object, #' including the \code{@@reductions}. -#' @slot background (Subsetted and/or regressed) normalized beyondcell scores -#' obtained using DSS signatures. Useful to compute the UMAP reduction and the +#' @slot background (Subsetted and/or regressed) normalized BCS obtained using +#' DSS signatures. Useful to compute \code{beyondcell}'s UMAP reduction and the #' therapeutic clusters when the number of drug signatures is low. #' @slot reductions A list of dimensional reductions for this object. #' @slot regression A list with the order of subset and regression steps -#' performed on the beyondcell object and the variables used for regression. -#' @slot n.genes Argument passed to -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. -#' @slot mode Argument passed to -#' \code{\link[GenerateGenesets]{GenerateGenesets}}. +#' performed on the \code{beyondcell} object and the variables used for +#' regression. +#' @slot n.genes Argument passed to \code{\link[beyondcell]{GenerateGenesets}}. +#' Number of up and/or down-regulated genes per signature. +#' @slot mode Argument passed to \code{GenerateGenesets}. Whether the +#' \code{geneset} contains up and/or down-regulated genes. #' @slot thres Argument \code{expr.thres} passed to -#' \code{\link[bcScore]{bcScore}}. +#' \code{\link[beyondcell]{bcScore}}. Minimum fraction of signature genes that +#' must be expressed in a cell to compute its BCS. beyondcell <- setClass("beyondcell", slots = list(scaled = "matrix", normalized = "matrix", diff --git a/README.md b/README.md index 5d37324..fb47462 100644 --- a/README.md +++ b/README.md @@ -3,38 +3,67 @@ [Package status](https://gitlab.com/bu_cnio/Beyondcell/commits/master) ## Introduction -**Beyondcell** is a methodology for the identification of drug vulnerabilities in single cell RNA-seq data. To this end, Beyondcell focuses on the analysis of drug-related commonalities between cells by classifying them into distinct therapeutic clusters. +**Beyondcell** is a methodology for the identification of drug vulnerabilities +in single-cell RNA-seq (scRNA-seq) data. To this end, Beyondcell focuses on the +analysis of drug-related commonalities between cells by classifying them into +distinct Therapeutic Clusters (TCs). ## Workflow overview -**Beyondcell workflow.** Given two inputs, the scRNA-seq expression matrix and a collection of drug signatures, the methodology calculates a Beyondcell score (BCS) for each drug-cell pair. The BCS ranges from 0 to 1 and measures the susceptibility of each cell to a given drug. The resulting BCS matrix can be used to determine the sample’s therapeutic clusters. Furthermore, drugs are prioritized in a table and each individual drug score can be visualized in a UMAP. +**Beyondcell workflow.** Given two inputs, the scRNA-seq expression matrix and a +collection of drug signatures, the methodology calculates a Beyondcell Score +(BCS) for each drug-cell pair. The BCS ranges from 0 to 1 and measures the +susceptibility of each cell to a given drug. The resulting BCS matrix can be +used to determine the sample’s TCs. Furthermore, drugs are prioritized in a +table and each individual drug score can be visualized in a UMAP. ![Beyondcell workflow](./.img/workflow_tutorial.png) -Depending on the evaluated signatures, the BCS represents the cell perturbation susceptibility (PSc) and the sensitivity to the drug effect (SSc). BCS can also be estimated from functional signatures to evaluate each cell functional status. +Depending on the evaluated signatures, the BCS represents the cell perturbation +susceptibility (PSc) or the sensitivity to the drug effect (SSc). BCS can also +be estimated from functional signatures to evaluate each cell functional +status. ![drug signatures](./.img/drug_signatures.png) -## Beyoncell's key applications - * Analyze the intratumoural heterogeneity of your experiment - * Classify your cells into therapeutic clusters +## Beyondcell's key applications + * Analyse the intratumoural heterogeneity (ITH) of your experiment + * Classify your cells into TCs * Prioritize cancer treatments - * If time points are available, identify the changes in drug tolerance of your samples + * If time points are available, identify the changes in drug tolerance * Identify mechanisms of resistance ## Installing Beyondcell -The Beyondcell algorithm is implemented in R (v. 4.0.0 or greater). We recommend running the installation via conda: +The Beyondcell algorithm is implemented in R (v. 4.0.0 or greater). We recommend +running the installation via conda: ```r -# Create a conda environment +# Create a conda environment. conda create -n beyondcell -# Install Beyondcell package and dependencies -conda install -c bu_cnio beyondcell +# Activate the environment. +conda activate beyondcell +# Install Beyondcell package and dependencies. +conda install -c bu_cnio r-beyondcell +# Install RStudio. +conda install -c r rstudio +# Launch RStudio. +rstudio ``` -> TIP: Are you having any trouble running bcUMAP? Your umap-learn version might not be recognized by R. r-reticulate generates a new environment that can bypass conda's umap-learn installation. To solve this, try erasing the environment created by r-reticulate on your computer. +> TIP: Are you having any trouble running `bcUMAP`? Your umap-learn version +might not be recognized by R. r-reticulate generates a new environment that can +bypass conda's umap-learn installation. To solve this, try erasing the +environment created by r-reticulate on your computer. ## Results -We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four conditions using the Seurat pipeline (left). After calculating the BCS for each cell, a clustering analysis was applied. Beyondcell was able to cluster the cells based on their treatment time point, to separate untreated cells from treated cells (center) and to recapitulate the changes arisen by the treatment with bortezomib (right). +We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM +of bortezomib and collected at different time points: t0 (before treatment), +t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) +obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four +conditions using the Seurat pipeline (left). After calculating the BCS for each +cell using PSc, a clustering analysis was applied. Beyondcell was able to +cluster the cells based on their treatment time point, to separate untreated +cells from treated cells (center) and to recapitulate the changes arisen by the +treatment with bortezomib (right). ![results_golub](./.img/integrated_bendavid.png) @@ -58,4 +87,4 @@ For general instructions on running Beyondcell, check out the [analysis workflow ## Citation ## Support -If you have any question regarding the use of Beyoncell, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues). +If you have any question regarding the use of Beyondcell, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues). diff --git a/data/drugInfo.RData b/data/drugInfo.RData index 70d73f5..da6b5c4 100644 Binary files a/data/drugInfo.RData and b/data/drugInfo.RData differ diff --git a/man/BreakString.Rd b/man/BreakString.Rd index 80c3c6a..0b18f94 100644 --- a/man/BreakString.Rd +++ b/man/BreakString.Rd @@ -7,14 +7,14 @@ BreakString(x, split = ", ", line.length = 50) } \arguments{ -\item{x}{String to be breaked, formed by elements separated by \code{split}.} +\item{x}{String to be broken, formed by elements separated by \code{split}.} \item{split}{Character that separates the elements of \code{x}.} -\item{line.length}{Length of the lines into which \code{x} will be breaked.} +\item{line.length}{Length of the lines into which \code{x} will be broken.} } \value{ -A string with the same contents that \code{x} breaked in lines of +A string with the same content as \code{x} broken in lines of length \code{line.length}. } \description{ diff --git a/man/GeneCase.Rd b/man/CaseFraction.Rd similarity index 61% rename from man/GeneCase.Rd rename to man/CaseFraction.Rd index 81d5f3c..5469041 100644 --- a/man/GeneCase.Rd +++ b/man/CaseFraction.Rd @@ -1,16 +1,16 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/Score.R -\name{GeneCase} -\alias{GeneCase} -\title{Returns the fraction of each case type in the input vector} +\name{CaseFraction} +\alias{CaseFraction} +\title{Returns the fraction of each case type in the input} \usage{ -GeneCase(x) +CaseFraction(x) } \arguments{ \item{x}{Character vector.} } \value{ -A named numeric vector with the fractions of each case type. +A named numeric vector with the fraction of each case type. } \description{ This function computes the fraction of of each case type diff --git a/man/CreatebcObject.Rd b/man/CreatebcObject.Rd index c90f987..cd2e68b 100644 --- a/man/CreatebcObject.Rd +++ b/man/CreatebcObject.Rd @@ -18,8 +18,8 @@ object. } \details{ This function creates a new \code{beyondcell} object by using the -\code{@normalized} scores as the original \code{@data}. Switch points are -recomputed and \code{@regression} is restarted. The expression and -metadata matrices are subsetted to keep only those cells present in the new -\code{@data} slot. +normalized BCS as the original \code{@data}. Switch points are +recomputed and \code{@regression} is restarted. The \code{@expr.matrix} and +\code{@meta.data} slots are subsetted to keep only those cells present in +the new \code{@data} slot. } diff --git a/man/FilteredIDS.Rd b/man/FilteredIDS.Rd deleted file mode 100644 index 1113b7a..0000000 --- a/man/FilteredIDS.Rd +++ /dev/null @@ -1,29 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/Genesets.R -\name{FilteredIDS} -\alias{FilteredIDS} -\title{Returns the \code{sig_ids} that match the specified filters} -\usage{ -FilteredIDS(infodf, col, filter, filtername) -} -\arguments{ -\item{infodf}{\code{data.frame} with all drug information.} - -\item{col}{Column name to subset by.} - -\item{filter}{User-supplied filtering vector for either drugs, MoA, target -genes or source database.} - -\item{filtername}{String to be printed with the warning (in case some of the -\code{filter} elements are not found in \code{infodf}).} -} -\value{ -A vector with the \code{sig_ids} that match the \code{filter} -elements. -} -\description{ -This function is meant to be used inside -\code{\link[GenerateGenesets]{GenerateGenesets}}. It subsets \code{infodf} -to select only the entries that match the specified \code{filter} and returns -the corresponding \code{sig_ids}. -} diff --git a/man/FindDrugs.Rd b/man/FindDrugs.Rd index 9c241e2..3cc0ed7 100644 --- a/man/FindDrugs.Rd +++ b/man/FindDrugs.Rd @@ -1,34 +1,37 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/Format.R +% Please edit documentation in R/Basics.R \name{FindDrugs} \alias{FindDrugs} \title{Returns a dataframe with information about the input drugs} \usage{ -FindDrugs(bc, x) +FindDrugs(bc, x, na.rm = TRUE) } \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} \item{x}{A character vector with drug names and/or sig IDs.} + +\item{na.rm}{Logical. Should \code{x} entries with no available information +be removed from the final output?} } \value{ A \code{data.frame}. } \description{ -This function searches information about the inputted drugs in -the pre-loaded \code{\link[beyondcell]{beyondcell}} matrices and returns a -\code{data.frame} with their drug synonyms and MoAs. +This function searches the input drugs in the pre-loaded +\code{beyondcell} matrices and returns a dataframe with drug information, +including drug synonyms and MoAs. } \details{ The output \code{data.frame} has the following columns: \itemize{ -\item{\code{Original_Name}}: Inputted drug name. -\item{\code{bc_Name}}: Drug name used in \code{bc}. -\item{\code{Preferred_Name}}: Drug name used in \code{beyondcell} plots. -\item{\code{Name}}: Other drug names. -\item{\code{sig_id}}: Signature ID. -\item{\code{Preferred_and_sig}}: \code{Preferred_Name} (or alternatively -\code{bc_Name}) and \code{sig_id}. -\item{\code{MoA}}: Mechanism(s) of action. +\item{\code{original.names}}: Input drug names. +\item{\code{bc.names}}: Drug names used in \code{bc}. +\item{\code{preferred.drug.names}}: Standard drug names. +\item{\code{drugs}}: Other drug names. +\item{\code{IDs}}: Signature IDs. +\item{\code{preferred.and.sigs}}: \code{preferred.drug.names} (or alternatively +\code{bc.names}) and \code{IDs}. Used as title in \code{beyondcell} plots. +\item{\code{MoAs}}: Mechanism(s) of action. } } diff --git a/man/GenerateGenesets.Rd b/man/GenerateGenesets.Rd index 33be85c..fa623ef 100644 --- a/man/GenerateGenesets.Rd +++ b/man/GenerateGenesets.Rd @@ -8,68 +8,72 @@ GenerateGenesets( x, n.genes = 250, mode = c("up", "down"), - filters = list(drugs = NULL, IDs = NULL, MoA = NULL, targets = NULL, source = NULL), + filters = list(drugs = NULL, IDs = NULL, MoAs = NULL, targets = NULL, sources = NULL), comparison = NULL, include.pathways = TRUE ) } \arguments{ -\item{x}{A pre-loaded matrix, a numeric matrix or a path to a GMT file with -custom genesets.} +\item{x}{A pre-loaded matrix, a ranked matrix or a path to a GMT file with +custom gene sets. See Details for more information.} -\item{n.genes}{Number of top and/or down genes in the desired genelists.} +\item{n.genes}{Number of up and/or down-regulated genes used to compute each +signature.} -\item{mode}{\code{"up"}, \code{"down"} or \code{c("up", "down")}. See Details -for more information.} +\item{mode}{Whether the output \code{geneset} must contain up and/or +down-regulated genes. See Details for more information.} \item{filters}{If \code{x} is a pre-loaded matrix, you can provide a list of -filters to subset these matrices. You can specify which drug names, sig IDs, -mechanisms of action, target genes and sources you are interested in (cap -insensitive). You can call \code{\link[ListFilters]{ListFilters}} function to -check all the available values for these filters. The signatures that pass -\strong{ANY} of these filters are included in the output.} +filters to subset it. You can specify which \code{drugs}, sig \code{IDs}, +mechanisms of action (\code{MoAs}), \code{targets} and/or \code{sources} you +are interested in (cap insensitive). You can call +\code{\link[beyondcell]{ListFilters}} to check all the available values for +these filters. The signatures that pass \strong{ANY} of them are included in +the output.} \item{comparison}{\code{"treated_vs_control"} or \code{"sensitive_vs_resistant"}. See Details for more information.} -\item{include.pathways}{\code{TRUE} (default) or \code{FALSE}. Whether or -not return \code{beyoncell}'s pre-computed genesets for functional pathways.} +\item{include.pathways}{Logical. Return \code{beyoncell}'s pre-computed +signatures for functional pathways?} } \value{ -A \code{\link[geneset]{geneset}} object with up and/or down genes for -each drug and pathway. +A \code{geneset} object. } \description{ -This function generates up and/or down genelists for each drug -and pathway. +This function creates a \code{\link[beyondcell]{geneset}} +object. } \details{ \code{x} can be: \itemize{ -\item{A pre-loaded matrix:} {Either \code{\link[PSc]{PSc}}, -\code{\link[SSc]{SSc}} or \code{\link[DSS]{DSS}}.} -\item{A numeric matrix:} {A matrix with genes as rows and signatures as -columns that contains some type of numerical value such a t-stat or a LFC to +\item{A pre-loaded matrix:} {Either \code{PSc}, \code{SSc} or \code{DSS}.} +\item{A ranked matrix:} {A matrix with genes as rows and signatures as +columns that contains some type of numeric value such a t-stat or a LFC to rank the genes accordingly.} -\item{A path to a GMT file:} {A file that contains custom genesets. Each -geneset must have an "_UP" or "_DOWN" suffix.} +\item{A path to a GMT file:} {A file that contains custom gene sets. Each +gene set must have an "_UP" or "_DOWN" suffix.} } In addition, \code{mode} can be: \itemize{ -\item{\code{"up"}:} {To return over-expressed genes in the signatures.} -\item{\code{"down"}:} {To return under-expressed genes in the signatures.} +\item{\code{"up"}:} {To compute the signatures using only up-regulated +genes.} +\item{\code{"down"}:} {To compute the signatures using only down-regulated +genes.} +\item{\code{c("up", "down")} :} {To compute the signatures using both up and +down-regulated genes.} } If \code{x} is a path to a GMT file, \code{mode} is deprecated and the names -of all genesets must end in "_UP" or "_DOWN" to indicate the mode of each -one. +of all gene sets must end in "_UP" or "_DOWN" to indicate the \code{mode} of +each one. Finally, \code{comparison} can be: \itemize{ \item{\code{"treated_vs_control"}:} {(\code{PSc} and \code{DSS} like) When -the numeric values or the genesets in the GMT file were obtained from a +the numeric values or the gene sets in the GMT file were obtained from a comparison between drug treated and untreated cells.} \item{\code{"sensitive_vs_resistant"}:} {(\code{SSc} like) When the numeric -values or the genesets in the GMT file were obtained from a comparison +values or the gene sets in the GMT file were obtained from a comparison between drug sensitive and resistant cells.} } When \code{x} is a pre-loaded matrix, \code{comparison} is set automatically. diff --git a/man/GetIDs.Rd b/man/GetIDs.Rd new file mode 100644 index 0000000..369fc40 --- /dev/null +++ b/man/GetIDs.Rd @@ -0,0 +1,24 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/Basics.R +\name{GetIDs} +\alias{GetIDs} +\title{Returns the IDs that match the specified filter's values} +\usage{ +GetIDs(values, filter, df = drugInfo) +} +\arguments{ +\item{values}{User-supplied filtering vector.} + +\item{filter}{Column name or number to subset by.} + +\item{df}{\code{data.frame} with drug information. It must contain, at least, +two columns: \code{"IDs"} and \code{filter}.} +} +\value{ +A vector with the IDs that match the \code{filter}'s values. +} +\description{ +This function subsets \code{df} to select only the entries that +match the specified \code{filter}'s \code{values} and returns the +corresponding IDs. +} diff --git a/man/GetStatistics.Rd b/man/GetStatistics.Rd index 020d3e0..4afd150 100644 --- a/man/GetStatistics.Rd +++ b/man/GetStatistics.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/Basics.R \name{GetStatistics} \alias{GetStatistics} -\title{Computes bcRanks' statistics and ranks} +\title{Computes the BCS' statistics and ranks} \usage{ GetStatistics(bc, signatures, cells, pb, total, i, n.rows, extended) } @@ -13,13 +13,13 @@ GetStatistics(bc, signatures, cells, pb, total, i, n.rows, extended) \item{cells}{Vector with the names of the cells of interest.} -\item{pb}{Progress bar.} +\item{pb}{\code{\link[utils]{txtProgressBar}}.} -\item{total}{Number of iterations to complete the progress bar.} +\item{total}{Number of iterations to complete the \code{pb}.} -\item{i}{Iteration number; used for increasing the progress bar.} +\item{i}{Iteration number. Used to increase the \code{pb}.} -\item{n.rows}{Number of signatures; used for increasing the progress bar.} +\item{n.rows}{Number of signatures. Used to increase the \code{pb}.} \item{extended}{If \code{extended = TRUE}, this function returns the switch point, mean, median, sd, variance, min, max, proportion of \code{NaN} and @@ -27,9 +27,9 @@ residuals' mean per signature. If \code{extended = FALSE}, this function returns only the switch point, mean and residuals' mean.} } \value{ -A \code{data.frame} with the statistics and ranks per signature. +A \code{data.frame} with the BCS' statistics and ranks. } \description{ -This function computes the bcscores' statistics and ranks -returned by \code{\link[bcRnaks]{bcRanks}}. +This function computes the beyondcell scores' (BCS) statistics +and ranks returned by \code{\link[beyondcell]{bcRanks}}. } diff --git a/man/ListFilters.Rd b/man/ListFilters.Rd index 009512f..d082604 100644 --- a/man/ListFilters.Rd +++ b/man/ListFilters.Rd @@ -7,14 +7,14 @@ ListFilters(entry) } \arguments{ -\item{entry}{Either \code{"drugs"}, \code{"IDs"}, \code{"MoA"}, -\code{"targets"} or \code{"source"}.} +\item{entry}{Either \code{"drugs"}, \code{"IDs"}, \code{"MoAs"}, +\code{"targets"} or \code{"sources"}.} } \value{ All the possible values for the specified \code{entry}. } \description{ This function returns all the available values for \code{drugs}, -\code{sig_ids}, \code{MoAs}, \code{targets} or \code{source} filters in -\code{\link[GenerateGenesets]{GenerateGenesets}} function. +\code{IDs}, \code{MoAs}, \code{targets} or \code{sources} filters in +\code{\link[beyondcell]{GenerateGenesets}} function. } diff --git a/man/Mean.Med.SD.Rd b/man/Mean.Med.SD.Rd index bc7abc9..58ffc66 100644 --- a/man/Mean.Med.SD.Rd +++ b/man/Mean.Med.SD.Rd @@ -7,12 +7,15 @@ Mean.Med.SD(x) } \arguments{ -\item{x}{A numeric vector for which to compute the statistics.} +\item{x}{Numeric vector.} + +\item{na.rm}{(From base) Logical. Should missing values (including NaN) be +removed?} } \value{ A named numeric vector with the mean, median and sd of \code{x}. } \description{ This function computes the mean, the median and the sd of a -vector removing \code{NA} values. +vector. } diff --git a/man/SwitchPoint.Rd b/man/SwitchPoint.Rd index 1827279..6801127 100644 --- a/man/SwitchPoint.Rd +++ b/man/SwitchPoint.Rd @@ -15,8 +15,8 @@ A named vector with the swich points of the signatures in \code{bc}. \description{ This function computes the switch point of the signatures of a given \code{\link[beyondcell]{beyondcell}} object. The switch point is the -(subsetted and/or regressed) scaled bcscore that corresponds to the point in -which the normalized bcscores in \code{beyondcell@data} switch from -negative (insensitive) to positive (sensitive) values. The closer to 0, the -more sensitive are the cells to a given drug. +(subsetted and/or regressed) scaled beyondcell score (BCS) that corresponds +to the point in which the normalized BCS in \code{beyondcell@data} switchs +from negative (insensitive) to positive (sensitive) values. The closer to 0, +the more sensitive are the cells to a given drug. } diff --git a/man/bc4Squares.Rd b/man/bc4Squares.Rd index 793717a..0f09d91 100644 --- a/man/bc4Squares.Rd +++ b/man/bc4Squares.Rd @@ -2,11 +2,11 @@ % Please edit documentation in R/Visualization.R \name{bc4Squares} \alias{bc4Squares} -\title{Plots a 4 squares plot} +\title{Drawns a 4 squares plot} \usage{ bc4Squares( bc, - idents = NULL, + idents, lvl = NULL, top = 3, topnames = NULL, @@ -20,19 +20,21 @@ bc4Squares( \item{idents}{Name of the metadata column of interest.} -\item{lvl}{Character vector with the \code{idents}' level of interest. If +\item{lvl}{Character vector with the \code{idents}' level(s) of interest. If \code{lvl = NULL}, all levels will be plotted.} -\item{top}{Number of top drugs per group to be labelled.} +\item{top}{Number of top drugs per quadrant to be labelled.} \item{topnames}{Character vector with additional interesting drugs to be -labelled (either their names and/or sig IDs).} +labelled (either their names or sig IDs).} -\item{force}{(From \code{ggrepel}) Force of repulsion between overlapping -text labels. Defaults to 1.} +\item{force}{(From \code{\link[ggrepel]{ggrepel}}) Force of repulsion between +overlapping text labels. Defaults to 1.} -\item{alpha}{Transparency level between 0 (not transparent) and 1 (fully +\item{alpha}{Transparency level between 1 (not transparent) and 0 (fully transparent).} + +\item{pt.size}{Point size.} } \value{ A list of \code{ggplot2} objects, one per \code{lvl}. @@ -40,17 +42,19 @@ A list of \code{ggplot2} objects, one per \code{lvl}. \description{ This function drawns a 4 square plot of the drug signatures present in a \code{\link[beyondcell]{beyondcell}} object. A 4 squares plot -consists in a scatter plot of the bcscores' residuals (x axis) vs the switch -points (y axis). 4 quadrants are highlighted: the top-left and bottom-right -corners contain the drugs to which all selected cells are most/least -sensistive, respectively. The centre quadrants show the drugs to which half -of the selected cells are sensitive and the other half insensitive. +consists in a scatter plot of the residuals' means (x axis) vs the switch +points (y axis) of a specified cluster (either a therapeutic cluster or a +group defined by experimental condition or phenotype). 4 quadrants are +highlighted: the top-left and bottom-right corners contain the drugs to which +all selected cells are least/most sensistive, respectively. The centre +quadrants show the drugs to which these cells are differentially insensitive +or sensitive when compared to the other clusters. x cut-offs: first and last deciles; y cut-offs: 0.1, 0.4, 0.6 and 0.9. } \details{ This function returns a list of \code{\link[ggplot2]{ggplot2}} -objects, one per each \code{lvl}. Note that residuals are different for each -level while swicth points are signature-specific. So, x axis will vary and y -axis will remain constant accross all plots. +objects, one per each \code{lvl}. Note that residuals' means are different +for each level while swicth points are signature-specific. So, x axis will +vary and y axis will remain constant accross all plots. } diff --git a/man/bcAddMetadata.Rd b/man/bcAddMetadata.Rd new file mode 100644 index 0000000..7b3d0df --- /dev/null +++ b/man/bcAddMetadata.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/Manipulation.R +\name{bcAddMetadata} +\alias{bcAddMetadata} +\title{Add new metadata to an existing beyondcell object} +\usage{ +bcAddMetadata(bc, metadata) +} +\arguments{ +\item{bc}{\code{beyondcell} object.} + +\item{metadata}{Matrix or dataframe with metadata to add. Rownames should be +cell names and colnames should not be already present in +\code{bc@meta.data}.} +} +\value{ +Returns a \code{beyondcell} object with updated metadata. +} +\description{ +This function adds new metadata to an existing +\code{\link[beyondcell]{beyondcell}} object. +} diff --git a/man/bcAddMetatada.Rd b/man/bcAddMetatada.Rd deleted file mode 100644 index d28e326..0000000 --- a/man/bcAddMetatada.Rd +++ /dev/null @@ -1,21 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/Manipulation.R -\name{bcAddMetatada} -\alias{bcAddMetatada} -\title{Add new metadata columns to an existing beyondcell object} -\usage{ -bcAddMetatada(bc, metadata = NULL) -} -\arguments{ -\item{bc}{\code{beyondcell} object.} - -\item{metadata}{Dataframe with metadata to add. Rownames should be cell names -and colnames should not be already present in \code{beyondcell@meta.data}.} -} -\value{ -Returns a \code{beyondcell} object with new added metadata columns. -} -\description{ -This function adds new metadata columns to an existing -\code{\link[beyondcell]{beyondcell}} object. -} diff --git a/man/bcCellCycle.Rd b/man/bcCellCycle.Rd index 6e065db..64f7e97 100644 --- a/man/bcCellCycle.Rd +++ b/man/bcCellCycle.Rd @@ -2,26 +2,25 @@ % Please edit documentation in R/Visualization.R \name{bcCellCycle} \alias{bcCellCycle} -\title{Plots a violindot of the beyondcell scores grouped by the cell cycle -phase} +\title{Plots a violindot of the BCS grouped by cell cycle phase} \usage{ -bcCellCycle(bc, signatures = NULL) +bcCellCycle(bc, signatures) } \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} \item{signatures}{Vector with the names of the signatures of interest. If -\code{signatures == "all"}, all signatures are selected.} +\code{signatures = "all"}, all signatures are selected.} } \value{ A list of \code{\link[ggplot2]{ggplot2}} violindots, one for each -signature of interest. In each violindot, the bcscores are grouped in G1, G2M -or S phase groups. +signature of interest. In each violindot, the BCS are grouped in G1, G2M or S +phase groups. } \description{ This function drawns, for each signature of interest, a -\code{\link[violindot]{violindot}} plot of the bcscores grouped by the cell -cycle phase (G1, G2M or S). Note that this information must be present in -\code{bc@meta.data} and can be obtained using \code{\link[Seurat]{Seurat}}'s -function \code{\link[CellCycleScoring]{CellCycleScoring}}. +\code{\link[see]{geom_violindot}} plot of the beyondcell scores (BCS) grouped +by cell cycle phase (G1, G2M or S). Note that this information must be +present in \code{bc@meta.data} and can be obtained using +\code{\link[Seurat]{CellCycleScoring}}. } diff --git a/man/bcClusters.Rd b/man/bcClusters.Rd index bb3fee4..74700e9 100644 --- a/man/bcClusters.Rd +++ b/man/bcClusters.Rd @@ -2,32 +2,30 @@ % Please edit documentation in R/Visualization.R \name{bcClusters} \alias{bcClusters} -\title{Plots the UMAP reduction colored by metadata information} +\title{Plots a UMAP reduction coloured by metadata information} \usage{ -bcClusters(bc, UMAP = "beyondcell", idents = NULL, factor.col = TRUE, ...) +bcClusters(bc, idents, UMAP = "beyondcell", factor.col = TRUE, ...) } \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} -\item{UMAP}{UMAP reduction to plot. Either "beyondcell" (computed using -\code{\link[bcUMAP]{bcUMAP}}) or "Seurat" computed using \code{Seurat}'s -functions.} +\item{idents}{Name of the metadata column to colour by.} -\item{idents}{Name of the metadata column to color by.} +\item{UMAP}{UMAP reduction to plot. Either \code{"beyondcell"}, computed +using \code{\link[beyondcell]{bcUMAP}}, or \code{"Seurat"}, obtained using +\code{Seurat}'s functions.} \item{factor.col}{Logical indicating if \code{idents} column is a factor or -not. If \code{idents} is a numerical column (such as \code{percent.mt} or -\code{nFeature_RNA}, \code{factor.col} must be \code{FALSE}).} +not. Set \code{factor.col = FALSE} if \code{idents} is a numeric column (such +as \code{percent.mt} or \code{nFeature_RNA}).} -\item{...}{Other arguments passed to \code{Seurat}'s -\code{\link[DimPlot]{DimPlot}}.} +\item{...}{Other arguments passed to \code{\link[Seurat]{DimPlot}}.} } \value{ -A \code{\link[ggplot]{ggplot}} object with the UMAP reduction colored -by \code{idents}. +A \code{ggplot2} object with the UMAP reduction coloured by +\code{idents}. } \description{ -This function returns a {\link[ggplot]{ggplot}} object with the -UMAP reduction (either \code{beyondcell}'s or \code{Seurat}'s) colored by the -specified metadata column. +This function returns a {\link[ggplot2]{ggplot2}} object with +a UMAP reduction coloured by the specified metadata column. } diff --git a/man/bcHistogram.Rd b/man/bcHistogram.Rd index 9e96ad1..05218aa 100644 --- a/man/bcHistogram.Rd +++ b/man/bcHistogram.Rd @@ -2,30 +2,29 @@ % Please edit documentation in R/Visualization.R \name{bcHistogram} \alias{bcHistogram} -\title{Plots a histogram with the beyondcell scores of the signature of -interest} +\title{Plots a histogram with the BCS of the signature of interest} \usage{ -bcHistogram(bc, signatures = NULL, idents = NULL) +bcHistogram(bc, signatures, idents = NULL) } \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} \item{signatures}{Vector with the names of the signatures of interest. If -\code{signatures == "all"}, all signatures are selected.} +\code{signatures = "all"}, all signatures are selected.} \item{idents}{Name of the metadata column of interest. If -\code{idents = NULL}, a single histogram with all bcscores is drawn. On the -other hand, if \code{idents != NULL} a histogram for each level found in +\code{idents = NULL}, a single histogram with all BCS is drawn. On the other +hand, if \code{idents != NULL} a histogram for each level found in \code{idents} will be drawn.} } \value{ -A list of \code{\link[ggplot]{ggplot}} histograms, one for each +A list of \code{\link[ggplot2]{ggplot2}} histograms, one for each signature of interest. In each histogram, the median, mean and sd are reported. Also, the mean is indicated with a black dashed line and the median with a red dashed line. } \description{ -This function drawns a histogram plot of bcscores for each -signature of interest. The plot can be a single histogram (if -\code{idents = NULL}) or a histogram for each level found in \code{idents}. +This function drawns a histogram of beyondcell scores (BCS) for +each signature of interest. The plot can be a single histogram or a histogram +for each level found in \code{idents}. } diff --git a/man/bcMerge.Rd b/man/bcMerge.Rd index ea82610..4ba382e 100644 --- a/man/bcMerge.Rd +++ b/man/bcMerge.Rd @@ -17,6 +17,6 @@ A merged \code{beyondcell} object. \description{ This function merges two \code{\link[beyondcell]{beyondcell}} objects obtained from the same single-cell matrix using the same -\code{expr.thres} (See \code{\link[bcScore]{bcScore}} for more information). -It binds signatures, not cells. +\code{expr.thres} (see \code{\link[beyondcell]{bcScore}} for more +information). It binds signatures, not cells. } diff --git a/man/bcRanks.Rd b/man/bcRanks.Rd index 279dd09..b1da20a 100644 --- a/man/bcRanks.Rd +++ b/man/bcRanks.Rd @@ -15,16 +15,16 @@ bcRanks(bc, idents = NULL, extended = TRUE) \code{idents}.} \item{extended}{If \code{extended = TRUE}, this function returns the switch -point, mean, median, sd, variance, min, max, proportion of \code{NaN} and -residuals' mean per signature. If \code{extended = FALSE}, this function -returns only the switch point, mean and residuals' mean.} +point, mean, median, standard deviation, variance, min, max, proportion of +\code{NaN} and residuals' mean per signature. If \code{extended = FALSE}, +this function returns only the switch point, mean and residuals' mean.} } \value{ -A \code{beyondcell object} with the results in a new entry of -\code{bc@ranks} (\code{bc@ranks$general} if \code{idents = NULL} or -\code{bc@ranks$idents} if \code{idents != NULL}). +A \code{beyondcell} object with the results in a new entry of +\code{bc@ranks}: \code{bc@ranks[["general"]]} (if \code{idents = NULL}) or +\code{bc@ranks[[idents]]} (if \code{idents != NULL}). } \description{ -This function computes the bcscores statistics of each -signature and ranks them according to the switch point and mean. +This function computes the beyondcell score's (BCS) statistics +of each signature and ranks them according to the switch point and mean. } diff --git a/man/bcRecompute.Rd b/man/bcRecompute.Rd index ee657b2..e9bfd0a 100644 --- a/man/bcRecompute.Rd +++ b/man/bcRecompute.Rd @@ -7,20 +7,19 @@ bcRecompute(bc, slot = "data") } \arguments{ -\item{bc}{\code{\link[beyondcell]{beyondcell}} object.} +\item{bc}{\code{beyondcell} object.} -\item{slot}{Matrix to recompute the beyondcell object. Either \code{"data"} -or \code{"normalized"}.} +\item{slot}{Score matrix to recompute the \code{beyondcell} object. Either +\code{"data"} or \code{"normalized"}.} } \value{ A recomputed \code{beyondcell} object. } \description{ -This function recomputes a beyondcell object using the matrix -stored in the slot \code{@data} (original scores) or \code{@normalized} -(which can contain subsetted and/or regressed bcscores). Columns added with -\code{\link[bcAddMetadata]{bcAddMetadata}} are preserved, except if they -define therapeutic clusters. Important: \code{bc@reductions} and -\code{bc@background} remain the same, while \code{bc@ranks} and -\code{bc@reductions} are removed. +This function recomputes a \code{\link[beyondcell]{beyondcell}} +object using the matrix stored in the slot \code{@data} (original scores) or +\code{@normalized} (which can contain subsetted and/or regressed scores). +Columns added with \code{\link[beyondcell]{bcAddMetadata}} are preserved, +except if they define therapeutic clusters. Important: \code{bc@background} +remains the same, while \code{bc@ranks} and \code{bc@reductions} are removed. } diff --git a/man/bcRegressOut.Rd b/man/bcRegressOut.Rd index 3b24166..a5da70a 100644 --- a/man/bcRegressOut.Rd +++ b/man/bcRegressOut.Rd @@ -2,21 +2,20 @@ % Please edit documentation in R/Manipulation.R \name{bcRegressOut} \alias{bcRegressOut} -\title{Regress out unwanted effects from beyondcell scores} +\title{Regresses out unwanted effects from BCS} \usage{ -bcRegressOut(bc, vars.to.regress = NULL) +bcRegressOut(bc, vars.to.regress) } \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} -\item{vars.to.regress}{Vector of metadata columns to regress out the -beyondcell scores.} +\item{vars.to.regress}{Vector of metadata columns to regress out the BCS.} } \value{ -Returns a \code{beyondcell} object with regressed normalized scores, -regressed scaled scores and regressed switch points. +Returns a \code{beyondcell} object with regressed normalized BCS, +regressed scaled BCS and regressed switch points. } \description{ This function regresses out unwanted effects from normalized -beyondcell scores. +beyondcell scores (BCS). } diff --git a/man/bcScore.Rd b/man/bcScore.Rd index f178ee2..c5c4bd5 100644 --- a/man/bcScore.Rd +++ b/man/bcScore.Rd @@ -2,24 +2,23 @@ % Please edit documentation in R/Score.R \name{bcScore} \alias{bcScore} -\title{Computes the beyondcell score} +\title{Computes the BCS} \usage{ bcScore(sc, gs, expr.thres = 0.1) } \arguments{ \item{sc}{\code{\link[Seurat]{Seurat}} object or expression matrix.} -\item{gs}{\code{\link[geneset]{geneset}} object.} +\item{gs}{\code{\link[beyondcell]{geneset}} object.} \item{expr.thres}{Minimum fraction of signature genes that must be -expressed in a cell to compute its beyondcell score. Cells with a number of -expressed genes below this fraction will have a \code{NaN} beyondcell -score.} +expressed in a cell to compute its BCS. Cells with a number of expressed +genes below this fraction will have a \code{NaN} BCS.} } \value{ A \code{beyondcell} object. } \description{ -This function computes the beyondcell score and returns an +This function computes the beyondcell score (BCS) and returns an object of class \code{\link[beyondcell]{beyondcell}}. } diff --git a/man/bcSignatures.Rd b/man/bcSignatures.Rd index e3ece49..86144fb 100644 --- a/man/bcSignatures.Rd +++ b/man/bcSignatures.Rd @@ -2,13 +2,13 @@ % Please edit documentation in R/Visualization.R \name{bcSignatures} \alias{bcSignatures} -\title{Plots the UMAP reduction colored by bcscores or gene expression values} +\title{Plots a UMAP reduction coloured by BCS or gene expression values} \usage{ bcSignatures( bc, UMAP = "beyondcell", - signatures = list(values = NULL, limits = c(0, 1), center = NULL, breaks = 0.1, - share.limits = TRUE, colorscale = NULL, alpha = 0.7, na.value = "grey50"), + signatures = list(values = NULL, colorscale = NULL, alpha = 0.7, na.value = "grey50", + limits = c(0, 1), center = NULL, breaks = 0.1), genes = list(values = NULL, limits = c(NA, NA), share.limits = FALSE), merged = NULL, blend = FALSE, @@ -19,82 +19,77 @@ bcSignatures( \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} -\item{UMAP}{UMAP reduction to plot. Either \code{"beyondcell"} (computed -using \code{\link[bcUMAP]{bcUMAP}}) or \code{"Seurat"} computed using -\code{\link[Seurat]{Seurat}}'s functions.} +\item{UMAP}{UMAP reduction to plot. Either \code{"beyondcell"}, computed +using \code{\link[beyondcell]{bcUMAP}}, or \code{"Seurat"}, obtained using +\code{Seurat}'s functions.} -\item{signatures}{List with parameters to color the UMAP by bcscores: +\item{signatures}{List with plot parameters to colour the UMAP by BCS: \itemize{ \item{\code{values}:} {Vector with the names of the signatures of interest. If \code{signatures[["values"]] = "all"}, all signatures are selected.} -\item{\code{limits}:} {Vector with the desired limits for all signatures' -plots. If \code{limits = c(NA, NA)} (default), the \code{limits} are computed -for each signature independently.} -\item{\code{center}:} {A single number indicating the center of the -\code{colorscale} for all signatures' plots. Alternatively, the \code{center} -can be a vector of two numbers. In this case, the \code{center} of the -\code{colorscale} is the middle point between these two numbers and the -\code{breaks} are computed using the difference between them. If -\code{center = NULL} (default), the \code{center} of each signature is its -switch point.} -\item{\code{breaks}:} {A single number indicating the break size of the -\code{colorscale}. Alternatively, it can be a vector with the desired breaks -(which don't have to be symmetric or equally distributed). If \code{center} -is a vector of two numbers, \code{breaks} are computed using the difference -between them.} -\item{\code{share.limits}:} {Logical argument. If \code{share.limits = TRUE} -(default), all signatures' plots will have the same \code{limits = c(0, 1)}. -If \code{share.limits = FALSE}, each signature plot will have its own -\code{limits}. Note that if \code{limits != c(NA, NA)}, -\code{share.limits = TRUE}.} \item{\code{colorscale}:} {Either a \code{viridis}, \code{RColorBrewer} or a -custom palette of 3 colors (low, medium and high) to color all signatures' -plots. If \code{colorscale = NULL} (default), the plots are colored using +custom palette of 3 colours (low, medium and high) to colour all signatures' +plots. If \code{colorscale = NULL} (default), the plots are coloured using \code{beyondcell}'s own palette.} -\item{\code{alpha}:} {Transparency level between 0 (not transparent) and 1 +\item{\code{alpha}:} {Transparency level between 1 (not transparent) and 0 (fully transparent).} -\item{\code{na.value}:} {Color to use for missing values (\code{NA}s).}}} +\item{\code{na.value}:} {Colour to use for missing values (\code{NA}s).} +\item{\code{limits}:} {Vector with the desired limits for all signatures' +plots.} +\item{\code{center}:} {A single number indicating the centre of the +\code{colorscale} for all signatures' plots. If \code{center = NULL} +(default), the \code{center} for each signature is its switch point.} +\item{\code{breaks}:} {A single number indicating the break size of the +\code{colorscale}. Alternatively, it can be a vector with the desired breaks +(which don't have to be symmetric or equally distributed).}}} -\item{genes}{List with parameters to color the UMAP by gene expression +\item{genes}{List with plot parameters to colour the UMAP by gene expression values: \itemize{ \item{\code{values}:} {Vector with the names of the genes of interest. If \code{genes[["values"]] = "all"}, all genes are selected.} \item{\code{limits}:} {Vector with the desired limits for all genes' plots. -If \code{limits = c(NA, NA)} (default), the \code{limits} are computed for -each gene independently.} +If \code{limits = c(NA, NA)} (default), the \code{limits} are computed +automatically. See Details for more information.} \item{\code{share.limits}:} {Logical argument. If \code{share.limits = TRUE}, all genes' plots will have the same \code{limits}. If \code{share.limits = FALSE} (default), each gene plot will have its own -\code{limits}. Note that if \code{limits != c(NA, NA)}, -\code{share.limits = TRUE}.}}} +\code{limits}. See Details for more information.}}} \item{merged}{If \code{merged != NULL}, two signatures will be superposed in the same plot. If \code{merged = "direct"}, the signatures are assumed to -have a direct relationship and the bcscores will be added (+). On the other -hand, if \code{merged = "indirect"}, the signatures are assumed to have an -indirect relationship and their bcscores will be substracted (-).} +have a direct relationship and the BCS will be added. On the other hand, if +\code{merged = "indirect"}, the signatures are assumed to have an indirect +relationship and their BCS will be substracted.} -\item{blend}{(From \code{Seurat}) Scale and blend expression values to -visualize coexpression of two genes.} +\item{blend}{(From \code{\link[Seurat]{FeaturePlot}}) Scale and blend +expression values to visualise co-expression of two genes.} \item{mfrow}{Numeric vector of the form \code{c(nr, nc)}. \code{nr} corresponds to the number of rows and \code{nc} to the number of columns of -the arrays in which the plots will be drawn. If you want to draw the plots +the grid in which the plots will be drawn. If you want to draw the plots individually, set \code{mfrow = c(1, 1)}.} -\item{...}{Other arguments passed to \code{Seurat}'s -\code{\link[FeaturePlot]{FeaturePlot}}.} +\item{...}{Other arguments passed to \code{FeaturePlot}.} } \value{ -A list of \code{patchwork}s (if \code{mfrow != c(1, 1)}) or -\code{ggplot2}s (if \code{mfrow = c(1, 1)}) of the desired UMAP reduction -colored by the beyondcell scores (for signatures) or gene expression values -(for genes). +A list of \code{patchwork} (if \code{mfrow != c(1, 1)}) or +\code{ggplot2} objects (if \code{mfrow = c(1, 1)}) of the desired UMAP +reduction coloured by the BCS (for signatures) or gene expression values (for +genes). } \description{ This function returns a list of -\code{\link[patchwork]{patchwork}}s or \code{\link[ggplot2]{ggplot2}}s with -the desired UMAP reduction (either \code{beyondcell}'s or \code{Seurat}'s) -colored by bcscores or gene expression values. +\code{\link[patchwork]{patchwork}} or \code{\link[ggplot2]{ggplot2}} objects +with the desired UMAP reduction coloured by beyondcell scores (BCS) or gene +expression values. +} +\details{ +When \code{genes[["limits"]] = c(NA, NA)}, \code{bcSignatures} +computes the limits automatically. You can make all plots share the same +limits by specifying \code{genes[["share.limits"]] = TRUE}, or make the +function to compute the limits individually for each gene with +\code{genes[["share.limits"]] = FALSE}. Moreover, if you specify a value for +\code{genes[["limits"]]}, \code{genes[["share.limits"]]} will automatically +set to \code{TRUE} and all plots will share those limits. } diff --git a/man/bcSubset.Rd b/man/bcSubset.Rd index d2528ed..dd68ca4 100644 --- a/man/bcSubset.Rd +++ b/man/bcSubset.Rd @@ -9,8 +9,8 @@ bcSubset( signatures = NULL, bg.signatures = NULL, cells = NULL, - nan.sigs = NULL, - nan.cells = NULL + nan.sigs = 1, + nan.cells = 1 ) } \arguments{ @@ -19,35 +19,35 @@ bcSubset( \item{signatures}{Vector with the names of the signatures to subset by. If \code{signatures = NULL}, all signatures will be kept.} -\item{bg.signatures}{Vector with the names of the \code{@background} -signatures to subset by. If \code{bg.signatures = NULL}, all background -signatures will be kept.} +\item{bg.signatures}{Vector with the names of the background signatures to +subset by. If \code{bg.signatures = NULL}, all background signatures will be +kept.} \item{cells}{Vector with the names of the cells to subset by. If \code{cells = NULL}, all cells will be kept.} -\item{nan.sigs}{Maximum desired proportion of \code{NaN} values per signature -in the output \code{beyondcell} object. All signatures with a proportion of -\code{NaN >= nan.sig} will be removed.} +\item{nan.sigs}{Maximum proportion of \code{NaN} values per signature in the +output \code{beyondcell} object. All signatures with a proportion of +\code{NaN > nan.sig} will be removed.} -\item{nan.cells}{Maximum desired proportion of \code{NaN} values per cell -in the output \code{beyondcell} object. All cells with a proportion of -\code{NaN >= nan.cells} will be removed.} +\item{nan.cells}{Maximum proportion of \code{NaN} values per cell in the +output \code{beyondcell} object. All cells with a proportion of +\code{NaN > nan.cells} will be removed.} } \value{ A subsetted \code{beyondcell} object. } \description{ This function subsets a \code{\link[beyondcell]{beyondcell}} -object based on the names of the signatures, cells and/or the maximum desired -proportion of \code{NaN} values in each signature and/or cell. See Details -for more information. +object based on signature names, cells and/or the maximum proportion of +\code{NaN} values desired in each signature and/or cell. See Details for more +information. } \details{ This function can subset a \code{beyondcell} object using its 5 parameters alone or in combination. -So, for example if you specify \code{signatures} and \code{cells}, the +So, for example, if you specify \code{signatures} and \code{cells}, the resulting \code{beyondcell} object (except the \code{@background} slot) will be subsetted according to those vectors. The slot \code{@background} will be only subsetted according to \code{cells}. If you want to subset it by @@ -55,12 +55,12 @@ signatures as well, you must specify a value for \code{bg.signatures}. On the other hand, if you specify \code{cells} and \code{nan.sigs}, the output \code{beyondcell} object will keep the selected cells and those -signatures with aproportion of \code{NaN} values above \code{nan.sigs}. Note -that \code{nan.sigs} and \code{nan.cells} arguments also subset the -signatures and cells that do not meet the criteria in \code{@background} -slot. +signatures with a proportion of \code{NaN} values below or equal to +\code{nan.sigs}. Note that \code{nan.sigs} and \code{nan.cells} arguments +also subset the signatures and cells that meet the criteria in +\code{@background} slot. Finally, if you specify all parameters, the result will keep those signatures -and cells of interest with a proportion of \code{NaN} above \code{nan.sigs} -and \code{nan.cells} respectively. +and cells of interest with a proportion of \code{NaN} below or equal to +\code{nan.sigs} and \code{nan.cells}, respectively. } diff --git a/man/bcUMAP.Rd b/man/bcUMAP.Rd index 3a3f3f0..02b6aa9 100644 --- a/man/bcUMAP.Rd +++ b/man/bcUMAP.Rd @@ -2,8 +2,7 @@ % Please edit documentation in R/Reductions.R \name{bcUMAP} \alias{bcUMAP} -\title{Computes the UMAP projection and the therapeutic clusters using the -beyondcell scores} +\title{Computes a UMAP projection and therapeutic clusters using the BCS} \usage{ bcUMAP( bc, @@ -17,55 +16,55 @@ bcUMAP( \arguments{ \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} -\item{pc}{Number of principal components to use (estimated from the Elbow -plot). If \code{pc = NULL} (default), the function will stop prior to compute -the UMAP projection and the therapeutic clusters. See Details for more -information.} +\item{pc}{Number of principal components to use (which can be estimated from +an elbow plot). If \code{pc = NULL} (default), the function will stop prior +to compute the UMAP projection and the therapeutic clusters. See Details for +more information.} -\item{k.neighbors}{(\code{Seurat}'s \code{k.param}). Defines \emph{k} for the -k-nearest neighbor algorithm.} +\item{k.neighbors}{(\code{\link[Seurat]{FindNeighbors}}' \code{k.param}) +Defines \emph{k} for the k-Nearest Neighbour algorithm.} -\item{res}{(\code{Seurat}'s \code{resolution}) Value of the resolution -parameter, use a value above (below) 1.0 if you want to obtain a larger -(smaller) number of communities. Can be a single number or a numeric vector.} +\item{res}{(\code{\link[Seurat]{FindClusters}}' \code{resolution}) Value of +the resolution parameter, use a value above/below 1.0 if you want to obtain +a larger/smaller number of communities. Can be a single number or a numeric +vector.} -\item{add.DSS}{Use background beyondcell scores computed with the -\code{\link[DSS]{DSS}}) signature (\code{add.DSS = TRUE}) or just use the -drugs included in the \code{bc} object (\code{add.DSS = FALSE}) to compute -the UMAP projection and the therapeutic clusters. If the number of drugs in -\code{bc} (excluding pathways) is < 20, it is recomended to set -\code{add.DSS = TRUE}. Note that if \code{add.DSS = TRUE}, the regression and -subset steps that have been applied to \code{bc} will also be applied to the -background beyondcell scores.} +\item{add.DSS}{Use background BCS computed with \code{DSS} signatures +(\code{add.DSS = TRUE}) or just use the signatures included in the \code{bc} +object (\code{add.DSS = FALSE}) to compute the UMAP projection and the +therapeutic clusters. If the number of drugs in \code{bc} +(excluding pathways) is <= 20, it is recomended to set \code{add.DSS = TRUE}. +Note that if \code{add.DSS = TRUE}, the regression and subset steps that have +been applied on \code{bc} will also be applied on the background BCS.} -\item{elbow.path}{Path to save the Elbow plot. If \code{elbow.path = NULL} +\item{elbow.path}{Path to save the elbow plot. If \code{elbow.path = NULL} (default), the plot will be printed.} } \value{ A \code{beyondcell} object with the UMAP reduction in -\code{bc@reductions} slot and the therapeutic clusters for each \code{res} -in \code{bc@meta.data}. Also, an Elbow plot (\code{\link[ggplot]{ggplot}} +\code{@reductions} slot and the therapeutic clusters for each \code{res} +in \code{bc@meta.data}. Also, an elbow plot (\code{\link[ggplot2]{ggplot2}} object) is printed (if \code{elbow.path = NULL}) or saved (if -\code{elbow.path} is not \code{NULL}). +\code{elbow.path != NULL}). } \description{ -This function uses the beyondcell scores to compute an UMAP -projection of the data and clusterizes cells according to their sensitivity -to the tested drugs (therapeutic clusters). +This function uses the beyondcell scores (BCS) to compute a +UMAP projection of the data and to cluster cells according to their +sensitivity to the tested drugs (therapeutic clusters). } \details{ -This function performs all the steps required to obtain an UMAP -reduction of the data and clusterize the cells according to the beyondcell -scores. You will normally require to run the function twice: +This function performs all the steps required to obtain a UMAP +reduction of the data and cluster the cells according to the BCS. + +You will normally require to run the function twice: \enumerate{ -\item Using \code{pc = NULL} to obtain the Elbow plot. +\item Using \code{pc = NULL} to obtain the elbow plot. \item Specifying a value for the \code{pc} parameter according to this plot. -This second time, the UMAP reduction and the beyondcell clusters will be -computed. -} +This second time, the UMAP reduction and the therapeutic clusters will be +computed.} -Note that \code{add.DSS} must be the same in both runs, so the Elbow plot +Note that \code{add.DSS} must be the same in both runs, so the elbow plot obtained in 1 is still valid in 2. If \code{add.DSS = TRUE}, the background -beyondcell scores will be stored in the \code{bc} object and the function -will skip this step the second time. +BCS will be stored in the \code{bc} object and the function will skip this +step the second time. } diff --git a/man/beyondcell-class.Rd b/man/beyondcell-class.Rd index 5f7f208..5fc9eb7 100644 --- a/man/beyondcell-class.Rd +++ b/man/beyondcell-class.Rd @@ -4,50 +4,53 @@ \name{beyondcell-class} \alias{beyondcell-class} \alias{beyondcell} -\title{An S4 class to represent the beyondcell scores for each cell and signature.} +\title{Beyondcell class} \description{ -An S4 class to represent the beyondcell scores for each cell and signature. +An object to represent the beyondcell scores (BCS) for each cell and +signature. } \section{Slots}{ \describe{ -\item{\code{scaled}}{(Subsetted and/or regressed) scaled beyondcell scores.} +\item{\code{scaled}}{(Subsetted and/or regressed) scaled BCS.} -\item{\code{normalized}}{(Subsetted and/or regressed) normalized beyondcell scores.} +\item{\code{normalized}}{(Subsetted and/or regressed) normalized BCS.} -\item{\code{data}}{Original normalized beyondcell scores, without subsetting or -regression.} +\item{\code{data}}{Original normalized BCS, without subsetting or regression.} + +\item{\code{switch.point}}{(Subsetted and/or regressed) scaled BCS for which the +normalized score in \code{@data} is 0 (one switch point per signature).} -\item{\code{switch.point}}{(Subsetted and/or regressed) scaled beyondcell score for -which the normalized score in \code{@data} is 0 (one switch point per -signature).} +\item{\code{ranks}}{List of dataframes with the BCS' statistics and ranks returned +by \code{\link[beyondcell]{bcRanks}}.} -\item{\code{ranks}}{List of dataframes with the statistics (switch point, -mean, median, sd, variance, min, max, proportion of \code{NaN}, residuals' -mean and ranks of each signature in the beyondcell object. This slot is -filled using the function \code{\link[bcRanks]{bcRanks}}.} +\item{\code{expr.matrix}}{Single-cell expression matrix used to compute the BCS.} -\item{\code{expr.matrix}}{Expression matrix used to compute the beyondcell scores.} +\item{\code{meta.data}}{Dataframe that contains information about each cell +(including the therapeutic clusters and \code{\link[Seurat]{Seurat}}'s +\code{@meta.data}).} -\item{\code{meta.data}}{Contains information about each cell (including the -therapeutic clusters and \code{Seurat}'s \code{@metadata}).} +\item{\code{SeuratInfo}}{List with information about the input \code{Seurat} object, +including the \code{@reductions}.} -\item{\code{background}}{(Subsetted and/or regressed) normalized beyondcell scores -obtained using DSS signatures. Useful to compute the UMAP reduction and the +\item{\code{background}}{(Subsetted and/or regressed) normalized BCS obtained using +DSS signatures. Useful to compute \code{beyondcell}'s UMAP reduction and the therapeutic clusters when the number of drug signatures is low.} \item{\code{reductions}}{A list of dimensional reductions for this object.} \item{\code{regression}}{A list with the order of subset and regression steps -performed on the beyondcell object and the variables used for regression.} +performed on the \code{beyondcell} object and the variables used for +regression.} -\item{\code{n.genes}}{Argument passed to -\code{\link[GenerateGenesets]{GenerateGenesets}}.} +\item{\code{n.genes}}{Argument passed to \code{\link[beyondcell]{GenerateGenesets}}. +Number of up and/or down-regulated genes per signature.} -\item{\code{mode}}{Argument passed to -\code{\link[GenerateGenesets]{GenerateGenesets}}.} +\item{\code{mode}}{Argument passed to \code{GenerateGenesets}. Whether the +\code{geneset} contains up and/or down-regulated genes.} \item{\code{thres}}{Argument \code{expr.thres} passed to -\code{\link[bcScore]{bcScore}}.} +\code{\link[beyondcell]{bcScore}}. Minimum fraction of signature genes that +must be expressed in a cell to compute its BCS.} }} diff --git a/man/center_scale_colour_stepsn.Rd b/man/center_scale_colour_stepsn.Rd index 667cf5d..b0fdf84 100644 --- a/man/center_scale_colour_stepsn.Rd +++ b/man/center_scale_colour_stepsn.Rd @@ -2,49 +2,44 @@ % Please edit documentation in R/Format.R \name{center_scale_colour_stepsn} \alias{center_scale_colour_stepsn} -\title{Creates a centered diverging binned colour gradient} +\title{Creates a centred sequential binned colour gradient} \usage{ center_scale_colour_stepsn( x, + colorscale, + alpha = 0.7, + na.value = "grey50", limits = c(NA, NA), center = NULL, - breaks = 0.1, - colorscale = NULL, - alpha = 0.7, - na.value = "grey50" + breaks = 0.1 ) } \arguments{ -\item{x}{A numeric vector. Can contain \code{NA}s.} +\item{x}{A numeric vector. It can contain \code{NA}s.} -\item{limits}{Vector with the desired limits.} +\item{colorscale}{A vector with 5 colours that can be obtained using +\code{\link[beyondcell]{get_colour_steps}}.} -\item{center}{A single number indicating the center of the \code{colorscale}. -Alternatively, the \code{center} can be a vector of two numbers. In this -case, the \code{center} of the \code{colorscale} is the middle point between -these two numbers and the \code{breaks} are computed using the difference -between them. If \code{center = NULL} (default), the center is set to the -middle point of \code{x}.} +\item{alpha}{Transparency level between 1 (not transparent) and 0 (fully +transparent).} -\item{breaks}{A single number indicating the break size of the -\code{colorscale}. Alternatively, it can be a vector with the desired breaks -(which don't have to be symmetric or equally distributed). If \code{center} -is a vector of two numbers, \code{breaks} are computed using the difference -between them.} +\item{na.value}{Colour to use for missing values.} -\item{colorscale}{A vector with 5 colors which can be obtained using -\code{\link[get_colour_stepsn]{get_colour_stepsn}}.} +\item{limits}{Vector with the desired limits.} -\item{alpha}{Transparency level between 0 (not transparent) and 1 (fully -transparent).} +\item{center}{A single number indicating the centre of the \code{colorscale}. +If \code{center = NULL} (default), the centre is set to the middle point of +\code{x}.} -\item{na.value}{Color to use for missing values.} +\item{breaks}{A single number indicating the break size of the +\code{colorscale}. Alternatively, it can be a vector with the desired breaks +(which don't have to be symmetric or equally distributed).} } \value{ -A centered diverging binned colour gradient that can be use to color -\code{\link[ggplot2]{ggplot2}} objects. +A centred sequential binned colour gradient that can be used to +colour \code{\link[ggplot2]{ggplot2}} objects. } \description{ -This function creates a diverging binned colour gradient -(low-mid-high) centered around \code{center}. +This function creates a sequential binned colour gradient +(low-mid-high) centred around \code{center}. } diff --git a/man/colMinus.Rd b/man/colMinus.Rd index 5f66971..0ab4ae1 100644 --- a/man/colMinus.Rd +++ b/man/colMinus.Rd @@ -2,21 +2,21 @@ % Please edit documentation in R/Basics.R \name{colMinus} \alias{colMinus} -\title{Substracts to the first row of a rectangular object the rest of rows} +\title{Computes the column substraction} \usage{ colMinus(x, na.rm = FALSE) } \arguments{ -\item{x}{A matrix or a data frame.} +\item{x}{A matrix or a dataframe.} \item{na.rm}{(From \code{base}) Logical. Should missing values (including -\code{NaN}) be omitted from the calculations?} +\code{NaN}) from rows \code{2:length(x)} be omitted from the calculations?} } \value{ A numeric rectangular object with the result of the substraction. } \description{ -This function substracts to the first row of a numerical -rectangular object (\code{x[1, ]}) the rest of rows of the same rectangular -object (\code{x[2:length(x), ]}). +This function substracts to the first element of each column of +a rectangular object (\code{x[1, n]}) the rest of elements of the same column +(\code{x[2:length(x), n]}). } diff --git a/man/geneset-class.Rd b/man/geneset-class.Rd index 509ba87..00772cd 100644 --- a/man/geneset-class.Rd +++ b/man/geneset-class.Rd @@ -4,29 +4,28 @@ \name{geneset-class} \alias{geneset-class} \alias{geneset} -\title{An S4 class to represent the genesets of each drug} +\title{Geneset class} \description{ -An S4 class to represent the genesets of each drug +An object to represent signatures. } \section{Slots}{ \describe{ -\item{\code{genelist}}{A list of drug and functional signatures (\code{pathways}) -with up and/or down-regulated genes.} +\item{\code{genelist}}{A list of drug signatures and functional \code{pathways} (if +specified) with up and/or down-regulated genes.} -\item{\code{n.genes}}{Argument passed to -\code{\link[GenerateGenesets]{GenerateGenesets}}.} +\item{\code{n.genes}}{Argument passed to \code{\link[beyondcell]{GenerateGenesets}}. +Number of up and/or down-regulated genes per signature.} -\item{\code{mode}}{Argument passed to -\code{\link[GenerateGenesets]{GenerateGenesets}}.} +\item{\code{mode}}{Argument passed to \code{GenerateGenesets}. Whether the +\code{geneset} contains up and/or down-regulated genes.} -\item{\code{info}}{If \code{GenerateGenesets}' input is a pre-loaded matrix, -\code{info} will be a \code{data.frame} with the correspondences between -\code{sig_ids} and drug names, as well as the source of the pre-loaded -matrices (LINCS, CTRP, GDSC or CCLE). If \code{GenerateGenesets}' input is a -path to a GMT file, \code{info} will be empty.} +\item{\code{info}}{Dataframe with drug signatures information, including sig IDs, +drug names, MoAs, target genes and data sources (LINCS, CTRP, GDSC or CCLE). +This slot is only filled if \code{GenerateGenesets}' input is a pre-loaded +matrix.} -\item{\code{comparison}}{Argument passed to -\code{\link[GenerateGenesets]{GenerateGenesets}}.} +\item{\code{comparison}}{Argument passed to \code{GenerateGenesets}. Either +\code{"treated_vs_control"} or \code{"control_vs_treated"}.} }} diff --git a/man/get_colour_steps.Rd b/man/get_colour_steps.Rd new file mode 100644 index 0000000..9af8649 --- /dev/null +++ b/man/get_colour_steps.Rd @@ -0,0 +1,21 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/Format.R +\name{get_colour_steps} +\alias{get_colour_steps} +\title{Returns a vector of 5 colours} +\usage{ +get_colour_steps(colorscale = NULL) +} +\arguments{ +\item{colorscale}{Either a \code{viridis}, \code{RColorBrewer} or a custom +palette of 3 colours (low, medium and high). If \code{colorscale = NULL} +(default), the function returns \code{beyondcell}'s own palette.} +} +\value{ +A vector with 5 colour values. +} +\description{ +This function gets a colour palette and returns a vector with 5 +colour values to be used in +\code{\link[beyondcell]{center_scale_colour_stepsn}}. +} diff --git a/man/get_colour_stepsn.Rd b/man/get_colour_stepsn.Rd deleted file mode 100644 index e4edec1..0000000 --- a/man/get_colour_stepsn.Rd +++ /dev/null @@ -1,21 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/Format.R -\name{get_colour_stepsn} -\alias{get_colour_stepsn} -\title{Returns a vector of colors to use in \code{center_scale_colour_stepsn}} -\usage{ -get_colour_stepsn(colorscale = NULL) -} -\arguments{ -\item{colorscale}{Either a \code{viridis}, \code{RColorBrewer} or a custom -palette of 3 colors (low, medium and high). If \code{colorscale = NULL} -(default), the function returns \code{beyondcell}'s own palette.} -} -\value{ -A vector with 5 color values. -} -\description{ -This function gets a color palette and returns a vector with 5 -color values to be used in -\code{\link[center_scale_colour_stepsn]{center_scale_colour_stepsn}}. -} diff --git a/man/minus.Rd b/man/minus.Rd index 5183650..47730aa 100644 --- a/man/minus.Rd +++ b/man/minus.Rd @@ -7,7 +7,7 @@ minus(x, na.rm = FALSE) } \arguments{ -\item{x}{Numerical vector} +\item{x}{Numeric vector.} \item{na.rm}{(From \code{base}) Logical. Should missing values (including \code{NaN}) be removed?} @@ -16,7 +16,7 @@ minus(x, na.rm = FALSE) The result of the substraction. } \description{ -This function substracts to the first element of a numerical +This function substracts to the first element of a numeric vector (\code{x[1]}) the rest of elements of the same vector. (\code{x[2:length(x)]}). } diff --git a/man/rankSigs.Rd b/man/rankSigs.Rd index 13412d7..346c41c 100644 --- a/man/rankSigs.Rd +++ b/man/rankSigs.Rd @@ -10,21 +10,22 @@ rankSigs(bc, idents = NULL, cond = NULL, n = 10, decreasing = TRUE) \item{bc}{\code{\link[beyondcell]{beyondcell}} object.} \item{idents}{Name of the metadata column of interest. If -\code{idents = NULL}, the function computes the ranks using all cells.} +\code{idents = NULL}, the function uses the general rank computed with all +cells.} \item{cond}{Level of \code{idents} to rank by the output vector. If \code{idents = NULL}, this parameter is deprecated.} \item{n}{Number of signatures to return in the output vector.} -\item{decreasing}{Return the top \code{n} signatures (default) or the bottom -\code{n} signatures (\code{decreasing = FALSE}).} +\item{decreasing}{Logical. Return the top \code{n} signatures (default) or +the bottom \code{n} signatures (\code{decreasing = FALSE})?.} } \value{ An ordered vector with the signature's names. } \description{ This function returns the top/bottom \code{n} signatures ranked -according to the bcscores of cells that satisfy -\code{bc@meta.data$idents == cond}. +by \code{\link[beyondcell]{bcRanks}}. If the rank has not been previously +computed, \code{rankSigs} performs the ranking itself. } diff --git a/tutorial/GenerateGenesets/README.md b/tutorial/GenerateGenesets/README.md deleted file mode 100644 index 952c76c..0000000 --- a/tutorial/GenerateGenesets/README.md +++ /dev/null @@ -1,73 +0,0 @@ -# GenerateGenesets function - - - -By default, `GenerateGenesets` returns a `geneset` with the `250` most upregulated and downregulated genes in each drug signature. You can change this behaviour by providing new values to `n.genes` and `mode`. Moreover, a small collection of functional pathways will be included in your `geneset` object. These pathways are related to the regulation of the epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence and apoptosis. Note that `n.genes` and `mode` arguments do not affect to functional pathways. - -```r -# Generate geneset object with one of the ready to use signature collections. -gset <- GenerateGenesets(PSc) -# Retrieve only the top 100 most upregulated genes in drug signatures (functional pathways remain unchanged) -up100 <- GenerateGenesets(PSc, n.genes = 100, mode = "up") -# You can deactivate the functional pathways option if you are not interested in evaluating them -nopath <- GenerateGenesets(PSc, include.pathways = FALSE) -``` - -Additionaly, you can computed a `geneset` from a pre-loaded PSc subset called DSS. - -```r -# Generate geneset object with one of the ready to use signature collections -dss <- GenerateGenesets(DSS, include.pathways = FALSE) -``` - -Also, you can filter PSc, SSc and DDS objects by several fields (cap insensitive): - - * `drugs`: Drug name of interest (i.e sirolimus). - * `IDs`: `sig_id` of the signature(s) of interest. - * `MoA`: Desired mechanism of action of interest (i.e. MTOR INHIBITOR). - * `targets`: Target gene of interest (i.e. MTOR). - * `source`: `"LINCS"` (for PSc) or `"GDSC"`, `"CCLE"` and/or `"CTRP"` (for SSc) - -```r -# Return a `geneset` with all sirolimus signatures, as well as signatures of sirolimus synonyms such as -# rapamycin or BRD-K84937637 -sirolimus <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(drugs = "sirolimus")) -# Return just a subset of sirolimus signatures -my_sigs <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(IDs = c("sig_2349", "sig_7409")) -# Return all MTOR INHIBITORS -MTORi <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(MoA = "MTOR INHIBITOR") -# Return all drugs targetting MTOR -mtor_targets <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(targets = "MTOR") -# Return only signatures derived from GDSC and CCLE -my_sources <- GenerateGenesets(SSc, include.pathways = FALSE, filters = list(source = c("GDSC", "CCLE")) -``` - -By calling `ListFilters` function, you can retrieve all the available values for a given field. The signatures that pass **ANY** of these filters are included in the final `geneset`. - -```r -# Values for targets -ListFilters(entry = "targets") -# Geneset with all drugs taht target MTOR and sirolimus signatures -filter_combination <- GenerateGenesets(SSc, include.pathways = FALSE, - filters = list(drugs = "sirolimus", targets = "MTOR")) -``` -You can check information about the pre-loaded signatures calling the object `drugInfo`. Also, each `geneset` object obtained using pre-loaded matrices contains a subset of `drugInfo` for the selected drugs. - -```r -# drugInfo of the signatures of interest -gset@info -``` - -Finally, Beyondcell allows the user to input a GMT file containing the functional pathways/signatures of interest or a numeric matrix (containing a ranking criteria such as the t-statistic or logFoldChange). - - * **In case your input is a GMT file:** You must supply the path to the file. Take into account that the names of each gene set must end in `"_UP"` or `"_DOWN"` to specify its mode. In this case, `n.genes` and `mode` are deprecated. - * **In case your input is a numeric matrix:** Make sure that rows correspond to genes and columns to signatures. - -In both cases, `filters` argument is deprecated but you must indicate if the `comparison` that yielded your input was `"treated_vs_control"` or `"sensitive_vs_resistant"`. - -```r -# Mock numeric matrix -m <- matrix(rnorm(500 * 25), ncol = 25, dimnames = list(rownames(PSc[[1]])[1:500], colnames(PSc[[1]])[1:25])) -num_matrix <- GenerateGenesets(m, n.genes = 100, mode = c("up", "down"), - comparison = "treated_vs_control", include.pathways = TRUE) -``` diff --git a/tutorial/analysis_workflow/.img/Phase_variation.png b/tutorial/analysis_workflow/.img/Phase_variation.png index 08d0de2..d63bb1d 100644 Binary files a/tutorial/analysis_workflow/.img/Phase_variation.png and b/tutorial/analysis_workflow/.img/Phase_variation.png differ diff --git a/tutorial/analysis_workflow/.img/bcClusters_corrected.png b/tutorial/analysis_workflow/.img/bcClusters_corrected.png index 068c171..4ba593c 100644 Binary files a/tutorial/analysis_workflow/.img/bcClusters_corrected.png and b/tutorial/analysis_workflow/.img/bcClusters_corrected.png differ diff --git a/tutorial/analysis_workflow/.img/nFeature_corrected.png b/tutorial/analysis_workflow/.img/nFeature_corrected.png index 66e5c10..c8ef1af 100644 Binary files a/tutorial/analysis_workflow/.img/nFeature_corrected.png and b/tutorial/analysis_workflow/.img/nFeature_corrected.png differ diff --git a/tutorial/analysis_workflow/.img/nFeature_variation.png b/tutorial/analysis_workflow/.img/nFeature_variation.png index c923828..949b28a 100644 Binary files a/tutorial/analysis_workflow/.img/nFeature_variation.png and b/tutorial/analysis_workflow/.img/nFeature_variation.png differ diff --git a/tutorial/analysis_workflow/README.md b/tutorial/analysis_workflow/README.md index f5074c5..09441ad 100644 --- a/tutorial/analysis_workflow/README.md +++ b/tutorial/analysis_workflow/README.md @@ -3,91 +3,128 @@ # Analysis workflow ## Data access: -We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four conditions using the Seurat pipeline. To follow this tutorial, the resulting seurat object can be accessed through the following [link](https://zenodo.org/record/4438620). - +We have validated Beyondcell in a population of MCF7-AA cells exposed to 500nM +of bortezomib and collected at different time points: t0 (before treatment), +t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) +obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four +conditions using the Seurat pipeline. To follow this tutorial, the resulting +`Seurat` object can be accessed through the following +[link](https://zenodo.org/record/4438620). ## Using Beyondcell For a correct analysis with **Beyondcell**, users should follow these steps: 1. Read a single-cell expression object - 2. Compute the Beyondcell scores (BCS) + 2. Compute the Beyondcell Scores (BCS) 3. Compute the Therapeutic Clusters (TCs) - * Check clustering and look for unwanted sources of variation + * Check the clustering and look for unwanted sources of variation * Regress out unwanted sources of variation - * Recompute UMAP rduction + * Recompute the UMAP reduction 4. Compute ranks - 5. [**Visualize**](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/visualization) the results - + 5. [Visualize](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/visualization) + the results ### 1. Read a single-cell expression object -Beyondcell can accept both a single-cell matrix or a Seurat object. In order to correctly compute the scores, the transcriptomics data needs to be pre-processed. This means that proper cell-based quality control filters, as well as normalization and scaling of the data, should be applied prior to the analysis with Beyondcell. +Beyondcell can accept both a single-cell matrix or a `Seurat` object. In order +to correctly compute the scores, the transcriptomics data needs to be +pre-processed. This means that proper cell-based quality control filters, as +well as normalization and scaling of the data, should be applied prior to the +analysis with Beyondcell. -> Note: We recommend using a Seurat object. +> Note: We recommend using a `Seurat` object. ```r library("beyondcell") library("Seurat") -# Read single cell experiment -sc = readRDS(path_to_sc) +set.seed(1) +# Read single-cell experiment. +sc <- readRDS(path_to_sc) ``` -Note that if you are using a Seurat object, the `DefaultAssay` must be specified. Both `SCT` and `RNA` assays are accepted. +Note that if you are using a `Seurat` object, the `DefaultAssay` must be +specified. Both `SCT` and `RNA` assays are accepted. ```r -# Set Assay +# Set Assay. DefaultAssay(sc) <- "RNA" ``` ### 2. Compute the BCS We need to perform two steps: -#### Get a geneset object with signatures of interest -In order to compute the BCS, we also need a `geneset` object containing the drug or functional signatures we are interested in evaluating. To create this object, the `GenerateGenesets` function needs to be called. Beyondcell includes two drug signature collections that are ready to use: +#### Get a geneset object with the signatures of interest +In order to compute the BCS, we also need a `geneset` object containing the drug +(and optionally the functional) signatures we are interested in evaluating. To +create this object, the `GenerateGenesets` function needs to be called. +Beyondcell includes two drug signature collections that are ready to use: - * **Drug Perturbation Signatures collection (PSc):** Captures the transcriptional changes induced by a drug. - * **Drug Sensitivity Signatures collection (SSc):** Captures the drug sensitivity to a given drug. + * **Drug Perturbation Signatures collection (PSc):** Captures the + transcriptional changes induced by a drug. + * **Drug Sensitivity Signatures collection (SSc):** Captures the drug + sensitivity to a given drug. -A small collection of functional pathways will be included by default in your gene signatures object. These pathways are related to the regulation of the epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence and apoptosis. +A small collection of functional pathways will be included by default in your +`geneset` object. These pathways are related to the regulation of the +epithelial-mesenchymal transition (EMT), cell cycle, proliferation, senescence +and apoptosis. ```r -# Generate geneset object with one of the ready to use signature collections -gset <- GenerateGenesets(PSc) -# You can deactivate the functional pathways option if you are not interested in evaluating them +# Generate geneset object with one of the ready to use signature collections. +gs <- GenerateGenesets(PSc) +# You can deactivate the functional pathways option if you are not interested in evaluating them. nopath <- GenerateGenesets(PSc, include.pathways = FALSE) ``` -PSc and SSc signatures can also be filtered according to several values. Moreover, Beyondcell allows the user to input a GMT file containing the functional pathways/signatures of interest, or a numeric matrix (containing a ranking criteria such as the t-statistic or logFoldChange). For further information please check [GenerateGenesets](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/GenerateGenesets) tutorial. +PSc and SSc can also be filtered according to several values. Moreover, +Beyondcell allows the user to input a GMT file containing the functional +pathways/signatures of interest or a numeric matrix (containing a ranking +criteria such as the t-statistic or logFoldChange). + #### Compute the BCS ```r # Compute score for the PSc. This might take a few minutes depending on the size of your dataset. -bc <- bcScore(sc, gset, expr.thres = 0.1) +bc <- bcScore(sc, gs, expr.thres = 0.1) ``` -> TIP: we recommend to input cells with at least 1000-1500 genes detected. +> TIP: We recommend to input cells with at least 1000-1500 genes detected. -### 3. Compute Therapeutic clusters -The ouput of the `bcScore` computation is a `bc object`. The object contains the normalized and scaled **Beyondcell** scores and switch point, as well as information concerning the parameters used for the analysis. The bc object can be used as an input for a dimensionality reduction and clustering analysis, using the `bcUMAP`function. With this analysis, cells can be classified into distinct **therapeutic clusters**, that represent sets of cells sharing a common response to a particular drug exposition. The Uniform Manifold Approximation and Projection (UMAP) will allow the visualization of the identified clusters. +### 3. Compute the TCs +The ouput of the `bcScore` computation is a `beyondcell` object. This object +contains the normalized and scaled **Beyondcell Scores** and the +**Switch Points** (SPs), as well as information concerning the parameters used +for the analysis. The `beyondcell` object can be used as an input for a +dimensionality reduction and clustering analysis using the `bcUMAP`function. +With this analysis, cells can be classified into distinct +**Therapeutic Clusters**, that represent sets of cells sharing a common response +to a particular drug exposition. The Uniform Manifold Approximation and +Projection (UMAP) will allow the visualization of the identified clusters. -> TIP: If pc = NULL (default), the function will stop prior to compute the UMAP projection and the therapeutic clusters. This first step will print and elbow plot in your screen and will help you chose the number of components needed for the UMAP computation. +> TIP: If `pc = NULL` (default), the function will stop prior to compute the +UMAP projection and the TCs. This first step will print and elbow plot in your +screen and will help you chose the number of components needed for the UMAP +computation. ```r # Run the UMAP reduction. -bc <- bcUMAP(bc) +bc <- bcUMAP(bc, k.neighbors = 4, res = 0.2) # Run the bcUMAP function again, specifying the number of principal components you want to use. -bc <- bcUMAP(bc, pc = 5, res = 0.2) +bc <- bcUMAP(bc, pc = 10, k.neighbors = 4, res = 0.2) ``` **Check clustering**\ -It is important to check whether any unwanted source of variation is guiding the clustering analysis. The `bcClusters` function allows us to colour the UMAP based on the metadata variables that migth be influencing the clustering. We recommend checking these sources of variation among others: - - * Number of detected genes per cell - * Number of detected counts - * Cell cycle phase +It is important to check whether any unwanted source of variation is guiding the +clustering analysis. The `bcClusters` function allows us to colour the UMAP +based on the metadata variables that migth be influencing this clustering. We +recommend checking these sources of variation among others: + + * Number of detected genes per cell (`nFeature_RNA`) + * Number of detected counts (`nCount_RNA`) + * Cell cycle phase (`Phase`) * Batch ```r -# Visualize whether cells are clustered based on the number of genes detecter per each cell +# Visualize whether cells are clustered based on the number of genes detecter per each cell. bcClusters(bc, UMAP = "beyondcell", idents = "nFeature_RNA", factor.col = FALSE) ``` @@ -98,26 +135,33 @@ bcClusters(bc, UMAP = "beyondcell", idents = "Phase", factor.col = TRUE) ``` -> TIP: the cell cycle information must be present in bc@meta.data and can be obtained using Seurat's function `CellCycleScoring` +> TIP: The cell cycle information must be present in `bc@meta.data` and can be +obtained using Seurat's function `CellCycleScoring` **Regress out unwanted sources of variation**\ -The `bcRegressOut` function will allow us to correct existing sources of variation. Have in mind that the number of detected genes per cell will *always* have an inpact in the final score. +The `bcRegressOut` function will allow us to correct existing sources of +variation. Have in mind that the number of detected genes per cell will +*always* have an inpact in the final score. ```r -bc <- bcRegressOut(bc, vars.to.regress = c("nFeature_RNA")) +bc <- bcRegressOut(bc, vars.to.regress = "nFeature_RNA") ``` -> TIP: is the regression step taking too long? Check the amount of NAs per cell of your bc@normalized matrix. You migth need to refine the filtering of your single cell experiment based on the amount of detected features. +> TIP: Is the regression step taking too long? Check the amount of NAs per cell +of your `bc@normalized matrix`. You migth need to refine the filtering of your +single-cell experiment based on the amount of detected features. -**Recompute Therapeutic clusters**\ -Once corrected, you will need to recompute the dimensionality reduction and clustering, in order to find the *true* **therapeutic clusters** present in your sample. +**Recompute the TCs**\ +Once corrected, you will need to recompute the dimensionality reduction and +clustering, in order to find the *true* **Therapeutic Clusters** present in your +sample. ```r -# Recompute UMAP -bc <- bcUMAP(bc, pc = 5, res = 0.2, add.DSS = FALSE, k.neighbors = 20) -# Visualize UMAP -bcClusters(bc, UMAP = "beyondcell", idents = "nFeature_RNA", factor.col = FALSE, pt.size = 1) -# Visualize Therapeutic clusters -bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2", pt.size = 1) +# Recompute UMAP. +bc <- bcUMAP(bc, pc = 10, k.neighbors = 20, res = 0.2) +# Visualize UMAP. +bcClusters(bc, UMAP = "beyondcell", idents = "nFeature_RNA", factor.col = FALSE) +# Visualize therapeutic clusters. +bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2") ```

@@ -126,24 +170,49 @@ bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2", pt.size = 1)

### 4. Compute ranks -A summary table can be obtained using the `bcRanks`function. This table includes summary metrics such as: the Switch Point (SP), mean, median, sd, variance, min, max, proportion of NaN and residuals' mean of each signature. This table aims to help you in the prioritization of drug candidates. +A summary table can be obtained using the `bcRanks`function. This table includes +metrics such as the SP, mean, median, sd, variance, min, max, proportion of NaN +and residuals' mean of each signature. Also, a signature rank is computed taking +into account the SP and the mean. This table aims to help you in the +prioritization of drug candidates. ```r -# Obtain general statistics +# Obtain general statistics. bc <- bcRanks(bc) -# Obtain condition-based statistics +# Obtain condition-based statistics. bc <- bcRanks(bc, idents = "condition") -head(bc@ranks$condition) -# Obtain therapeutic cluster-based statistics -bc <- bcRanks(bc, idents = "bc_clusters_res.0.2") +# Obtain unextended therapeutic cluster-based statistics. +bc <- bcRanks(bc, idents = "bc_clusters_res.0.2", extended = FALSE) ``` -The summary tables are saved in the slot `ranks` as a data frame. You can access them as follows: +The summary tables are saved in the slot `@ranks` as a list of dataframes. You +can access them as follows: ```r -# Explore the statistics table +# Explore the statistics table. head(bc@ranks$general) +head(bc@ranks$condition) +head(bc@ranks$bc_clusters_res.0.2) ``` -> TIP: We recommend prioritizing drugs taking into account both the switch point and residuals. - -> Concerning the SP: It determines the value where negative scores switch to positive scores and vice versa. As an example, an all-sensitive dataset will have SP equal to zero, as there won’t be any negative scores for that specific drug in the whole population and a dataset insensitive to a certain drug, will be expected to have a SP close to 1. Intermediate SP, as a contrast, will reflect that the dataset contains both susceptible and non-susceptible cells. +> TIP: The ranking returned by `bcRanks` orders the drug signatures from most to +least sensitive. This kind of rank might be useful if you want to find a drug to +treat all your cells simultaneously. However, as Beyondcell allows to inspect +the intratumoural heterogeneity (ITH) in a single-cell RNA-seq experiment, you +may be interested in the specific drugs that are most/least effective against a +particular cluster. In the later scenario, we recommend prioritizing drugs +taking into account both the SP and residuals' mean. In order to facilitate the +computation and visualization of this kind of rank, we have included the +function `bc4Squares` in Beyondcell (see [visualization](https://gitlab.com/bu_cnio/Beyondcell/-/tree/master/tutorial/visualization) +for more information). + +> Concerning the SP: It is the scaled value where normalized negative scores +switch to positive scores and vice versa. As an example, an all-sensitive +dataset will have *SP = 0*, as there won’t be any negative normalized scores for +that specific drug in the whole population. On the other hand, a dataset +insensitive to a certain drug will be expected to have a *SP $\approx$ 1*. +Intermediate SPs, as a contrast, will reflect that the dataset contains both +susceptible and non-susceptible cells. + +> **Note that the SP is not equivalent to the proportion of insensitive cells** +**to a given drug, although there's a positive correlation between these two** +**magnitudes. ** diff --git a/tutorial/visualization/.img/bc4squares_t0.png b/tutorial/visualization/.img/bc4squares_t0.png index 424c05f..79d62ce 100644 Binary files a/tutorial/visualization/.img/bc4squares_t0.png and b/tutorial/visualization/.img/bc4squares_t0.png differ diff --git a/tutorial/visualization/.img/bc_clusters.png b/tutorial/visualization/.img/bc_clusters.png index 6965de2..c7f926d 100644 Binary files a/tutorial/visualization/.img/bc_clusters.png and b/tutorial/visualization/.img/bc_clusters.png differ diff --git a/tutorial/visualization/.img/bc_condition.png b/tutorial/visualization/.img/bc_condition.png index 83d341c..05cbec3 100644 Binary files a/tutorial/visualization/.img/bc_condition.png and b/tutorial/visualization/.img/bc_condition.png differ diff --git a/tutorial/visualization/.img/bortezomib_all_plots.png b/tutorial/visualization/.img/bortezomib_all_plots.png new file mode 100644 index 0000000..52b0ec1 Binary files /dev/null and b/tutorial/visualization/.img/bortezomib_all_plots.png differ diff --git a/tutorial/visualization/.img/bortezomib_histogram.png b/tutorial/visualization/.img/bortezomib_histogram.png index c7400a0..22012f8 100644 Binary files a/tutorial/visualization/.img/bortezomib_histogram.png and b/tutorial/visualization/.img/bortezomib_histogram.png differ diff --git a/tutorial/visualization/.img/bortezomib_histogram_gral.png b/tutorial/visualization/.img/bortezomib_histogram_gral.png index a3ea3cc..f2f984c 100644 Binary files a/tutorial/visualization/.img/bortezomib_histogram_gral.png and b/tutorial/visualization/.img/bortezomib_histogram_gral.png differ diff --git a/tutorial/visualization/.img/bortezomib_signature_18868.png b/tutorial/visualization/.img/bortezomib_signature_18868.png index b2eebd6..887699f 100644 Binary files a/tutorial/visualization/.img/bortezomib_signature_18868.png and b/tutorial/visualization/.img/bortezomib_signature_18868.png differ diff --git a/tutorial/visualization/.img/cellcycle.png b/tutorial/visualization/.img/cellcycle.png index 5e7f753..8126d1a 100644 Binary files a/tutorial/visualization/.img/cellcycle.png and b/tutorial/visualization/.img/cellcycle.png differ diff --git a/tutorial/visualization/.img/seurat_clusters.png b/tutorial/visualization/.img/seurat_clusters.png index 6eb13c0..d9470bf 100644 Binary files a/tutorial/visualization/.img/seurat_clusters.png and b/tutorial/visualization/.img/seurat_clusters.png differ diff --git a/tutorial/visualization/README.md b/tutorial/visualization/README.md index 7ca4814..7b03456 100644 --- a/tutorial/visualization/README.md +++ b/tutorial/visualization/README.md @@ -1,75 +1,121 @@ # Visualization of the results -**beyondcell** provides several visualization functions to help you better understand the results. +**Beyondcell** provides several visualization functions to help you better understand the results. -|**Function** | **Description** | - |--------------|--------------| - |**`bcClusters()`** |Returns a ggplot object with the UMAP reduction (either beyondcell's or Seurat's) colored by the specified metadata column.| - |**`bcSignatures()`** |Returns a list of patchwork objects containing ggplot2s with the desired UMAP reduction (either beyondcell's or Seurat's) colored by bcscores or gene expression values.| - |**`bcHistogram()`** |Drawns a histogram plot of bcscores for each signature of interest. The plot can be a single histogram (if idents = NULL) or a histogram for each level found in idents.| - |**`bcCellCycle()`** |Drawns, for each signature of interest, a violindot plot of the bcscores grouped by the cell cycle phase (G1, G2M or S). Note that this information must be present in `bc@meta.data` and can be obtained using Seurat's function `CellCycleScoring`.| - |**`bc4Squares()`** |Returns a 4 squares plot of the drug signatures present in a beyondcell object. | +| **Function** | **Description** | +| -------------- | -------------- | +| **`bcClusters()`** | Returns a `ggplot2` object with a UMAP reduction (either beyondcell's or Seurat's) coloured by the specified metadata column. | +| **`bcSignatures()`**|Returns a list of `patchwork` or `ggplot2` objects with the desired UMAP reduction (either beyondcell's or Seurat's) coloured by Beyondcell Scores (BCS) or gene expression values. | +| **`bcHistogram()`** | Drawns a histogram of BCS for each signature of interest. The plot can be a single histogram (if `idents = NULL`) or a histogram for each level found in `idents`. | +| **`bcCellCycle()`** | Drawns, for each signature of interest, a `geom_violindot` plot of the BCS grouped by cell cycle phase (G1, G2M or S). Note that this information must be present in `bc@meta.data` and can be obtained using Seurat's function `CellCycleScoring`.| +| **`bc4Squares()`** | Drawns a 4 squares plot of the drug signatures present in a `beyondcell` object. | ## Data -In this tutorial, we are analyzing a population of MCF7-AA cells exposed to 500nM of bortezomib and collected at different time points: t0 (before treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all four conditions using the Seurat pipeline. After calculating the beyondcell scores (BCS) for each cell and regressing unwanted sources of variation, a clustering analysis was applied. +In this tutorial, we are analysing a population of MCF7-AA cells exposed to +500nM of bortezomib and collected at different time points: t0 (before +treatment), t12, t48 and t96 (72h treatment followed by drug wash and 24h of +recovery) obtained from *Ben-David U, et al., Nature, 2018*. We integrated all +four conditions using the Seurat pipeline. After calculating the BCS for each +cell and regressing unwanted sources of variation, a clustering analysis was +applied. ## Metadata visualization -Once the BCS and the UMAP reductions are computed, we can check out how the different available metadata variables behave. General quality control variables, such as the number of features per cell or the cell cycle phase can be analyzed with the `bcClusters` function (previously shown). We can visualize the **therapeutic clusters**: +Once the BCS and the UMAP reductions are computed, we can check out how the +different metadata variables behave. General quality control variables such as +the number of features per cell or the cell cycle phase can be analyzed with +`bcClusters` function (previously shown). Moreover, we can visualize the +**Therapeutic Clusters** (TCs) using this same function: ```r -# Beyondcell UMAP -bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2", pt.size = 1) +# Beyondcell UMAP. +bcClusters(bc, UMAP = "beyondcell", idents = "bc_clusters_res.0.2") ``` -Or condition-based metadata: +In addition, we can visualize condition-based metadata: ```r -bcClusters(bc, UMAP = "beyondcell", idents = "condition", pt.size = 1) +# UMAP with bigger point size. +bcClusters(bc, UMAP = "beyondcell", idents = "condition", pt.size = 1.5) ``` -Also, when available, the Seurat reduction can be plotted. This will allow us to detect the location of the therapeutic clusters in the *original* expression UMAP. +Also, when available, the Seurat reduction can be plotted. This will allow us to +detect the location of the TCs in the *original* expression UMAP. ```r -# Expression UMAP -bcClusters(bc, UMAP = "Seurat", idents = "seurat_clusters", pt.size = 1) +# Expression UMAP. +bcClusters(bc, UMAP = "Seurat", idents = "seurat_clusters") ``` - ## Visualize drug signatures and markers -In this example, we have analyzed the Ben-David *et* al. dataset using the drug perturbation signature collection or **PSc**. As these cells have been previously treated with bortezomib, a Protease inhibitor drug, we expect to identify a differential susceptibility pattern between the available conditions. +In this example we have analyzed the *Ben-David et al.* dataset using the drug +Perturbation Signatures collection (**PSc**). As these cells had been previously +treated with bortezomib, a proteasome inhibitor, we expect to identify a +differential susceptibility pattern between the different time points. -First, we look for Bortezomib's presence in the PSc collection. We can do this using the `FindDrugs` function implemented in the package. +First, we look for bortezomib's information in the `beyondcell` object computed +using PSc. We can do this using `FindDrugs`. ```r FindDrugs(bc, "BORTEZOMIB") - ``` -|Original_Name | bc_Name | Preferred_Name | Name | sig_id | Preferred_and_sig | MoA | -|---------------|---------|-----------------|------|--------|-------------------|-----| -|BORTEZOMIB |sig_1866 | BORTEZOMIB | BORTEZOMIB | sig_1866 | BORTEZOMIB (sig_1866) | PROTEASOME INHIBITOR | -|BORTEZOMIB | sig_18868| BORTEZOMIB | BORTEZOMIB | sig_18868 | BORTEZOMIB (sig_18868) | PROTEASOME INHIBITOR | -|BORTEZOMIB | sig_20842 | BORTEZOMIB | BORTEZOMIB | sig_20842 | BORTEZOMIB (sig_20842) | PROTEASOME INHIBITOR | -Then, we run the `bcSignatures` function using the *sig_id* of the drug. If you input more than one sig_id, the result will be a patchwork object containing all the signature plots. +|original.names|bc.names|preferred.drug.names|drugs|IDs|preferred.and.sigs|MoAs| +|---------------|---------|-------------|------|--------|-----------|---------------| +|BORTEZOMIB|sig_1866|BORTEZOMIB|BORTEZOMIB|sig_1866|BORTEZOMIB (sig_1866)|NFKB PATHWAY INHIBITOR, PROTEASOME INHIBITOR| +|BORTEZOMIB|sig_18868|BORTEZOMIB|BORTEZOMIB|sig_18868|BORTEZOMIB (sig_18868)|NFKB PATHWAY INHIBITOR, PROTEASOME INHIBITOR| +|BORTEZOMIB|sig_20842|BORTEZOMIB|BORTEZOMIB|sig_20842|BORTEZOMIB (sig_20842)|NFKB PATHWAY INHIBITOR, PROTEASOME INHIBITOR| + +Then, we run `bcSignatures` using the `IDs` of the drug. ```r -bcSignatures(bc, UMAP = "beyondcell", signatures = list(values = "sig_18868"), pt.size = 1) +bcSignatures(bc, UMAP = "beyondcell", signatures = list(values = "sig_18868"), pt.size = 1.5) ``` -We can also take a look at the behaviour of specific gene expression markers, such a PSMA5, a gene targeted by bortezomib. +If you input more than one `ID`, the result will be a `patchwork` object +containing all the signature plots. + +```r +library("patchwork") +# Get all IDs correspondig to bortezomib. +bortezomib_IDs <- FindDrugs(bc, "bortezomib")$IDs +# Patchwork object with all bortezomib plots. +bortezomib <- bcSignatures(bc, UMAP = "beyondcell", + signatures = list(values = bortezomib_IDs), pt.size = 1.5) +# Plot all three plots in one image. +wrap_plots(bortezomib, ncol = 3) +``` + + +We can also take a look at the behaviour of specific gene expression markers, +such a *PSMA5*, a gene targeted by bortezomib. ```r -bcSignatures(bc, UMAP = "beyondcell", genes = list(values = "PSMA5"), pt.size = 1) +bcSignatures(bc, UMAP = "beyondcell", genes = list(values = "PSMA5")) ``` ## Ranking visualization -We can summarize the ranking results using the `bc4Squares` function. This function summarizes the top hits obtained for each of the specified condition levels. The residuals are represented in the x axis, the switch point is represented in the y axis. The top-left and bottom-right corners contain the drugs to which all selected cells are least/most sensitive, respectively. The centre quadrants show the drugs with an heterogeneous response. In this case, we can clearly see how the tool predicts an heterogeneous response to bortezomib. +We can compute a drug rank and summarize the results using the `bc4Squares` +function. A 4 squares plot consists in a scatter plot of the residuals' means +(x axis) vs the Switch Points (y axis) of a specified cluster (either a TC or a +group defined by experimental condition or phenotype). Four quadrants are +highlighted: the top-left and bottom-right corners contain the drugs to which +all selected cells are least/most sensistive, respectively. The centre quadrants +show the drugs to which these cells are differentially insensitive or sensitive +when compared to the other clusters. + +This function displays the top hits obtained for each of the specified condition +levels. Note that residuals' means are different for each level while +Swicth Points (SPs) are signature-specific. So, x axis will vary and y axis will +remain constant accross all plots. + +In this case, we can clearly see how the tool predicts an heterogeneous response +of bortezomib-naive cells. ```r bc4Squares(bc, idents = "condition", lvl = "t0", top = 5) @@ -78,22 +124,25 @@ bc4Squares(bc, idents = "condition", lvl = "t0", top = 5) ## Visualize BCS distribution -The `bcHistogram` function can help us analyze the differences in the distribution of the BCS for specific signatures. +`bcHistogram` can help us analyse the differences in the distribution of the BCS +for specific signatures. ```r -# General view +# General view. bcHistogram(bc, signatures = "sig_18868", idents = NULL) ``` ```r -# Condition-based histograms +# Condition-based histograms. bcHistogram(bc, signatures = "sig_18868", idents = "condition") ``` -## Visualize Cell cycle phases -For each drug of interest, we can also take a look at the differences of BCS depending on the cell cycle phase of the cells. This aims to help the user understand the effect the cell cycle is having on the predicted drug response. +## Visualize cell cycle phases +For each drug of interest, we can also take a look at the differences of BCS +depending on the cell cycle phase. This aims to help the user understand the +effect that the cell cycle is having on the predicted drug response. ```r bcCellCycle(bc, signatures = "sig_18868") @@ -101,5 +150,6 @@ bcCellCycle(bc, signatures = "sig_18868") ## Support -Additional information can be found in the package's R documentation. If you have any question regarding the use of **Beyoncell**, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues). +Additional information can be found in the package's documentation. If you have +any question regarding the use of **Beyoncell**, feel free to submit an [issue](https://gitlab.com/bu_cnio/Beyondcell/issues).