UCSCXenaTools is an R package downloading and exploring data from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
Install stable release from CRAN with:
install.packages("UCSCXenaTools")
You can also install devel version of UCSCXenaTools from github with:
# install.packages("remotes")
remotes::install_github("ShixiangWang/UCSCXenaTools", build_vignettes = TRUE)
All datasets are available at https://xenabrowser.net/datapages/.
Currently, UCSCXenaTools supports 10 data hubs of UCSC Xena.
- UCSC Public Hub: https://ucscpublic.xenahubs.net
- TCGA Hub: https://tcga.xenahubs.net
- GDC Xena Hub: https://gdc.xenahubs.net
- ICGC Xena Hub: https://icgc.xenahubs.net
- Pan-Cancer Atlas Hub: https://pancanatlas.xenahubs.net
- GA4GH (TOIL) Hub: https://toil.xenahubs.net
- Treehouse Hub: https://xena.treehouse.gi.ucsc.edu
- PCAWG Hub: https://pcawg.xenahubs.net
- ATAC-seq Hub: https://atacseq.xenahubs.net
- Singel Cell Xena hub: https://singlecell.xenahubs.net
If any url of data hub is changed or a new data hub is online, please remind me by emailing to [email protected] or opening an issue on GitHub.
Download UCSC Xena datasets and load them into R by UCSCXenaTools is
a workflow with generate
, filter
, query
, download
and prepare
5 steps, which are implemented as XenaGenerate
, XenaFilter
,
XenaQuery
, XenaDownload
and XenaPrepare
functions, respectively.
They are very clear and easy to use and combine with other packages like
dplyr
.
To show the basic usage of UCSCXenaTools, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.
UCSCXenaTools uses a data.frame
object (built in package)
XenaData
to generate an instance of XenaHub
class, which records
information of all datasets of UCSC Xena Data Hubs.
You can load XenaData
after loading UCSCXenaTools
into R.
library(UCSCXenaTools)
#> =========================================================================
#> UCSCXenaTools version 1.2.2.9000
#> Github page: https://github.com/ShixiangWang/UCSCXenaTools
#> Documentation: https://shixiangwang.github.io/UCSCXenaTools/
#>
#> If you use it in published research, please cite:
#> Wang, Shixiang, et al. "The predictive power of tumor mutational burden
#> in lung cancer immunotherapy response is influenced by patients' sex."
#> International journal of cancer (2019).
#> =========================================================================
#>
data(XenaData)
head(XenaData)
#> # A tibble: 6 x 17
#> XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 https://… publicHub Acute lymp… mullighan20… 30 copy number
#> 2 https://… publicHub Acute lymp… mullighan20… 159 phenotype
#> 3 https://… publicHub Acute lymp… mullighan20… 129 copy number
#> 4 https://… publicHub Breast Can… Caldas2007/… 242 phenotype
#> 5 https://… publicHub Breast Can… Caldas2007/… 220 copy number
#> 6 https://… publicHub Breast Can… Caldas2007/… 135 gene expre…
#> # … with 11 more variables: Label <chr>, Type <chr>,
#> # AnatomicalOrigin <chr>, SampleType <chr>, Tags <chr>, ProbeMap <chr>,
#> # LongTitle <chr>, Citation <chr>, Version <chr>, Unit <chr>,
#> # Platform <chr>
Select datasets.
# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
df_todo
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (3 total):
#> TCGA Lung Adenocarcinoma (LUAD)
#> TCGA Lung Cancer (LUNG)
#> TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (3 total):
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#> TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
Query and download.
XenaQuery(df_todo) %>%
XenaDownload() -> xe_download
#> This will check url status, please be patient.
#> All downloaded files will under directory /var/folders/mx/rfkl27z90c96wbmn3_kjk8c80000gn/T//RtmpauJGeg.
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena.
#> Creating directories for datasets...
#> Downloading TCGA.LUAD.sampleMap/LUAD_clinicalMatrix.gz
#> Downloading TCGA.LUNG.sampleMap/LUNG_clinicalMatrix.gz
#> Downloading TCGA.LUSC.sampleMap/LUSC_clinicalMatrix.gz
Prepare data into R for analysis.
cli = XenaPrepare(xe_download)
class(cli)
#> [1] "list"
names(cli)
#> [1] "LUAD_clinicalMatrix.gz" "LUNG_clinicalMatrix.gz"
#> [3] "LUSC_clinicalMatrix.gz"
Create two XenaHub objects:
to_browse
- a XenaHub object containing a cohort and a dataset.to_browse2
- a XenaHub object containing 2 cohorts and 2 datasets.
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD") -> to_browse
to_browse
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (1 total):
#> TCGA Lung Adenocarcinoma (LUAD)
#> datasets() (1 total):
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2
to_browse2
#> class: XenaHub
#> hosts():
#> https://tcga.xenahubs.net
#> cohorts() (2 total):
#> TCGA Lung Adenocarcinoma (LUAD)
#> TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (2 total):
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
XenaBrowse()
function can be used to browse dataset/cohort links using
your default web browser. At default, this function limits one
dataset/cohort for preventing user to open too many links at once.
# This will open you web browser
XenaBrowse(to_browse)
XenaBrowse(to_browse, type = "cohort")
# This will throw error
XenaBrowse(to_browse2)
#> Error in XenaBrowse(to_browse2): This function limite 1 dataset to browse.
#> Set multiple to TRUE if you want to browse multiple links.
XenaBrowse(to_browse2, type = "cohort")
#> Error in XenaBrowse(to_browse2, type = "cohort"): This function limite 1 cohort to browse.
#> Set multiple to TRUE if you want to browse multiple links.
When you make sure you want to open multiple links, you can set
multiple
option to TRUE
.
XenaBrowse(to_browse2, multiple = TRUE)
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)
More features and usages please read online documentations.
API functions can be used to query specified data (e.g. expression of a few genes for a few samples) or information instead of downloading the entire dataset.
If you want to use APIs provided by UCSCXenaTools to access Xena Hubs, please read this vignette.
Wang, Shixiang, et al. "The predictive power of tumor mutational burden
in lung cancer immunotherapy response is influenced by patients' sex."
International journal of cancer (2019).
This package is based on XenaR, thanks Martin Morgan for his work.
GPL-3
Please note, code from XenaR package under Apache 2.0 license.