Welcome to the GEO Accession repository! This README provides an overview of how to access gene expression data from the Gene Expression Omnibus (GEO) using the GEOquery package in R.
The GEOquery package is a valuable tool for extracting high-throughput experimental data from the National Center for Biotechnology Information (NCBI) GEO. It allows you to access a wide range of data, including samples, series, and platforms, which are submitted to the GEO Datasets database by data owners.
-
To access the metadata associated with GSM (GSMxxx) sample IDs, follow our tutorial on Accessing Metadata with GSM Sample IDs.
-
To access metadata by GSE (GSExxx) IDs, refer to the documentation within the repository.
The Gene Expression Omnibus (GEO) is a public repository GEO Link that hosts data submitted by various array-based applications and high-throughput sequencing experiments. It includes raw and processed data, covering a wide range of experimental types, such as microarrays, RNA sequencing, ChIP experiments, and DNA sequencing. GEO classifies data into four main categories: platforms, samples, series, and datasets.
-
Platforms (GPLxxx): These identify the experimental platforms and the elements used in experiments.
-
Samples (GSMxxx): Each sample record provides information about the sample conditions used in experiments.
-
Series (GSExxx): Series records group samples and provide details about the experimental design, including the array platform used.
-
Datasets: Datasets contain curated sets of sample data for data display and analysis. They often share a common probe design.
The GEOquery package is an R tool that simplifies interactions with the NCBI GEO database, making it easy to access gene expression data. You can download data as a list of experiments, including platform information.
- To check the list of experiments associated with the data, use the following code:
gsm_list <- getGEO(x) total_platform <- length(gsm_list)