NCBI » GEO » Info » Frequently Asked QuestionsLogin

Frequently Asked Questions

Submission

Query and search

Submission

What is GEO?

The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO. For more information about various aspects of GEO, please see our documentation listings and publications.

Why should I submit my data to GEO?

There are several good reasons for submitting your data to us. The most likely reason is that the funder of your research or the journal in which you are publishing your research requires deposit of microarray or sequence data to a MIAME- or MINSEQE-compliant public repository like GEO. In addition to satisfying funder and journal requirements for publication, there are other significant benefits to depositing data with GEO. Your data receive long-term archiving at a centralized repository, and are integrated with other NCBI resources which afford increased usability and visibility. You may also include links back to your own project websites within your submission, again increasing visibility of your research. Journal publication is not a requirement for data submission to GEO.

How do I submit my data to GEO?

Submitters should first log in through their NCBI account. If you don’t have an NCBI account, you can create one here. Submitters are then asked to complete a My GEO Profile form that provides the contact information to be used by GEO curators to communicate about the submission and to be displayed on the GEO records. All submitters are asked to supply raw data, processed data, and descriptive information about the samples, protocols and overall study in a supported deposit format. Follow the relevant link for your data type on the Submitting data page to find submission instructions. Submitters of high throughput sequencing data can watch a tutorial video on "How to submit to GEO". We endeavor to make data deposit procedures as straightforward as possible and will provide as much assistance as you require to get your data submitted to GEO. If you have problems or questions about the submission procedures, e-mail us and one of our curators will quickly get back to you.

When do I submit my data to GEO?

Many journals require accession numbers for microarray or sequence data before acceptance of a paper for publication. Also, reviewers and editors may need access to your data during the review process. Thus, data should be deposited in GEO before a manuscript describing the data is sent to a journal for review. GEO processing times is approximately 5 business days after completion of submission, but may take longer around federal holidays, so it is important to make your submission well in advance of when you require the accession numbers for your manuscript. Your records may remain private until your manuscript (or preprint) is publicly available. Once your submissions have been approved by GEO staff, you can cite the GEO accession number(s) in your manuscript and you can generate a reviewer access token by which editors and reviewers can access your private GEO records.

When will my data receive GEO accession numbers?

Processing time normally takes approximately 5 business days after completion of submission, but may take longer around federal holidays. After you complete the submission, your data are put into a queue to await review by a curator. Please understand that we receive hundreds of study submissions per week, and processing times can vary depending on submission volume. Thus, it is important to make your submission well in advance of when you require the accession numbers for your manuscript. If format or content problems are identified with your submission, a curator will contact you by e-mail explaining how to address the issue. Please address the issues raised by curators; failure to do so may result in processing delays or removal of the records. Once your records pass review, the curator will send you an e-mail confirming your GEO accession numbers and their release dates. If you do not receive an e-mail from us within 5 business days of your submission, please first check your spam or junk e-mail folders because some systems recognize GEO e-mail correspondence as spam, then e-mail us to inquire about your submission. Do not quote GEO accession numbers in manuscripts until you have received an approval e-mail notice from a GEO curator.

What kinds of data will GEO accept?

GEO was designed around the common features of most of the high-throughput and parallel molecular abundance-measuring technologies in use today. These include data generated from microarray and high-throughput sequence technologies, for example:

  • Gene expression profiling by microarray or next-generation sequencing (see examples)
  • Non-coding RNA profiling by microarray or next-generation sequencing (see examples)
  • Chromatin immunoprecipitation (ChIP) profiling by microarray or next-generation sequencing (see examples)
  • Genome methylation profiling by microarray or next-generation sequencing (see examples)
  • High-throughput RT-PCR (see examples)
  • Genome variation profiling by array (arrayCGH) (see examples)
  • SNP arrays (see examples) (see human subject FAQ)
  • Serial Analysis of Gene Expression (SAGE) (see examples)
  • Protein arrays (see examples)

The GEO database has a flexible and open design that is responsive to developing trends. If you have questions about whether GEO can accept your data type, please do not hesitate to contact us.

Does GEO store raw data?

Yes. GEO requires raw data, processed data and metadata. Raw data facilitates the unambiguous interpretation of the data and potential verification of conclusions. For microarray data, raw data may be supplied either within the Sample record data tables or as external supplementary data files, e.g., Affymetrix CEL. For high-throughput sequencing, GEO brokers the complete set of raw data files, e.g., FASTQ, to the SRA database on your behalf.

Can I submit an extracted or summary subset of data?

No. Complete, unfiltered data sets should be supplied. This includes full hybridization tables, genome-wide sequence results, fully annotated samples, and meaningful, trackable sequence identifier information in Platform records and processed sequence data files. The principal reason we maintain this archive and the rationale behind many journals' requirement for data deposit into GEO is so that the community can access and comprehensively re-examine data that form the basis of scientific reporting. Therefore, we do not accept partial or heavily filtered data sets. We do understand the various reasons and difficulties some researchers have with sharing data. However, the demand from users and journal editors, together with our need to maintain a useful and transparent database has led to our policy of only accepting complete data sets.

How do I create a GEO account?

You will need both a NCBI account and an accompanying My GEO Profile to submit data. First, log in through your NCBI account. If you don’t have a NCBI account, you can create one here. Submitters are then asked to complete a My GEO Profile form that provides the contact information to be used by GEO curators to communicate about the submission and to be displayed on the GEO records. The NCBI account can be used to submit additional data in the future without re-entering contact information, as well as to authenticate the submitter when updating or editing an existing GEO record.

How can I make edits to my contact information?

After logging in to your NCBI account, follow the My GEO Profile link on the home page. Edits to contact information will be applied immediately to all existing records submitted under that account. If you need the contact information to remain unedited on existing records, but different contact details to appear on new records, it is necessary to open a separate account and submit new data under that account.

I run a facility and need to submit data for multiple investigators. What account should I use?

You have three choices when submitting data on behalf of others:

  1. Create a separate GEO Profile for each investigator for whom you will be submitting data. Each Profile will need a separate NCBI account. When you create each GEO Profile, you can add both the investigator's e-mail address and your own. In this case, both addresses will receive e-mail correspondence from GEO, but only the e-mail address of the investigator will be displayed on the GEO records.
  2. Submit the data under your own GEO Profile. When the submission is approved, you can ask us to transfer the submission to the investigator's GEO Profile (you must first ask them to create their own GEO Profile and to provide you with their GEO username). In this case, you will receive e-mail correspondence from GEO up until the time the data are moved to the investigator’s Profile.
  3. Maintain one 'Facility' account and include the investigator names as 'Contributors' on their records. For example, see this record submitted by the Stanford Microarray Database on behalf of one of their investigators. In this case, only the facility will receive e-mail correspondence from GEO.

Can I keep my data private while my manuscript is being prepared or under review?

Yes. GEO records may remain private until a manuscript (including preprint) quoting the GEO accession number is made available to the public (journal publication is not a requirement for data submission to GEO). During the submission process, you are prompted to specify a release date for your records. The release date is the date on which your data are made public and will be available for anyone to access, download and re-use. Therefore, it is very important that all your collaborators agree on the release date. Although the maximum allowable limit is four years, this date may be brought forward or pushed back at any time; see Change the release date of your private records for instructions on how to change the release date. This feature allows a submitter to deposit data and receive a GEO accession number to quote in a manuscript before the data become public. We will send you an e-mail reminder 10 days before the scheduled release date, inviting you to postpone the release date as necessary. It is important to inform us as soon as your manuscript or preprint is published so that we can release your records and link them with PubMed. Submitters also have the opportunity to create a reviewer token that allows collaborators or reviewers confidential, read-only access to private data before manuscript publication.

Can I keep my data private after my manuscript is published?

No. If GEO accession numbers are quoted in a manuscript, including publicly posted unpublished preprints through servers like bioRxiv, the records must be released so that the data are accessible to the scientific community. Even if the preprint is intended to be temporary, if the accession is cited, the data must be released. If GEO accession numbers are found to be quoted in any publication or preprint before the scheduled release date, GEO staff are obligated to release those records, even if a second manuscript describing the same data is pending.

How can I allow reviewers access to my private records?

After your records have been approved, use the Reviewer access link near the top of your Series (GSExxx) record to create a reviewer token which provides anonymous, read-only access to your private submissions. The token can be sent to the journal editor who will circulate it to reviewers requiring access to your private data. This method provides access to all private data except sequence files submitted to SRA. SRA does not currently support access to private sequence data, but if necessary, you can e-mail SRA to request a reviewer metadata link.

How can I make corrections to data that I already submitted?

You may perform updates and edits at any time to any of your submissions. Please refer to the Updating your GEO records page for instructions. Be aware that updates can take several business days to complete, and may take longer around federal holidays, so it is important to make your update well in advance of when you require it to be implemented. Also, for sequence data, note that the corresponding raw data records in SRA follow the NLM GenBank and SRA data Processing procedures for status changes.

How can I delete my records?

Only GEO staff can remove records from the database; it is necessary to e-mail us to request deletion of specific accession numbers. Please keep in mind that updating records is preferable to deleting records, if appropriate. If the accession numbers in question have been published in a manuscript, including a preprint, we cannot delete the records. Rather, a comment will be added to the record indicating the reason the submitter requested withdrawal of the data, and the record content will be adjusted/deleted accordingly. Also, for sequence data, note that the corresponding raw data records in SRA follow the NLM GenBank and SRA data Processing procedures for status changes.

I'm a reviewer, how do I access and evaluate pre-publication data?

Reviewers should expect to receive a reviewer token with the manuscript. This token allows anonymous, read-only access to the private GEO records cited in the manuscript. Detailed information is provided in these Guidelines for reviewers and journal editors.

Does GEO support MIAME and MINSEQE?

Yes. GEO encourages submitters to supply MIAME- and MINSEQE-compliant data. GEO submission procedures are designed to closely follow the MIAME and MINSEQE checklists; if you provide all requested information, your submission will be compliant. Note that MIAME and MINSEQE compliance is determined by the content provided, not by the submission format or route.

Human Subject Guidelines: Can I submit data derived from human subjects?

If your data need controlled access, deposit your data with NCBI's dbGaP database.

GEO is an unrestricted-access database. Please read the following guidelines for Human Genomic Data Submitted to Unrestricted-Access Repositories.

For NIH-funded studies: If you plan to submit large-scale human genomic data, as defined by the NIH Genomic Data Sharing (GDS) Policy, to be maintained in an unrestricted-access NCBI database, NIH expects you to 1) have an Institutional Certification to assure that the data submission and expectations defined in the NIH GDS Policy have been met (this Certification does not need to be submitted to GEO), 2) register the study in NCBI BioProject regardless of where the data will ultimately reside (e.g., GenBank, SRA, GEO (note: if submitting to GEO, we will register a BioProject on your behalf)). If you have any questions about whether your research is subject to the NIH GDS Policy, please contact the relevant NIH Program Official and/or the Genomic Program Administrator. If you plan to submit genomic data from human specimens that would not be considered large-scale, it is your responsibility to ensure that the submitted information does not compromise participant privacy, and is in accord with the original consent, in addition to all applicable laws, regulations, and institutional policies. GEO is not able to help interpret your consent forms; instead, you should consult with your institutional review board (IRB) on that.

For non-NIH-funded studies: If your data are not NIH-funded, you are not required to comply with GDS policy but you must have the appropriate consent/permission to submit the data to a public database like GEO. GEO is not able to help interpret your consent forms; instead, you should consult with your institutional review board (IRB) on that. It is your responsibility to ensure that the submitted information does not compromise participant privacy and is in accord with the original consent in addition to all applicable laws, regulations, and institutional policies. If you do not have consent to make the data fully public in a database like GEO, you can apply to the NIH Office of Science Policy to find an NIH Institute that will sponsor your study in NCBI's dbGaP database. dbGaP has controlled-access mechanisms and is an appropriate resource for hosting sensitive patient data. The sponsor would create a Data Access Request and Use Certification and define use restrictions for use in approving data access requests.

Query and search

Who can use GEO data?

Anybody can access and download public GEO data. There are no login requirements. For more information, please read these copyright and data disclaimers.

What kinds of retrievals are possible in GEO?

There are several ways to retrieve GEO data, please see the Query and analysis overview and the Download GEO data instructions for details. These methods range from performing simple or sophisticated queries of the GEO DataSets and GEO Profiles databases, entering a valid GEO accession number in the Accession Display bar, browsing the list of current GEO repository contents, or downloading data from the GEO FTP site.

How can I query and analyze GEO data?

Once you have found a curated DataSet or Series of interest, there are several features available that help identify interesting gene expression profiles within that study. Some RNA-seq studies and most microarray studies can be analyzed with GEO2R. GEO2R is a web application that can be used to compare 2 or more groups of Samples, and identify and plot differentially expressed genes. All records analyzable with GEO2R can be retrieved by searching with "geo2r"[Filter]. Alternatively, there are some curated DataSets that include a find genes feature, cluster heatmaps and a t-test sample comparison tool. Once you have identified gene expression profile charts of interest, there are several types of neighbors links on the Profile records that help identify related genes of interest. Alternatively, if you prefer to perform your own analysis using your favorite software package, the value matrix tables within the DataSet full SOFT files available from the DataSet records, or the Series Matrix File or supplementary files linked at the foot of Series records, may prove suitable. Finally, thousands of GEO data tracks have been uploaded for viewing on NCBI’s Genome Data Viewer. All records with tracks can be retrieved by searching with track[filter]; the 'See on Genome Data Viewer' button on those records links to corresponding tracks on NCBI’s Genome Data Viewer (see example tracks).

Can GEO data be accessed programmatically?

Yes. Users can take advantage of NCBI's Entrez programming utilities to access data stored in GEO DataSets and GEO Profiles. The Construct a URL feature is a popular mechanism to download complete metadata records in bulk. Additionally, BioConductor users may be interested in the GEOquery package which parses GEO SOFT files for integration with BioConductor 'R' analysis resources; see publication.

Can I get notified when new data is available?

Yes. This can be accomplished using an NCBI account. Once you are logged in to NCBI, construct a search for data relevant to your interests in GEO DataSets. For example, if you are only interested in studies performed on Platform GPL96, search with GPL96[GEO Accession]; to see any apoptosis studies, search with apoptosis; or if you want to see all new studies, search with all[filter]. Next to the search box, you should see a Save Search option. You will be presented with the option to receive e-mail alerts when new data matching your search criteria have been added to the database. This database is updated daily.

Can I cite data I find in GEO as evidence to support my own research?

Yes. Users often cite data they find in GEO to support their own studies; please see the list of third-party usage citations and guidelines for Citing data you find in GEO.

What is the difference between a Series and a DataSet?

A GEO Series (GSExxx) is an original submitter-supplied record that summarizes a study. These data are reassembled by GEO staff into curated GEO Datasets (GDSxxx). A DataSet represents a collection of biologically- and statistically-comparable Samples processed using the same Platform. Information reflecting experimental variables is provided through DataSet subsets. Both Series and DataSets are searchable using the GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters; see the Data organization document for more information. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have a corresponding DataSet record(s). When a curated DataSet is not available, it may be appropriate to analyze the Series using GEO2R, which compares groups of Samples and identifies differentially expressed genes.

Why can't I find gene profile charts or DataSet clusters for my study of interest?

As explained in the What is the difference between a Series and a DataSet? FAQ above, suitable submitter-supplied GEO Series records are reassembled by GEO staff into curated DataSets. At periodic intervals, these DataSets are then indexed and loaded into GEO Profiles and GEO DataSets, which allows users to query gene names, visualize charts and clusters, and more. If your Series of interest has not yet been assembled into a DataSet these features will not be available, but it may be appropriate to analyze the Series using GEO2R, which compares groups of Samples and identifies differentially expressed genes.

What do the red bars and blue squares represent in GEO profile charts?

In GEO Profile charts, the red bars represent values extracted from original GEO Sample records as supplied by submitters. For single channel data, values are assumed to be submitted as normalized signal count data, reflecting the relative measure of abundance of each transcript. For Affymetrix data, the "detection call" (A=absent, P=present, M=marginal) data are taken into consideration, if supplied (absent calls faded out). For dual channel experiments values are normalized log ratios, and SAGE values reflect "tags per million" counts. The blue squares represent the percentile ranked value of a spot compared to all other spots within that Sample. That is, all values within each Sample are rank ordered and placed into rank percentile 'bins'. This gives an indication of the relative expression level of that gene compared to all other genes on the array. Value profiles are plotted on a scale that fits each individual gene, whereas rank data are always plotted on a scale of 0-100%.

What data types are provided with next-generation sequence submissions?

Processed sequence data files: GEO hosts submitter-supplied processed sequence data files, which are linked at the bottom of Sample and/or Series records as supplementary files. Requirements for processed data files are not yet fully standardized and will depend on the nature of the study, but data typically include genome tracks or expression counts.

Raw sequence data files: Submitter-supplied raw data are loaded to NCBI's Sequence Read Archive (SRA) database. Use the SRA Run Selector to list and select Runs to be downloaded or analyzed with the SRA Toolkit. If you have questions about SRA format or the SRA toolkit, please e-mail SRA directly.

NCBI-generated RNA-seq count data: For some RNA-seq data, NCBI precomputes RNA-seq gene expression counts and delivers them as count matrices that may be incorporated into commonly used differential expression analysis and visualization software. For more information, see NCBI-generated RNA-seq count data.

Last modified: September 5, 2024