Skip to content

🧬 gnomAD Python API is used to obtain data from gnomAD (genome aggregation database).

License

Notifications You must be signed in to change notification settings

furkanmtorun/gnomad_python_api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧬 gnomAD Python API (Batch Script)

#️⃣ What is gnomAD and the purpose of this script?

gnomAD (The Genome Aggregation Database) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes. Here, this batch script is able to search the genes or transcripts of your interest and retrieve variant data from the database via gnomAD backend API that based on GraphQL query language.

#️⃣ Requirements and Installation

  • Create a directory and download the "gnomad_python_api.py" and "requirements.txt" files or clone the repository via Git using following command: git clone https://github.com/furkanmtorun/gnomad_python_api.git

  • Install the required packages if you do not already: pip3 install -r requirements.txt

  • It's ready to use now!

If you did not install pip yet, please follow the instruction here.

#️⃣ Usage & Options

Options in the script Description Parameters
-filter_by It defines the input type gene_name, gene_id, transcript_id
-search_by It defines the input Type a gene/transcript identifier
e.g.: TP53, ENSG00000169174, ENST00000544455
Type the name of file containig your inputs
e.g: myGenes.txt
-dataset It defines the dataset exac, gnomad_r2_1, gnomad_r3, gnomad_r2_1_controls, gnomad_r2_1_non_neuro, gnomad_r2_1_non_cancer, gnomad_r2_1_non_topmed
-h It displays the parameters To get help via script: python gnomad_python_api.py -h

Example Usages

  • How to list the variants by gene name or gene id?

python gnomad_python_api.py -filter_by="gene_name" -search_by="TP53" -dataset="gnomad_r2_1"

Here, "gene_id" can also be used instead of "gene_name" after stating an Ensembl Gene ID instead of a gene name.

  • How to list the variants by transcript ID?

python gnomad_python_api.py -filter_by="transcript_id" -search_by="ENST00000544455" -dataset="gnomad_r3"

  • How to list the variants using a file containing genes/transcripts?

    • Prepare your file that contains gene name, Ensembl gene IDs or Ensembl transcript IDs line-by-line.

      ENSG00000169174
      ENSG00000171862
      ENSG00000170445

    • Then, run the following command:

    python gnomad_python_api.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="exac"

Please, use only one type of identifier in the file.

  • Then, the variants will be listed in "outputs" folder in the files according to their identifier (gene name, gene id or transcript id).
  • That's all!

#️⃣ Contributing & Feedback

I would be very happy to see any feedbacks and contributions on the script.

Furkan Torun | [email protected] | Web site: furkanmtorun.github.io