Skip to content

Reference data: BED files, genes, transcripts, variations.

Notifications You must be signed in to change notification settings

AstraZeneca-NGS/reference_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reference data

Capture region BED files

Collects commonly used capture region BED files. These are installed and available for use in bcbio analyses. Includes files for hg19 (chr1, chr2, chr3... style naming) and GRCh37 (1, 2, 3... style naming).

Canonical transcripts

Files under transcripts/cancer_transcripts_*_ensembl.txt contain IDs of canonical (longest) transcripts that are used by SnpEff variant prediction tool when it run with the -canon flag (only in Ensembl-based versions of reference databases GRCh37.** and GRCh38.** in SnpEff notation). Since not all IDs in the list represent the most cancer-relevant isoforms, transcripts/canon_cancer_replacement.txt provides a map of transcripts for replacement with the -canonList option:

java -jar snpEff.jar GRCh37.75 test.vcf -canon -canonList transcripts/canon_cancer_replacement.txt

To use the canonical transcripts for variant annotation in bcbio, add the following into your configuration YAML file:

algorithm:
  effects_transcripts: canon

To use the cancer transcripts, use the following:

algorithm:
  effects_transcripts: canonical_cancer

The full list of genes with replaced transcripts:

AKT1     ENST00000555528
BRCA1    ENST00000357654
CD79B    ENST00000006750
CDKN2A   ENST00000304494
CHEK1    ENST00000534070
CHEK2    ENST00000328354
ESR1     ENST00000206249
FANCL    ENST00000233741
FGFR1    ENST00000447712
FGFR2    ENST00000457416
FGFR3    ENST00000440486
MET      ENST00000397752
MYD88    ENST00000396334
PPP2R2A  ENST00000380737
RAD51D   ENST00000345365
RAD54L   ENST00000371975
GNAS     ENST00000371085
TP53     ENST00000269305
ARID1B   ENST00000350026
TET2     ENST00000380013
CEBPA    ENST00000498907
PIK3C2G  ENST00000538779

About

Reference data: BED files, genes, transcripts, variations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •