Skip to content
/ MACIE Public

Multi-dimensional Annotation Class Integrative Estimation

Notifications You must be signed in to change notification settings

xihaoli/MACIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

MACIE (Multi-dimensional Annotation Class Integrative Estimation)

Description

Thank you for your interest in MACIE. MACIE (Multi-dimensional Annotation Class Integrative Estimation) is an unsupervised multivariate mixed model framework to assess multi-dimensional functional impacts for both coding and non-coding variants in the human genome. MACIE integrates a variety of functional annotations, including protein function scores, evolutionary conservation scores, and epigenetic annotations from ENCODE and Roadmap Epigenomics, and estimates the joint posterior probabilities of each genetic variant being functional.

Data Availability and Code Reproducibility

The MACIE scores (and other integrative scores) used in all benchmarking examples are available for download here. Precomputed MACIE scores for every possible variant in the human genome are available for download via Zenodo: Part 1 (Chr1 - Chr3), Part 2 (Chr4 - Chr7), Part 3 (Chr8 - Chr13), Part 4 (Chr14 - Chr22). These are compressed with the bgzip utility, and indexed with tabix, both of which are part of the Samtools software suite. In addition, tabix provides a means of efficiently extracting subsets of the data defined by genomic regions. For example, the command line

tabix MACIE_hg19_noncoding_chr1.tab.bgz 1:20000-30000 > Subset.txt

extracts all variants on chromosome 1 from position 20,000 through 30,000 and writes them to the file Subset.txt. In this example, the tabix index file, MACIE_hg19_noncoding_chr1.tab.bgz.tbi, needs to be in the same directory as the main data file, MACIE_hg19_noncoding_chr1.tab.bgz. Samtools, including bgzip and tabix, is available here.

The code used for training MACIE models are available here.

All genomic coordinates are given in NCBI Build 37/UCSC hg19.

Reference

Xihao Li*, Godwin Yung*, Hufeng Zhou, Ryan Sun, Zilin Li, Kangcheng Hou, Martin Jinye Zhang, Yaowu Liu, Theodore Arapoglou, Chen Wang, Iuliana Ionita-Laza, and Xihong Lin (2021+) "A Multi-Dimensional Integrative Scoring Framework for Predicting Functional Variants in the Human Genome". Submitted.

About

Multi-dimensional Annotation Class Integrative Estimation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages