This repository contains the software implementation of a TwoSampleMR (Two sample Mendelian randomization) pipeline that performs Mendelian randomization using GWAS and QTL summary statistics to estimate the causal effect of an QTL in a given tissue (exposure), such as expression or splicing QTLs on a trait (outcome). This pipeline was run in the study: Integrating genetic regulation and single-cell expression with GWAS prioritizes causal genes and cell types for glaucoma. Hamel AR, et al. medRxiv 2023 (https://www.medrxiv.org/content/10.1101/2022.05.14.22275022v2), accepted in principle at Nature Communications 2023.
This pipeline runs the TwoSampleMR and MendelianRandomization packages in R (version 4.1.2). MR estimates are generated by calculating the Wald ratio. Where multiple variants constituted the instrument for the candidate gene, the inverse-variance weighted (IVW) method is used as the primary method for pooling variant-specific estimates. For sensitivity analysis, this pipeline also runs the simple-median, weighted-median, MR-Egger, and MR-PRESSO methods. Horizontal pleiotropy is tested using the Egger-intercept test and MR-PRESSO global heterogeneity test on cases with 3 or more instrumental variable variants. P<0.05 indicates the presence of horizontal pleiotropy. To correct for multiple hypothesis testing, Bonferroni correction or Benjamini-Hochberg (BH) FDR<0.05 applied to the primary IVW/Wald ratio test can be used to identify statistically significant MR results.
Authors:
Puja Mehta[1,2,3], Skanda Rajasundaram[1,2,3,4]
Segre lab, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA
Affiliations:
[1] Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
[2] Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
[3] Broad Institute of Harvard and MIT, Cambridge, MA, USA
[4] Centre for Evidence-Based Medicine, University of Oxford, Oxford, UK; Faculty of Medicine, Imperial College London, London, UK
For questions or comments regarding this tool, please contact Puja Mehta at pamehta [at] meei [dot] harvard [dot] edu, and Ayellet Segre at ayellet_segre [at] meei [dot] harvard [dot] edu.
Date: November 16, 2023
src
: the directory contains scripts for the software pipeline and for generating results
data
: the directory contains input files required to run TwoSampleMR and MendelianRandomization for GTEx v8 QTLs. Needs to be downloaded by user. Each type of molecular QTL will have a separate sub-directory (e.g. GTEx_v8_eQTL, GTEx_v8_sQTL)
tmp_data
: the directory contains temporary files generated
results
: the directory containing the results file
TwoSampleMR was written in R (at least 3.5) and requires the following libraries and modules:
library("data.table")
library("dplyr")
library("tidyr")
library("foreign")
library("tibble")
library("metafor")
library("meta")
library("survival")
library("ggplot2")
library("plyr")
library("gridExtra")
library("gtable")
library("grid")
library("tidyverse")
library("stringr")
library("coloc")
library("devtools")
library("glmnet")
library("MendelianRandomization")
library("TwoSampleMR")
Guide to running our TwoSampleMR pipeline, preprocessing of input files, and generating results: Instructions are based on using GTEx v8 data as input, but can be applied to any non-GTEx QTL datasets.
- Create the repository structure
- Format the GWAS summary statistics file. Required columns: chr, pos, SNP, effect_allele, Other_allele, effect, StdErr, gwas_p_value
- Download and format the molecular QTL files. Note: Current pipeline is build to work with the GTEx v8 expression and splicing QTL data output format (GTEx Download https://www.gtexportal.org/home/downloads/adult-gtex#qtl). Required columns: variant_id, gene_id, slope, slope_se, pval_nominal
- Generate a manifest file (space separated) with Trait name, Gene ID, Gene Symbol, Tissue, QTL type, p-Value cutoff, File name of the trait, path to the QTL file and QTL file extension. An example manifest file can be found in: manifest.sh
- Run TwoSampleMR (wrap_Manifest.R). An example shell script that runs the manifest file and launches the MR jobs: wrap_manifest.sh. The output resuts file is per GWAS/trait, gene, QTL type, tissue combination and contains reuslts from all two sample MR tests and horizontal pleiotropy tests.
- Concatenate all results files across all GWAS/trait, gene, QTL type, tissue combinations into a single file/table (concatenate_results.R).
Our code is distributed under the terms of the BSD 3-Clause License. See LICENSE.txt file for more details.
- Hamel et al., "Integrating genetic regulation and single-cell expression with GWAS prioritizes causal genes and cell types for glaucoma", medRxiv 2023 (https://www.medrxiv.org/content/10.1101/2022.05.14.22275022v2). Accepted in principle at Nature Communications 2023.
- GTEx Consortium, "The GTEx Consortium atlas of genetic regulatory effects across human tissues", Science 369, 1318–1330 (2020).
- Hemani et al.,”The MR-Base platform supports systematic causal inference across the human phenome”, eLife 2018 (https://elifesciences.org/articles/34408)
- Hemani et al.,”Orienting the causal relationship between imprecisely measured traits using GWAS summary data”, PLOS Genetics 2017 (https://doi.org/10.1371/journal.pgen.1007081)
- Yavorska and Burgess, "MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data", Int. J. Epidemiol. 46, 1734–1739 (2017).
- Bowden et al., "Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression", Int. J. Epidemiol. 44, 512–525 (2015).
- Verbanck et al., "Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases", Nat. Genet. 50, 693–698 (2018).
Last updated: November 16, 2023