Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 2.01 KB

README.md

File metadata and controls

33 lines (23 loc) · 2.01 KB

Predicting regional mutation burden in cancer genomes using chromatin accessibility (CA) and replication timing (RT)

This repository includes source code, tutorials, and processed datasets for the study:

Chromatin accessibility of primary human cancers ties regional mutational processes and signatures with tissues of origin .

Oliver Ocsenas and Jüri Reimand (2022) in revision.

Tutorials - Jupyter notebooks

  • 1_BigWigtoWindow.ipynb - mapping chromatin signals to megabase-scale windows
  • 2_MAFtoWindow.ipynb - mapping cancer mutations to megabase-scale windows
  • 3_CA2M_RF.ipynb - random forest models of megabase-scale mutation burden, chromatin accessibility and replication timing
  • 4_CA2M_RF_FeatureSelection_Tutorial.ipynb - selecting significant features predicting mutation rates
  • 5_CA2M_RF_SHAPscores.ipynb - computing feature importance scores (SHAP)
  • 6_CA2M_RF_EnrichedMutations_Tutorial.ipynb - detecting genomic regions with enriched mutations that are not explained by chromatin and replication timing alone

Tutorials/data - files needed for tutorials

  • All_CA_RT_100KB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 100-kbps resolution
  • All_CA_RT_1MB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 1-Mbps resolution
  • NormalCA_RT_MBscale.csv.gz - CA and RT tracks for normal tissues and cell lines, 1-Mbps resolution
  • PCAWG_SNVbinned_100KB_scale.csv.gz
  • PCAWG_SNVbinned_MBscale.csv.gz - mutation burden in whole cancer genomes, 1-Mbps resolution
  • PCAWG_breastcancer_SNV.MAF.gz - example file of somatic mutations in breast cancer for creating files above
  • SHAP_plot.pdf - example plot of feature importance scores (SHAP)
  • TCGA_BRCA_ATACSeq_chr1_2.bw - example file of chromatin accessibility in breast cancer for creating files above (chrs 1-2 only)
  • TumorCA_RT_MBscale.csv.gz - CA and RT tracks for cancer samples, 1-Mbps resolution

All_code - entire code repository for the project; use on your own responsibility

Contact: oocsenas [@] oicr.on.ca ; juri.reimand [@] utoronto.ca