Skip to content
/ PSYCHIC Public

Code for finding putative enhancers using Hi-C data

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



46 Commits

Repository files navigation


Code for finding putative enhancers using Hi-C data


python <config file>

For running the example use python examples/himr90.chr20.conf from the repo directory.

Config file format

  • res resolution of the Hi-C file, in bases (40000)
  • win interaction distance cutoff in bases, usually 2000000
  • chrname chromosome name, used for genes, size and output. should use 'chr1'..'chrX'
  • chrsize path to bed file of chromsome lengths (examples/hg19.size.bed)
  • output_prefix output prefix for this conf file (hIMR90)
  • output_dir path in which to store the output files (examples/output)
  • input_matrix path to input Hi-C matrix for the chromosome (see format specifications below)
  • genes_file path to bed file describing genes (examples/hg19.genes.bed)

For a functioning example config file consult examples/himr90.chr20.conf


The code was used on a Linux machine. It has scripts in matlab, python and perl, so the minimal requirements would be -

  • Matlab
  • python2.7
  • Perl
  • Unix tools - cut, sed, pushd, popd (typically installed by default)


  • Main command line interface for PSYCHIC
  • matlab/ Main matlab files
  • domaincall_software/ Slightly adapted files from Hi-C DomainCaller
  • insulation/ Domain caller from Crane et al. and additional scripts Insulation Score DomainCaller
  • examples/ Example files, contains config, Hi-C matrix, chromsome sizes and gene list

Input matrix

input_matrix should be in a csv or tab-delimitered file, specifying the Hi-C data (for a given chomromsome). The first column could be either empty or contain the names of each genomic segment (in a fixed stride, as specified in the res variable). The rest of the matrix should be symmetric, with each cell (< i,j >) containing the number of contacts between the matching segments in the chromosome. These data could be previously normalized to account for various Hi-C biases, and is assumed to be symmetric. See example under examples/hIMR90.chr20.matrix.txt.

Output files

The program outputs multiple intermediate files, and final enhancers files.

  • .enh_p.bed bed file of over-represented pairs with FDR value < p, each line is of the format [chr start end], [gene, distance to enhancer, FDR, p-value, expected # of interactions, observed # of interactions]
  • .model.estimated.params.bed the power-law parameters for the model, each line represents a TAD / merge / Sky Specifically, fields include [ chr start end ], class { TAD / merge / Sky }, parameters for first (or only) power-law segment [ slope, intersect, RMSE ], same for second power-law segment (or NaN), transition point between two models, in log2(bp).

Temporary files

  • .enh_rand.bed set of random interactions with promoters, used as control for near-promoter enrichment values
  • .domains, domains.txt the domains found be the specified TAD-calling algorithm (DI/Insulation Score)
  • refined domains found by the program
  • .7col,.DI,.HMM used by Directionality Index domain caller
  • .prob.{bg,tad}.matrix.txt,supersum.txt the probabilstic parameters, based on the given domains
  • .model.estimated.matrix.txt the model, estimated using the data, in tab deilimeted format
  • .llr.txt the log-likelihood ratio of observed counts and the model
  • .hierarchy.bed the constructed hierarchy of the domains
  • .mdump* files are the outputs of executed matlab functions, for debugging.
  • .fixed_matrix the input matrix converted to the desired format

The prefix specified in the configuration file is prepended to the mentioned names output_prefix and chrname


Code for finding putative enhancers using Hi-C data






No releases published


No packages published