Skip to content

list of notebooks for 4 days course on Hi-C data handling and 3D modeling of chromatin

License

Notifications You must be signed in to change notification settings

3DGenomes/3DAROC

Repository files navigation

3DAROC

Course description

3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are spatially close in the cell nucleus. Overall, these interaction matrices have been used to study how the genome folds within the nucleus, that is one of the most fascinating problems in modern biology. The rigorous analysis of the paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in the space. Currently, there is a huge expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1 kb resolution (see for example: http:https://hic.umassmed.edu/welcome/welcome.php ; http:https://promoter.bx.psu.edu/hi-c/view.php or http:https://www.aidenlab.org/data.html ). In this course, participants will learn to use TADbit, a software designed and developed to manage all the dimensionalities of the Hi-C data:

  • 1D - Map paired-end sequences to generate Hi-C interaction matrices
  • 2D - Normalize matrices and identify constitutive domains (compartments, TADs)
  • 3D - Generate populations of model structures which reproduce the Hi-C interaction matrices
  • 4D - Compare samples at different time points

Participants can bring specific biological questions and/or their own 3C data to analyze during the course. At the end of the course, participants will be familiar with the TADbit software, and will be able to fully analyze Hi-C data. Note: Although the TADbit software is central in this course, alternative software will be discussed for each part of the analysis.

Instructors

Marc A. Marti-Renom obtained a Ph.D. in Biophysics from the Universidad Autonoma de Barcelona where he worked on protein folding under the supervision of B. Oliva, F.X. Aviles and M. Karplus. After that, he went to the US for a postdoctoral training on protein structure modeling at the Sali Lab (Rockefeller University) as the recipient of the Burroughs Wellcome Fund fellowship. Later on, Marc was appointed Assistant Adjunct Professor at UCSF. Between 2006 and 2011, he headed of the Structural Genomics Group at the CIPF in Valencia (Spain). Currently, Marc is an ICREA research professor and leads the Structural Genomics Group at the National Center for Genomic Analysis - Centre for Genomic Regulation (CNAG-CRG) in Barcelona. His group is broadly interested on how RNA, proteins and genomes organize and regulate cell fate. Finally, Marc is an Associate Editor of the PLoS Computational Biology journal and has published over 90 articles in international peer-reviewed journals.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

François Serra obtained his Degree in Biology, specialized in Physiology and Neurophysiology, his Master's Degree in Structural genomics and bioinformatics (Strasbourg I University, France) and it's PhD in Evolutionary Genomics in the Department of Bioinformatics at the CIPF (Valencia). He is now part of the Structural Genomic team of Marc Marti-Renom at CNAG and at CRG (Barcelona). His main research interests are grounded on comparative genomics and evolution with a special focus on the effect of evolution in the structural arrangement of genomes. He has taught MEPA and 3DMOG for GTPB, and also in similar courses at CIPF (Valencia, ES) and the Department of Genetics of the University of Cambridge (UK).

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

David Castillo obtained his MSc in Photonics from the Universitat Politècnica de Catalunya in Barcelona (Spain) where he worked in Super-resolution microscopy. He has a background in Physics and Engineering. He works as a technician in the Structural Genomics team of Marc A. Martí-Renom at CNAG-CRG (Barcelona), developing tools for the analysis, modelling and visualization of HiC data. He is also interested in the integration of microscopy to the modeling of genomic 3D structures.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

Target Audience

The course is designed for experimental researchers and bioinformaticians at the graduate and post-graduate levels which are interested in studying the genome spatial organization.

It is likely that the participants to this course aim at getting involved in generating Hi-C data for chromosome structure determination, or that they just want to be able to correctly interpret and analyse publicly available data.

Course Pre-requisites

Recommended Linux and basic Python programming skills, graduate level in Life Sciences. All hands-on will be given at 3 levels of computational expertise (web platform, command-line tool and python scripting).

TADbit API

This tutorial is associated with a specific version of TADbit, if you wish to reproduce exactly the results in the notebooks you should use the version of TADbit tagged 3DAROC_2018.

To install this version do:

git clone https://github.com/3dgenomes/tadbit
cd tadbit
git checkout tags/3DAROC_2018
sudo python setup.py install

TADbit tools

Most of the tasks of the "core pipeline" can be tunned directly from command line (without any python), using TADbit tool. Have a look to the commands, and the metadata of the results.

For now TADbit tool is not incuded in the general documetation, as it is still "under construction". Use it carefully, and don't hesitate to repport any weird behaviour you observe.

Virtual research environment

With small datasets TADbit core pipeline can be runned through a new Virtual Research Environment (VRE), hosted by the MuG project.

This might also be the best place to try TADkit for visualizing genomes in 3D together with interactions matrices and any other genomic track.

Course material

Lectures (pdf) Core pipeline (notebooks) Annex (notebooks)
Day1
Day2
Day3
Day4
Day5

Course Timetable

(provisional)

Day #1 Monday, Sep 17th
09:30 - 10:00 Welcome and introductions
10:00 - 11:00 Overview on structure determination
11:00 - 11:30 Coffee Break
11:30 - 12:30 3D modeling of genomes and genomic domains
12:30 - 14:00 Lunch Break
14:00 - 15:00 Introduction to Linux and Python: the Jupyter notebook
15:00 - 16:00 Next Generation Sequencing (NGS) and data handling
16:00 - 16:30 Tea Break
16:30 - 18:00 From raw data to Hi-C contact matrices
Day #2 *Tuesday, Sep 18th
09:30 - 10:00 Morning wrap-up: what have we done so far?
10:00 - 11:00 Chromatin structure and Hi-C data
11:00 - 11:30 Coffee Break
11:30 - 12:30 Integrative modeling applied to chromatin
12:30 - 14:00 Lunch Break
14:00 - 16:00 Biological applications (I)
16:00 - 16:30 Tea Break
16:30 - 18:00 Hi-C contact matrices: filtering and normalization
Day #3 Wednesday, Sep 19th
09:30 - 10:00 Morning wrap-up: what have we done so far?
10:00 - 11:00 Biological applications (II)
11:00 - 11:30 Coffee Break
11:30 - 12:30 Compartment detection and analysis
12:30 - 14:00 Lunch Break
14:00 - 16:00 Topologically Associated Domains detection and analysis
16:00 - 16:30 Tea Break
16:30 - 18:00 Comparison between experiments
Day #4 Thursday, Sep 20th
09:30 - 10:00 Morning wrap-up: what have we done so far?
10:00 - 11:00 Biological applications (III)
11:00 - 11:30 Coffee Break
11:30 - 12:30 3D Modeling of real Hi-C data with TADbit (I)
12:30 - 14:00 Lunch Break
14:00 - 16:00 3D Modeling of real Hi-C data with TADbit (II)
16:00 - 16:30 Tea Break
16:30 - 18:00 Final wrap-up session
Day #5 Friday, Sep 21st
09:30 - 10:00 Morning wrap-up: what have we done so far?
10:00 - 11:00 Multiscale Genomics: From genomes to structures
11:00 - 11:30 Coffee Break
11:30 - 12:30 Nucleosome positioning and Nucleosome Dynamics
12:30 - 14:00 Lunch Break
14:00 - 16:00 Coarse-Grained DNA
16:00 - 16:30 Tea Break
16:30 - 18:00 Chromatin Dynamics

Feedback

Feedback (0: not clear; 5: very clear)
Day1
- UNIX/Python
- FASTQ/Hi-C quality before mapping
- Iterative vs Fragment-based mapping
- Stats for quality measure of Hi-C experiment
- Applied filters in reads
- Reading TADbit Hi-C map
4.3
3.8
4.4
3.3
3.1
3.6
Day2
- What 3C-based modelling methods told us about genome
- 3 levels of organization (A/B, TADs, loops)
- Modelling 3D genomes
- Limits of 3D modelling
- TADbit filtering/normalization
4.5
4.7
3.2
2.9
4.0
Day3
- Using TADbit tools: 3.4
- A/B bompartment calling: 3.3
- TAD calling & comparison: 3.3
3.4
3.3
3.3
Day4
- Modeling genomes
- Analyzing the 3D genomes
- Reading the output of the model analysis
2.7
2.6
3.1
Day5

Thanks!

About

list of notebooks for 4 days course on Hi-C data handling and 3D modeling of chromatin

Resources

License

Stars

Watchers

Forks

Packages

No packages published