An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data

Oluwadare, Oluwatosin; Highsmith, Max; Cheng, Jianlin

doi:10.1186/s12575-019-0094-0

Biological Procedures Online

Table 1 A comparison of the methods for reconstructing 3-D chromosome and genome structure from Hi-C data

From: An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data

Algorithms	Year	Software Availability	Language	Structure Representation	In–built Normalization	IF Model Based	Methodology Based	Sampling Algorithm	Structure Based	Species, Coverage, and Resolution of Test 3C Data	Input
5C3D [45]	2009	No		Points	No	Distance	Optimization	Gradient Descent	Ensemble	Human: 5C HoxA gene cluster region.	Hi–C contact matrix
Duan et al. [66]	2010	No		Spheres	No	Distance	Optimization	IPOPT [71]– Interior–point gradient–based	Consensus	Budding yeast: Whole genome (10kb)	Hi–C contact matrix
Tanizawa et al. [67]	2010	No		Spheres	Yes	Distance	Optimization	IPOPT– Interior–point gradient–based	Consensus	Fission yeast: Whole genome (20kb)	Hi–C contact matrix
Bau et al. [72]	2011	No		Points	Yes	Distance	Optimization	IMP [73] – Monte Carlo(MC) sampling and simulated annealing with Metropolis criteria	Ensemble	Human: Chromosome 16 – 500–kb ENm008 domain (500kb)	Hi–C contact matrix
MCMC5C [48]	2011	No	Java	Points	No	Probability	Probabilistic Modeling	Markov chain Monte Carlo (MCMC) sampling using the Metropolis–Hastings algorithm	Ensemble	Human: 5C 142kb genomic region and Hi–C Chromosome 16 – 88.4 Mb region (1Mb)	Hi–C contact matrix
Meluzzi and Arya [74]	2012	No		Polymer	No	Contact	Optimization	Modified conjugate gradient algorithm and Brownian Dynamics simulation	Ensemble	Synthetic: 75kb – 270kb (3kb – 6kb)	Hi–C contact matrix
Kalhor et al. [68]	2013	No		Spheres	Yes	Contact	Optimization	Conjugate gradients and molecular dynamics with simulated annealing	Population	Human: Whole genome (1Mb)	Hi–C contact matrix
BACH [75]	2013	Yes	R	Points	Yes	Probability	Probabilistic Modeling	Gibbs sampler with hybrid MC, and adaptive rejection sampling (ARS)	Consensus	Mouse: All chromosomes (40kb)	Hi–C contact matrix and local genomic features (restriction enzyme cutting frequencies, GC content and sequence uniqueness) as input
ChromSDE [76]	2013		Matlab	Points	No	Distance	Optimization	Linear and Quadratic Semi–definite programming(SDP)	Consensus	Mouse and Human: Chromosome 13 (200kb – 1Mb 40kb(chr13:21Mb–25Mb))	Hi–C contact matrix
AutoChrom3D [77]	2013	Yes	Perl	Points	Yes	Distance	Optimization	Non–linear constrained optimization	Consensus	Human: 500kb – 1MB (8kb)	Hi–C contact matrix
PASTIS [78]	2014	Yes	Python	Points	No	Distance and Probability	Optimization(MDS1, MDS2) and Probabilistic Modeling (PM1,PM2)	IPOPT – interior point filter algorithm	Consensus	Mouse: All chromosomes (100kb – 1Mb, 20kb –chr1-19)	Hi–C contact matrix
ShRec3D [79]	2014	Yes	Matlab	Points	No	Distance	Optimization	Shortest-path Floyd-Warshall algorithm	Consensus	Human: Chromosome 1 – 30Mbp region (3kb - 150kb)	Hi–C contact matrix
MOGEN [49, 80]	2014	Yes	Java	Points	No	Contact	Optimization	Gradient descent	Ensemble	Human: All chromosomes and whole genome (200kb - 1Mb)	Hi–C contact matrix
FisHiCal [81]	2014	Yes	R	Points	Yes	Distance	Optimization	SMACOF algorithm [82]	Consensus	Human: Whole genome (1Mb)	Hi–C contact matrix
InfMod3DGen [64]	2015	Yes	Matlab	Polymer	No	Distance	Probabilistic Modeling	Gradient ascent	Ensemble	Yeast: All chromosomes –12.1Mb genome (10kb)	Hi–C contact matrix
Gen3D [83]	2015	Yes	C++	Points	No	Contact	Optimization	Adaptation, Simulated annealing and Genetic algorithm	Consensus	Human: All chromosomes (1Mb)	Hi–C contact matrix
MBO [84]	2015	Yes	Matlab	Points	No	Distance	Optimization	Manopt – manifold optimization	Consensus	Mouse: Chromosome X (50kb - 600kb)	Hi–C contact matrix
TADbit [85]	2016	Yes	Python	Spheres	Yes	Distance	Optimization	Simulated Annealing and Monte Carlo Sampling	Ensemble	Drosophila Fly: 52Mb region (10kb)	Hi–C contact matrix
HSA [47]	2016	Yes	R	Points	Yes	Distance	Optimization	GLM framework with Hamiltonian dynamics with simulated annealing	Consensus	Human and Mouse: All chromosomes (25kb - 1Mb)	One or more raw contact maps or normalized Hi–C contact matrix.
Chromosome3D [46]	2016	Yes	Perl	Points	No	Distance	Optimization	Distance Geometry Simulated Annealing	Ensemble	Human: All chromosomes (500kb - 1Mb)	Hi–C contact matrix
PGS [65, 86]	2016	Yes	Python	Spheres	Yes	Probability	Probabilistic Modeling	Simulated annealing/molecular dynamics	Population	Human: Whole genome (50kb - 1Mb)	Raw Hi–C contact matrix and a TAD file in bed format
tRex [87]	2016	Yes	R	Points	Yes	Probability	Probabilistic Modeling	MCMC sampling using the Metropolis–Hastings algorithm/Gibbs sampler, Hamiltonian MCMC	Ensemble	Human: Chromosome 14 and 22 (1Mb)	Hi–C contact matrix and a vector of covariates (e.g. fragment length, GC content, and mappability score)
3D–GNOME [88, 89]	2016	Web server	C++, Javascript, PHP, Python, R	Polymer	Yes	Distance	Optimization	Monte Carlo-based simulated annealing	Consensus	Human: All chromosomes (Multiscale 1-2Mb, PET (1–10kb))	A seven or eight columns bedpe (paired–end BED format) file containing the locations and strengths of long range contact points. Use of ChIA-PET data is recommended
LorDG [69]	2016	Yes	Java	Points	No	Distance	Optimization	Gradient ascent	Ensemble	Human: All chromosomes and whole genome (500kb –1Mb)	Hi–C contact matrix
ISDHiC [90]	2016	No	C, C++, Python	Spheres	No	Distance	Probabilistic Modeling	MCMC sampling using Hamiltonian MC	Ensemble	Mouse: Chromosome X (50kb, 500kb)	Hi–C contact matrix
Chrom3D [91]	2017	Yes	Perl	Spheres	No	Contact	Optimization	Monte Carlo-Optimization using the Metropolis–Hastings algorithm with simulated annealing	Ensemble	Human: Whole genome (TAD)	Hi–C contact matrix and LAD information
miniMDS [92]	2017	Yes	Python	Points	No	Distance	Optimization	MDS approximation algorithms and Kabsch algorithm	Consensus	Human: Whole genome (10kp-100kb)	Hi–C contact matrix
3DMax [70]	2018	Yes	Java, Matlab	Points	No	Distance	Optimization	Gradient ascent	Ensemble	Human: All chromosomes (1Mb)	Hi–C contact matrix
GEM [93]	2018	Yes	Matlab	Polymer	No	Contact	Optimization	Adaptive gradient descent method	Ensemble	Human: Chromosome 13 and 14(1Mb), Chromosome 1 (250 kb: 130Mb-180Mb region ), Yeast : Chromosome 6 (10kb),	Hi–C contact matrix
GEM–FISH [94]	2018	Yes	Matlab	Polymer	No	Contact	Optimization	Gradient descent	Consensus	Human: Chromosomes 20, 21, 22, and X (TAD)	Hi–C contact matrix and FISH data
SIMBA3D [95]	2018	Yes	Python	Points	No	Probability	Probabilistic Modeling	BFGS mehtod with analytical gradient	Ensemble	Mouse: All chromosomes (100kb)	Hi–C contact matrix
ShRec3D+ [96]	2018	No		Points	No	Distance	Optimization	Floyd-Warshall algorithm	Consensus	Human and Mouse: All chromosomes (1Mb)	Hi–C contact matrix
EVR [97]	2018	Yes	C, Python	Points	No	Distance	Optimization	Error-Vector Resultant algorithm	Consensus	Bacteria: All chromosomes (10kb)	Hi–C contact matrix
Hierarchical3DGenome [98]	2019	Yes	Java	Points	No	Distance	Optimization	Gradient ascent and hierarchical modeling	Ensemble	Human: All chromosomes (1kb - 5kb)	Hi–C contact matrix and File containing identified TADs

Each column denotes the key properties of each method. Algorithms column denotes the 3-D structure reconstruction method’s name or acronym, Year column denotes the publication year, Software Availability column denotes the availability of an open-source software for a method, Language column denotes the programming language that the software was implemented in, Structure Representation column denotes the structural representation used by a method— which could either be as polymer, spheres, or points, detailed explanation for each is provided above, In–Built Normalization column denotes if a normalization step is in-built into a method’s algorithm, Interaction Frequency Model Based column denotes a method’s class based on its input IF modelling, Methodology Based column denotes a method’s class based on its chromosome and genome 3-D reconstruction methodology, Sampling Algorithm column denotes the sampling or one of the sampling algorithms used by a method, Structure Based column denotes a method’s class based on the structure generated—consensus-based methods generate a single representative structure for the entire Hi-C dataset or for single-cell Hi-C data, ensemble-based methods generates a variety of 3-D structures while using the same input H-C data to adequately simulate the heterogeneity of the Hi-C data, population-based methods generates a population of individual 3-D structures of genome that is as a whole statistically consistent with true configuration of the input Hi-C data, Species,Coverage, and Resolution of Test 3C data column denotes the size, resolution, species of Hi-C data used in a method’s manuscript, and Input column denotes a method’s input data format. Even though these methods may be tested only on the Hi-C data of some species, most, if not all, the methods should be applicable to any Hi-C contact data of any species formatted according to the input requirement

Back to article page

ISSN: 1480-9222

Contact us

General enquiries: journalsubmissions@springernature.com