Skip to main content

Table 1 A comparison of the methods for reconstructing 3-D chromosome and genome structure from Hi-C data

From: An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data

Algorithms Year Software Availability Language Structure Representation In–built Normalization IF Model Based Methodology Based Sampling Algorithm Structure Based Species, Coverage, and Resolution of Test 3C Data Input
5C3D [45] 2009 No   Points No Distance Optimization Gradient Descent Ensemble Human: 5C HoxA gene cluster region. Hi–C contact matrix
Duan et al. [66] 2010 No   Spheres No Distance Optimization IPOPT [71]– Interior–point gradient–based Consensus Budding yeast: Whole genome (10kb) Hi–C contact matrix
Tanizawa et al. [67] 2010 No   Spheres Yes Distance Optimization IPOPT– Interior–point gradient–based Consensus Fission yeast: Whole genome (20kb) Hi–C contact matrix
Bau et al. [72] 2011 No   Points Yes Distance Optimization IMP [73] – Monte Carlo(MC) sampling and simulated annealing with Metropolis criteria Ensemble Human: Chromosome 16 – 500–kb ENm008 domain (500kb) Hi–C contact matrix
MCMC5C [48] 2011 No Java Points No Probability Probabilistic Modeling Markov chain Monte Carlo (MCMC) sampling using the Metropolis–Hastings algorithm Ensemble Human: 5C 142kb genomic region and Hi–C Chromosome 16 – 88.4 Mb region (1Mb) Hi–C contact matrix
Meluzzi and Arya [74] 2012 No   Polymer No Contact Optimization Modified conjugate gradient algorithm and Brownian Dynamics simulation Ensemble Synthetic: 75kb – 270kb (3kb – 6kb) Hi–C contact matrix
Kalhor et al. [68] 2013 No   Spheres Yes Contact Optimization Conjugate gradients and molecular dynamics with simulated annealing Population Human: Whole genome (1Mb) Hi–C contact matrix
BACH [75] 2013 Yes R Points Yes Probability Probabilistic Modeling Gibbs sampler with hybrid MC, and adaptive rejection sampling (ARS) Consensus Mouse: All chromosomes (40kb) Hi–C contact matrix and local genomic features (restriction enzyme cutting frequencies, GC content and sequence uniqueness) as input
ChromSDE [76] 2013   Matlab Points No Distance Optimization Linear and Quadratic Semi–definite programming(SDP) Consensus Mouse and Human: Chromosome 13 (200kb – 1Mb 40kb(chr13:21Mb–25Mb)) Hi–C contact matrix
AutoChrom3D [77] 2013 Yes Perl Points Yes Distance Optimization Non–linear constrained optimization Consensus Human: 500kb – 1MB (8kb) Hi–C contact matrix
PASTIS [78] 2014 Yes Python Points No Distance and Probability Optimization(MDS1, MDS2) and Probabilistic Modeling (PM1,PM2) IPOPT – interior point filter algorithm Consensus Mouse: All chromosomes (100kb – 1Mb, 20kb –chr1-19) Hi–C contact matrix
ShRec3D [79] 2014 Yes Matlab Points No Distance Optimization Shortest-path Floyd-Warshall algorithm Consensus Human: Chromosome 1 – 30Mbp region (3kb - 150kb) Hi–C contact matrix
MOGEN [49, 80] 2014 Yes Java Points No Contact Optimization Gradient descent Ensemble Human: All chromosomes and whole genome (200kb - 1Mb) Hi–C contact matrix
FisHiCal [81] 2014 Yes R Points Yes Distance Optimization SMACOF algorithm [82] Consensus Human: Whole genome (1Mb) Hi–C contact matrix
InfMod3DGen [64] 2015 Yes Matlab Polymer No Distance Probabilistic Modeling Gradient ascent Ensemble Yeast: All chromosomes –12.1Mb genome (10kb) Hi–C contact matrix
Gen3D [83] 2015 Yes C++ Points No Contact Optimization Adaptation, Simulated annealing and Genetic algorithm Consensus Human: All chromosomes (1Mb) Hi–C contact matrix
MBO [84] 2015 Yes Matlab Points No Distance Optimization Manopt – manifold optimization Consensus Mouse: Chromosome X (50kb - 600kb) Hi–C contact matrix
TADbit [85] 2016 Yes Python Spheres Yes Distance Optimization Simulated Annealing and Monte Carlo Sampling Ensemble Drosophila Fly: 52Mb region (10kb) Hi–C contact matrix
HSA [47] 2016 Yes R Points Yes Distance Optimization GLM framework with Hamiltonian dynamics with simulated annealing Consensus Human and Mouse: All chromosomes (25kb - 1Mb) One or more raw contact maps or normalized Hi–C contact matrix.
Chromosome3D [46] 2016 Yes Perl Points No Distance Optimization Distance Geometry Simulated Annealing Ensemble Human: All chromosomes (500kb - 1Mb) Hi–C contact matrix
PGS [65, 86] 2016 Yes Python Spheres Yes Probability Probabilistic Modeling Simulated annealing/molecular dynamics Population Human: Whole genome (50kb - 1Mb) Raw Hi–C contact matrix and a TAD file in bed format
tRex [87] 2016 Yes R Points Yes Probability Probabilistic Modeling MCMC sampling using the Metropolis–Hastings algorithm/Gibbs sampler, Hamiltonian MCMC Ensemble Human: Chromosome 14 and 22 (1Mb) Hi–C contact matrix and a vector of covariates (e.g. fragment length, GC content, and mappability score)
3D–GNOME [88, 89] 2016 Web server C++, Javascript, PHP, Python, R Polymer Yes Distance Optimization Monte Carlo-based simulated annealing Consensus Human: All chromosomes (Multiscale 1-2Mb, PET (1–10kb)) A seven or eight columns bedpe (paired–end BED format) file containing the locations and strengths of long range contact points. Use of ChIA-PET data is recommended
LorDG [69] 2016 Yes Java Points No Distance Optimization Gradient ascent Ensemble Human: All chromosomes and whole genome (500kb –1Mb) Hi–C contact matrix
ISDHiC [90] 2016 No C, C++, Python Spheres No Distance Probabilistic Modeling MCMC sampling using Hamiltonian MC Ensemble Mouse: Chromosome X (50kb, 500kb) Hi–C contact matrix
Chrom3D [91] 2017 Yes Perl Spheres No Contact Optimization Monte Carlo-Optimization using the Metropolis–Hastings algorithm with simulated annealing Ensemble Human: Whole genome (TAD) Hi–C contact matrix and LAD information
miniMDS [92] 2017 Yes Python Points No Distance Optimization MDS approximation algorithms and Kabsch algorithm Consensus Human: Whole genome (10kp-100kb) Hi–C contact matrix
3DMax [70] 2018 Yes Java, Matlab Points No Distance Optimization Gradient ascent Ensemble Human: All chromosomes (1Mb) Hi–C contact matrix
GEM [93] 2018 Yes Matlab Polymer No Contact Optimization Adaptive gradient descent method Ensemble Human: Chromosome 13 and 14(1Mb), Chromosome 1 (250 kb: 130Mb-180Mb region ), Yeast : Chromosome 6 (10kb), Hi–C contact matrix
GEM–FISH [94] 2018 Yes Matlab Polymer No Contact Optimization Gradient descent Consensus Human: Chromosomes 20, 21, 22, and X (TAD) Hi–C contact matrix and FISH data
SIMBA3D [95] 2018 Yes Python Points No Probability Probabilistic Modeling BFGS mehtod with analytical gradient Ensemble Mouse: All chromosomes (100kb) Hi–C contact matrix
ShRec3D+ [96] 2018 No   Points No Distance Optimization Floyd-Warshall algorithm Consensus Human and Mouse: All chromosomes (1Mb) Hi–C contact matrix
EVR [97] 2018 Yes C, Python Points No Distance Optimization Error-Vector Resultant algorithm Consensus Bacteria: All chromosomes (10kb) Hi–C contact matrix
Hierarchical3DGenome [98] 2019 Yes Java Points No Distance Optimization Gradient ascent and hierarchical modeling Ensemble Human: All chromosomes (1kb - 5kb) Hi–C contact matrix and File containing identified TADs
  1. Each column denotes the key properties of each method. Algorithms column denotes the 3-D structure reconstruction method’s name or acronym, Year column denotes the publication year, Software Availability column denotes the availability of an open-source software for a method, Language column denotes the programming language that the software was implemented in, Structure Representation column denotes the structural representation used by a method— which could either be as polymer, spheres, or points, detailed explanation for each is provided above, In–Built Normalization column denotes if a normalization step is in-built into a method’s algorithm, Interaction Frequency Model Based column denotes a method’s class based on its input IF modelling, Methodology Based column denotes a method’s class based on its chromosome and genome 3-D reconstruction methodology, Sampling Algorithm column denotes the sampling or one of the sampling algorithms used by a method, Structure Based column denotes a method’s class based on the structure generated—consensus-based methods generate a single representative structure for the entire Hi-C dataset or for single-cell Hi-C data, ensemble-based methods generates a variety of 3-D structures while using the same input H-C data to adequately simulate the heterogeneity of the Hi-C data, population-based methods generates a population of individual 3-D structures of genome that is as a whole statistically consistent with true configuration of the input Hi-C data, Species,Coverage, and Resolution of Test 3C data column denotes the size, resolution, species of Hi-C data used in a method’s manuscript, and Input column denotes a method’s input data format. Even though these methods may be tested only on the Hi-C data of some species, most, if not all, the methods should be applicable to any Hi-C contact data of any species formatted according to the input requirement