Skip to main content

Table 1 A comparison of the methods for reconstructing 3-D chromosome and genome structure from Hi-C data

From: An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data

Algorithms

Year

Software Availability

Language

Structure Representation

In–built Normalization

IF Model Based

Methodology Based

Sampling Algorithm

Structure Based

Species, Coverage, and Resolution of Test 3C Data

Input

5C3D [45]

2009

No

 

Points

No

Distance

Optimization

Gradient Descent

Ensemble

Human: 5C HoxA gene cluster region.

Hi–C contact matrix

Duan et al. [66]

2010

No

 

Spheres

No

Distance

Optimization

IPOPT [71]– Interior–point gradient–based

Consensus

Budding yeast: Whole genome (10kb)

Hi–C contact matrix

Tanizawa et al. [67]

2010

No

 

Spheres

Yes

Distance

Optimization

IPOPT– Interior–point gradient–based

Consensus

Fission yeast: Whole genome (20kb)

Hi–C contact matrix

Bau et al. [72]

2011

No

 

Points

Yes

Distance

Optimization

IMP [73] – Monte Carlo(MC) sampling and simulated annealing with Metropolis criteria

Ensemble

Human: Chromosome 16 – 500–kb ENm008 domain (500kb)

Hi–C contact matrix

MCMC5C [48]

2011

No

Java

Points

No

Probability

Probabilistic Modeling

Markov chain Monte Carlo (MCMC) sampling using the Metropolis–Hastings algorithm

Ensemble

Human: 5C 142kb genomic region and Hi–C Chromosome 16 – 88.4 Mb region (1Mb)

Hi–C contact matrix

Meluzzi and Arya [74]

2012

No

 

Polymer

No

Contact

Optimization

Modified conjugate gradient algorithm and Brownian Dynamics simulation

Ensemble

Synthetic: 75kb – 270kb (3kb – 6kb)

Hi–C contact matrix

Kalhor et al. [68]

2013

No

 

Spheres

Yes

Contact

Optimization

Conjugate gradients and molecular dynamics with simulated annealing

Population

Human: Whole genome (1Mb)

Hi–C contact matrix

BACH [75]

2013

Yes

R

Points

Yes

Probability

Probabilistic Modeling

Gibbs sampler with hybrid MC, and adaptive rejection sampling (ARS)

Consensus

Mouse: All chromosomes (40kb)

Hi–C contact matrix and local genomic features (restriction enzyme cutting frequencies, GC content and sequence uniqueness) as input

ChromSDE [76]

2013

 

Matlab

Points

No

Distance

Optimization

Linear and Quadratic Semi–definite programming(SDP)

Consensus

Mouse and Human: Chromosome 13 (200kb – 1Mb 40kb(chr13:21Mb–25Mb))

Hi–C contact matrix

AutoChrom3D [77]

2013

Yes

Perl

Points

Yes

Distance

Optimization

Non–linear constrained optimization

Consensus

Human: 500kb – 1MB (8kb)

Hi–C contact matrix

PASTIS [78]

2014

Yes

Python

Points

No

Distance and Probability

Optimization(MDS1, MDS2) and Probabilistic Modeling (PM1,PM2)

IPOPT – interior point filter algorithm

Consensus

Mouse: All chromosomes (100kb – 1Mb, 20kb –chr1-19)

Hi–C contact matrix

ShRec3D [79]

2014

Yes

Matlab

Points

No

Distance

Optimization

Shortest-path Floyd-Warshall algorithm

Consensus

Human: Chromosome 1 – 30Mbp region (3kb - 150kb)

Hi–C contact matrix

MOGEN [49, 80]

2014

Yes

Java

Points

No

Contact

Optimization

Gradient descent

Ensemble

Human: All chromosomes and whole genome (200kb - 1Mb)

Hi–C contact matrix

FisHiCal [81]

2014

Yes

R

Points

Yes

Distance

Optimization

SMACOF algorithm [82]

Consensus

Human: Whole genome (1Mb)

Hi–C contact matrix

InfMod3DGen [64]

2015

Yes

Matlab

Polymer

No

Distance

Probabilistic Modeling

Gradient ascent

Ensemble

Yeast: All chromosomes –12.1Mb genome (10kb)

Hi–C contact matrix

Gen3D [83]

2015

Yes

C++

Points

No

Contact

Optimization

Adaptation, Simulated annealing and Genetic algorithm

Consensus

Human: All chromosomes (1Mb)

Hi–C contact matrix

MBO [84]

2015

Yes

Matlab

Points

No

Distance

Optimization

Manopt – manifold optimization

Consensus

Mouse: Chromosome X (50kb - 600kb)

Hi–C contact matrix

TADbit [85]

2016

Yes

Python

Spheres

Yes

Distance

Optimization

Simulated Annealing and Monte Carlo Sampling

Ensemble

Drosophila Fly: 52Mb region (10kb)

Hi–C contact matrix

HSA [47]

2016

Yes

R

Points

Yes

Distance

Optimization

GLM framework with Hamiltonian dynamics with simulated annealing

Consensus

Human and Mouse: All chromosomes (25kb - 1Mb)

One or more raw contact maps or normalized Hi–C contact matrix.

Chromosome3D [46]

2016

Yes

Perl

Points

No

Distance

Optimization

Distance Geometry Simulated Annealing

Ensemble

Human: All chromosomes (500kb - 1Mb)

Hi–C contact matrix

PGS [65, 86]

2016

Yes

Python

Spheres

Yes

Probability

Probabilistic Modeling

Simulated annealing/molecular dynamics

Population

Human: Whole genome (50kb - 1Mb)

Raw Hi–C contact matrix and a TAD file in bed format

tRex [87]

2016

Yes

R

Points

Yes

Probability

Probabilistic Modeling

MCMC sampling using the Metropolis–Hastings algorithm/Gibbs sampler, Hamiltonian MCMC

Ensemble

Human: Chromosome 14 and 22 (1Mb)

Hi–C contact matrix and a vector of covariates (e.g. fragment length, GC content, and mappability score)

3D–GNOME [88, 89]

2016

Web server

C++, Javascript, PHP, Python, R

Polymer

Yes

Distance

Optimization

Monte Carlo-based simulated annealing

Consensus

Human: All chromosomes (Multiscale 1-2Mb, PET (1–10kb))

A seven or eight columns bedpe (paired–end BED format) file containing the locations and strengths of long range contact points. Use of ChIA-PET data is recommended

LorDG [69]

2016

Yes

Java

Points

No

Distance

Optimization

Gradient ascent

Ensemble

Human: All chromosomes and whole genome (500kb –1Mb)

Hi–C contact matrix

ISDHiC [90]

2016

No

C, C++, Python

Spheres

No

Distance

Probabilistic Modeling

MCMC sampling using Hamiltonian MC

Ensemble

Mouse: Chromosome X (50kb, 500kb)

Hi–C contact matrix

Chrom3D [91]

2017

Yes

Perl

Spheres

No

Contact

Optimization

Monte Carlo-Optimization using the Metropolis–Hastings algorithm with simulated annealing

Ensemble

Human: Whole genome (TAD)

Hi–C contact matrix and LAD information

miniMDS [92]

2017

Yes

Python

Points

No

Distance

Optimization

MDS approximation algorithms and Kabsch algorithm

Consensus

Human: Whole genome (10kp-100kb)

Hi–C contact matrix

3DMax [70]

2018

Yes

Java, Matlab

Points

No

Distance

Optimization

Gradient ascent

Ensemble

Human: All chromosomes (1Mb)

Hi–C contact matrix

GEM [93]

2018

Yes

Matlab

Polymer

No

Contact

Optimization

Adaptive gradient descent method

Ensemble

Human: Chromosome 13 and 14(1Mb), Chromosome 1 (250 kb: 130Mb-180Mb region ), Yeast : Chromosome 6 (10kb),

Hi–C contact matrix

GEM–FISH [94]

2018

Yes

Matlab

Polymer

No

Contact

Optimization

Gradient descent

Consensus

Human: Chromosomes 20, 21, 22, and X (TAD)

Hi–C contact matrix and FISH data

SIMBA3D [95]

2018

Yes

Python

Points

No

Probability

Probabilistic Modeling

BFGS mehtod with analytical gradient

Ensemble

Mouse: All chromosomes (100kb)

Hi–C contact matrix

ShRec3D+ [96]

2018

No

 

Points

No

Distance

Optimization

Floyd-Warshall algorithm

Consensus

Human and Mouse: All chromosomes (1Mb)

Hi–C contact matrix

EVR [97]

2018

Yes

C, Python

Points

No

Distance

Optimization

Error-Vector Resultant algorithm

Consensus

Bacteria: All chromosomes (10kb)

Hi–C contact matrix

Hierarchical3DGenome [98]

2019

Yes

Java

Points

No

Distance

Optimization

Gradient ascent and hierarchical modeling

Ensemble

Human: All chromosomes (1kb - 5kb)

Hi–C contact matrix and File containing identified TADs

  1. Each column denotes the key properties of each method. Algorithms column denotes the 3-D structure reconstruction method’s name or acronym, Year column denotes the publication year, Software Availability column denotes the availability of an open-source software for a method, Language column denotes the programming language that the software was implemented in, Structure Representation column denotes the structural representation used by a method— which could either be as polymer, spheres, or points, detailed explanation for each is provided above, In–Built Normalization column denotes if a normalization step is in-built into a method’s algorithm, Interaction Frequency Model Based column denotes a method’s class based on its input IF modelling, Methodology Based column denotes a method’s class based on its chromosome and genome 3-D reconstruction methodology, Sampling Algorithm column denotes the sampling or one of the sampling algorithms used by a method, Structure Based column denotes a method’s class based on the structure generated—consensus-based methods generate a single representative structure for the entire Hi-C dataset or for single-cell Hi-C data, ensemble-based methods generates a variety of 3-D structures while using the same input H-C data to adequately simulate the heterogeneity of the Hi-C data, population-based methods generates a population of individual 3-D structures of genome that is as a whole statistically consistent with true configuration of the input Hi-C data, Species,Coverage, and Resolution of Test 3C data column denotes the size, resolution, species of Hi-C data used in a method’s manuscript, and Input column denotes a method’s input data format. Even though these methods may be tested only on the Hi-C data of some species, most, if not all, the methods should be applicable to any Hi-C contact data of any species formatted according to the input requirement