Skip to main content

Table 1 Four distance metrics used in pair-wise comparisons with MOTIFSIM

From: Performance evaluation for MOTIFSIM

Metric

Formula

Description

Ref.

Average Kullback-Leibler (AKL)

\( AKL\left(X,Y\right)=10-\frac{\sum \limits_{b=A}^T{f}_x(b)\times \mathit{\log}\frac{f_x(b)}{f_y(b)}+\sum \limits_{b=A}^T{f}_y(b)\times \mathit{\log}\frac{f_y(b)}{f_x(b)}}{2} \)

X and Y are two aligned columns of two matrices in comparison.

fx(b) is the frequency of base b ∈ {A, C, G, T}  in column X and likewise for fy(b) in column Y.

AKL(X, Y) is the similarity score at an alignment position for two columns X and Y.

21

Average Log-likelihood Ratio (ALLR)

\( ALLR=\frac{\sum \limits_{b=A}^T{n}_{bX}\times \mathit{\log}\left(\frac{f_{bY}}{p_b}\right)+\sum \limits_{b=A}^T{n}_{bY}\times \mathit{\log}\left(\frac{f_{bX}}{p_b}\right)}{\sum \limits_{b=A}^T\left({n}_{bX}+{n}_{bY}\right)} \)

nbX is the count of base b ∈ {A, C, G, T} in column X and likewise for nbY in column Y.

fb = nb/N is the frequency of base b where N is the total count of all bases in a column.

pb is the prior probability for base b.

24

Pearson Correlation Coefficient (PCC)

\( PCC\left(X,Y\right)=\frac{\sum \limits_{b=A}^T\left({X}_b-\overline{X}\right)\times \left({Y}_b-\overline{Y}\right)}{\sqrt{\sum \limits_{b=A}^T{\left({X}_b-\overline{X}\right)}^2\times \sum \limits_{b=A}^T{\left({Y}_b-\overline{Y}\right)}^2}} \)

Xb is the count of base b ∈ {A, C, G, T} in column X and likewise for Yb in column Y.

\( \overline{X} \) is the average count of bases in column X and likewise for \( \overline{Y} \) in column Y.

24

χ2 Distance

\( {\chi}^2=\sum \limits_{b=A,C,G,T}\frac{{\left({N}_{g,i}{f}_{b,i}-{N}_{f,i}{g}_{b,i}\right)}^2}{N_{f,i}{N}_{g,i}\left({f}_{b,i}+{g}_{b,i}\right)} \)

fb,  i is the entries of overlapping parts at position i in matrix f of the two matrices f and g in comparison

gb,  i is the entries of overlapping parts in matrix g

Nf, i = ∑bfb, i, and Ng, i = ∑bgb, i.

16