Skip to main content

Table 2 Sixteen benchmark sequence datasets [29]. They are grouped by species. Each sequence dataset has an embedded transcription factor

From: Performance evaluation for MOTIFSIM

Sequence Dataset

Dataset Type

Species

Transcription Factor

Number of Sequences

Sequence Length

hm01g

Generic

Homo sapiens

AP-1

18

2000

hm04g

Generic

Homo sapiens

c-Jun

13

2000

hm08m

Markov

Homo sapiens

CREB

15

500

hm15g

Generic

Homo sapiens

NF-1

4

2000

hm17g

Generic

Homo sapiens

NF-kappaB

11

500

hm19g

Generic

Homo sapiens

Sp1

5

500

hm22g

Generic

Homo sapiens

USF1

6

500

hm22m

Markov

Homo sapiens

USF1

6

500

mus04m

Markov

Mus musculus

C/Ebalpha

7

1000

mus06g

Generic

Mus musculus

GATA-1

3

500

mus10g

Generic

Mus musculus

Sp1

13

1000

mus11m

Markov

Mus musculus

Sp1

12

500

yst02g

Generic

Saccharomyces cerevisiae

GAL04

4

500

yst03m

Markov

Saccharomyces cerevisiae

GCN4

8

500

yst06g

Generic

Saccharomyces cerevisiae

MCM1

7

500

yst09g

Generic

Saccharomyces cerevisiae

CAR1

16

1000