Table 1 

Predictor performance 

GB 
 IS 
+ IS 
Manual 



A. dehalogenans 2CPC (NC_007760) 

Total IS ORF 
1 
4 
4 
2 
Complete ORF 
 
0 
0 
0 
Partial ORF 
 
1 
1 
1 
Pseudogene 
1 
2 
2 
1 
Unknown ORF 
 
1 
1 
0 
Total IS 
 
4 
4 
2 
Different IS 
 
4 
4 
2 
Anaeromyxobacter sp. Fw109 5 (NC_009675) 

Total IS ORF 
15 
22 
24 
19 
Complete ORF 
 
4 
12 
12 
Partial ORF 
 
1 
2 
6 
Pseudogene 
1 
4 
4 
1 
Unknown ORF 
 
13 
6 
0 
Total IS 
 
20 
21 
16 
Different IS 
 
16 
17 
12 
Anaeromyxobacter sp. K (NC_011145) 

Total IS ORF 
14 
25 
28 
27 
Complete ORF 
 
12 
26 
26 
Partial ORF 
 
2 
0 
0 
Pseudogene 
 
1 
1 
1 
Unknown ORF 
 
10 
1 
0 
Total IS 
 
19 
19 
18 
Different IS 
 
10 
10 
9 
A. dehalogenans 2CP1 (NC_011891) 

Total IS ORF 
15 
33 
35 
35 
Complete ORF 
 
18 
24 
27 
Partial ORF 
 
4 
2 
3 
Pseudogene 
 
8 
8 
5 
Unknown ORF 
 
3 
1 
0 
Total IS 
 
25 
25 
23 
Different IS 
 
12 
12 
14 
A. aeolicus VF5 (NC_000918) 

Total IS ORF 
 
7 
7 
3 
Complete ORF 
 
0 
2 
2 
Partial ORF 
 
1 
1 
1 
Pseudogene 
 
0 
0 
0 
Unknown ORF 
 
6 
4 
0 
Total IS 
 
7 
7 
3 
Different IS 
 
6 
6 
2 
C. thermocellum 27405 (NC_009012) 

Total IS ORF 
75 
143 
144 
160 
Complete ORF 
 
81 
123 
125 
Partial ORF 
 
43 
11 
27 
Pseudogene 
 
7 
7 
8 
Unknown ORF 
 
12 
3 
0 
Total IS 
 
115 
115 
119 
Different IS 
 
27 
27 
26 
S. maltophilia R5513 (NC_011071) 

Total IS ORF 
11 
21 
22 
20 
Complete ORF 
 
13 
19 
19 
Partial ORF 
 
7 
1 
1 
Pseudogene 
 
1 
1 
0 
Unknown ORF 
 
0 
1 
0 
Total IS 
 
18 
19 
16 
Different IS 
 
6 
7 
4 
S. maltophilia K279a (NC_010943) 

Total IS ORF 
49 
53 
54 
57 
Complete ORF 
 
18 
45 
47 
Partial ORF 
 
27 
5 
9 
Pseudogene 
 
3 
3 
1 
Unknown ORF 
3 
5 
1 
0 
Total IS 
 
38 
39 
36 
Different IS 
 
18 
19 
18 


The table shows a comparison of IS annotations of eight bacterial genomes contained in the corresponding GenBank files (GB) with those obtained by manual annotation (Manual) and using the ISsaga predictor with two different IS reference databases. In one database (IS) the reference ISs contained in the genome under test were removed while in the other these ISs were included (+IS). The total number of ISassociated ORFs (Total IS ORF) are divided into four categories: Complete ORFs, Partial ORFs, Pseudogenes and Unknown. The category 'Unknown' includes all examples that cannot be distinguished by the predictor as complete or partial due to the absence of sufficient numbers of closely related examples in the reference database. The categories 'Total IS' and 'Different IS' are based on nucleotide predictions. In these predictions the number of ORFs carried by the IS are taken into account. For example, if an IS includes two ORFs, this will be counted as two examples in 'Complete ORF' but as a single IS in 'Total IS'. 

Varani et al. Genome Biology 2011 12:R30 doi:10.1186/gb2011123r30 