Alignment download
Alignment format
Alignments download:
No. Subgroup Number of introns Download
1 IA1 80 HTML | sali.gz | sali.zip | IA1.sto
2 IA2 15 HTML | sali.gz | sali.zip | IA2.sto
3 IA3 56 HTML | sali.gz | sali.zip | IA3.sto
4 IB1 42 HTML | sali.gz | sali.zip | IB1.sto
5 IB2 18 HTML | sali.gz | sali.zip | IB2.sto
6 IB3 7 HTML | sali.gz | sali.zip | IB3.sto
7 IB4 89 HTML | sali.gz | sali.zip | IB4.sto
8 IC1 837 HTML | sali.gz | sali.zip | IC1.sto
9 IC2 32 HTML | sali.gz | sali.zip | IC2.sto
10 IC3 328 HTML | sali.gz | sali.zip | IC3.sto
11 ID 17 HTML | sali.gz | sali.zip | ID.sto
12 IE1 38 HTML | sali.gz | sali.zip | IE1.sto
13 IE2 58 HTML | sali.gz | sali.zip | IE2.sto
14 IE3 111 HTML | sali.gz | sali.zip | IE3.sto
15 IE 61 HTML | sali.gz | sali.zip
The alignments of 211 introns belonging to IE major subgroup, from Li and Zhang (2005), Nucleic Acids Res., 33, 2118-2128, could be downloaded HERE.
.sto files: in stockholm format, alignments used by INFERNAL package.
sali alignment format:

The Group I intron structural alignment format, ".sali", is defined as below:

  1. Each entry has its sequence and structure, which means two lines;
  2. Non-pseudoknot base-pairings are marked by '<' and '>';
  3. Non-paired bases are marked by '.';
  4. For the two pseudoknots, P3-P7 and P1-P10, P7 and P10 are marked by '[' and ']';
  5. Gaps in the sequence and structure are both marked by '-';
  6. The insertion sequences are marked by '(number of nt)' in the sequence, and by '~' in the structure;
  7. The base which is competed by two different bases in P10 and P1' respectively, is marked by '^'.

e.g. Figure

This format was designed to keep as much structure information in the alignment. The pseudoknot structure, overlapped basepairs are all kept. Each sequence keeps its own specific structure, although the sequences in the alignment have a consensus, which could be inferred from the alignment. For example, we have converted these alignments to be in Stockholm format, which is the accpeted format of INFERNAL package. In the process of conversion, repeat or nearly identical sequences were eliminated, and long insertions in a small part of the sequences were not kept.