Group I intron Sequence and Structure Database

GISSD introduction

The need of GISSD

Intron structure

Intron catalysis

Intron folding

Distribution
Phylogeny
Mobility

Selected references

Labs and links

GISSD introduction:

Group I intron sequence and structure Database (GISSD) is a specialized and comprehensive database for group I introns, focusing on integrating useful group I intron information from all available databases. GISSD also wants to provide de novo data essential for understanding group I introns at a systematic level. It aims to provide a consensus structure for each subgroup of group I introns based on high quality alignments, to judge the confidence of the group I introns annotated by Rfam (http://www.sanger.ac.uk/Software/Rfam/), to classify Rfam group I introns into subgroups based on the consensus structures, and to provide intron number-containing taxonomy tree based on the taxonomy information of the host organisms of all group I introns.

Currently, GISSD presents 1789 intron complete records, including the nucleotide sequence of each annotated intron plus 15 nt of the upstream and downstream exons, as well as the pseudoknots-containing secondary structure predicted by integrating comparative sequence analysis and minimal free energy algorithms. These introns represent all 13 known minor subgroups and an undifferentiated major subgroup, with their structure-based alignments being separately provided. Both structure predictions and alignments were done manually and iteratively adjusted, which yielded reliable consensus structure for each subgroup allowing us to judge the confidence of 20,085 group I introns previously predicted by INFERNAL program (http://infernal.janelia.org/), and to classify these large amount of introns into subgroups automatically. The database provides the intron-associated taxonomy information from GenBank, allowing one to detailedly view the distribution of all group I introns. CDSs residing in introns and 3-D structure information are also integrated if available. A total number of 16914 group I introns were validated, with 95.5% of them being classified into IC3 subgroup and 96.4% residing in viridiplantae, suggesting that the major reservoir of group I introns in nature is the chloroplast tRNALeu genes.

The advantages of GISSD

Providing group I intron sequences
In the CRW site (Cannone, et al. 2002), although the GenBank Accession Number of the gene that contains individual intron is given, no intron sequence is provided. Sometimes, the intron annotation in the corresponding GenBank records is not available or clear. To systematically analyze the phylogeny and structure of group I introns requires the immediate availability of each intron sequence. The 5' and 3' exon sequences adjacent to the intron are required to fold the P1 and P10 structures of the intron. To meet this requirement, GISSD assigns each sequence to each intron and also provides 15 nt of the adjacent upstream and downstream exons of the intron.
Providing reliabe structures and alignments of group I introns
Group I introns are highly diverse in their sizes ranging from less than 200 nt to over 4000 nt. These introns are also featured by weak sequence similarity and contain two long-range pseudoknots. These features make it difficult for current computational programs to automatedly generate the reliable secondary structure and alignments of group I introns. CITRON (Lisacek et al., 1994) and Infernal (Griffiths-Jones et al., 2003) have been successfully used to identify group I introns from genome sequences. However, the sensitivity and specificity of both programs are not satisfied for introns whose characteristics are not integrated into the program or not included in the training dataset. Furthermore, identification of an intron doesn't mean that the structure of the intron is well determined. We have used manual methods to provide reliable detailed secondary structures and alignments for near 1800 group I introns belonging to 13 subgroups, and made each of them available on GISSD. These data could be used to build better models or as a more complete training datasets to improve the future search of new group I introns or prediction of the intron structures.

In general, phylogenetic trees of group I introns are inferred by inputting the core region alignments to phylogeny programs. The reliability of the alignment would affect the results and reasoning of the phylogeny analysis. On the other hand, structure comparison can reveal non-canonical structural interactions. Reliable secondary structure for all the subgroups also allows depicting the structural function of peripheral elements.

We deduced a consensus structure for each of the 14 subgroups according to the alignments of the 1789 group I introns. The consensus structures were then processed by program 'cmbuild' in INFERNAL package and we got 14 Covariance Models (CM is a type of profile stochastic context-free grammar (profile SCFG)) (Eddy and Durbin, 2002). These CMs could be used to automatically search group I introns, to predict secondary structures (rough structures needed to be processed and validated), and to classify introns into subgroups.
Providing intron number-containing taxonomy tree
In the Distribution page, an intron number-containing taxonomy tree allows a user to display the desired level of taxonomy nodes. In each taxonomy node, the level, taxonomy name and rank, and the number of introns are shown. The user could view the distribution of introns easily.
Providing ORF information
Homing endonuclease genes (HEGs) that invade non-critical regions (i.e. terminal loops) of group I intron promote intron mobility by encoding highly site-specific homing endonucleases (HEs) (Haugen et al., 2005). HEGs reside in introns have relation with the mobility intron insertion, splicing and spread. Annotations of the ORFs in the introns would benefit the study of the history/origin of group I introns.

Intron structure:

Group I introns are widespread non-coding RNA sequences found in nuclear, chloroplast, and mitochondrial genomes of eukaryotes, in bacterial archaebacterial and eubacterial genomes, and even in some viral genomes. They are well known for self-splicing of their own from the host precursor RNA via two transesterification reactions, and therefore called group I ribozymes (Cech, 1990). Comparative sequence analysis reveals a common secondary structure of all group I introns that consists conserved base paired elements designed as P1-P9 (Burke, et al., 1987; Michel and Westhof, 1990; Li and Zhang, 2005). However, but P2 was latter found not present in some group I introns, and P10 containing the 3' splice site was found to be a quite conserved structure (Michel and Westhof, 1990; Li and Zhang, 2005).

Structural and Biochemical study has revealed that the active structure of group I ribozyme is assembled by two separable domains, i.e. P4-P6 domain containing P4, P5 and P6 and P3-P9 domain containing P3, P7, P8 and P9 (Michel and Westhof, 1990; Tanner, et al., 1997a and 1997b; Golden, et al., 1998; Woodson 2005). In addition to these conserved core structural helices, group I introns have at least one additional peripheral base-paired structure, such as P2.1, P5abc, P9.1 and P9.2 (Michel and Westhof, 1990). Peripheral elements establish a variety of tertiary interactions that play important and sometimes essential roles in organizing the P4-P6 and P3-P9 domains to the compact active structure (Doherty and Doudna, 2001; Xiao et al., 2005).

The atomic structures of three group I introns have been resolved recently (Adams et al., 2004; Guo et al., 2004 and Golden et al., 2005), which provide tremendous information to understand the catalytic mechanisms of group I introns and the roles of metal ions (Stahley and Strobel, 2005; Stahley and Strobel, 2006). The crystal structures also provide insights to how intron-specific set of long-range interactions established by peripheral interactions contribute to stabilize the core structure (Vicens and Cech, 2006).

Catalysis:

Splicing of group I introns is processed by two sequential ester-transfer reactions. The exogenous guanosine or guanosine nucleotide (exoG) first docks onto the active G-binding site located in P7, and its 3'-OH is aligned to attack the phosphorester bond at the 5' splice site located in P1, resulting in a free 3'-OH group at the upstream exon and the exoG being attached to the 5' end of the intron. Then the terminal G (omega G) of the intron swaps the exoG and occupy the G-binding site to organize the second ester-transfer reaction, the 3' OH group of the upstream exon in P1 is aligned to attacks the 3' splice site in P10, leading to the ligation the adjacent upstream and downstream exons and free of the catalytic intron. Following its excision from the pre-rRNA, group I intron undergoes an intramolecular cyclization reaction. This reaction is also self-catalyzed by transesterification, with the 3'-terminal G-OH of the RNA attacking of phosphorus atom located near the 5' end of the molecule. Both the 5' and 3' splice site phosphodiester bonds of group I intron precursor are unusually susceptible to slow hydrolysis, producing 5'-phosphate and 3'-hydrolysis termini. Site specific hydrolysis is thought to reflect the ability of the folded RNA structure to activate the splice-site phosphates or the incoming nucleophile, using the catalytic mechanism similar to that of self-splicing (Cech, 1990).

Two-metal-ion mechanism seen in protein polymerases and phosphatases was proposed to be used by group I and group II introns to process the phosphoryl transfer reactions (Steitz and Steitz, 1993), which was unambiguously proven by a recently resolved high-resolution structure of the Azoarcus group I intron (Stahley and Strobel, 2006).

Intron folding:

Since early 1990s, the community started to study how group I intron achieves its native structure in vitro, and some mechanisms of RNA folding has been appreciated thus far. It is agreed that the tertiary structure is folded after the formation of the secondary structure (Brion and Westhof, 1997). During folding, RNA molecules are rapidly populated into different folding intermediates, the intermediates containing native interactions are further folded into the native structure through a fast folding pathway, while those containing non-native interactions are trapped metastable or stable non-native conformations, and the process of conversion to the native structure occurs very slowly (Thirumalai et al., 2001). It is evident that group I introns differing in the set of peripheral element display different potentials in entering the fast folding pathway. Meanwhile, cooperative assembly of the tertiary structure is important for folding of the native structure (Treiber and Williamson, 2001; Xiao et al., 2003; Rangan et al., 2003; Chauhan et al., 2005). Nevertheless, folding of group I introns in vitro encounters both thermodynamic and kinetic challenges (Treiber and Williamson, 1999; 2001). A few RNA binding proteins and chaperones have been shown to promote the folding of group I introns in vitro and in bacteria by stabilizing the native intermediates or structure, and by destabilizing the non-native structures, respectively (reviewed by Schroeder et al., 2004).

Distribution, phylogeny and mobility

Group I introns are distributed in bacteria, lower eukaryotes and higher plants. However, their occurence in bacteria seems to be more sporadic than in lower eukaryotes, and they become prevalent in higher plants. The genes that group I introns interrupt differ significantly: They interrupte rRNA, mRNA and tRNA genes in bacterial genomes, as well as in mitochondrial and chloroplast genomes of lower ukaryotes, but only invade rRNA genes in the nuclear genome of lower eukaryotes. In higher plants, these introns seem to be restricted to a few tRNA and mRNA genes of the chloroplasts and mitochondria. Both intron-early and intron-late theories have found evidences in explaining the origin of group I introns (Haugen et al., 2005). Some group I introns encode homing endonuclease (HEG), which catalyzes intron mobility. It is proposed that HEGs move the intron from one location to another, from one organism to another and thus account for the wide spreading of the selfish group I introns. It is true that no biological role has been identified for group I introns thus far except for splicing of themselves from the precursor to prevent the death of the host that they live by. A small number of group I introns are also found to encode a class of proteins called maturases that facilitate the intron splicing (Lambowitz et al., 1999).

Selected references:

Adams P.L., Stahley M.R., Kosek A.B., Wang J., Strobel S.A. (2004) Crystal structure of a self-splicing group I intron with both exons. Nature, 430, 45-50.
Brion P., Westhof E. (1997) Hierarchy and dynamics of RNA folding. Annu. Rev. Biophys. Biomol. Struct., 26, 113-37.
Burke, J.M., Belfort, M., Cech, T.R., Davies, R.W., Schweyen, R.J., Shub, D.A., Szostak, J.W., Tabak, H.F. (1987) Structural conventions for group I introns. Nucleic Acids Res., 15, 7217-7221.
Cannone, J.J., Subramanian, S., Schnare, M.N., Collett, J.R., D'Souza, L.M., Du, Y., Feng, B., Lin, N., Madabusi, L.V., Muller, K.M., Pande N., Shang Z., Yu N., Gutell R.R. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics, 3, 2.
Cech, T.R. (1990) Self-splicing of group I introns. Annu. Rev. Biochem., 59, 543-568.
Chauhan S., Caliskan G., Briber R.M., Perez-Salas U., Rangan P., Thirumalai D., Woodson S.A. (2005) RNA tertiary interactions mediate native collapse of a bacterial group I ribozyme. J. Mol. Biol., 353, 1199-1209.
Doherty E.A., Doudna J.A. (2001) Ribozyme structures and mechanisms. Annu. Rev. Biophys. Biomol. Struct., 30, 457-475.
Eddy, S. R. (2002). A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics, 3:18.
Golden B.L., Gooding A.R., Podell E.R., Cech T.R. (1998) A preorganized active site in the crystal structure of the Tetrahymena ribozyme. Science., 282, 259-264.
Golden B.L., Kim H., Chase E. (2005) Crystal structure of a phage Twort group I ribozyme-product complex. Nat. Struct. Mol. Biol., 12, 82-89.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. (2003) Rfam: an RNA family database. Nucleic Acids Res., 31(1), 439-441.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res,33, D121-124
Guo F., Gooding A.R., Cech T.R. (2004) Structure of the Tetrahymena ribozyme: base triple sandwich and metal ion at the active site. Mol. Cell., 16, 351-362.
Haugen P., Simon D.M. and Bhattacharya D. (2005) The natural history of group I introns TRENDS in Genetics 21, 111-119.
Johansen S. and Haugen P. (2001) A new nomenclature of group I introns in ribosomal DNA. RNA, 7, 935-936.
Lambowitz A.M., Caprara M.G., Zimmerly S., Perlman P.S. (1999) Group I and II ribozymes as RNPs: Clues from the past and guides to the future. In: Gesteland R, Cech TR, Atkins J, eds. The RNA World (2nd ed.). Cold Spring Harbor Laboratory Press. pp. 451-485.
Lang B.F., Laforest M.J., Burger G. (2007) Mitochondrial introns: a critical view. Trends Genet., 23, 119-125.
Li, Z.J. and Zhang Y. (2005) Predicting the secondary structures and tertiary interactions of 211 group I introns in IE subgroup. Nucleic Acids Res., 33, 2118-2128.
Lisacek F., Diaz Y. and Michel F. (1994) Automatic identification of group I intron cores in genomic DNA sequences, J. Mol. Biol., 235, 1206-1217.
Michel, F. and Westhof, E. (1990) Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. J. Mol. Biol., 216, 585-610.
Rangan, P., Masquida, B., Westhof E., Woodson S.A. (2003) Assembly of core helices and rapid tertiary folding of a small bacterial group I ribozyme. Proc Natl Acad Sci USA., 100, 1574-1579.
Schroeder R., Barta A., Semrad K. (2004) Strategies for RNA folding and assembly. Nat. Rev. Mol. Cell Biol., 5, 908-919.
Stahley M.R., Strobel S.A. (2006). RNA splicing: group I intron crystal structures reveal the basis of splice site selection and metal ion catalysis. Curr Opin Struct Biol., 16, 319-326.
Stahley M.R., Strobel S.A. (2005) Structural evidence for a two-metal-ion mechanism of group I intron splicing. Science., 309, 1587-1590.
Steitz T.A., Steitz J.A. (1993) A general two-metal-ion mechanism for catalytic RNA. Proc. Natl. Acad. Sci. USA., 90, 6498-6502.
Tanner M.A., Cech T.R. (1997) Joining the two domains of a group I ribozyme to form the catalytic core. Science., 275, 847-849.
Tanner M.A., Anderson E.M., Gutell R.R., Cech T.R.(1997) Mutagenesis and comparative sequence analysis of a base triple joining the two domains of group I ribozymes. RNA., 3, 1037-1051.
Thirumalai D., Lee N., Woodson S.A., Klimov D. (2001) Early events in RNA folding. Annu. Rev. Phys. Chem., 52, 751-762.
Treiber D.K., Williamson J.R. (2001) Beyond kinetic traps in RNA folding. Curr. Opin. Struct. Biol., 11, 309-314.
Treiber D.K., Williamson J.R. (1999) Exposing the kinetic traps in RNA folding. Curr. Opin. Struct. Biol., 9, 339-345.
Vicens Q., Cech T.R. (2006) Atomic level architecture of group I introns revealed. Trends Biochem Sci., 31, 41-51.
Woodson S.A. (2005) Metal ions and RNA folding: a highly charged topic with a dynamic future. Curr Opin Chem Biol., 9, 104-109.
Xiao M., Leibowitz M.J., Zhang Y. (2003) Concerted folding of a Candida ribozyme into the catalytically active structure posterior to a rapid RNA compaction. Nucleic Acids Res., 31, 3901-3908.
Xiao M, Li T, Yuan X, Shang Y, Wang F, Chen S, Zhang Y. (2005) A peripheral element assembles the compact core structure essential for group I intron self-splicing. Nucleic Acids Res., 33, 4602-4611.

Labs and links:

The Comparative RNA Website	Gutell Lab CRW Site
The RNA Families database	Group I Intron
Thomas Cech (catalysis and structure)	Lab homepage
Eric Westhof (structure)	Lab homepage
Scott Strobel (structure)	Lab homepage
Barbara Golden (structure)	Lab homepage
Jennifer A. Doudna (structure)	Lab homepage
Sarah Woodson (folding)	Lab homepage
Daniel Herschlag (folding)	Lab homepage
Yi Zhang (folding and structure)	Lab homepage
Steinar Johansen (phylogeny and distribution)	Lab homepage
Debashish Bhattacharya (intron origin and evolution)	Lab homepage