Group I intron Sequence and Structure Database

Description

Home page

Search page

Sequence page

GISSD, Group I Intron Sequence and Structure Database. This "Help" page is to provides the usage of the web interface for the users, although there are some illustrations in each page.

Home page:

The home page gives an overview of GISSD and states the purpose of the project. It also presents the basic information of group I introns, including intron structure, intron catalysis, intron folding, distribution, phylogeny and mobility. There are also some selected papers on group I introns. The Labs whose research work are related to group I intron, are listed at the bottom. If you know some labs which are not in the list, please contact us, thanks.

Search page:

The search page provides users with the interface to view or search specific introns. The left part of the search page is a hierarchical tree structure of current classification of group I introns. When a major subgroup is chosen, its minor groups are all selected automatically. The user could view all the introns in the chosen groups directly or searching in it by filling the query form. The introns could be searched by intron name, organism, accession number, phylogeny, insertion position, intron length, host gene cell location, host gene type, host gene name, containing CDS or not, having 3D structure or not, and any kinds of their combination.

In the result page, hits of the query are shown by line. The order of the introns could be changed by choosing in the search form. The information for one intron includes the intron name, subgroup, organism name, insertion position, intron length, accession number, and the link to the intron sequence page (introduced in the next section), the link to its secondary structure, and the link to its GenBank record. The figure of secondary structure was automatically generated by RnaViz 2.0 and followed by manual adjustments obey the three-domain presentation of a group I intron structure. The structure file is in PDF format, one structure per file, in which two pseudoknots (P3-P7, P1-P10) are co-represented.

In some structure files whose sequences are very long, the insertion parts were not shown and were replaced by numbers with brackets in the structure files, which should be read from 5' to 3', indicating the length of the insertion sequences eliminated from original sequences.

Sequence page:

From 'Search page', you can get sequences by browsing or querying. This page provides another way to get the sequences quickly. For each subgroup, there are gzipped and zipped Fasta files containing all entries in that subgroup for download. Upstream and downstream 15nt exon sequences are also included, which are in lowercase, and intron sequences are in uppercase. They are easy to be parsed and processed by Perl script or using Bio::SeqIO module in Bioperl package (Bioperl.org)

Structure page:

Bulk download of structures are provided in this page, although you could get them by browsing or querying in the 'Search page'. The structures are packaged by subgroup, and for each subgroup, one gzipped and one zipped file are downloadable. The structures are in pdf format, which could be converted to jpg or other formats by using program like Adobe Acrobat. An example is shown at the bottom in the page.

Alignment page:

In the alignment page, the files of alignments of 15 subgroups IA1, IA2, IA3, IB1, IB2, IB3, IB4, IC1, IC2, IC3, ID, IE1, IE2, IE3 and IE could be downloaded. The numbers of introns in each alignment are also shown. Except the alignments of Li and Zhang (2005), other alignments are in a new structural alignment format defined for group I introns for the purpose of keeping maximum structure information in the alignment, whose rules are introduced at the bottom of that page.

Based on the rules, the alignments could be easily transformed to other formats, which are accepted by current programs to do other analysis, because of the clear definition. For example, the structures files fed into RnaViz 2.0 are in .ct format, which were automatically generated by using a simple script written by the authors.

Distribtution page:

The distribution page is provided to see the intron distribution in the nature, and to give another way of viewing or searching the introns in GISSD. The representation is based on NCBI Taxonomy database. Each node in the tree structure is a classification unit, containing the name, the rank and the number of introns in GISSD classified to it. In one page, the number of levels to be shown could be determined by the user, by default 4. The user could follow the link of the each node to deeper nodes. For each node, the current level to the top root is also shown to show the depth in the taxonomic tree.

The phylogeny lineage from the root to the current top node of the tree is shown above the tree. Hitting the last node in red would bring the users to see the introns belonging to it; Hitting other nodes would bring the users to go back to the upper levels.

gIRfam page:

In the page, the user could get the orignal and our processed data of group I intron family in Rfam. Three tab-delimited flat files contains all information of each intron are downloadable. Three different intron number-containing taxonomy trees are browserable. From those trees, one can know the distribution of the introns at a glance.

The 16914 reliable group I introns were classified into subgroups by using Infernal and CMs by subgroup built from our human-curated alignments. The user could get a tab-delimited flat file containing the subgroup information for each of them.

Submission page:

In the "Submission" page, there is a form for users to upload sequences which are candidate group I introns, or introns lacking secondary structures. Our knowledgeable researchers would analyze the sequence to check whether there is an intron in it, to locate its position, and to predict its secondary structure and do classification. The user's known knowledge filled in the form about the intron would support our analysis.

When the users submit, the submitted information would be sent to the researchers as well as the submitter by email. The researchers would contact the submitter you as soon as possible by e-mail. The final results would be loaded to the database and also the detail personal information of the contributors.

Any questions and suggestions are welcome!

RNA Research Group at Wuhan University
College of Life Sciences
Wuhan University
Wuhan, 430072
China

Phone: 86-27-68756207
Fax: 86-27-68751945
Email: Prof. Yi ZHANG
Web: http://www.rna.whu.edu.cn

Email to GISSD project group: rna@whu.edu.cn