January 12-16, 2008
Town & Country Convention Center
San Diego, CA
Heidrun Gundlach1 , Georg Haberer1 , Manuel Spannagl1 , Rémy Bruggmann2 , Klaus F.X. Mayer1
The structural annotation of large and repeat rich genomes still poses manifold bioinformatic challenges in terms of automated and validated genetic element annotation, computing time, cross species data integration and visualization. We have established plant specific high throughput gene and repeat detection pipelines, which provide together with our generic database system plantsDB (http://mips.gsf.de/projects/plants) a comprehensive data resource for a wide range of comparative studies.
Our gene annotation pipeline combines several complementary gene detection programs, which are adapted for individual species using high quality gene training sets. The primary annotation is enriched by protein domain detection, functional assignment, expression data, the identification of tandem and segmental duplications and of syntenic regions to related species.
The predominant components of large plant genomes are repetitive elements, mainly retrotransposons, covering 30 to >80 percent of the sequence. Mobile elements create diversity and are regarded as major players in genome evolution. They degenerate rapidly and tend to insert into other elements leading to nested structures. Our repeat annotation concept is based on mips-REcat, a general repeat classification catalog, and mips-REdat, an exhaustive database of plant repeat elements. The detection layer of our ANGELA pipeline (AutomatedNestedGeneticELementAnalysis) combines intrinsic repeat detection approaches with homology based methods. The processing layer integrates gene annotations. It handles element overlaps, followed by the identification of nested structures and the timing of LTR retrotransposon insertions. The presentation will discuss our gene and repeat detection approaches along with recent results on syntenic regions of grass genomes.