PAG-XVI  Plant & Animal Genomes XVI Conference

January 12-16, 2008
Town & Country Convention Center
San Diego, CA



W57 : Bioinformatics


How Syntenic Region Evolve: An Integrated Gene And Repeat Annotation Approach For Comparative Genomics

Heidrun Gundlach1 , Georg Haberer1 , Manuel Spannagl1 , Rémy Bruggmann2 , Klaus F.X. Mayer1

1  Munich Information Center for Protein Sequences (MIPS), Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany
2  Plant Genome Initiative at Rutgers (PGIR), Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA

The structural annotation of large and repeat rich genomes still poses manifold bioinformatic challenges in terms of automated and validated genetic element annotation, computing time, cross species data integration and visualization. We have established plant specific high throughput gene and repeat detection pipelines, which provide together with our generic database system plantsDB (http://mips.gsf.de/projects/plants) a comprehensive data resource for a wide range of comparative studies.
Our gene annotation pipeline combines several complementary gene detection programs, which are adapted for individual species using high quality gene training sets. The primary annotation is enriched by protein domain detection, functional assignment, expression data, the identification of tandem and segmental duplications and of syntenic regions to related species.
The predominant components of large plant genomes are repetitive elements, mainly retrotransposons, covering 30 to >80 percent of the sequence. Mobile elements create diversity and are regarded as major players in genome evolution. They degenerate rapidly and tend to insert into other elements leading to nested structures. Our repeat annotation concept is based on mips-REcat, a general repeat classification catalog, and mips-REdat, an exhaustive database of plant repeat elements. The detection layer of our ANGELA pipeline (AutomatedNestedGeneticELementAnalysis) combines intrinsic repeat detection approaches with homology based methods. The processing layer integrates gene annotations. It handles element overlaps, followed by the identification of nested structures and the timing of LTR retrotransposon insertions. The presentation will discuss our gene and repeat detection approaches along with recent results on syntenic regions of grass genomes.