PAG-XV  Plant & Animal Genomes XV Conference

January 13-17, 2007
Town & Country Convention Center
San Diego, CA



W42 : Bioinformatics


Automated Eukaryotic Gene Structure Annotation By Combining Diverse Evidences And Leveraging Transcript Alignments

Brian J. Haas , Wei Zhu , Jennifer R. Wortman , C. Robin Buell

  The Institute for Genomic Research 9712 Medical Center Dr. Rockville, MD. 20850 USA

Comprehensive gene identification in complete eukaryotic genomes utilizes a combination of bioinformatics methods, relying primarily on sequence homology detection and ab initio gene prediction. Often, the gene structures inferred from the various evidence sources disagree, and methods are needed to resolve discrepancies and to provide the single best gene structure at each locus. Two software tools, PASA and EVM, were developed at TIGR to facilitate accurate and automated gene structure annotation by evaluating sources of evidence for gene structures and modeling genes accordingly. EVM (aka Evidence Modeler) constructs a weighted consensus gene structure using introns and exons derived from both ab initio predictions and sequence alignments. PASA (Program to Assemble Spliced Alignments, http://pasa.sf.net) incorporates nearly perfect cognate cDNA alignments into gene structures to yield untranslated regions (UTRs), exon modifications, and to model alternative splicing isoforms. The algorithms underlying EVM and PASA are described, and applications of these tools are exemplified by our automated annotation of the complete rice genome.