PAG-XV  Plant & Animal Genomes XV Conference

January 13-17, 2007
Town & Country Convention Center
San Diego, CA



W46 : Bioinformatics


Gnomon – NCBI Gene Prediction Tool For Eukaryotic Genomes

Alexander Souvorov

  National Center for Biotechnology Information, NIH 45 Center Drive, Bethesda, MD 20892-6510

A new gene prediction tool for eukaryotic genome annotation, Gnomon, has been developed at NCBI that tries to encompass all the available experimental data. The core algorithm is based on GENSCAN which uses a 3-periodic fifth-order Hidden Markov Model for the coding propensity score, and incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. NCBI gene prediction method is a combination of homology searching and ab initio modeling. The use of ab initio is threefold: a) ab initio scores are used to evaluate pre-calculated alignments and to locate the optimal coding regions within the alignments, b) in case of partial alignment ab initio prediction is used to extend the alignment, and c) in the absence of any experimental information pure ab initio prediction is used. The genome annotation pipeline starts with the collecting of the homology search data set, which consists of all transcript data from the organism of interest and sometimes from the closely related organisms and protein target set. The protein target set usually includes all available protein sequences from the analyzed organism and several sets of known proteins from well studied complete genomes. The collected sequences are aligned to the set of genomic sequences using BLAST programs. There are several other programs also developed at NCBI that are involved in the process. A program called Compart analyzes BLAST hits to find the approximate positions of the target sequences on the genome – compartments. For each compartment a global spliced alignment is generated using Splign for transcript sequences and ProSplign for proteins. The alignments are fed into Chainer, a program that combines partial alignments in hopefully full length or at least longer chains. Finally, Gnomon decides if the chains are full length models and extends the chains if needed.


Return to the Intl-PAG home page.
For further assistance, e-mail help15@intl-pag.org