PAG-XV  Plant & Animal Genomes XV Conference

January 13-17, 2007
Town & Country Convention Center
San Diego, CA



W45 : Bioinformatics


Self-Training Algorithms For Gene Prediction In Novel Plant And Animal Genomes

Mark Borodovsky

  School of Biology and Department of Biomedical Engineering Georgia Institute of Technology Atlanta, Georgia USA

Finding new protein-coding genes is one of the most important goals of plant and animal genome sequencing projects. However, reaching this goal is challenged by the fact that organization of novel genomes is diverse and gene finding tools tuned up for previously studied genomes are rarely suitable for accurate gene identification in a new eukaryotic genome.
Current ab initio methods require large training sets of validated genes for estimation of gene model parameters. Alternative gene finding methods based on cDNA and EST mapping rely on existence of abundant cDNA and EST data. In practice these types of data may be available in sufficient amount only in rather late stages of the novel genome sequencing or even much later on.
We have developed a new method that runs gene finding in parallel with estimation of parameters of the gene models from unannotated genomic DNA. The parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Dynamically changing restrictions on the range of model parameters are added to filter out fluctuations that may occur in the initial steps of the algorithm and redirect the iteration process away from the biologically relevant point in the parameter space.
Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods that use the supervised model training separately from the gene prediction. The new method was used to predict genes in several novel plant and animal genomes.


Return to the Intl-PAG home page.
For further assistance, e-mail help15@intl-pag.org