PAG-XVI  Plant & Animal Genomes XVI Conference

January 12-16, 2008
Town & Country Convention Center
San Diego, CA



P6 : Genome Sequencing & ESTs


Eukaryotic Ultra Conserved Orthologs And Estimation Of Gene Capture In EST Libraries

Alexander Kozik1 , Marta Matvienko1 , Ivan Kozik1 , Hans van Leeuwen1 , Allen Van Deynze2 , Richard Michelmore1

1  The Genome Center, University of California Davis, CA 95616
2  Seed Biotechnology Center, University of California Davis, CA 95616

As a number of ESTs (expression sequence tags) for a particular species increase, estimations of gene capture based on number of assembled unigenes can be misleading due to several reasons, e.g. multiple splice variants per gene; allele variation within and between genotypes; sequencing errors; partial coverage of gene length. The current number of unigenes for Arabidopsis thaliana and Oryza sativa ESTs exceed 150,000 sequences per species. We developed a new approach to estimate gene coverage in EST libraries. The approach is based on estimation of the fraction of Ultra Conserved Orthologs (UCOs) in a library. UCOs are single copy genes that are common in eukaryotic organisms. We have identified about 300 UCOs in sequenced genomes for model plants, mammals, insects and nematodes. UCOs are represented by genes of multiple functional categories with a broad spectrum of expression levels. UCOs expression profiles are similar between species. Thus, this set of UCOs that cover approximately 1% of a eukaryotic transcriptome can serve as a control set of sequences to estimate gene coverage. An estimation of the UCO fraction upon BLAST search against a set of ESTs is a reliable method to find the depth of a particular library. Using this approach we have analyzed EST libraries from 18 species generated by the Compositae Genome Project http://compgenomics.ucdavis.edu/. EST library normalization by duplex-specific nuclease (Evrogen) dramatically increased gene discovery and reduced the depth of sequencing to 40,000 reads to achieve capture up to 75% of the transcriptome.