PAG-X  Plant, Animal & Microbe Genomes X Conference

January 12-16, 2002
Town & Country Convention Center
San Diego, CA


Poster: Genome Sequencing & ESTs
            


EXTRACTING VALUE FROM A SORGHUM EST DATABASE - DATA MINING

Marie-Michele Cordonnier-Pratt1 , Alan R. Gingle2 , Chun Liang1 , Marc Sudman1 , Victoria Wentzel1 , St. Patrick Reid1 , Manish Shah1 , Dmitri Kolychev1 , Dipinder Keer1 , Lee H. Pratt1

1 Department of Botany, University of Georgia, Athens, Georgia, 30602, USA
2 Office of the Vice President for Research, University of Geogia, Athens, Georgia, 30602, USA

Utilizing newly developed wet-lab and bioinformatics pipelines we have assembled an expressed sequence tag (EST) database containing in excess of 105,000 high quality entries. Clones to be sequenced were picked at random from ten unamplified cDNA libraries prepared in lambda ZAPII. All cDNAs were sequenced at both 3' and 5' ends. After removal of vector and sequence regions below a PHRED quality score of 16, all sequences greater than 100 nt in length were deposited in GenBank. Average length of sequences as deposited is ~450 nt; current useful read lengths are now in excess of 500 nt. All data have been assembled in a custom Oracle 8i relational database, permitting rapid and versatile mining. Routine mining is done via a web-accessible interface, while more specific queries can be executed via SQL scripts. Initial 3' clustering results provide a unigene set approaching 15,000 members in size. These same data yield an estimate of the total number of genes in sorghum of about 27,000. Each of the ten individual libraries is predicted to contain cDNAs derived from a total of no more than about 25% of the total. Both the tools used to mine this database as well as representative results will be presented. Support from the National Science Foundation (DBI-9872649, DBI-0110140) and the University of Georgia's Office of the Vice President for Research is gratefully acknowledged.


Return to Previous Page or Intl-PAG Homepage