PAG-X  Plant, Animal & Microbe Genomes X Conference

January 12-16, 2002
Town & Country Convention Center
San Diego, CA


Bioinformatics: Algorithms
             


CLUSTERING SUCEST: DEFINING AND VALIDATING ESTS CLUSTERING PIPELINES

Felipe R da Silva1 , Guilherme P Telles2 , Paulo Arruda5

1 CBMEG - UNICAMP, CxP 6010, 13083-970, Campinas - SP. Brazil
2 LBI - IC - UNICAMP, CxP 6176, 13083-970, Campinas - SP. Brazil

The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has proved to be a very fast and cost effective method to sample a large number of genes from an organism. In order to identify the genes represented by all this data and to provide additional functional, structural and evolutionary information regarding those genes, it is necessary to condense single-read sequences by means of clustering or assembly. We present here a clustering pipeline suitable for any sized EST project. The analysis performed to define clustering parameters and a thorough evaluation of under and over clustering, reliability and accurateness of the used methodology is estimated using the 300.000+ reads of the Brazilian Sugarcane EST Project - SUCEST (http://sucest.lbi.ic.unicamp.br ). Supported by FAPESP.


Return to Previous Page or Intl-PAG Homepage