January 10-14, 2009
Town & Country Convention Center
San Diego, CA
Alex V Barbosa3 , Raphael MOB Sales3 , Alan C Andrade1 , Felipe R da Silva1, 2
EST Genome Projects are a relatively inexpensive way to describe genes. Effectively finding genes of interest, however, is complicated by the overwhelming amount of partial and redundant sequence data generated in such projects. We present here the Web interface to the EST sequence database maintained at Embrapa Recursos Genéticos e Biotecnologia. The database and the interface were originally developed to support the Brazilian Coffee Genome EST project , and latter incorporated Coffea canephora EST data contributed by Cornell University (59,721 raw EST sequences, Lin et al., 2005) and Institut de Recherche pour le Développement IRD (8,782 raw EST sequences, Poncet et al., 2006). It is constantly updated as more data becomes publically available and can be freely accessed at https://alanine.cenargen.embrapa.br/CoffEST. The CoffEST is build by processing the raw chromatogram data and assembling highly similar sequences on UniGenes, as described by Telles and da Silva (2001). UniGene sequences are then compared with all the nucleotide and protein sequences available at GenBank. All data is stored on a PostgreSQL relational database. Searches on this database can be performed based on homology or origin. Homology searches include (i) sequence blast, (ii) Boolean keywords on pre-computed blast results or (iii) browsing the phylogenetic tree of similarities found on GenBank. Origin searches can be as simple as retrieve a sequence by its name (read or UniGene) or the library it came from. More elaborated searches allow one to find UniGenes exclusively, preferentially or differentially expressed on one library or set of libraries.