January 9-13, 2010
Town & Country Convention Center
San Diego, CA
Marco Cristancho1 , Edgar F. Salcedo1 , Alvaro Gaitan1 , Jaime Arcila1 , Herb Aldwinckle2 , Marcela Yepes2
Exponential growth of genomic information from different sequencing projects makes essential the deployment of bioinformatics tools for visualization and data analysis. To characterize the main families of genes present in the coffee genome, we compared the predicted protein sequences of coffee with those of the model organisms Arabidopsis thaliana and Populus trichocarpa. The prediction of proteins was performed from 58,343 public EST sequences from four species of Coffea, including 41,985 sequences from C. arabica. A pipeline for protein family identification for coffee was standardized, by using several bioinformatics tools such as EST scan, BLASTP and the algorithms TRIBEMCL and OrthoMCL. With this pipeline, we identified 8,588 gene families of coffee that were shared with Arabidopsis and Populus, and 4,289 families unique to the Coffea genus. We are currently analyzing these unique Coffea families. We also investigated in detail 417 families identified as genes for resistance to different pathogens. The allocation of each family member was validated by comparing the annotations to InterProScan and Blast. The families identified will also help the search for agronomically important candidate genes associated with genomic regions of interest from the identification of orthologous regions and the comparative genomic analysis of coffee with better-characterized model species. It is hoped that the identification and comparison of orthologous genes and/or paralogs present in other related species will facilitate evolutionary studies of some of the families of most interest in coffee. Details of the bioinformatics pipeline and visualization of the gene families can be found at http://bioinformatics.cenicafe.org/.