January 15-19, 2005
Town & Country Convention Center
San Diego, CA
Stephane M. Rombauts1 , Lieven Sterck1 , Steven Robbens1 , Sven Degroeve1 , Thomas Schiex3 , Pierre Rouzé2 , Yves Van de Peer1
With the completion of the human genome, many resources have become available in terms of hardware, time and people to sequence genomes from other model organisms that are also of scientific interest. This implies that there is a wealth of genome sequence data in the pipeline that is going to be at the disposal of the scientific community.
Although raw data will be generated at a relative high pace, annotation platforms, being the first step towards exploring the genomes, need to adapt their strategies to profit from the ever-increasing amount of genome data. To this end, we have developed a series of programs that allows us to exploit the already available genome data to a large extent, based on our gene prediction platform EuGene, jointly developed at the University of Ghent and INRA-Toulouse. This platform has already been successfully applied to different plant genomes, from green algae to trees.
Currently, we are involved in the annotation of the genome of the green alga Ostreococcus tauri that, although small, needed a high degree of expert driven annotation due to many peculiarities specific to this genome and the lack of close relatives. The poplar annotation on the other hand, could be achieved in roughly 8 months, including building of training sets, training of the software, and the gene prediction itself. When a final assembly and good quality ESTs or full length cDNAs are available, we should be able to further reduce the time needed for complete genome annotation to approximately 2.5 months.