Workshop: Bioinformatics
W02_01.html
Scientists are confronted with a rapid increase in the amount of sequence information. Across all sequenced species, nearly half of the potential genes can not be assigned specific roles. The gap between data collection and interpretation is growing. Therefore, genome databases contain a wealth of information which is accessible, but which remains hidden. Intelligent systems are needed to bridge the gap between data collection and interpretation. Much can be learned from comparing different genomes, as genomes of distant organisms still encode proteins with high sequence similarity. The order of genes in genomes may also be conserved to some extend. We have employed both these observations to create a multi-functional, computational analysis system (genomeSCOUT), which allows for rapid identification and functional characterization of genes and proteins through genome comparison. The application is based on the well established data integration system SRS. Information about different levels of protein homology (concerning e.g. paralogs, orthologs and clusters of orthologous groups, COGs) and gene order is collected and stored in five discrete databases. These databases are then queried interactively for genome comparisons. Key benefits: 1. fast handling of large genomic data sets, 2. straightforward access to a multitude of biological databases, 3. unique linking functions between these databases, 4. highly efficient collection of information on genes and proteins, and 5. fully integrated and user friendly graphical representations of search results. genomeSCOUT can be used for projects as diverse as the correct annotation of genomes, optimization of microorganisms for production, or identification of drug targets. http://www.lionbioscience.com