PAG-X  Plant, Animal & Microbe Genomes X Conference

January 12-16, 2002
Town & Country Convention Center
San Diego, CA


Bioinformatics: Databases
             


MIPS ARABIDOPSIS THALIANA DATABASE (MAtDB): ADVANCING FROM GENOME DATA REPOSITORY TO INTEGRATED BIOLOGICAL KNOWLEDGE BASE

Heiko Schoof1 , Heidrun Gundlach1 , Stephen Rudd1 , Paolo Zaccaria1 , H. Werner Mewes1 , Klaus F.X. Mayer1

1 Institute for Bioinformatics, GSF National Research Center for Environment and Health, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany

With the publication of the Arabidopsis genome, a milestone of plant research marking the availability of a complete set of genomic sequences with quality annotation has been reached. This has already facilitated various in-depth analyses. The next step in the post-genomic era is linking functions to genes, and integrating various biological data into a database model that permits computational processing. With this perspective, methods of curation, data integration, exploration and visualisation as developed for MAtDB will be presented as our prototype for an organism-based, integrated biological knowledge base.

The first challenge for any genome database is keeping up with new data. At the date of publication, over 90% of Arabidopsis genes were predicted rather than experimentally characterized, and for 40% no EST data was available. This situation is rapidly changing due to large-scale EST and cDNA sequencing projects. MIPS has developped a semi-automatic updating mechanism that will utilize this extrinsic data to correct gene predictions. Additionally, interfaces have been built that allow external experts to edit and comment on information in MAtDB.

MAtDB aims to compile as much information for each individual gene as possible. Partly this relies on the automatic protein annotation tool, Pedant, that was developped at MIPS and that provides in-depth analysis using all currently available and relevant bioinformatics tools. On the other hand, experimental data and manual annotation is also integrated. In collaboration with partners worldwide, specialized information e.g. specific gene families is included. Recently, a phenotype database has been set up to catalog mutant data. Tools for handling and interpreting expression profile data are available.

Integration of various databases is necessary also at the query level. This is achieved by using BioRS, a biological indexing, search and retrieval system that allows queries across multiple databases, developed by Biomax Informatics AG. Complex queries for the uncovering of indirect data relationships (data mining) require a sophisticated data structure that can represent connections and relationships between easily accessible data objects. MIPS has developped GAMS, the generic annotation management system, that provides an object-oriented data structure and a powerful interface for applications. This system will, for example, be used to map Arabidopsis genes onto metabolic pathways, allowing automated pathway elucidation.

Display of the information in easy to grasp, interactive graphical applications is a further challenge. MAtDB offers viewers for gene predictions, proteins, DNA sequences/clones and duplications.


Return to Previous Page or Intl-PAG Homepage