January 12-16, 2002
Town & Country Convention Center
San Diego, CA
Bioinformatics: Databases
With the publication of the Arabidopsis genome, a milestone of plant
research marking the availability of a complete set of genomic
sequences with quality annotation has been reached. This has already
facilitated various in-depth analyses. The next step in the
post-genomic era is linking functions to genes, and integrating
various biological data into a database model that permits
computational processing. With this perspective, methods of curation,
data integration, exploration and visualisation as developed for
MAtDB will be presented as our prototype for an organism-based,
integrated biological knowledge base.
The first challenge for any genome database is keeping up with new
data. At the date of publication, over 90% of Arabidopsis genes were
predicted rather than experimentally characterized, and for 40% no EST
data was available. This situation is rapidly changing due to
large-scale EST and cDNA sequencing projects. MIPS has developped a
semi-automatic updating mechanism that will utilize this extrinsic
data to correct gene predictions. Additionally, interfaces have been
built that allow external experts to edit and comment on information
in MAtDB.
MAtDB aims to compile as much information for each individual gene as
possible. Partly this relies on the automatic protein annotation tool,
Pedant, that was developped at MIPS and that provides in-depth
analysis using all currently available and relevant bioinformatics
tools. On the other hand, experimental data and manual annotation is
also integrated. In collaboration with partners worldwide, specialized
information e.g. specific gene families is included. Recently, a
phenotype database has been set up to catalog mutant data. Tools for
handling and interpreting expression profile data are available.
Integration of various databases is necessary also at the query
level. This is achieved by using BioRS, a biological indexing, search
and retrieval system that allows queries across multiple databases,
developed by Biomax Informatics AG. Complex queries for the
uncovering of indirect data relationships (data mining) require a
sophisticated data structure that can represent connections and
relationships between easily accessible data objects. MIPS has
developped GAMS, the generic annotation management system, that
provides an object-oriented data structure and a powerful interface
for applications. This system will, for example, be used to map
Arabidopsis genes onto metabolic pathways, allowing automated pathway
elucidation.
Display of the information in easy to grasp, interactive graphical
applications is a further challenge. MAtDB offers viewers for gene
predictions, proteins, DNA sequences/clones and duplications.