Workshop: Arabidopsis
W11_05.html
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study for many researchers around the world. In addition to the large body of genetic, physiological, and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced at the end of year 2000. The sequencing effort was coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for the 250,000 species of higher plants. In order to maximize the use of knowledge gathered for this plant, there is a need for a comprehensive database, information retrieval and analysis system that will provide user-friendly access to all information about the organism. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org). The basic structure of the database includes relationships among data objects (clones, genes, sequences, genetic markers, polymorphisms, transcripts, etc.), annotation (function, map position, expression, etc.), and attribution (source of data, update history, and references) of the data objects. The underlying database uses an industry standard relational database management system (Sybase). The basic functionality of data access includes searching, browsing, updating, and downloading the data. In addition, we developed tools to graphically visualize comprehensive map information for Arabidopsis (TAIR MapViewer: www.arabidopsis.org/servlets/mapper) and sequence annotation of the completed genome (TAIR SequenceViewer) using Java Servlet technology. The major thrust in the upcoming year is to: 1) add richness to the database by populating it with curated data, currently in substantially variable formats and sources; 2) reiterate the process of database and user interface development to include the new types of data coming from functional genomics projects; and 3) improve ways of accessing, visualizing, and analyzing increasingly complex data types in more integrated ways. Toward these ends, we are developing guidelines for nomenclature and controlled vocabulary to name and describe data objects. In this respect, we are working closely with other databases, including Arabidopsis Functional Genomics Consortium (AFGC), Arabidopsis Biological Resource Center (ABRC), The Institute for Genome Research (TIGR), Gene Ontology (GO) consortium, and MaizeDB.