1 Department of Plant Pathology and Microbiology and Crop Biotechnology Center, Texas A&M University, College Station, Texas 77843 2 Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas 77843 3 W. M. Keck Center for Informatics, Institute of Biosciences and Technology, Texas A&M University System Health Science Center, Houston, Texas 77030
The NSF Plant Genome Research Project, 'Medicago truncatula as the nodal species for comparative and functional legume genomics', is a distributed collaboration between scientists at multiple laboratories and institutions. A major goal is the generation of substantial new DNA sequence data for M. truncatula, both genomic (e.g., BAC-end, PCR-derived) and cDNA (e.g., an estimated 30,000 new EST sequences). The initial focus of the informatics group has been to provide the necessary software tools to support the flow of DNA sequence data, beginning in individual laboratories (chromatograms), to annotated sequences which reside in publicly available national (NCBI) and project databases. An important subsequent step in data analysis is to explore relationships between individual DNA sequences (and their deduced coding regions), and sequences derived from the study of other species (e.g., Arabidopsis, soybean, etc.). To this end, we work with end-users on the design of Web-based interfaces to information derived from sequence comparisons (e.g., from BLAST analysis) and stored in a relational database developed for the project in Oracle8 (MtDB). In this poster, we will review this overall data management architecture, and provide examples of the data flow, and Web-accessible views of the annotated data. This work is supported by the National Science Foundation (award #9872664) and the W. M. Keck Foundation (to LE).