1 Department of Plant Pathology, 495 Borlaug Hall, University of Minnesota, St. Paul, MN, USA 2 National Center for Genome Resources, 1800-A Old Pecos Trail, Santa Fe, NM 87505 USA
More than 150,000 plant nucleotide sequences have been deposited in public databases as of September, 1998. With these sequences as a starting point, it was possible to examine a large number of plant resistance genes (R-genes) and R-gene-like sequences by computer, without waiting for the laboratory isolation and phenotypic description of additional R-genes. All of the public sequences showing similarity to plant R-genes could be electronically harvested and a thematic database of R-genes created and updated automatically. These sequences could then be quantified, queried, and visualized to reveal novel features of R-genes. We carried out a computational analysis of more than 570 publicly available R-gene and R-gene-like sequences using a combination of physical genomic, motif structure, and phylogenetic tools. In the process, we discovered that there are two distinct types of nucleotide binding site domains among R-genes and that the relative abundance of different R-gene classes differs among plant taxa, including Arabidopsis and Oryza. The analysis enabled us to make predictions about the number of R-gene-like sequences and R-gene clusters that will eventually be discovered in Arabidopsis. Finally, we were able to use the results of our computer analysis to develop testable hypotheses that can now be pursued in the laboratory.