P2
In this report we present the analysis of our efforts thus
far in sequencing the Arabidopsis thaliana genome. These
efforts are directed toward the short arm of chromosome IV
and a central region of chromosome V and a
comprehensive report is accessible by WWW. The genome is
relatively gene-rich, with a gene occurring every 5.2 kbp.
Approximately 90% of the genes identified are newly described
for Arabidopsis, and of these, about 55% are newly described
for plants. In addition, about 35% of the genes constructed
with gene prediction algorithms have EST or cDNA matches.
Between 60-65% of the identified genes exhibit matches to
known sequences in the public databases, while 35-40% of the
genes are labeled as hypothetical: They either have no
database match or exhibit similarity to hypothetical proteins
from other genome sequencing efforts (notably, S. cerevisiae
and C. elegans). Genome sequencing has also revealed a
clustering of genes such as disease resistance genes (both of
the LRR and MLO types) and protein kinases. Once the genome
sequence is complete, it will become necessary to systematically
study each of the identified genes. Transposon-directed
insertional mutagenesis is a powerful tool for these
functional genomic studies. As a prelude to this post-genome
research, we are expending effort to identify putative
transposable elements and other genetic repeats, such as LINE
elements and solo LTRs, and these results indicate that the
Arabidopsis genome is more complex than originally
thought.