S14
Putative proteins encoded by contigs that have been sequenced in the course of
Arabidopsis genome project were analysed using iterative sequence similarity
search, feature prediction, and domain dissection. Arabidopsis proteins are
highly conserved in evolution, with ~70% of proteins matching sequences from
taxa other than higher plants. The annotation framework based on natural
taxonomy of proteins was applied to Arabidopsis proteins, allowing completely
automated and highly accurate assignment of function for ~35% of these matches.
For almost equal number of matches, at least general biochemical function could
be predicted by interactive analysis. Small number of exon misassemblies was
detected as well as relatively large number of misannotations (errors of
filtering, errors of domain assignment, and errors of ortholog definition).
Taxonomic distribution of the plant-related sequences in the database was
analysed. Of particular interest is the comparison to Synechocystis sp. Among
the Arabidopsis proteins that have a bacterial ortholog, about 50% have a
Synechocystis sequence as the best bacterial match. A fraction of those has no
established relation to the chloroplast function, and may represent displacement
of pivotal cellular functions by endosymbiont-derived genes. Interestingly,
high similarities between plant and Synechocystis include such quintessential
signal transduction domains as serine-threonine protein kinases and Toll-like
motifs. Protein families that distinguish plants from animals and fungi, including those
truly unique to plants, as well as dramatically expanded in plants as compared
to other phylae of life, were analyzed and will be discussed in more detail.