P3
Sequana Therapeutics, Inc., 11099 N Torrey Pines Rd. , La Jolla CA 92037
The arabidopsis genome consortium has sequenced >5% of the genome and annotated
1500 proteins. The first 1001 proteins from the contigs submitted by three
North American and a European group were reanalyzed using tools from Sequana's
Genome DragomanTM suite of strategies for high-throughput sequence
analysis and function prediction. With appreciation of the incompleteness of
the sequence data, we estimated the level of our understanding of protein sequences
in arabidopsis. Arabidopsis proteins are highly conserved in evolution, with
>65% of predicted proteins matching the entries in public databases from taxa
other than higher plants. For >70%, a cellular role or a general biochemical
function could be predicted based on sequence similarity. Half of the proteins
belong to families of paralogs, and about one-third of those, or 15% of the
whole protein set, are plant-specific families, even though some share conserved
motifs with non-plant proteins. These ratios are emerging as invariants in proteome
analysis, as they are consistent with similar numbers obtained for complete
bacterial proteomes. They are unlikely to change as more plant sequence becomes
available, although the repertoire of paralogous families will grow. Of
interest are the results of comparison with the completely sequenced genome of a
blue-green alga, Synechocystis sp.. Among the arabidopsis proteins that
have a likely ortholog in bacteria, about 50% have a protein from
Synechocystis as the best bacterial match. A fraction of these proteins
does not have an established relation to chloroplast function and may represent
displacements of general cellular functions by endosymbiont-derived genes. This
fraction will be characterized in more detail.
Similarity analysis suggested a small number of gene mispredictions, resulting
in artifactual truncations or mergers of protein domains, as well as relatively
large number of misannotations. Several simple ways to significantly improve
annotation quality will be suggested. The high-quality function annotation is
essential for better understanding of plant biochemistry and cellular
signalling, as will be demonstrated using as the examples the epicuticular wax biosynthesis
and other pathways.