P4
Recent initiative of genome sequencing of plants such as Arabidopsis
involves a large scale sequencing of many overlapping genomic clones and
identification of coding regions among these millions of basepairs will be
a formidable task. The information regarding the consensus sequences
flanking the translation start codon in plants could be very useful in
identifying the putative coding regions of the unidentified genes. We have,
therefore, designed new computer programs that retrieved the information
regarding the translation start codons from thousands of plant genes and
performed a systematic analysis of nucleotide frequencies in the region
flanking the AUG codons. In this survey of 5074 plant genes for their AUG
context sequences, purines are present at the -3 and +4 positions in about
80 % of the sequences. The number of plant mRNAs with purines at the -3
position is significantly lower and at the +4 position is remarkably higher
than reported for vertebrate mRNAs. Higher plants have an AC rich consensus
sequence, caA(A/C)aAUGGCg as a context of translation initiator codon that
is similar to earlier observation (Joshi CP, NAR 15: 6643-6653, 1987).
Between the two major groups of angiosperms, the context of the AUG codon
in dicot mRNAs is aaA(A/C)aAUGGCu which is similar to the higher plant
consensus but monocot mRNAs have c(a/c)(A/G)(A/C)cAUGGCG as a consensus
which exhibits an overall similarity with the vertebrate consensus. The
experimental evidence regarding the functional importance of the AUG
context in plants will also be discussed.