January 12-16, 2008
Town & Country Convention Center
San Diego, CA
Takeshi Itoh , Tsuyoshi Tanaka
In 2004 the International Rice Genome Sequencing Project (IRGSP) determined the complete genome sequence of Oryza sativa L. ssp. japonica cultivar Nipponbare, and recently an updated assembly, build 4, has been released. In addition, 581,446 5'- or 3'-end sequences of rice full-length cDNA (FLcDNA) clones were produced. Using these novel data, the Rice Annotation Project (RAP) updated the annotation of the rice genome. This new annotation data set is freely available through the RAP-DB (http://rapdb.dna.affrc.go.jp/). Since a large number of 5'-end sequences were also available for Arabidopsis FLcDNAs, we decided to conduct comparative sequence analyses of transcription start sites (TSSs) between these two species. The rice end sequences obtained from full-length clones were aligned to the IRGSP build 4 assembly by blast and est2genome. Similarly, the Arabidopsis end sequences were mapped to the Arabidopsis genome. As a result, 45,917 transcription start sites (TSSs) of rice were identified in 24,211 loci, and 35,313 TSSs of Arabidopsis in 16,964 loci. Our analysis of the nucleotide features in their 5'-upsteam regions revealed that the rice and Arabidopsis genes had a clear TATA-like signal in the most upstream TSS, whereas the downstream TSSs displayed somewhat unusual patterns. We also examined the relationship between the number of TSSs and protein sequence conservation. We found that the numbers of TSSs were proportional to the sequence conservation; the more TSSs were, the more the proteins were conserved. Expression of rapidly evolving genes appear to be regulated in limited cases, but conserved genes might be expressed in diverse conditions and tissues through different promoters. Evolutionary significance of TSS biodiversity between rice and Arabidopsis is discussed.