January 12-16, 2002
Town & Country Convention Center
San Diego, CA
Workshop: Rice
The rice genome holds fundamental information in its biological power
including physiology, development, genetics, and molecular evolution. To
accelerate and to broaden the scope of the rice genomic and genetic
research, and to determine quantitative trait loci for the vegetative
growth or vigor of the Chinese hybrid cultivars from indica cultivars, we
set out to sequence the rice genome.
Using a whole genome shotgun approach, we produced a draft rice genome
sequence of Oryza sativa L. ssp. indica, the major crop rice subspecies in
China. Using a custom-designed software package (RePS, Repeat-masked Phrap
with Scaffolding), we analyzed 3.6 million successful sequencing reads
(from 93-11, a typical indica cultivar and the paternal cultivar of a
vigorous hybrid rice, Liang-Yu-Pei-Jiu, LYP9), masked the repetitive
sequences, and assembled 127,550 contigs with N50 contig size of 6,688 bp.
These contigs were linked into 102,444 scaffolds with N50 scaffold size of
11,764 bp. Both N50 sizes are larger than the average rice gene (~4.5 Kb).
The rice genome is estimated to be 464 Mb. Assembled contigs and scaffolds
total 359 Mb and 360 Mb, respectively. Un-assembled reads have an
assembled-equivalent size of 104 Mb, but 89% is repeated DNA. Functional
coverage in just the assembled contigs, based on STSs, ESTs, and
full-length cDNAs, is estimated to be 92%.
We focused our initial analyses on genome composition dynamics and
discovered: (1) Distinct from A. thaliana genome, the rice genome has
undergone a major compositional transition, resulting in genes (and some
introns) that are remarkably GC-rich. (2) A negative GC content gradient
was found in the protein-coding region of the transcripts (confirmed also
in Gramineae) of the majority of rice genes. (3) 42% of the rice genome
was masked as repetitive by using exact 20-mer repeats identified by RePS.
The high proportion of identifiable repetitive sequences and their high
copy numbers suggest that a significant fraction of the repeats in the
rice genome are recent in origin or biologically active. (4) The rice
genome landscape is similar to A. thaliana and other plants in that the
significant portion of their genome is intergenic, and crammed with
transposable elements (TEs), in contrast to animal genomes in which most
of the sequence is transcribed.
This unconditional release of the draft sequence, as both raw data and
assembled contigs, will be a fundamental resource for the international
scientific communities to facilitate biological and genetic studies on rice.