PAG-X  Plant, Animal & Microbe Genomes X Conference

January 12-16, 2002
Town & Country Convention Center
San Diego, CA


Workshop: Rice
            


A DRAFT SEQUENCE ASSEMBLY OF THE RICE GENOME (ORYZA SATIVA L. SSP. INDICA)

Jun Yu1 , Songnian Hu1 , Jun Wang1 , Gane Ka-Shu Wong1 , Siqi Liu1 , Guyang Matthew Huang1 , Ming Tao1 , Jian Wang1 , Lihuang Zhu1 , Longping Yuan1 , Huanming Yang1

1 Beijing Genomics Institute/Center of Genomics & Bioinformatics, Chinese Academy of Sciences, Beijing 101300, China; Hangzhou Genomics Institute/Institute of Bioinformatics of Zhejiang University/Key Laboratory of Bioinformatics of Zhejiang Province, Hangzhou 310007, China
2 Institute of Genetics, Chinese Academy of Sciences, Beijing 100101, China
3 National Hybrid Rice R & D Center, Changsha 410125, China
4 University of Washington Genome Center, Department of Medicine, Seattle, WA 98195, U.S.A.

The rice genome holds fundamental information in its biological power including physiology, development, genetics, and molecular evolution. To accelerate and to broaden the scope of the rice genomic and genetic research, and to determine quantitative trait loci for the vegetative growth or vigor of the Chinese hybrid cultivars from indica cultivars, we set out to sequence the rice genome. Using a whole genome shotgun approach, we produced a draft rice genome sequence of Oryza sativa L. ssp. indica, the major crop rice subspecies in China. Using a custom-designed software package (RePS, Repeat-masked Phrap with Scaffolding), we analyzed 3.6 million successful sequencing reads (from 93-11, a typical indica cultivar and the paternal cultivar of a vigorous hybrid rice, Liang-Yu-Pei-Jiu, LYP9), masked the repetitive sequences, and assembled 127,550 contigs with N50 contig size of 6,688 bp. These contigs were linked into 102,444 scaffolds with N50 scaffold size of 11,764 bp. Both N50 sizes are larger than the average rice gene (~4.5 Kb). The rice genome is estimated to be 464 Mb. Assembled contigs and scaffolds total 359 Mb and 360 Mb, respectively. Un-assembled reads have an assembled-equivalent size of 104 Mb, but 89% is repeated DNA. Functional coverage in just the assembled contigs, based on STSs, ESTs, and full-length cDNAs, is estimated to be 92%. We focused our initial analyses on genome composition dynamics and discovered: (1) Distinct from A. thaliana genome, the rice genome has undergone a major compositional transition, resulting in genes (and some introns) that are remarkably GC-rich. (2) A negative GC content gradient was found in the protein-coding region of the transcripts (confirmed also in Gramineae) of the majority of rice genes. (3) 42% of the rice genome was masked as repetitive by using exact 20-mer repeats identified by RePS. The high proportion of identifiable repetitive sequences and their high copy numbers suggest that a significant fraction of the repeats in the rice genome are recent in origin or biologically active. (4) The rice genome landscape is similar to A. thaliana and other plants in that the significant portion of their genome is intergenic, and crammed with transposable elements (TEs), in contrast to animal genomes in which most of the sequence is transcribed. This unconditional release of the draft sequence, as both raw data and assembled contigs, will be a fundamental resource for the international scientific communities to facilitate biological and genetic studies on rice.


Return to Previous Page or Intl-PAG Homepage