1 The Institute for Genomic Research 9712 Medical Center Drive Rockville, MD 20850. 2 Celera Genomics Corporation, 45 West Gude Drive Rockville, MD 20850
We report the sequence of chromosome 2 (19.6 Mb) from the top arm Nucleolar Organizing Region (NOR) to the bottom arm telomere, omitting approximately 1 Mb of highly repetitive centromeric sequence, in two contigs of 3.6 Mb and 16 Mb, respectively. The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism. Chromosome 2 is 45% larger than originally predicted and encodes 4037 genes, 48% of which have no known function. Many tandem gene duplications were found, the largest consisting of 9 copies of a putative short chain dehydrogenase/reductase. Large-scale duplications of approximately 0.5 and 4.5 Mb were discovered between chromosome 2 and chromosomes 1 and between chromosomes 2 and 4, respectively. The larger of these two regions encompasses over 1000 genes, approximately 30% of which are still closely related to their partners in the duplicated region. Mapping data permitted the identification and subsequent sequencing of BACs constituting nearly 2 Mb of sequence within the genetically defined centromere. This region contains a low density of recognizable genes and a high density and diverse range of vestigial and presumably inactive mobile elements. More surprising is what appears to be a recent insertion of a continuous stretch of a large portion of the mitochondrial genome. These features suggest that the completion of the remaining chromosomes will provide further insights into the origins and organization of a higher plant genome.