PAG-XIII  Plant & Animal Genomes XIII Conference

January 15-19, 2005
Town & Country Convention Center
San Diego, CA



P020 : Genome Sequencing & ESTs


Modeling Assembly Of The Maize Genome

Sarah Towey1 , Steve Rounsley1 , Christina Raymond1 , David Jaffe1 , Arvind K. Bharti2 , Heidrun Gundlach3 , Georg Haberer3 , Heiko Schoof3 , Cari Soderlund4 , Rod A. Wing5 , Klaus F.X. Mayer3 , Joachim Messing2 , Bruce Birren1 , Chad Nusbaum1

1  Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
2  The Plant Genome Initiative at Rutgers (PGIR), Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
3  Munich Information Center for Protein Sequences (MIPS), Institute for Bioinformatics, GSF Research Center for Environment and Health, Neuherberg, Germany
4  Arizona Genomics Computational Laboratory (AGCoL), University of Arizona, Tucson, AZ 85721, USA
5  Arizona Genomics Institute (AGI), 303 University of Arizona, Tucson, AZ 85721, USA

The repetitive nature of the maize genome poses challenges to genome sequencing. Over half of the genome consists of class I retroelements, which can be up to 14kb in size. As a result, assembly of sequences from even BAC-sized regions of the genome can be troublesome. In addition, the landscape of the maize genome is not well defined. It has been suggested that genes are clustered in islands amid larger oceans of repeat sequences. If true, this too would have consequences for genome sequencing strategies. Our goal is to explore and evaluate cost-effective strategies for generation of a high quality maize genome sequence in light of these challenges. We have sampled the maize genome by sequencing 100 randomly selected BACs, representing ~0.6% of the genome (~14.5Mb). These BACs were sequenced to very high coverage and resulting assemblies were manually edited in order to establish a high confidence set of BAC assemblies to be used as gold standards for subsequent analyses. Various assembly strategies using subsets of reads from these BACs were then evaluated. Evaluation was performed relative to the gold standard assemblies and included examination of raw sequence coverage, scaffold coverage, gene and exon coverage and cost efficiency. Using this same approach, it is also possible to study the effect of combining reads from different types of libraries including whole genome shotgun and various reduced representational libraries. These experiments serve as the basis for designing a cost efficient strategy for producing a high quality draft sequence of the maize genome.