PAG-XVI  Plant & Animal Genomes XVI Conference

January 12-16, 2008
Town & Country Convention Center
San Diego, CA



P3 : Genome Sequencing & ESTs


The Long And The Short Of It: Uniqueness Of Short DNA Sequences In Plants

Joann Mudge , Andrew D. Farmer , Gregory D. May

  National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505 USA

Next-generation DNA sequencing technologies including Solexa (Illumina), 454-FLX (Roche) and SOLiD (AB) generate relatively short (25-250nt) sequence reads. We examined the uniqueness of short DNA sequences in the genome and transcriptome of model/crop plants to determine the utility of next-generation sequencing technologies for variant discovery and de novo assembly. To examine the ability to align short reads uniquely to a reference, oligomers of various lengths (15-250nt) were examined in 20 JCVI Plant Transcript Assemblies (TAs) and five plant genomes assuming a perfect match. Next-generation sequencing technologies provide the added capability to generate paired-end sequence. Pairs of oligomers 20-36nt in length separated by 200-6000nt were examined in the 20 TAs and five genomes assuming exact matches. Further analyses were performed in M. truncatula to examine the effect of nucleotide variability (ie. sequencing errors or SNPs) and inexact pair spacing. One percent nucleotide variation and 10% variation in pair spacing was modeled. Finally, de novo assembly was examined in two paralogous soybean BACs together and individually. In general, oligomer uniqueness was dependent on the size and complexity of the genome or transcriptome. Paired 36mers increased oligomer uniqueness 5-19% over singletons in the genome, with larger increases correlated with more complex genomes. Uniqueness improved with increasing pair spacing. Similar results were seen in the transcriptomes. In de novo assembly, contig N50 was greatly decreased in the presence of duplicated regions. Next generation sequencing technologies are valuable for plant genomic applications, however paired reads may be necessary given recent polyploidy or segmental duplication events.