January 12-16, 2008
Town & Country Convention Center
San Diego, CA
Joann Mudge , Andrew D. Farmer , Gregory D. May
Next-generation DNA sequencing technologies including Solexa (Illumina), 454-FLX (Roche) and SOLiD (AB) generate relatively short (25-250nt) sequence reads. We examined the uniqueness of short DNA sequences in the genome and transcriptome of model/crop plants to determine the utility of next-generation sequencing technologies for variant discovery and de novo assembly. To examine the ability to align short reads uniquely to a reference, oligomers of various lengths (15-250nt) were examined in 20 JCVI Plant Transcript Assemblies (TAs) and five plant genomes assuming a perfect match. Next-generation sequencing technologies provide the added capability to generate paired-end sequence. Pairs of oligomers 20-36nt in length separated by 200-6000nt were examined in the 20 TAs and five genomes assuming exact matches. Further analyses were performed in M. truncatula to examine the effect of nucleotide variability (ie. sequencing errors or SNPs) and inexact pair spacing. One percent nucleotide variation and 10% variation in pair spacing was modeled. Finally, de novo assembly was examined in two paralogous soybean BACs together and individually. In general, oligomer uniqueness was dependent on the size and complexity of the genome or transcriptome. Paired 36mers increased oligomer uniqueness 5-19% over singletons in the genome, with larger increases correlated with more complex genomes. Uniqueness improved with increasing pair spacing. Similar results were seen in the transcriptomes. In de novo assembly, contig N50 was greatly decreased in the presence of duplicated regions. Next generation sequencing technologies are valuable for plant genomic applications, however paired reads may be necessary given recent polyploidy or segmental duplication events.