January 12-16, 2008
Town & Country Convention Center
San Diego, CA
Nikki Appleby1,2 , Chris Duran1,2 , Michael Imelfort1,2 , David LA Wood3 , David Edwards1,2 , Jacqueline Batley1,2,4
Two forms of sequence based marker, Simple Sequence Repeats (SSRs), and Single Nucleotide Polymorphisms (SNPs) now predominate applications in modern genetic analysis. The mining of SNPs and SSRs from the large quantities of publicly available sequence data is the most cost effective method for marker discovery. We present the identification of candidate polymorphic SNPs and SSRs from expressed sequence data. The greatest challenge of in silico SNP discovery is the differentiation between true polymorphisms and sequence error. We have applied two methods to measure the confidence that SNPs represent true genetic variation: the redundancy of the polymorphism in an alignment, and co-segregation of SNPs with haplotype. We have linked this SNP discovery pipeline with a relational database, hosting information on the polymorphisms, plant cultivars and gene annotations, to enable efficient mining and interrogation of this data. Users may search for SNPs in genes with specific annotation, or for SNPs that differentiate between plant cultivars. Polymorphic SSRs are distinguished from monomorphic SSRs by the representation of varying motif lengths within an alignment of sequence reads. We have applied this approach for the discovery of SNPs and SSRs from 350 000 public barley Expressed Sequences, leading to the identification of over 17 000 candidate SNPs and over 10,000 unique SSRs. This resource will help accelerate genetic mapping and association studies through the availability of characterised genetic variations.