January 12-16, 2008
Town & Country Convention Center
San Diego, CA
Danielle M. Bowen , Zhi-Liang Hu , Zhi-Qiang Du , Max F. Rothschild
A new computer program, SNPidentifier, was developed for detecting single nucleotide polymorphisms (SNPs) from expressed sequence tags (ESTs) where no chromatogram files are available for determining sequence quality. From 25,937 Litopenaeus vannamei (Pacific White Shrimp) EST sequences, 22,684 were deemed usable, then masked for tandem repeats, and inputted into Cap3 for clustering. Cap3 created 3,532 contigs incorporating 8,628 sequences with 5,096 singlets remaining unclustered. Contigs were inputted into SNPidentifier, which computed minor allele frequency, number of reliable sequences at each base position, and a corresponding score for each predicted SNP. SNPidentifier performed quality control checks before including EST sequences in SNP calculations. These included not allowing the frequency of poor quality nucleotides (Ns) in a neighboring region to exceed 0.1, requiring 15 bases on either side of the SNP to exactly match the consensus sequence, requiring at least 4 contributing sequences to contain the minor SNP allele and its frequency be at least 0.1, and not including the first 10 bases of any sequence in the SNP prediction. These parameters are simple to alter if a particular minimum minor allele frequency instead of the default score threshold is desired. Using a conservative set of parameters, 505 SNPs were predicted from 141 contigs. To date, 17/35 SNPs (49%) have been confirmed in 18 individuals from 3 lines. For this sample, SNP prediction accuracy did not depend on minor allele frequency or the number of contributing sequences.
Danielle Bowen is supported by a USDA-CSREES National Needs fellowship under Grant no. 2007-38420-17767.