January 15-19, 2011
Town & Country Convention Center
San Diego, CA
Jill L Wegrzyn1 , Ben Figueroa1 , John Yu1 , Minyoung Choi1 , Andrew J Eckert2 , David B Neale1
The Dendrome project provides custom informatics tools and databases to manage the flood of information resulting from high-throughput genomics projects in forest trees from sample collection to downstream analysis. This resource is further enhanced with systems that are well connected with federated databases, automated data flows, machine learning analysis, standardized annotations and quality control processes. A sample tracking system now sits at the forefront of most large-scale projects. Barcode identifiers assigned to the trees during sample collection are maintained in the database to identify an individual through DNA extraction, resequencing, genotyping and phenotyping. Emerging technologies have been applied to integrate a solution for high-throughput SNP discovery in non-model organisms. The Pine Sequence Alignment and SNP Identification Pipeline (PineSAP) identifies SNPs from both Sanger and 454 sequencing that reflect true genetic variation. The supporting TreeGenes database contains ten curated modules that support the storage of data and provide the foundation for web-based searches and visualization tools. DiversiTree, an extensive user-friendly desktop-style interface, queries the TreeGenes database and is designed for bulk data retrieval. It provides the community with access to a multitude of data types including ESTs, primers, tracefiles, SNPs, individual tree data, genotypes and phenotypes. Recent developments have focused on the storage, annotation, and distribution of next-generation transcriptome experiments underway in the community. The variety of outputs available allows users to perform high-resolution dissection of traits and relate molecular diversity to functional variation. For downstream analysis, DNA Sequence Analysis and Manipulation (DnaSAM), was developed to address the challenges of data manipulation, summary statistic estimation and statistical hypothesis testing. This program is capable of performing a large number of standard and newly designed tests of neutrality for multiple sequence alignments of resequenced loci. The combined resources of the Dendrome project serve as a powerful knowledge environment for genotype-phenotype information resulting from a multitude of large-scale genomics projects.