PAG-XIV  Plant & Animal Genomes XIV Conference

January 14-18, 2006
Town & Country Convention Center
San Diego, CA



Poster: General Comparative


P239

A Shotgun Sequence Analysis Pipeline (SSAP) For Rapid Characterization Of Higher Plant Genomes

Philippe Chouvarine , Surya Saha , Daniel G. Peterson

  Mississippi Genome Exploration Laboratory Mississippi State University 117 Dorman Hall, Box 9555 Mississippi State, MS 39762

We have developed an automated shotgun sequence analysis pipeline (SSAP) for rapidly characterizing the sequence content of a genome and estimating gene enrichment and repeat reduction afforded by reduced-representation sequencing techniques. The SSAP utilizes BLAST algorithms, ab initio sequence discovery programs, and multiple PERL scripts to place high quality, unassembled shotgun reads (< 800 bp) into descriptive categories. Query sequences are compared via BLAST to locally maintained databases that contain spermatophyte (angiosperm and gymnosperm) repeat, organelle, and gene sequences. Queries exhibiting a significant hit(s) to one or more of these DBs are classified based upon their most significant hit. Those query sequences that do not possess significant homology to the aforementioned DBs are compared to a “pre-filtered” spermatophyte EST database, and those producing significant hits are classified as “ESTs.” The remaining queries are aligned against plant genomic sequences. If a query produces a significant hit to an annotated portion of a genomic sequence, it is characterized based on the corresponding annotation. Those queries recognizing unannotated sequences or showing no significant hit(s) to the genomic sequences are analyzed by programs designed to recognize tandem repeats or gene-like elements. Those queries that possess tandem repeats are classified as such, those with gene-like characteristics are classified as “potential genes,” and those not classified by the preceding methods are deemed “sequences of unknown character.” Further information on the SSAP and other pipelines utilized in our work can be found at http://www.mgel.msstate.edu/seq_analysis.htm.