January 15-19, 2005
Town & Country Convention Center
San Diego, CA
Nathan J Olson1 , Christopher Besemann1 , Anne Denton1 , Phillip Mcclean2 , Shahryar Kianian2
A robust definition of a plant unique gene space is of ultimate importance to plant genomic research. We utilize complete linkage hierarchical clustering of plant protein sequence WU-BLAST all-against-all results to define a plant unique gene space. We employ complete linkage hierarchical clustering in lieu of single linkage hierarchical clustering to minimize the negative influence of transitive (friend-of-a-friend) sequence similarity and eliminate the need for ad-hoc criteria such as minimum sequence alignment percent identity. Use of a relational database (PostgreSQL) and a database-centric algorithm allow the system to scale to multiple species. The clusters of paralogs and orthologs can be queried through a web interface.