PAG-XIII  Plant & Animal Genomes XIII Conference

January 15-19, 2005
Town & Country Convention Center
San Diego, CA



P847 : Software


Identifying Plant Paralogs And Orthologs Using Complete Linkage Hierarchical Clustering

Nathan J Olson1 , Christopher Besemann1 , Anne Denton1 , Phillip Mcclean2 , Shahryar Kianian2

1  North Dakota State University, Department of Computer Science, 1301 12th Avenue North, Fargo, ND 58105
2  North Dakota State University, Aes Plant Sciences, 1301 12th Avenue North, Fargo, ND 58105

A robust definition of a plant unique gene space is of ultimate importance to plant genomic research. We utilize complete linkage hierarchical clustering of plant protein sequence WU-BLAST all-against-all results to define a plant unique gene space. We employ complete linkage hierarchical clustering in lieu of single linkage hierarchical clustering to minimize the negative influence of transitive (friend-of-a-friend) sequence similarity and eliminate the need for ad-hoc criteria such as minimum sequence alignment percent identity. Use of a relational database (PostgreSQL) and a database-centric algorithm allow the system to scale to multiple species. The clusters of paralogs and orthologs can be queried through a web interface.