PAG-XV  Plant & Animal Genomes XV Conference

January 13-17, 2007
Town & Country Convention Center
San Diego, CA



P870 : Software


Gene Ontology (GO) Terms Classifications Counter

Zhi-liang Hu1 , Jie Bao2 , James M. Reecy1

1  Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University
2  Department of Computer Science, Iowa State University

Gene Ontology (GO) uses controlled vocabularies to describe genes and gene product attributes in any organism, and to describe the basic term categories and relationships under biology context. Since its inception in late 1990’s, GO has been widely used in gene identification, characterization, large scale sequence analysis such as expressed sequence tags (EST) and microarray studies. One seemingly straightforward but often frustrating problem is to properly count the occurrences of children terms within each parental category under a given classification method such as GO Slim. The frustration is caused partly by the existence of multiple paths between a child term and a parental term under directed acyclic graphs (DAGs) structure of GO. The GO term “categorizing count” can also be time consuming with traditional structured query language (SQL) based approach. We have developed a GO Terms Classification Counter based on pre-computed transitive closure paths. To avoid aforementioned problems, we have implemented a number of features in the Counter for the ease of data analysis. For example, users have options to choose an existing “GO slim” classification method or use a customary developed one; Users can choose to count "single occurrences" or "all occurrences", etc. Our approach significantly reduced the counting time from a few hours to a few minutes for a couple thousand terms. The Counter is web based and easy to use. However, users are cautioned to use the best sense to interpret the results, choose a proper classification and strategy for the best objective count.