1 Department of Genetics, Stanford University Medical Center, Stanford, California 94306-5120 USA 2 Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724-2209 USA 3 Department of Biochemistry, Stanford University Medical Center, Stanford, California 94306-5428 USA 4 Howard Hughes Medical Institute, Stanford, CA 94305-5428, USA
To identify genes whose RNA levels vary periodically during the yeast cell cycle we obtained microarray data from synchronized cells and suitable controls. This resulted in nearly 500,000 measurements. To manually analyze such a large collection of data is not feasible. Therefore we attributed scores to each gene based on a Fourier transform (that assesses periodicity) and a correlation measurement (which compared our data to that of previously identified cell cycle regulated genes) to obtain an objective assessment of whether a gene is periodically regulated. This resulted in the identification of 800 periodically regulated genes. To display the information using line plots is unrealistic. We therefore use a graphical scheme whereby each datapoint is represented by a color that quantitatively represents the experimental observation. This allows us to display many gene expression patterns simultaneously, in a manner that allows intuitive interpretation. The key to the success of this method however, is the ability to order the expression profiles of the genes, such that like profiles are grouped and patterns of expression can be discerned. I will discuss two methods that we have employed to this end. A third problem is how to organize and store large datasets, such that they are readily accessible to those involved in the project, and eventually to the scientific community as a whole, in an easy to query format. I will demonstrate the Website that we have authored which allows people to query and visualize the data in an intuitive and biologically relevant manner.