PAG-VII: ANALYSIS, DISPLAY AND MANAGEMENT OF LARGE DATASETS: ANALYSIS OF THE YEAST CELL CYCLE USING DNA MICROARRAYS.

PAG-VII   Plant & Animal Genome VII Conference

Town & Country Hotel, San Diego, CA, January 17-21, 1999.


W52

ANALYSIS, DISPLAY AND MANAGEMENT OF LARGE DATASETS: ANALYSIS OF THE YEAST CELL CYCLE USING DNA MICROARRAYS.

GAVIN SHERLOCK1,2, Paul T. Spellman1, Michael B. Eisen1, Vishwanath R. Iyer3, Patrick O. Brown3,4, David Botstein1, Bruce Futcher2

1 Department of Genetics, Stanford University Medical Center, Stanford, California 94306-5120 USA
2 Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724-2209 USA
3 Department of Biochemistry, Stanford University Medical Center, Stanford, California 94306-5428 USA
4 Howard Hughes Medical Institute, Stanford, CA 94305-5428, USA

To identify genes whose RNA levels vary periodically during the yeast cell cycle we obtained microarray data from synchronized cells and suitable controls. This resulted in nearly 500,000 measurements. To manually analyze such a large collection of data is not feasible. Therefore we attributed scores to each gene based on a Fourier transform (that assesses periodicity) and a correlation measurement (which compared our data to that of previously identified cell cycle regulated genes) to obtain an objective assessment of whether a gene is periodically regulated. This resulted in the identification of 800 periodically regulated genes. To display the information using line plots is unrealistic. We therefore use a graphical scheme whereby each datapoint is represented by a color that quantitatively represents the experimental observation. This allows us to display many gene expression patterns simultaneously, in a manner that allows intuitive interpretation. The key to the success of this method however, is the ability to order the expression profiles of the genes, such that like profiles are grouped and patterns of expression can be discerned. I will discuss two methods that we have employed to this end. A third problem is how to organize and store large datasets, such that they are readily accessible to those involved in the project, and eventually to the scientific community as a whole, in an easy to query format. I will demonstrate the Website that we have authored which allows people to query and visualize the data in an intuitive and biologically relevant manner.


Return to Previous Page or Intl-PAG Homepage