1 National Center for Genome Reseources 1800-A Old Pecos Trail Santa Fe, New Mexico 87505 2 tacg Informatics 1 Whistler Ct Irvine, CA, 92612
Gene expression data provide snapshots of the cellular biology and molecular function of living organisms. However, unlike sequence data, gene expression data only have meaning in an experimental context - organism, tissue, developmental state, treatment, stress or pathology, time course, technical details of the experimental procedures and underlying technology. Representing that context in a compact, logical, intuitive, searchable way is the challenge we address here. Whole-genome expression experiments produce massive numeric and image output; the data are expensive both to generate and store. Further, most of a dataset is typically ignored because of targeted interests of the investigator. Establishing an Internet-accessible repository would provide the storage, computing power, and tools to enable other researchers to use these 'ignored' data to answer complex questions about gene expression and therefore allow more researchers to perform better science more cheaply. This resource of pooled data would also encourage distributed development of objective statistical approaches dealing with differences within and between technologies such as equalization between different samples, background subtraction, and correction of edge effects. It would also help to establish baseline levels of gene expression under many conditions. Our goal is to create and make publicly available, via an Open Source model, the integrated resource described here, including all its components: the data, the interface software, and the analytical tools. In this workshop, we will describe some of the challenges involved in doing so and how we, in collaboration with others, are addressing them. Current information on GeneX.