PAG-II Plant Genome II Conference

Town & Country Conference Center, San Diego, CA, January, 1994.


PG-II: PROGRESS TOWARDS A COMPLETE SET OF GENES

PROGRESS TOWARDS A COMPLETE SET OF GENES

J. Craig Venter, The Institute for Genomic Research, Gaithersburg, MD 20878


Discovery of all human genes is the major goal of the world wide human genome project. A new approach to gene discovery (Expressed Sequence Tags, ESTs) developed by TIGR scientists has resulted in a 2-3 order of magnitude increase in the rate of discovery. EST strategy starts with mRNA isolated from specific tissues and cells, subsequent to the cell's transcribing and editing the RNA, so that 99.5% of the genome is selected out, leaving only the transcripts encoding the proteins required for specific cell functions. Complementary DNAs are constructed from the mRNAs and partially sequenced, using automated methods. This approach at TIGR currently allows sequencing of approximately 500,000 bp of DNA from up to 1200 expressed genes per day. This approach should complete of discovery and partial sequencing of over 50% of expressed human genes in the next few months.

TIGR is sequencing ESTs from over 156 human cDNA libraries, made from single cells, fetal and embryonic tissues, adult organs and tissues, and cancerous tissues. Over 70,000 human sequences have been obtained from cDNA clones isolated from these libraries, including over 15,000 from brain and 2,000 from major tissues/organs. We hope to approach 100,000 human cDNA sequences by spring of 1994 and will publish the same year, after initial analysis.

TIGR has developed the Expressed Gene Anatomy Database (EGAD) which integrates cDNA, genomic sequence, and mapping data with biological information needed to assign putative functions to new genes. EGAD integrates data on gene expression, isology (sequence similarity), gene family, biochemical function, and cellular role, allowing both complex queries linking these types of data and open-ended browsing. EGAD is linked directly to the TIGR EST database (ESTB), and to TIGR's new Sequences, Sources, TAXA (SST) database, which maintains taxonomic and specimen information. EGAD and SST are both implemented in SYBASE, and will be fully interoperable with the Genome Sequence Database and GDB. TIGR plans to have both databases publicly accessible in 1994.


Return to Previous Page or Intl-PAG Homepage