January 12-16, 2002
Town & Country Convention Center
San Diego, CA
Workshop: Bioinformatics/Computers
Much of what we know about the genome is biased by a tendency to extrapolate from one or two highly studied examples. Bioinformaticists amplify the problem when they do not question received wisdom and merely automate the propagation of inherently flawed assumptions. With the advent of large-scale genome sequencing, the era of extrapolation is over. It is time to re-open old questions, because much of what we have long taken for granted will turn out to be wrong. I will discuss our recent discoveries on the differences in genome landscape between monocots-dicots and plants-animals, the consequences for sequence annotation tools, and the implications for molecular evolution. MONCOTS-DICOTS: GC-content distributions for monocots exhibit a notably GC-rich component that is not present in dicots. We will demonstrate that this can be attributed to variation in GC-content within the genes, as opposed to between the genes. In particular, there is 5'-to-3' gradient along the direction of transcription, in GC-content and in codon usage statistics. For dicots, no such gradient is observed. Gene-prediction programs that rely on codon usage do not account for these gradients and produce demonstrably flawed annotations, particularly in rice. PLANTS-ANIMALS: Although plant and animal genomes both have large numbers of transposons, we can show that, in plants, the transposons insert between the genes, but in animals, the transposons insert inside the introns. In fact, contrary to popular perception, most of the human genome is transcribed. Furthermore, it is highly possible that rice will have more genes than human. We will explain how gene complexity, in animal genomes, is largely derived from alternative transcription, where as, in plant genomes, it is largely derived from gene duplications.