PAG-XVI  Plant & Animal Genomes XVI Conference

January 12-16, 2008
Town & Country Convention Center
San Diego, CA



W59 : Bioinformatics


Problems With The Annotation Of Low Quality Eukaryotic Genomes

Alexander Souvorov

  National Center for Biotechnology Information, Bethesda MD 20892, USA

Recent advances in sequencing technologies produced unprecedented amount of genomic sequence data. Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. It is commonly believed that a 3 to 5X coverage of genomic DNA can yield enough amounts of biologically meaningful data if the appropriate analysis methods can be applied. However, draft genomes pose special problems for the annotation processes. We have come to rely on the output of the annotation pipelines and assume that the data is “correct”. We would like to present some examples which illustrate the danger of these assumptions
NCBI approach for eukaryotic genomes annotation is using a combination of homology searching with ab initio modeling. To assess the quality of genome annotation we have compared several vertebrate genomes with various depth of coverage and levels of assembly. We have performed the analysis of the frameshifted genes in the context of genome assembly and have shown that some annotation problems indicate potential assembly issues.