January 15-19, 2005
Town & Country Convention Center
San Diego, CA
Zhihua Jiang , Xiao-Lin Wu , Galen A. Williams
The release of the huge amount of ESTs (expressed sequence tags) to the public domains has been revolutionizing many aspects of the genomic studies. Here we report the development of a bioinformatics tool, ELF-Walking, which facilitates large-scale in silico cloning of full-length cDNA sequences by mining the sequence databases. ELF-Walking starts with ortholog identification and selects a set of non-overlapping best hits (seed sequences) for each gene in the target species. Then, these seed sequences are extended in length by adding significantly overlapping sequences through electronic-flanking walking, where query sequences are repeatedly blasted against all the sequences of the target species till no sequences grows. Finally, full-length cDNA sequences are identified based on a Hidden Markov chain Model. Applying this tool to mining the NCBI (National Center for Biotechnology Information) EST databases using 21,775 human coding genes as references, we were able to generate unique full-length cDNA sequences of 3,881 genes and partial cDNAs sequences of 10,358 genes in pigs, and unique full-length cDNAs sequences of 4,308 genes and partial cDNAs sequences of 10,785 genes in cattle. No doubt, this work generated a clear picture to the pig and cattle genome mapping community on the status of orthologous genes in these two species, which thus provided a solid basis for annotation, mapping and functional analysis of the porcine and bovine genomes. The programs are written in JAVA and are freely available upon request.