ForBio Research Pty Ltd, 50 Meiers Road, Indooroopilly, Queensland. 4068 Australia
EST sequencing has become a method of choice for `discovering' genes. We are applying 5'tag EST sequencing to eucalyptus of the subspecies Symphyomyrtus. Plasmid cDNA libraries are constructed from specific tissues and also from the same tissues of different ages and growth conditions. We currently use directionally cloned pBluescript libraries for tissues from which we can obtain large amounts of good quality RNA. For more intractable or smaller tissues, we are using Clonetech SMART synthesised cDNA cloned into pGEMt. Five prime sequence data is obtained from these libraries using a sequencing primer based on the five prime PCR primer. The sequence data is trimmed automatically and processed with BLASTX analysis. The first BLAST result for each sequence is stored in a SQL server based database. Each BLASTX hit is classified functionally as a basis for text based searching. FASTA searches of each new sequence against our current dbase provides redundancy information. Currently sequences are classified as `identical' with a FASTA nucleotide identity of greater than 90% identity over greater than 100 bases overlap. These identical sequences are used to construct multiple alignments. Using these multiple alignments we identify SNP and polymorphic microsatellite loci between species. Their utility in genetic mapping is under investigation. The analysis of between library redundancy also provides expression data for different growth conditions. We ultimately hope to tag the majority of eucalyptus' genes, and comparisons of this data with model plant sequence data should reveal tree specific genes.