As the Arabidopsis cDNA sequencing project progresses, a relatively large dataset has begun to accumulate. More than 22,000 ESTs have been generated, processed and analyzed. We have entered a phase beyond the archiving of sequences and their respective similarity results. Specifically, we have begun the extraction of derived information in these sequences, as well as analyses by comparative genome analysis, by cluster analysis, and by the development of complex queries on finely structured data loaded into an object-relational database management system [RDBMS]. The RDBMS includes information from the public databases, specifically GenBank, GenInfo and PIR. Among the iformation which is being developed are those sequences which are uniquely represented in the dataset, and those which presently appear to be unique to a variety of species. A powerful caveat from this information is that the comparative species datasets are not presently sufficiently large to draw biological conclusions; however the results are more than enticing.