January 12-16, 2002
Town & Country Convention Center
San Diego, CA
Bioinformatics: Databases
We report here the preliminary data in our effort to collect all rice repetitive sequences and to develop a new rice repeat database (tentatively named Northern Illinois University Rice Repeat Database or NRDB). All sequences in NRDB are assigned serial names in digits for the convenience of organization and retrieval. The naming system includes the information of organism, GenBank information (accession number, gi number), repeat type and copy number (if available). Repetitive sequences from the TIGR Rice Repeats from and representative sequences from BLAST generated clusters of CUGI BAC end sequences were also included in this database. In addition to the annotated repetitive sequences, an all-to-all BLAST search was performed using all genomic sequences from the International Rice Genome Sequencing Projects at 0.5 kb, 2.5kb and 5kb query sequence lengths to search for new repetitive sequences. The use of various query sequence lengths is to make sure we collect various repetitive sequences in their possible entirety. On the other hand, core sequences from retrotransposons, such as DNA sequences corresponding to RNaseH and Reverse Transcriptase (RT) regions were searched separately considering the fact that there exist large amount of highly degraded Gypsy-like retrotransposons in the rice genome with conserved RNaseH and RT regions. Genomic segments with high copy numbers were further clustered to eliminate redundancy and overlapping and their identities, if any, were assigned. A webpage is under development through which NRDB will be available to the community for rice sequence annotation and genome organization and evolution study.