January 15-19, 2005
Town & Country Convention Center
San Diego, CA
Sandra E. Orchard , Rolf Apweiler
UniProt is a comprehensive, annotated protein sequence database, with extensive cross-references and query interface. The database has two sections - Swiss-Prot (manually curated entries) and TrEMBL (automated classification, annotation and cross-references). UniProt provides several non-redundant sequence databases - UniRef sequence subsets for efficient searching and UniParc which includes predicted and synthetic sequences inappropriate for inclusion within UniProt. Databases for the analysis of genomic data have been built on the foundation provided by UniProt. InterPro allows functional classification of genes, assigning unknown sequences to a protein family and identifying domains, motifs, active/binding-sites and potential sites of post-translational modification. Each InterPro entry consists of one or more member database signatures, derived from regular expressions and profiles (ProSite), motifs (PRINTS), HMMs (Pfam, Smart, TIGRFAM, PIRSF, SUPERFAMILY and CATHSF) or sequence-clustering (ProDom). The Proteome Analysis database provides statistical analyses of the predicted proteomes of fully sequences organisms, compiled using InterPro, CluSTr database and GO. Higher organisms are included using non-redundant protein sets generated from UniProt, RefSeq and Ensembl by the International Protein Index. IntAct is both a public repository for protein interaction data and manually curated literature data. Finally, the Integr8 browser unites information on gene and protein into a single entry from data contained in Genome Reviews (a standardised view of the genomic sequence of organisms with completely deciphered genomes) and the UniProt proteome sets. This provides species descriptions, literature, statistical and summary information about each complete proteome and integrates data from a variety of sources, including InterPro, CluSTr and GO.