EMBL/EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
The scientific community is facing the challenge of rapidly growing amounts of sequence data lacking experimental determination of the biological function. This results in the need for new methods to characterise proteins in a large scale. The EBI developed with InterPro a new resource to address these needs. InterPro is an integrated documentation resource for protein families, domains and functional sites, developed initially as a means of rationalising the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom databases. Merged annotation from PRINTS, PROSITE and Pfam form the InterPro core. Each InterPro entry includes functional descriptions and literature references, and links are made back to the relevant parent database(s), allowing users to see at a glance whether a particular family or domain has associated patterns, profiles, fingerprints, etc.. Merged and individual entries (i.e., those that have no counterpart in the companion resources) are assigned unique accession numbers. The first release of InterPro (November 1999) contains around 3000 entries, representing families, domains, repeats and sites of post-translational modification (PTMs) encoded by nearly 5000 different regular expressions, profiles, fingerprints and Hidden Markov Models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 650,000 hits in total). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. InterPro changes our way of incorporating data into SWISS-PROT and TrEMBL. To enhance the annotation of TrEMBL we use InterPro to reliably group SWISS-PROT and TrEMBL entries in order to extract the annotation shared by all SWISS-PROT entries of one group, and to assign this common annotation to the unannotated TrEMBL proteins of the same group. This procedure enables us to prevent overpredictions and to standardise annotation and nomenclature.