EMBL Outstation - The European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
The databases PROSITE, Pfam, PRINTS, ProDom, SWISS-PROT and TrEMBL joined forces to launch a new Integrated Resource of Protein Domains and Functional Sites, abbreviated InterPro. The growing amount of sequence data without any functional characterization leads to less and less meaningful hits with the conventional similarity searches like BLAST and fastA. The usual approach to gain a good coverage is to analyze unknown sequences by as many motif recognition tools as possible. This would also increase the statistical significance, provided we know which prediction of tool A is equivalent to which prediction of tool B. The goal of InterPro is to solve this problem. For each set of equivalent patterns, profiles, fingerprints, and hidden Markov models, we created a dedicated InterPro entry. Each entry contains links to the parent databases, descriptions of the specific methods, functional annotation, literature references, and a list of all matched SWISS-PROT and TrEMBL protein sequences. The entries are classified as protein family, domain, repeat, or post-transcriptional modification site. We also maintain links between InterPro entries where appropriate. For instance, the histone family (IPR000166) points to the histone domains H2A (IPR002119), H2B (IPR000558), H3 (IPR000164), and H4 (IPR001951). Initially, we computed a list of corresponding signatures. However, this list contained too many errors, mainly caused by the different biological concepts of the parent databases. Therefore, we decided to check this list manually by a group of by now sixteen scientists, who also merged and rewrote the functional annotation. InterPro is accessible on the web at http://www.ebi.ac.uk/interpro and http://srs.ebi.ac.uk. The XML flat file can be downloaded via anonymous ftp at ftp.ebi.ac.uk/pub/databases/interpro. If you have questions or comments, please feel free to contact interpro@ebi.ac.uk.