Organizers: Kimmen
Sjolander, University of California - Berkeley
and
Ruchira Datta,
University of California - Berkeley
Tutorial Level: Beginner to Intermediate. We recommend that participants be familiar with using BLAST, but no background in phylogenomics or evolutionary analysis is required.
Intended Audience: Anyone who is interested in any of the following tasks: improving gene/protein function prediction, ortholog identification, predicting protein-protein interaction, and predicting functional residues in proteins.
Outline
Part 1: Introduction to the PhyloFacts Phylogenomic Encyclopedia and toolkit (20
minutes).
We will present two primary uses: submitting sequences for HMM-based classification to
functional families and subfamilies in the PhyloFacts resource, and using the PhyloBuilder
tool to construct protein family phylogenies. Workshop participants will be encouraged to
come prepared with sequences of interest, and we will assist users personally with using
these tools. (We can provide example sequences for participants without selected sequences.)
While user-submitted sequences are being analyzed in PhyloFacts, we will proceed through the subsequent stages. Results for HMM-based classification to existing PhyloFacts families and subfamilies typically take only a few minutes to return, but PhyloBuilder jobs can take an hour or so to complete (since these involve gathering homologs and constructing a phylogenetic tree). We will review the results of these programs in the last 30 minutes or so of the workshop.
Part 2: Fundamentals of protein family evolution and the sources of annotation error
(20
minutes).
We will present an overview of the biological processes underlying gene family evolution,
and which motivate the use of phylogenomic methods to predict gene function. We will discuss how
gene families evolve novel functions and structure through point mutations, gene duplication,
speciation, gene fusion and fission and other domain rearrangements. We will also review the
main sources of annotation error (and give examples of specific cases of mis-annotated plant
and animal proteins).
Part 3: A standard phylogenomic pipeline (20 minutes).
In this section, we will review the tasks in a phylogenomic pipeline -- clustering
homologs,
multiple sequence alignment, masking the alignment, phylogenetic tree construction, ortholog
identification, retrieving experimental data and GO annotations, and inferring function on
the basis of the phylogenetic tree -- and present recommended protocols for each step.
Part 4: Advanced phylogenomic methods available in PhyloFacts (30 minutes).
A detailed look at the software tools and webservers available through PhyloFacts, to
include: FlowerPower subfamily HMM-based clustering of homologs, SCI-PHY subfamily
identification, INTREPID (INformation-theoretic tree Traversal for Protein functional site
Identification) active site identification and SATCHMO (simultaneous alignment and tree
construction using hidden Markov models).
Part 5: Reviewing the results of workshop participant submissions to PhyloFacts (20
minutes).
In this last section, we will review the results of the jobs submitted to the PhyloFacts
servers at the beginning of the workshop. Individual cases of particular interest will be
projected on the screen for all workshop participants to review jointly.
Part 6: Summary statements, and planned expansions of the PhyloFacts suite of
phylogenomic
analysis tools. (20 minutes).
In this section, I will ask for feedback from participants on which aspects of the
resource
are most useful, which aspects are difficult to use, and which of the planned expansions
would best meet their personal research needs. We will use this feedback to improve
PhyloFacts and prioritize new developments.
![]()
This page last updated Monday, 08-Sep-2008 20:26:44 EDT