Phylogenomics Intl - PAG

The Phylogenomics Workshop

Tuesday Morning, January 13, 2009 -- 10:20 pm - 12:30 pm

California Room


Organizers: Kimmen Sjolander, University of California - Berkeley

(kimmen@berkeley.edu)

and

Ruchira Datta, University of California - Berkeley

(datta@math.berkeley.edu)

Tutorial Level: Beginner to Intermediate. We recommend that participants be familiar with using BLAST, but no background in phylogenomics or evolutionary analysis is required.

Intended Audience: Anyone who is interested in any of the following tasks: improving gene/protein function prediction, ortholog identification, predicting protein-protein interaction, and predicting functional residues in proteins.

Outline

Part 1: Introduction to the PhyloFacts Phylogenomic Encyclopedia and toolkit (20 minutes).
We will present two primary uses: submitting sequences for HMM-based classification to functional families and subfamilies in the PhyloFacts resource, and using the PhyloBuilder tool to construct protein family phylogenies. Workshop participants will be encouraged to come prepared with sequences of interest, and we will assist users personally with using these tools. (We can provide example sequences for participants without selected sequences.)

While user-submitted sequences are being analyzed in PhyloFacts, we will proceed through the subsequent stages. Results for HMM-based classification to existing PhyloFacts families and subfamilies typically take only a few minutes to return, but PhyloBuilder jobs can take an hour or so to complete (since these involve gathering homologs and constructing a phylogenetic tree). We will review the results of these programs in the last 30 minutes or so of the workshop.

Part 2: Fundamentals of protein family evolution and the sources of annotation error (20 minutes).
We will present an overview of the biological processes underlying gene family evolution, and which motivate the use of phylogenomic methods to predict gene function. We will discuss how gene families evolve novel functions and structure through point mutations, gene duplication, speciation, gene fusion and fission and other domain rearrangements. We will also review the main sources of annotation error (and give examples of specific cases of mis-annotated plant and animal proteins).

Part 3: A standard phylogenomic pipeline (20 minutes).
In this section, we will review the tasks in a phylogenomic pipeline -- clustering homologs, multiple sequence alignment, masking the alignment, phylogenetic tree construction, ortholog identification, retrieving experimental data and GO annotations, and inferring function on the basis of the phylogenetic tree -- and present recommended protocols for each step.

Part 4: Advanced phylogenomic methods available in PhyloFacts (30 minutes).
A detailed look at the software tools and webservers available through PhyloFacts, to include: FlowerPower subfamily HMM-based clustering of homologs, SCI-PHY subfamily identification, INTREPID (INformation-theoretic tree Traversal for Protein functional site Identification) active site identification and SATCHMO (simultaneous alignment and tree construction using hidden Markov models).

Part 5: Reviewing the results of workshop participant submissions to PhyloFacts (20 minutes).
In this last section, we will review the results of the jobs submitted to the PhyloFacts servers at the beginning of the workshop. Individual cases of particular interest will be projected on the screen for all workshop participants to review jointly.

Part 6: Summary statements, and planned expansions of the PhyloFacts suite of phylogenomic analysis tools. (20 minutes).
In this section, I will ask for feedback from participants on which aspects of the resource are most useful, which aspects are difficult to use, and which of the planned expansions would best meet their personal research needs. We will use this feedback to improve PhyloFacts and prioritize new developments.


green
This page last updated Monday, 08-Sep-2008 20:26:44 EDT