PPI phylogenetics tutorial

	P. s. pv. tomato DC3000	P. s. pv. syringae B728a	P. s. pv. phaseolicola 1448A
Protocol for Phylogenetic Analysis			genome resources home page

Hop home

Hop database

Criteria for Hops

Rules for name selection

Phylogenetic analysis

Hop name list

This page describes tools and guidelines for use in assigning newly identified Hops to existing families and subgroups. These protocols are written for researchers whose expertise in phylogenetic analyses does not extend beyond basic BLAST analyses. The general approach involves alignment of sequences in a given Hop family using ClustalW followed by use of MEGA2.1 to evaluate clustering and calculate genetic distance.

NOTE 2-12-10: Since publication of Lindeberg et al., 2005, MEGA 2.1 has been replaced with MEGA 4. The following tutorial uses MEGA 4.

Note that guidelines for selecting the Hop names themselves including names for novel Hops, Hops belonging to novel or preexisting subgroups in previously identified families, as well as chimeric, truncated, or non-expressed Hops can be found on the Name structure and Selection page.

Outline of procedures for Hop phylogenetic analysis: (download pdf of this protocol)

I. Confirm similarity to one or more previously characterized Hops using BLAST analysis
II. Sequence alignment in ClustalW

A. Obtain a file listing all sequences in the Hop family of interest
B. Generate alignment file

III. Clustering analysis and genetic distance calculation in MEGA

A. Download and install MEGA
B. Convert Clustal alignment file to MEGA format
C. Perform clustering analysis in MEGA
D. Calculate genetic distance between the new Hop and established subgroups

I. Confirm similarity to one or more previously characterized Hops using BLAST analysis.

Conduct BLASTP analyses to determine whether a given protein is similar to any previously characterized Hops.

If there IS NO significant similarity to one or more previously characterized Hops, and and the newly identified Hop has been confirmed by criteria other than sequence similarity (see Criteria for Hop name assignment), go to Name structure and Selection: How to name a Hop for guidelines on naming novel Hop proteins.

If there IS significant similarity to one or more previously characterized Hops (roughly defined as a BLAST expect value of less than 10-5 and with alignment extending over 60% of the length of the protein) follow the steps below to assign a subgroup classification, or contact the PPI site administrator and a subgroup classification can be generated for you.

II. Sequence alignment in ClustalW

A. Obtain a file listing all sequences in the Hop family of interest

Go to the list of assigned Hop names and obtain the sequences for all members of the appropriate family by clicking on the family name.

Save the list of sequences as a text file. Note that the sequences are listed in FASTA format.

Add the sequence of the newly identified Hop to the list (also in FASTA format) and save the file.

B. Generate alignment file

ClustalW is a general purpose tool for alignment of multiple sequences. It is available through numerous websites, including the EMBL-EBI site described here. (The HopA family is used to illustrate the various procedures)

Go to ClustalW at EMBL-EBI (now Clustal Omega):

Paste or upload your protein sequences in fasta format into the specified window. Parameters can stay at their default settings. Click on the link to download the alignment file. Save as a text file, renaming it if desired
III. Clustering analysis and genetic distance calculation in MEGA.
MEGA (Molecular Evolutionary Genetics Analysis) is a free software package for comparative sequence analysis

A. Download and install MEGA

Go to the MEGA 4 download site, provide the requested information, and download the program. An .exe file will appear on your desktop.

Open the .exe file and follow the instructions for installation of MEGA 4.

B. Convert Clustal alignment file to MEGA format

Open the installed MEGA program. The following window will appear

Under File on the menu select "Convert to MEGA format". The following window will appear:

Select the output file you saved from ClustalW as "Data file to convert:" and for "Data format" select ".aln (CLUSTAL)". Click OK.
The converted file will appear in the "Text File Editor and Format Converter" window.

Scroll through the converted file to check format.
If line numbers are present, either manually remove them or return to ClustalW and generate a new .aln file without line numbers.
If any extraneous symbols are present following the last sequence, delete them

Save the output file with a .meg extension

C. Perform clustering analysis in MEGA

Return to the initial MEGA window (shown above in III.B.1) and select "click me to activate a data file".

Select the .meg file that you saved in step III.B.5. An "Input Data" window will appear. Under "Data Type" select "Protein Sequences" and click OK.

The window shown below will appear, with the open data file indicated at the bottom

Generate a phylogenetic tree from the active data file using Phylogeny > Bootstrap Test of Phylogeny.
At this point, a number of options can be selected, including UPGMA Tree, Neighbor-Joining Tree, Minimum Evolution and Maximum Parsimony. Similarly, in the "Analysis Preferences" window for UPGMA, Neighbor-Joining, or Minimum Evolution, models can be selected under Models>Amino Acid. Users are encouraged to generate trees using a variety of these options. The general clustering patterns should be similar, regardless of method.

The method that best approximates those described in Lindeberg et al, 2005 (using MEGA 4 rather than MEGA 2.1) involves Neighbor-Joining using the Amino Acid: p-distance model (the "Analysis Preferences" window for Neighbor-Joining is shown below)

The Bootstrap consensus tree resulting from Neighbor-Joining analysis for the HopA family is shown below.

D. Calculate genetic distance between the new Hop and established subgroups

The level of amino acid diversity within and between subgroups was used as the basis for dividing Hop families (Lindeberg et al. 2005) As described there, homology families were subdivided if within-group amino acid diversity was less than 0.75 and between-group amino acid diversity greater than 0.75, when using a gamma parameter of 2.25. A cutoff score for MEGA 4 consistent with the previously established subfamily divisions comes to 0.475 when using the Neighbor-Joining Amino acid: p-distance model.

To calculate amino acid diversity among the sequences in the active data file, return to the MEGA window shown in III.C.3. and go to Distances>Compute Pairwise.

An "Analysis Preferences" window will appear similar to that shown in III.C.4. Select Model>Amino Acid: p-distance if not already selected. Click OK

The resulting table of pairwise distances for the HopA family is shown below.

Although the HopA1 subgroup has a higher level of internal diversity than HopA2, the pairwise distance table shows between-group diversity > 0.475 and within-group diversity < 0.475, consistent with the recommendations for subgroup division described above

Magdalen Lindeberg
PPI Project Coordinator
Plant Pathology and Plant-Microbe Biology
Cornell University
Email: ML16@cornell.edu