PPI logo P. s. pv. tomato DC3000 P. s. pv. phaseolicola 1448A
Protocol for Phylogenetic Analysis

genome resources
home page


This page describes tools and guidelines for use in assigning newly identified Hops to existing families and subgroups. These protocols are written for researchers whose expertise in phylogenetic analyses does not extend beyond basic BLAST analyses. The general approach involves alignment of sequences in a given Hop family using ClustalW followed by use of MEGA2.1 to evaluate clustering and calculate genetic distance.

NOTE 3-24-05: Since publication of Lindeberg et al., 2005 and the writing of this tutorial, MEGA 3.0 has been released. However, MEGA 2.1 is still available for downloading at the site listed below. Instructions for calculating genetic distance using MEGA 3.0 are currently being developed.

Note that guidelines for selecting the Hop names themselves including names for novel Hops, Hops belonging to novel or preexisting subgroups in previously identified families, as well as chimeric, truncated, or non-expressed Hops can be found on the Name structure and Selection page.

Outline of procedures for Hop phylogenetic analysis: (download pdf of this protocol)

I. Confirm similarity to one or more previously characterized Hops using BLAST analysis
II. Sequence alignment in ClustalW

A. Obtain a file listing all sequences in the Hop family of interest
B. Generate alignment file

III. Clustering analysis and genetic distance calculation in MEGA

A. Download and install MEGA
B. Convert Clustal alignment file to MEGA format
C. Perform clustering analysis in MEGA
D. Calculate genetic distance between the new Hop and established subgroups

I. Confirm similarity to one or more previously characterized Hops using BLAST analysis.

    1. Conduct BLASTP analyses to determine whether a given protein is similar to any previously characterized Hops.
    2. If there IS NO significant similarity to one or more previously characterized Hops, and and the newly identified Hop has been confirmed by criteria other than sequence similarity (see Criteria for Hop name assignment), go to Name structure and Selection: How to name a Hop for guidelines on naming novel Hop proteins.
    3. If there IS significant similarity to one or more previously characterized Hops (roughly defined as a BLAST expect value of less than 10-5 and with alignment extending over 60% of the length of the protein) follow the steps below to assign a subgroup classification, or contact the PPI site administrator and a subgroup classification can be generated for you.

II. Sequence alignment in ClustalW

A. Obtain a file listing all sequences in the Hop family of interest

    1. Go to the list of assigned Hop names and obtain the sequences for all members of the appropriate family by clicking on the family name.
    2. Save the list of sequences as a text file. Note that the sequences are listed in FASTA format.
    3. Add the sequence of the newly identified Hop to the list (also in FASTA format) and save the file.

B. Generate alignment file

ClustalW is a general purpose tool for alignment of multiple sequences. It is available through numerous websites, including the EMBL-EBI site described here. (The HopA family is used to illustrate the various procedures)

  1. Go to ClustalW at EMBL-EBI. The following window will appear

    1. Under OUTPUT FORMAT Select "aln w/o numbers"

    (All other settings can stay at their default value)

    2. Paste or upload list of FASTA formatted sequences into the window

    3. Hit "Run"

     


  2. ClustalW will generate the files listed in the window below. Click on the link to the sequence alignment file (designated by an .aln extension). Save the .aln file as a text file, renaming it if desired

III. Clustering analysis and genetic distance calculation in MEGA.
MEGA (Molecular Evolutionary Genetics Analysis) is a free software package for comparative sequence analysis

    A. Download and install MEGA

    1. Go to the MEGA 2.1 download site, provide the requested information, and download the program. An .exe file will appear on your desktop.
    2. Open the .exe file and follow the instructions for installation of MEGA 2.1.

    B. Convert Clustal alignment file to MEGA format

    1. Open the installed MEGA program. The following window will appear



    2. Under File on the menu select "Convert to MEGA format". The following window will appear:



    3. Select the output file from ClustalW as "Data file to convert:" and for "Data format" select ".aln (CLUSTAL)". Click OK.
      The converted file will appear in the "Text File Editor and Format Converter" window.
    4. Scroll through the converted file to check format.
      If line numbers are present, either manually remove them or return to ClustalW and generate a new .aln file without line numbers.
      If any extraneous symbols are present following the last sequence, delete them
    5. Save the output file with a .meg extension

    C. Perform clustering analysis in MEGA

    1. Return to the initial MEGA window (shown above in III.B.1) and select "click me to activate a data file".
    2. Select the .meg file that you saved in step III.B.5. An "Input Data" window will appear. Under "Data Type" select "Protein Sequences" and click OK.
    3. The window shown below will appear, with an expanded menu and the open data file indicated at the bottom



    4. Generate a phylogenetic tree from the active data file using Tests> Bootstrap Test of Phylogeny.

      At this point, a number of options can be selected, including UPGMA Tree, Neighbor-Joining Tree, Minimum Evolution and Maximum Parsimony. Similarly, in the "Analysis Preferences" window for either UPGMA or Neighbor-Joining, several models can be selected under Models>Amino Acid. Users are encouraged to generate trees using a variety of these options. The general clustering patterns should be similar, regardless of method.

      The method that best approximates those described in Lindeberg et al, 2005 involves Neighbor-Joining using the gamma model with the gamma parameter set at 2.25 (the "Analysis Preferences" window for Neighbor-Joining is shown below)


    The Bootstrap consensus tree resulting from Neighbor-Joining analysis with the gamma parameter for the HopA family is shown below. (Click on the "Original tree" tab in the "Tree Explorer" window for additional information on clustering patterns)

    D. Calculate genetic distance between the new Hop and established subgroups

    The level of amino acid diversity within and between subgroups was used as the basis for dividing Hop families (Lindeberg et al. 2005) As described there, homology families were subdivided if within-group amino acid diversity was less than 0.75 and between-group amino acid diversity greater than 0.75, when using a gamma parameter of 2.25.

    1. To calculate amino acid diversity among the sequences in the active data file, return to the MEGA window shown in III.C.3. and go to Distances>Compute Pairwise.
    2. An "Analysis Preferences" window will appear similar to that shown in III.C.4. Select Model>Amino Acid>Gamma Distance if not already selected, and adjust the Gamma parameter to 2.25. Click OK
    3. The resulting table of pairwise distances for the HopA family is shown below.

      Although the HopA1 subgroup has a higher level of internal diversity than HopA2, the pairwise distance table shows between-group diversity > 0.75 and within-group diversity < 0.75, consistent with the recommendations for subgroup division described above

 

Magdalen Lindeberg
PPI Project Coordinator
Dept Plant Pathology
Cornell University
Email: ML16@cornell.edu