top P. s. pv. tomato DC3000 P. s. pv.phaseolicola 1448A
bottom
Module 3: Web Exercise
Sequence Analysis on the Web and a quiz!

PPI HS Connect Home

Lab and Web Exercises

Additional Resources:

Student Resources
Teacher Resources
Meet the Scientists
Activities printouts

Site Map

Contact us

Glossary of Terms
(for CAPITALIZED words)
quiz

What can you do with a gene or protein sequence?

Characterization of virulence genes commonly involves cloning, described in Module 3 - Lab Exercise, and determining the sequence of the protein which the gene encodes. This exercise shows you how you can analyze a protein's sequence to learn more about its function using resources available on the internet. 

Part I: Sequence Analysis

Imagine that you have cloned and sequenced a gene from a pathovar of Pseudomonas syringae that plays an important role in plant-pathogen interactions.

What can you do next?

  • You can do more work in the laboratory to understand its role in the biology of the organism.
  • You can use online databases to get clues about the function of the protein it encodes.

In this exercise, you will:

  • Be introduced to biological sequence databases
  • Compare your protein sequence with all those in the Genbank database to see if similar sequences are present
  • Record the names of any similar sequences and use them to search the PubMed literature database

In the Module 4 - Web Exercise, you will have a chance to do additional work with BLAST

1. Introduction to biological sequence databases

When a scientist sequences a segment of DNA, be it a single gene, a gene operon or an entire chromosome or GENOME, the sequence is deposited in an online database.

the most commonly used database in this country is NCBI Genbank
Other databases include:

  • EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute)
  • DDBJ (DNA Data Bank of Japan)

Those databases contains millions of DNA sequences and exchange information daily

Each sequence record in the NCBI sequence databases is organized into three sections:

  • Header – general information about the sequence including the organism it came from and the paper in which it was first published.
  • Features - information about the role of the sequence in the biology of the organism and any changes that have been made to the sequence. This section also includes information like the length of the sequence, the molecular weight of the protein, and any notes that the depositors wished to add. The start of this section is indicated with the label FEATURES on the left. Terms beginning with a / are referred to as ‘qualifiers. Examples of qualifiers include /product, /gene, /locus_tag, and /note.
  • Sequence – nucleotides listed in order and numbered. The start of this section is indicated with the label ORIGIN on the left.

See an example of a sequence record in NCBI Genbank file format here

NCBI also contains millions of Protein sequences that come from specialized databases such as SwissProt or from DNA translations of Genbank gene sequence records.

NCBI allows users to search the databases and perform analyses in various ways.

  • You can search by name for nucleotide sequences (genes) or amino acid sequences (proteins).
  • You can search by name for publications about the sequence (recorded in the Science Life literature database called PUBMED).
You can search for similar sequences using the feature called BLAST (by inputting all or part of a DNA or amino acid sequence) and compare two or more sequences.

 

 

 

 

 

 

 

 

 

 

 

 

 

 





 

 

2. Compare your sequence with those in the Genbank database

BLAST (Basic Local Alignment Search Tool) is a program that allows users to rapidly search protein and nucleotide databases for regions similar to your sequence of interest. At NCBI there are many options for comparing protein or DNA sequences against different databases, but since your sequence is a protein sequence, we will be comparing it with the database of protein sequences at NCBI. For this purpose, your sequence will be called, the QUERY SEQUENCE

>query_sequence
MTIVSGHIGKHPSLTTVQAGSSASVENQMPDPAQFSDGRWKKLPTQLSSITLARFDQDICTNNHGISQRAMCF GLSLSWINMIHAGKDHVTPYASAERMRFLGSFEGVVHARTVHNFYRTEHKFLMEQASANPGVSSGAMAGTESLL QAAELKGLKLQPVLEDKSNSGLPFLIACKQSGRQVSTDEAALSSLCDAIVENKRGVMVIYSQEIAHALGFSVSSDG KRATLFDPNLGEFHTHSKALADTIENISSADGLPLIGVQVFASKIH
  1. Using your mouse, highlight the query sequence, including the first line beginning with >, and copy it.
  2. Go to the protein-protein BLAST site at NCBI
  3. Near the top of the page you will see a box with the word "search" to its left. Click your mouse on the box and paste the copied query sequence
  4. We will not choose any special parameters, so once you have pasted in the sequence, you can click the blue button labeled "BLAST".
  5. A new window will open. In the new window, click on the button labelled "FORMAT". The page will automatically refresh itself as soon as the comparison is completed, yielding a page entitled "Results of BLAST" (This could take a couple minutes

Results:

The "Results of BLAST" page can be divided into roughly three sections, each showing roughly the same data in different formats:

  1. Graphical Overview - this is a box showing a series of colored bars aligned under a bar representing the query sequence. Bars represent sequences with high levels of similarity to the query sequence, and are color coded by the degree of similarity with the query sequence, with red being the highest and black the lowest
  2. A list of the sequence names showing the highest degree of similarity to the query sequence. Those listed first are the closest match
  3. A list of the actual alignments between the query sequence and highly similar sequences in the database

Questions:

Look at your "Results of BLAST" page:

  1. Are there sequences in the database that are similar to your query sequence?
  2. Are there in sequences that are identical to the query sequence?
  3. What is the name of the most similar sequence? What organism is it from?
  4. Now look at the other sequences most similar to your query sequence. Record the names given for these other sequences. Are they from organisms in the same genus and species, or different?

To the left of the sequence names on the "Results of BLAST" page, you will see hyperlinks made up of a series of letters and numbers. Click on the hyperlink for the sequence with highest similarity. This will take you to the NCBI Genbank record for that sequence. NCBI Genbank sequence records are divided into three sections:

HEADER - information about the whole record including the organism it came from and the paper in which it was first published
FEATURES - information about the role of the sequence in the biology of the organism and any changes that have been made to the sequence. this section also includes information liek the length of the sequence, the molecular weight, and any notes that the depositers wished to add. The start of this section is indicated with the label FEATURES on the left. Terms beginning with a / are referred to as QUALIFIERS. Examples of qualifiers include /product, /gene, /locus_tag, and /note.
SEQUENCE - Numbered list of amino acids (or nucleotides if DNA) in the sequence. The start of this section is indicated with the label OPTIONS on the left

  1. What is the title and first author of the publication associated with deposit of this sequence?
  2. Check the /note qualifiers in the FEATURES section. Does it say anything about other names for this protein?

3. Learning more about your protein with the PubMed

PubMed is an online database of scientific literature that you can search with terms of interest. Using Pubmed, we can see what researchers have discovered about the biology of proteins that are similar to your query sequence.

  1. Go to PubMed
  2. Near the top of the page you will see a line that says: "Search PubMed for" followed by a box for entering search terms.
  3. Click on the box and type the name of the protein most similar to your query sequence. In order to limit your search to proteins from Pseudomonas syringae, type in "syringae" after the protein name, separating the two words with a single space. Click on "Go"
  4. A new page will appear that shows a list of articles in the PubMed database that contained your two search terms.
  5. Click on the hyperlinked author's names to view the abstract for each article.
  6. To get a more complete list of publications, try searching PubMed with any alternate names you found in the NCBI Genbank sequence record. You can also search using the names of sequences that were not the most similar to your query sequence.

Part II: QUIZ

What did you learn during this module?  Find out with this online Quiz!