top P. s. pv. tomato DC3000 P. s. pv.phaseolicola 1448A
bottom
Module 4: Web Exercise
BLAST and a quiz!

PPI HS Connect Home

Lab and Web Exercises

Additional Resources:

Student Resources
Teacher Resources
Meet the Scientists
Activities printouts

Site Map

Contact us

Glossary of Terms
(for CAPITALIZED words)
quiz

Compare gene sequences with BLAST and take a quiz!

In the Module 3 - Web Exercise, you learned how to analyze a protein's sequence using resources available on the internet. In this exercise, you will use some of these same tools to analyze a protein sequence related to the topic of plant defense. You can also take a quiz to see what you've learned in this module.

Part I: BLAST

You’ve learned in the chapter Process of Science that plant immunity has some features in common with human innate immunity. This chapter will allow you to see for yourself the similarities shared by plant RESISTANCE PROTEIN and human PRRs or other proteins involved in human innate immunity. You will use BIOINFORMATIC tools that scientists use on a daily basis to find data about the genes and proteins they are working on. Those tools include biological sequence databases, scientific literature databases and programs designed for sequence analysis. For these exercises you will use the free online resources of NCBI (National Center for Biotechnology Information).

1. Introduction to biological sequence Databases
(for those who have not done the Web exercise in Module 3)

When a scientist sequences a segment of DNA, be it a single gene, a gene operon or an entire chromosome or GENOME, the sequence is deposited in an online database.

the most commonly used database in this country is NCBI Genbank
Other databases include:

  • EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute)
  • DDBJ (DNA Data Bank of Japan)

Those databases contains millions of DNA sequences and exchange information daily

Each sequence record in the NCBI sequence databases is organized into three sections:

  • Header – general information about the sequence including the organism it came from and the paper in which it was first published.
  • Features - information about the role of the sequence in the biology of the organism and any changes that have been made to the sequence. This section also includes information like the length of the sequence, the molecular weight of the protein, and any notes that the depositors wished to add. The start of this section is indicated with the label FEATURES on the left. Terms beginning with a / are referred to as ‘qualifiers. Examples of qualifiers include /product, /gene, /locus_tag, and /note.
  • Sequence – nucleotides listed in order and numbered. The start of this section is indicated with the label ORIGIN on the left.

See an example of a sequence record in NCBI Genbank file format here

NCBI also contains millions of Protein sequences that come from specialized databases such as SwissProt or from DNA translations of Genbank gene sequence records.

NCBI allows users to search the databases and perform analyses in various ways.

  • You can search by name for nucleotide sequences (genes) or amino acid sequences (proteins).
  • You can search by name for publications about the sequence (recorded in the Science Life literature database called PUBMED).
You can search for similar sequences using the feature called BLAST (by inputting all or part of a DNA or amino acid sequence) and compare two or more sequences.

 

2. Using NCBI to investigate similarities between plant immunity and human innate immunity

BLAST (Basic Local Alignment Search Tool) is a program that allows users to rapidly search protein and nucleotide databases for regions similar to your sequence of interest. At NCBI there are many options for comparing protein or DNA sequences against different databases and you will use several of them in the following exercises.

The sequence you want to compare with those in the Genbank database is generally called the ‘Query sequence’.  In the case of a protein, it can be called the “Protein Query”.

You will also use PUBMED, the NCBI database of scientific literature that you can search with terms of interest, to see what researchers have discovered about the biology of proteins that are similar to your query sequence

Exercise 1

In this exercise, your Protein Query is a resistance protein from tobacco called the N protein (this protein confers resistance to tobacco mosaic virus).  The amino acid sequence of the N protein is the following:

MASSSSSSRWSYDVFLSFRGEDTRKTFTSHLYEVLNDKGIKTFQDDKRLEYGATIPGELCKAIEESQFAIV
VFSENYATSRWCLNELVKIMECKTRFKQTVIPIFYDVDPSHVRNQKESFAKAFEEHETKYKDDVEGIQRW
RIALNEAANLKGSCDNRDKTDADCIRQIVDQISSKLCKISLSYLQNIVGIDTHLEKIESLLEIGINGVRIMG
IWGMGGVGKTTIARAIFDTLLGRMDSSYQFDGACFLKDIKENKRGMHSLQNALLSELLREKANYNNEED
GKHQMASRLRSKKVLIVLDDIDNKDHYLEYLAGDLDWFGNGSRIIITTRDKHLIEKNDIIYEVTALPDHES
IQLFKQHAFGKEVPNENFEKLSLEVVNYAKGLPLALKVWGSLLHNLRLTEWKSAIEHMKNNSYSGIIDKLK
ISYDGLEPKQQEMFLDIACFLRGEEKDYILQILESCHIGAEYGLRILIDKSLVFISEYNQVQMHDLIQDMG
KYIVNFQKDPGERSRLWLAKEVEEVMSNNTGTMAMEAIWVSSYSSTLRFSNQAVKNMKRLRVFNMGRS
STHYAIDYLPNNLRCFVCTNYPWESFPSTFELKMLVHLQLRHNSLRHLWTETKHLPSLRRIDLSWSKRLTR
TPDFTGMPNLEYVNLYQCSNLEEVHHSLGCCSKVIGLYLNDCKSLKRFPCVNVESLEYLGLRSCDSLEKLPEI
YGRMKPEIQIHMQGSGIRELPSSIFQYKTHVTKLLLWNMKNLVALPSSICRLKSLVSLSVSGCSKLESLPEE
IGDLDNLRVFDASDTLILRPPSSIIRLNKLIILMFRGFKDGVHFEFPPVAEGLHSLEYLNLSYCNLIDGGLPEE
IGSLSSLKKLDLSRNNFEHLPSSIAQLGALQSLDLKDCQRLTQLPELPPELNELHVDCHMALKFIHYLVTKR
KKLHRVKLDDAHNDTMYNLFAYTMFQNISSMRHDISASDSLSLTVFTGQPYPEKIPSWFHHQGWDSSV
SVNLPENWYIPDKFLGFAVCYSRSLIDTTAHLIPVCDDKMSRMTQKLALSECDTESSNYSEWDIHFFFVPF
AGLWDTSKANGKTPNDYGIIRLSFSGEEKMYGLRLLYKEGPEVNALLQMRENSNEPTEHSTGIRRTQYNN
RTSFYELING

 

 

 

 

 

 

 

 

 

First, you will look for conserved domains in your sequence. Conserved domains are short sequences that have been found in numerous proteins contained in the database.

1. *Using your mouse, highlight the Protein Query and copy it

2. *Click here to go to the BLAST page at NCBI

3. *Find the session called ‘Specialized BLAST’ and choose ‘Find conserved domains in your sequence (cds)’ by clicking on ‘conserved domains’

4. *Paste your sequence in the empty frame and click on ‘Submit Query’

After a few seconds, a page appears with your protein represented as a grey bar and two boxes under it. Those boxes correspond to conserved motifs found in your protein sequence.

5. *Click on the red box labeled ‘TIR’

You can read a description of the protein domain named TIR.

6. *Click on the + sign at the left of ‘LINKS’

7. *Then, click on ‘links’ at the right of ‘PUBMED’

This page shows you titles of scientific publications about proteins with a TIR domain. You can click on them to see a summary of the study.

8. *Note the information that appears to be most relevant to the goal of this chapter: 
What types of proteins contain TIR domains? 
In which organisms are these proteins found?

Now, you will look for proteins that have a TIR domain similar to the one found in the N protein from tobacco.
The amino acid sequence of the TIR domain of the N protein is the following:

VFLSFRGEDTRKTFTSHLYEVLNDKGIKTFQDDKRLEYGATIPGELCKAIEESQFAIVVFSENYATSRWC
LNELVKIMECKTRFKQTVIPIFYDVDPSHVRNQKESFAKAFEEHETK


 

9. *Using your mouse, highlight the query sequence and copy it

10. *Click here to go to the BLAST page at NCBI

11. *Find the session called ‘Basic BLAST’ and click on ‘Protein BLAST’

12. *Paste the copied query sequence in the empty frame under ‘Enter Query Sequence’

You want to see if there are human proteins with a TIR domain similar to the one of the N protein from tobacco in the NCBI protein database, so you will restrict your BLAST to human proteins.

13. *Find the session ‘Choose Search Set’ and write [Human] in the ‘Organism’ line

14. *Select ‘blastp’ in the ‘Program selection’ session

15. *click the blue button labeled ‘BLAST’

A new window will open. The page will automatically refresh itself as soon as the comparison is completed yielding a page showing the BLAST results (This could take a couple minutes).

The BLAST results page begin with references about the program used to do the BLAST and information about the database searched.

16. *look at the information next to the word ‘database’: Can you see how many protein sequences the database contains?

Below this set of information, the BLAST results page can be divided into three sections, each showing roughly the same data in different formats:

The Graphical Overview which consists of a rectangle containing a series of colored bars aligned under a bar representing your query sequence. Bars represent sequences with high levels of similarity to the query sequence. Those sequences are called ‘Hits’ and are color coded by the degree of similarity with the query sequence, with red being the highest and black the lowest.
The list of the sequences having the highest degree of similarity to your query sequence. Those listed first are the closest match.
The actual alignments between your query sequence and highly similar sequences in the database. 

17. *Look at your BLAST results:
Are there human sequences in the database that are similar to your query sequence?
Can you see proteins named ‘Toll-like receptors’?

Those proteins are PRRs involved in human innate immunity.

NOTE: To the left of the sequence names on the BLAST results page, you will see hyperlinks made up of a series of letters and numbers. Click on one hyperlink corresponding to a Toll-like receptor. This will take you to the corresponding NCBI sequence record. From this page, you will have access to more information about the sequence (you can find explanations on the sequence record’s format in the introduction of this chapter).

Exercise 2


In this exercise, your Protein Query is a resistance protein from tomato called Pto (this protein confers resistance to the bacterial pathogen, Pseudomonas syringae pv. tomato). The amino acid sequence of Pto is the following:

MGSKYSKATNSISDASNSFESYRFPLEDLEEATNNFDDKFFIGEGAFGKVYKGVLRDGTKVALKRQNRDSR
QGIEEFGTEIGILSRRSHPHLVSLIGYCDERNEMVLIYDYMENGNLKSHLTGSDLPSMSWEQRLEICIGAAR
GLHYLHTNGVMHRDVKSSNILLDENFVPKITDFGLSKTRPQLYQTTDVKGTFGYIDPEYFIKGRLTEKSDVYS
FGVVLFEVLCARSAMVQSLPREMVNLAEWAVESHNNGQLEQIVDPNLADKIRPESLRKFGETAVKCLALSS
EDRPSMGDVLWKLEYALRLQESVI

 

 

 

1. *Using your mouse, highlight the query sequence and copy it.

2. *Click here to connect to the BLAST page at NCBI

3. *Find the session called ‘Protein BLAST’ and click on it.

4. *Paste the copied query sequence in the empty frame under ‘Enter Query Sequence’

You want to see if there are human proteins similar to the tomato resistance protein Pto in the NCBI protein database, so you will restrict your BLAST to human proteins.

5. *Find the session ‘Choose Search Set’ and write [Human] in the ‘Organism’ line.

6. *Select ‘blastp’ in the ‘Program selection’ session

7. *Click the ‘BLAST’ blue button and wait for the page showing your BLAST results.

8. *Look at your BLAST results:
Are there human sequences in the database that are similar to your query sequence?
Do you see sequences named ‘interleukin-1 receptor-associated kinase 4’ (IRAK)?


This protein is involved in a signaling pathway of human innate immunity.

Part II: QUIZ

What did you learn during this module?  Find out with this online Quiz!
(You can do it even if you haven’t done the experiment).