2. Compare your sequence with those in the Genbank database
BLAST (Basic Local Alignment Search
Tool) is a program that allows users to rapidly search protein and nucleotide
databases for regions similar to your sequence of interest. At NCBI
there are many options for comparing protein or DNA sequences against
different databases, but since your sequence is a protein sequence,
we will be comparing it with the database of protein sequences at NCBI.
For this purpose, your sequence will be called, the QUERY
SEQUENCE
>query_sequence
MTIVSGHIGKHPSLTTVQAGSSASVENQMPDPAQFSDGRWKKLPTQLSSITLARFDQDICTNNHGISQRAMCF
GLSLSWINMIHAGKDHVTPYASAERMRFLGSFEGVVHARTVHNFYRTEHKFLMEQASANPGVSSGAMAGTESLL
QAAELKGLKLQPVLEDKSNSGLPFLIACKQSGRQVSTDEAALSSLCDAIVENKRGVMVIYSQEIAHALGFSVSSDG
KRATLFDPNLGEFHTHSKALADTIENISSADGLPLIGVQVFASKIH |
- Using your mouse, highlight
the query sequence, including the first line beginning with >,
and copy it.
- Go to the protein-protein
BLAST site at NCBI
- Near the top of the page you
will see a box with the word "search" to its left. Click
your mouse on the box and paste the copied query sequence
- We will not choose any special
parameters, so once you have pasted in the sequence, you can click
the blue button labeled "BLAST".
- A new window will open. In the
new window, click on the button labelled "FORMAT". The page
will automatically refresh itself as soon as the comparison is completed,
yielding a page entitled "Results of BLAST" (This could
take a couple minutes
Results:
The "Results of BLAST"
page can be divided into roughly three sections, each showing roughly
the same data in different formats:
- Graphical
Overview - this is a box showing a series of colored bars
aligned under a bar representing the query sequence. Bars represent
sequences with high levels of similarity to the query sequence, and
are color coded by the degree of similarity with the query sequence,
with red being the highest and black the lowest
- A list of the
sequence names showing
the highest degree of similarity to the query sequence. Those listed
first are the closest match
- A list of the
actual alignments between
the query sequence and highly similar sequences in the database
Questions:
Look at your "Results of BLAST"
page:
- Are there sequences in the database
that are similar to your query sequence?
- Are there in sequences that are
identical to the query sequence?
- What is the name of the most similar
sequence? What organism is it from?
- Now look at the other sequences
most similar to your query sequence. Record the names given for these
other sequences. Are they from organisms in the same genus and species,
or different?
To the left of the sequence names
on the "Results of BLAST" page, you will see hyperlinks made
up of a series of letters and numbers. Click on the hyperlink for the
sequence with highest similarity. This will take you to the NCBI Genbank
record for that sequence. NCBI Genbank sequence records are divided into
three sections:
HEADER - information about the whole
record including the organism it came from and the paper in which it
was first published
FEATURES - information about the
role of the sequence in the biology of the organism and any changes
that have been made to the sequence. this section also includes information
liek the length of the sequence, the molecular weight, and any notes
that the depositers wished to add. The start of this section is indicated
with the label FEATURES on the left. Terms beginning with a / are referred
to as QUALIFIERS. Examples of qualifiers
include /product, /gene, /locus_tag, and /note.
SEQUENCE - Numbered list of amino acids (or nucleotides
if DNA) in the sequence. The start of this section is indicated with
the label OPTIONS on the left
- What is the title
and first author of the publication associated with deposit of this
sequence?
- Check the /note
qualifiers in the FEATURES section.
Does it say anything about other names for this protein?
3. Learning more
about your protein with the PubMed
PubMed is an online
database of scientific literature that you can search with terms of interest.
Using Pubmed, we can see what researchers have discovered about the biology
of proteins that are similar to your query sequence.
- Go to PubMed
- Near the top of the page you will
see a line that says: "Search PubMed for" followed by a box
for entering search terms.
- Click on the box and type the
name of the protein most similar to your query sequence. In order to
limit your search to proteins from Pseudomonas syringae, type
in "syringae" after the protein name, separating the two words
with a single space. Click on "Go"
- A new page will appear that shows
a list of articles in the PubMed database that contained your two search
terms.
- Click on the hyperlinked author's
names to view the abstract for each article.
- To get a more complete list of
publications, try searching PubMed with any alternate names you found
in the NCBI Genbank sequence record. You can also search using the names
of sequences that were not the most similar to your query sequence.
Part II: QUIZ
What did you learn during this module? Find out with this online Quiz!
|