Introduction to BLAST

Introduction

Basic Local Alignment Search Tool (BLAST) is a powerful tool for comparing and identifying sequences which share similarity. This can be useful for several reasons:

  • Identifying an unknown sequence by finding annotated (or known) sequences which are similar
  • Finding similar sequences in other species (e.g. orthologs)
  • Predicting function by identifying similar regions in other sequences which already have a known function

In this tutorial, we are going to use a version of BLAST called BLAST+.

BLAST+ is split into different applications which are based on the type of sequence provided by you, the user, as well as the type of sequences in the database being searched. There are three things you will need each time you want to run a BLAST search:

  • A query sequence (can be nucleotide or protein)
  • A sequence database (can be nucleotide or protein)
  • A BLAST application (this will depend on your query sequence and database - more on this later!)

Why do I need this tutorial you may say! Well, running BLAST+ is like running a lab experiment. To get meaningful results, you must first optimize the conditions you are using. After this tutorial you will not only be able to run BLAST, but be able to tailor your search to your specific biological question.

Learning outcomes

By the end of this tutorial you can expect to be able to:

  • Create a BLAST database from your own sequences
  • Describe the difference between BLAST programs and when to use them
  • Run BLAST locally
  • Generate tailored BLAST output files

Tutorial sections

This tutorial is split into two sections:

Authors

This tutorial was created by Victoria Offord.

Running the commands from this tutorial

You can run the commands in this tutorial either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.

Running commands on Jupyter

If you are using Jupyter, command cells (like the one below) can be run by selecting the cell and clicking Cell -> Run from the menu above or using ctrl Enter to run the command. Let's give this a try by printing our working directory using the pwd command and listing the files within it. Run the commands in the two cells below.


In [ ]:
pwd

In [ ]:
ls -l

Running commands in the terminal

You can also follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.


In [ ]:
echo cd $PWD

Now open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:

cd /home/manager/pathogen-informatics-training/Notebooks/BLAST/

Now you can follow the instructions in the tutorial from here.

Let's get started!

This tutorial assumes that you have BLAST+ installed on your computer. For download and installation instructions, please see ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/.

To check that you have installed the software correctly, you can run the following command:


In [ ]:
blastn -h

This should return the following help message:

USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
    [-gapextend extend_penalty] [-perc_identity float_value]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
    [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.7.0+

Use '-help' to print detailed descriptions of command line arguments

For the first part of this tutorial, we are going to look at how to create a BLAST database from a file containing your own sequences. Answers to all of the questions can be found here. To get started with the tutorial, head to the first section: Part 1: Creating a BLAST database