Introduction

ChIP-Seq is the combination of chromatin immunoprecipitation (ChIP) assays with high-throughput sequencing (Seq) and can be used to identify DNA binding sites for transcription factors and other proteins. The goal of this hands-on session is to perform the basic steps of the analysis of ChIP-Seq data, as well as some downstream analysis. Throughout this practical we will try to identify potential transcription factor binding sites of PAX5 in human lymphoblastoid cells.

Learning outcomes

By the end of this tutorial you can expect to be able to:

  • generate an unspliced alignment by aligning raw sequencing data to the human genome using Bowtie2
  • manipulate the SAM output in order to visualise the alignment in IGV
  • based on the aligned reads, find immuno-enriched areas using the peak caller MACS2
  • perform functional annotation and motif analysis on the predicted binding regions

Tutorial sections

This tutorial comprises the following sections:

  1. Introducing the tutorial dataset
  2. Aligning the PAX5 sample to the genome
  3. Manipulating SAM output
  4. Visualising alignments in IGV
  5. Aligning the control sample to the genome
  6. Identifying enriched areas using MACS
  7. File formats
  8. Inspecting genomic regions using bedtools
  9. Motif analysis

Authors

This tutorial was converted into a Jupyter notebook by Victoria Offord based on materials developed by Angela Goncalves, Myrto Kostadima, Steven Wilder and Maria Xenophontos.

Running the commands from this tutorial

You can run the commands in this tutorial either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.


In [ ]:
pwd

In [ ]:
ls -l

Running commands in the terminal

You can also follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.


In [ ]:
echo cd $PWD

Open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:


In [ ]:
cd /home/manager/pathogen-informatics-training/Notebooks/ChIP-Seq/

Now you can follow the instructions in the tutorial from here.

Prerequisites

This tutorial assumes that you have the following software or packages and their dependencies installed on your computer. The software or packages used in this tutorial may be updated from time to time so, we have also given you the version which was used when writing the tutorial.

Let's get started!

To get started with the tutorial, head to the first section: introducing the tutorial dataset.

The answers to all questions in the tutorial can be found in answers.ipynb.