Installation

pip install https://github.com/khramts/assocplots/archive/master.zip

This tutorial provides examples of code for interactive Manhattan and QQ pltos. Please note that the interactive figure will not be generated in this notebook, but will be saved as an html output, which should be viewed in a browser.

Data for the tutorial

There are two data options presented in this tutorial: mock and real data. One option is to generate mock data, but this is commented out, as we will use real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium.


In [1]:
# from assocplots.misc import mock_data_generation
# data_m, data_w = mock_data_generation(M=100000, seed=42)
# data_m['pval'] /= 500000.*np.exp(-(data_m['pos']-10000.)**2/50000.0) * (data_m['chr']=='4') * np.random.rand(len(data_m)) + 1.

The actual data that we will use comes from the GIANT consortium: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files

Result are described in Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, Monda KL, et al. (2013) Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits. PLoS Genet 9(6): e1003500. doi:10.1371/journal.pgen.1003500 http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003500

In this tutorial we will be using one trait (hip circumference) measured in two groups: males and females. These are the files listed under Sex Stratified Anthropometrics subsection. For example, here is one of the files called GIANT_Randall2013PlosGenet_stage1_publicrelease_HapMapCeuFreq_HIP_WOMEN_N.txt and the first couple of lines looks like this:

MarkerName A1 A2 Freq.Hapmap.Ceu BETA SE.2gc P.2gc N rs4747841 a g 0.55 0.0054 0.0080 0.50 40354.8 rs4749917 t c 0.45 -0.0054 0.0080 0.50 40354.8 rs737656 a g 0.3667 0.0035 0.0083 0.67 40354.7 rs737657 a g 0.3583 0.0020 0.0083 0.81 40351.8

The P.2gc column is the p-value of the association test. For the Manhattan plot, besides the p-value, we also need to know SNPs chromosome and genomic position. To obtain the chromosome number and position for each SNP I used a python script called LiftRsNumber.py from this Goncalo Abecasis’ group http://genome.sph.umich.edu/wiki/LiftOver

Since we only need to know the SNP's chromosome, position, and p-value, I generated the following file out of the one above: HIP_WOMEN_chr_pos_rs_pval.txt, where column 1 = chromosome, 2=position, 3=SNP rs number, 4=p-value

10 9918166 rs4747841 0.5 10 9918296 rs4749917 0.5 10 98252982 rs737656 0.67 10 98253133 rs737657 0.81

Alternatively, you can download reduced data from https://gitlab.com/khramts/assocplots_data

Running the code

We will first load all the necessary libaries for the interative plots.


In [2]:
# Load standard libraries
import numpy as np
from pandas import DataFrame
from bokeh.plotting import figure, output_notebook, show, gridplot
from bokeh.models import ColumnDataSource, widgets, CustomJS
from bokeh.models.glyphs import Circle, Square
from bokeh.models import HoverTool

from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, DateFormatter, TableColumn
from bokeh.io import output_file, show, vform, vplot, hplot


from assocplots.misc import mock_data_generation
data_m, data_w = mock_data_generation(M=100000, seed=42)
data_m['pval'] /= 500000.*np.exp(-(data_m['pos']-10000.)**2/50000.0) * (data_m['chr']=='4') * np.random.rand(len(data_m)) + 1.

Next, we will read in the data for two groups.


In [3]:
# # Reading data
# data_m = np.genfromtxt('HIP_MEN_chr_pos_rs_pval.txt', dtype=None, names=['chr', 'pos', 'snp', 'pval'])
# data_w = np.genfromtxt('HIP_WOMEN_chr_pos_rs_pval.txt', dtype=None, names=['chr', 'pos', 'snp', 'pval'])

Lastly, we run the code to generate an interactive plot and save it as an html file, which can be opened in a browser.

Note in the code below, that we can specify how many points to show in the plot. In general, interactive visualization made through web browsers are limited by the number of objects they can smoothly display. In the cur-rent example we have selected the top (most significant) N=1,000 SNPs in each of the two groups and are visualizing at most 2,000 dots if there is no overlap between those SNPs.


In [5]:
from assocplots.interactive import *
import matplotlib.pyplot as plt
from matplotlib.colors import rgb2hex

# cut1, cut2, data = data_reduce(data_m, data_w, N=5000)
cut1, cut2, data = data_reduce_fast(data_m, data_w, N=1000)

# You can assign any color sequence:
colors = ['#1b9e77', "#d95f02", '#7570b3', '#e7298a']

# Or choose a matplotlib colormap and extract colors from it:
cmap = plt.get_cmap('viridis')
# Select points of the colormap
colors = [cmap(i) for i in [0.0,0.33,0.67,0.90]]
# Converting RGB color format into HEX
colors = [rgb2hex(colors[i]) for i in range(len(colors))]

p1,p2,p3,p4,pq1 = mann_only_interactive(data, cut1, cut2, 
                                        chrs_plot=[str(i) for i in range(1,23)],
                                        ms=6, # size of markers
                                        color_sequence = colors, # any number of colors for M-plot
                                       )
# show(vplot(p1,p2))
# show(hplot(pq1,p4))
# show(p4)


from assocplots.htmloutput import *
write_to_html([p1,p2,pq1,p4], filename='output.html', title='Title')


1990
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22']
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22']
840700
825100
671600
649200
611800
580300
541500
489400
468500
460600
443600
453700
317800
293200
273700
304800
271500
261500
199100
212200
111000
113400
[      0.  840700.  825100.  671600.  649200.  611800.  580300.  541500.
  489400.  468500.  460600.  443600.  453700.  317800.  293200.  273700.
  304800.  271500.  261500.  199100.  212200.  111000.  113400.]
[       0.   840700.  1665800.  2337400.  2986600.  3598400.  4178700.
  4720200.  5209600.  5678100.  6138700.  6582300.  7036000.  7353800.
  7647000.  7920700.  8225500.  8497000.  8758500.  8957600.  9169800.
  9280800.  9394200.]
[       0.   840700.  1665800.  2337400.  2986600.  3598400.  4178700.
  4720200.  5209600.  5678100.  6138700.  6582300.  7036000.  7353800.
  7647000.  7920700.  8225500.  8497000.  8758500.  8957600.  9169800.
  9280800.  9394200.]
[  420350.  1253250.  2001600.  2662000.  3292500.  3888550.  4449450.
  4964900.  5443850.  5908400.  6360500.  6809150.  7194900.  7500400.
  7783850.  8073100.  8361250.  8627750.  8858050.  9063700.  9225300.
  9337500.]
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22']
E:\katya\PycharmProjects\assocplots\assocplots\interactive.py:235: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  ts['abspos'][ts['chr'] == chrs[i]] += xtixks_pos[i]
E:\katya\PycharmProjects\assocplots\assocplots\interactive.py:250: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  ts['color'][ts['chr'] == chrs[i]] = color_sequence[i % len(color_sequence)]
E:\katya\PycharmProjects\assocplots\assocplots\interactive.py:377: BokehDeprecationWarning: bokeh.io.vform was deprecated in Bokeh 0.12.0; please use bokeh.models.layouts.WidgetBox instead
  p3 = vform(data_table)
C:\Anaconda3\lib\site-packages\bokeh\io.py:613: BokehDeprecationWarning: bokeh.models.layouts.VBoxForm was deprecated in Bokeh 0.12.0; please use bokeh.models.layouts.WidgetBox instead
  return VBoxForm(*children, **kwargs)
E:\katya\PycharmProjects\assocplots\assocplots\interactive.py:380: BokehDeprecationWarning: bokeh.io.vform was deprecated in Bokeh 0.12.0; please use bokeh.models.layouts.WidgetBox instead
  p4 = vform(data_table_filt)
C:\Anaconda3\lib\site-packages\bokeh\io.py:613: BokehDeprecationWarning: bokeh.models.layouts.VBoxForm was deprecated in Bokeh 0.12.0; please use bokeh.models.layouts.WidgetBox instead
  return VBoxForm(*children, **kwargs)

In [ ]: