Nyaplot Tutorial 2: Interaction with DataFrame

I have alreay explained about Nyaplot::DataFrame in tutorial 1, but it's not enough to tell the usefulness. This notebook consists of 2 use case using DataFrame.


In [1]:
require 'nyaplot'


Out[1]:
Out[1]:
true

Case 1: Scatter with original tooltips

First, prepare sample data and put it into DataFrame. Then build a scatter plot based on it.


In [2]:
samples = Array.new(10).map.with_index{|d,i| 'cat'+i.to_s}
x=[];y=[];home=[]
10.times do
  x.push(5*rand)
  y.push(5*rand)
end
df = Nyaplot::DataFrame.new({x:x,y:y,name:samples})
df


Out[2]:
xyname
4.4840457212384931.4236755629461113cat0
0.091923468254613843.859486471834251cat1
2.76912119267972030.2724593604033648cat2
3.7905775381933764.3849875032095715cat3
3.0756637716956021.4208854855110642cat4
1.07681032587402741.3575402862086254cat5
1.72662825640274582.061477073499303cat6
3.09545122522183820.3593608483818478cat7
2.79415219114556960.4127745317214565cat8
4.5230752372789582.2397417984244443cat9

In [3]:
plot = Nyaplot::Plot.new
plot.x_label("weight [kg]")
plot.y_label("height [m]")
sc = plot.add_with_df(df, :scatter, :x, :y)


Out[3]:
#<Nyaplot::Diagram:0xb930de14 @properties={:type=>:scatter, :options=>{:x=>:x, :y=>:y}, :data=>"d9b05d19-7b1b-411b-bbc9-eb0e2324b8f6"}, @xrange=[0.09192346825461384, 4.523075237278958], @yrange=[0.2724593604033648, 4.3849875032095715]>

In [4]:
plot.show


Out[4]:

The plot above is not contain name information, so add it into tool-tip. Use tooltip_contents to add contents to tool-tip.


In [5]:
sc.tooltip_contents([:name])
plot.show


Out[5]:

Tool-tip can include multiple lines, but the DataFrame has only three columns and that's not enough to add more line to tool-tip. Let's add home column to it.


In [6]:
address = ['London', 'Kyoto', 'Los Angeles', 'Puretoria']
home = Array.new(10,'').map{|d| address.clone.sample}
df.home = home
df


Out[6]:
xynamehome
4.4840457212384931.4236755629461113cat0Puretoria
0.091923468254613843.859486471834251cat1London
2.76912119267972030.2724593604033648cat2London
3.7905775381933764.3849875032095715cat3London
3.0756637716956021.4208854855110642cat4Kyoto
1.07681032587402741.3575402862086254cat5Kyoto
1.72662825640274582.061477073499303cat6Puretoria
3.09545122522183820.3593608483818478cat7Puretoria
2.79415219114556960.4127745317214565cat8Los Angeles
4.5230752372789582.2397417984244443cat9Los Angeles

In [7]:
sc.tooltip_contents([:name, :home])
plot.show


Out[7]:

Then, fill points on the scatter in different colors according to 'home' column. To do so, specify column name by fill_by method.


In [8]:
colors = Nyaplot::Colors.qual


Out[8]:
rgb(102,194,165)rgb(252,141,98)rgb(141,160,203)rgb(231,138,195)rgb(166,216,84)rgb(255,217,47)rgb(229,196,148)rgb(179,179,179)
        

In [9]:
sc.color(colors)
sc.fill_by(:home)
plot.show


Out[9]:

Use shape_by method to change shape according to value in a column.


In [10]:
sc.color(colors)
sc.shape_by(:home)
plot.show


Out[10]:

Case 2: Multiple panes

DataFrame is also useful when visualizing data in multiple panes. Let's create plot from data about mutation.
First, fetch data from csv file. (All data used in this Tutorial is included in Nyaplot's repository: /examples/notebook/data/*)


In [11]:
path = File.expand_path("../data/first.tab", __FILE__)
df = Nyaplot::DataFrame.from_csv(path, sep="\t")


Out[11]:
mutationbloodset1set2set3set12set21set31
G>A0.00.0192307692307692320.00.482142857142857150.00.00.4782608695652174
C>T0.00.425925925925925930.00.00.3750.00.0
C>G0.00.00.00.00.00.5250.0
C>A0.00.00.19354838709677420.00.00.46666666666666670.0
C>A0.00.00.083333333333333330.00.00.51612903225806450.0
G>T0.00.00.00.00.44444444444444440.00.0
C>G0.00.00.00.00.00.00.4
C>A0.00.00.033333333333333330.00.00.428571428571428550.0
A>C0.00.61538461538461540.00.00.59259259259259260.00.0
C>A0.00.00.00.00.00.320.0
C>A0.00.00.00.00.00.42307692307692310.0
T>A0.00.5250.00.00.52777777777777780.00.0
C>T0.00.428571428571428550.00.00.66666666666666660.00.0
G>A0.00.00.00.00.00.50.03225806451612903
T>C0.00.00.00.00.00.00.3793103448275862
C>T0.00.00.00.457142857142857130.00.00.46875
........................
A>T0.00.47169811320754720.00.0144927536231884060.31428571428571430.00.0

Now I want to plot SET1 column, but it contains many zero cells. Then filter them out.


In [12]:
df.filter! {|row| row[:set1] != 0.0}
df


Out[12]:
mutationbloodset1set2set3set12set21set31
G>A0.00.0192307692307692320.00.482142857142857150.00.00.4782608695652174
C>T0.00.425925925925925930.00.00.3750.00.0
A>C0.00.61538461538461540.00.00.59259259259259260.00.0
T>A0.00.5250.00.00.52777777777777780.00.0
C>T0.00.428571428571428550.00.00.66666666666666660.00.0
G>A0.00.456521739130434760.00.00.434782608695652160.00.0
C>T0.00.098039215686274510.00.00.371428571428571440.00.0
T>A0.00.57692307692307690.00.00.47826086956521740.00.0
A>G0.00.438596491228070150.00.00.51428571428571420.00.0
T>C0.00.58064516129032260.00.00.63414634146341460.00.0
G>A0.00.0147058823529411760.00.00.00.00.41935483870967744
G>A0.00.66666666666666660.00.00.50.00.0
C>T0.00.5322580645161290.00.00.52272727272727270.00.0
G>A0.00.56521739130434780.00.00.38709677419354840.00.0
G>A0.00.428571428571428550.00.00.33333333333333330.00.0
C>A0.00.48888888888888890.00.00.432432432432432460.00.0
........................
A>T0.00.47169811320754720.00.0144927536231884060.31428571428571430.00.0

Next prepare instance of Nyaplot::Plot as usual. Nyaplot::Plot.filter is a method for adding 'filter box' to the plot.


In [13]:
plot4=Nyaplot::Plot.new
plot4.add_with_df(df, :histogram, :set1)
plot4.configure do
  height(400)
  x_label('PNR')
  y_label('Frequency')
  filter({target:'x'})
  yrange([0,130])
end

plot5=Nyaplot::Plot.new
plot5.add_with_df(df, :bar, :mutation)
plot5.configure do
  height(400)
  x_label('Mutation types')
  y_label('Frequency')
  yrange([0,100])
end


Out[13]:

Then create an instance of Nyaplot::Frame. It can hold multiple plots in it, and it helps them to interact with each other.


In [14]:
frame = Nyaplot::Frame.new
frame.add(plot4)
frame.add(plot5)
frame.show


Out[14]: