In [22]:
%%HTML
<style>
.container { width:100% !important; }
.input{ width:60% !important;
align: center;
}
.text_cell{ width:70% !important;
font-size: 16px;}
.title {align:center !important;}
</style>
Now that we are familiar with the framework's basics, we can start showing the full capabilities of Shaolin. In order to do that I will rewiev one of the most "simple" and widely used plots in data science: The scatter plot. We provide in the dashboards section of the Shaolin framework several Dashboards suited for complex data processing, and the Bokeh Scatterplot is the one on which we are going to center this tutorial. All the individual components of this Dashboard will be explained deeply in further tutorials.
A scatter plot, as we all know is a kind of plot in which we represent two datapoint vectors (x and y) against each other and assign a marker(by now just a circle) to each pair of data points. Althoug the x and y coordinates of the marker are the only two compulsory parameters, it is also possible to customize the following parameters for a circle marker:
It is possible to fully customize what data from the data structure we want to plot willn be mapped to a marker parameter and how that mapping will be performed. In order to assign values to a marker parameter we have to follow this process:
This means that we could theoretically plot 8 dimensional data by mapping each parameter to a coordinate of a data point, but in practise it is sometime more usefull to use more than one parameter to map the same vector of data points in order to emphatise some feature of the data we are plotting. For example, we could map the fill_color parameter and the fill_alpha parameter to the same feature so it would be easy to emphatise the higher values of the plotted vector.
The scatter plot is a Dashboard with the following attributes:
In the following diagram you can se the process of how data is mapped into visual information.
For this example we will use the classic Iris dataset imported from the bokeh data samples.
In [1]:
from IPython.display import Image #this is for displaying the widgets in the web version of the notebook
from shaolin.dashboards.bokeh import ScatterPlot
from bokeh.sampledata.iris import flowers
scplot = ScatterPlot(flowers)
The data contained in the blocks described in the above diagram gcan be accessed the following way:
In [2]:
scplot.data.head()
Out[2]:
In [3]:
scplot.tooltip.output.head()
Out[3]:
In [4]:
scplot.mapper.output.head()
Out[4]:
In [5]:
scplot.output.head()
Out[5]:
The scatter plot Dashboard contains the bokeh scatter plot and a widget. That widget is a toggle menu that can display two Dashboards:
The complete plot interface can be displayed calling the function show.
As you will see, the interface layout has not been yet customized, so any suggestion regarding interface desing will be appreciated.
This is the Dashboard that allows to customize how the data will be plotted. We will color each of its components so its easier to locate them. This is a good example of a complex Dashboard comprised of multiple Dashboards.
In [6]:
mapper = scplot.mapper
mapper.buttons.widget.layout.border = "blue solid"
mapper.buttons.value = 'line_width'
mapper.line_width.data_scaler.widget.layout.border = 'yellow solid'
mapper.line_width.data_slicer.widget.layout.border = 'red solid 0.4em'
mapper.line_width.data_slicer.columns_slicer.widget.layout.border = 'green solid 0.4em'
mapper.line_width.data_slicer.index_slicer.widget.layout.border = 'green solid 0.4em'
mapper.line_width.default_value.widget.layout.border = 'purple solid 0.4em'
mapper.line_width.apply_row.widget.layout.border = "pink solid 0.4em"
scplot.widget
Image(filename='scatter_data/img_1.png')
Out[6]:
A plot mapper has the following components:
It is possible to choose what information from the data attribute of the ScatterPlot will be shown when hovering above a marker.
In the above cell we click in the "tooltip" button of the toggleButtons in order to make the widget visible. As we can see there is a SelectMultiple widget for every column of the original DataFrame.
In [37]:
scplot.widget
Image(filename='scatter_data/img_2.png')
Out[37]:
Although it is possible to save the bokeh plot with any of the standard methods that the bokeh library offers by accessing the plot attribute of the ScatterPlot, shaolin offers the possibility of saving an snapshot of the plot as a shaolin widget compatible with the framework, this way it can be included in a Dashboard for displaying purposes.
This process is done by accessing the snapshot attribute of the scatterPlot. This way the current plot is exported and we can keep working with the ScatterPlot Dashboard in case we need to make more plots. An snapshot is an HTML widget which value is an exported notebook_div of the plot.
In [39]:
widget_plot = scplot.snapshot
widget_plot.widget
Image(filename='scatter_data/img_3.png')
Out[39]:
It is also possible to plot a pandas Panel or a Panel4d the same way as a DataFrame. The only resctriction for now is that the axis that will be used as index must be the major_axis in case of a Panel and the items axis in case of a Panel4D. The tooltips are disabled, custom tooltips will be available in the next release.
It would be nice to have feedback on how would you like to display and select the tooltips.
In [40]:
from pandas.io.data import DataReader# I know its deprecated but i can't make the pandas_datareader work :P
import datetime
symbols_list = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)
panel = DataReader( symbols_list, start=start, end=end,data_source='yahoo')
In [41]:
panel
Out[41]:
In [ ]:
sc_panel = ScatterPlot(panel)
In [44]:
#sc_panel.show()
Image(filename='scatter_data/img_4.png')
Out[44]:
In [ ]: