Scikit-Data library offers a set of functionalities to help the Data Analysts in their work.
Initially is just a small set of simple functionalities like convert a dataframe in a crostab dataframe using some specifics fields.
Other interesting functionality is offer a jupyter widget to offer interactive options to handle the data with graphical and tabular outputs.
To import the Scikit-Data Jupyter Widget just use the following code:
from skdata.widgets import SkDataWidget
In [1]:
from IPython.display import Image
from skdata.widgets import SkDataWidget
from skdata import SkData
The data used in this example was extracted from Kaggle Titanic challenge.
SPECIAL NOTES: Pclass is a proxy for socio-economic status (SES) 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower
Age is in Years; Fractional if Age less than One (1) If the Age is Estimated, it is in the form xx.5
With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored. The following are the definitions used for sibsp and parch.
Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored) Parent: Mother or Father of Passenger Aboard Titanic Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic
Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them. As well, some travelled with very close friends or neighbors in a village, however, the definitions do not support such relations."
In [2]:
sd = SkData('/tmp/titanic.h5')
sd.import_from(
source='../data/train.csv', index_col='PassengerId',
target_col='Survived'
)
To use SkDataWidget class, you need some SkData loaded:
w = SkDataWidget(sd)
You can use the show_chart method to change some parameters of the chart that show information of a cross tab of the fields selected:
w.display(dset_id='dset_id')
This method will use the parameters informed and create and show a chart and a data table.
In [3]:
sd['train'].summary()
Out[3]:
In [4]:
w = SkDataWidget(sd)
w.display(dset_id='train')
This should display the follow screen:
In [5]:
Image(filename='../data/img/initial_screen.png')
Out[5]:
If you want to see the chart just click at Chart option and you will see something like that:
In [6]:
Image(filename='../data/img/chart_screen.png')
Out[6]:
By default, the chart is displayed crossing each fields Xs
with Y
(chart type=individual). If you want to see a unique chart with all selected
fields Xs
crossed with Y
field, select the chart type option grouped
.
These are an initial functionalities to help handle and observe data phenomenons in a very quick way.