In [1]:
import seaborn as sns
%matplotlib inline
tips = sns.load_dataset('tips')
tips.head()
Out[1]:
In [2]:
sns.barplot(x='sex', y='total_bill', data=tips)
Out[2]:
Thus the average bill for men was higher than women.
In [4]:
sns.countplot(x='sex', data=tips)
Out[4]:
Boxplots are very common. It is used to display distribution of data as well as outliers. A boxplot splits the data into 4 quantiles
or quartiles
. The median
is represented as a horizontal line with the quartile +- medain in solid shade. The end of the whiskers may represent the ends of the remaining quartiles
If outliers are calculated, then whiskers are shorter and values greater than 1.5
times the IQR
- Inter Quartile Range are considered outliers.
In [2]:
sns.boxplot(x='time', y='total_bill', data=tips)
Out[2]:
We can interpret this as people spend more on dinner on average than lunch. The median is higher. Yet there is higher variability as well with the amount spent on dinner. The lowest being lower than lunch.
In [6]:
sns.boxplot(x='time',y='total_bill', data=tips, hue='sex')
Out[6]:
In [9]:
sns.violinplot(x='time',y='total_bill', data=tips)
Out[9]:
You can see, lunch bills are tighter around the median compared to dinner. The Q3
of dinner is long, which can be noticed in the spread of the green violin plot.
In [10]:
sns.violinplot(x='time', y='total_bill', data=tips, hue='sex', split=True)
Out[10]:
From this plot, we assert our experience so far that women's bills are lesser than men - the width of the violin is higher on the lower end.
In [12]:
sns.stripplot(x='time', y='total_bill', data=tips)
Out[12]:
To make out the data distribution, you can add some jitter to the plot. Jitter will shift the points laterally in a random manner.
In [13]:
sns.stripplot(x='time', y='total_bill', data=tips, jitter=True)
Out[13]:
In [14]:
sns.swarmplot(x='time', y='total_bill', data=tips)
Out[14]:
You can combine a violin and swarm plot to see how the KDE is calculated and smooths
In [15]:
sns.violinplot(x='time', y='total_bill', data=tips)
sns.swarmplot(x='time', y='total_bill', data=tips, color='black')
Out[15]: