# Seaborn - categorical plotting

``````

In [1]:

import seaborn as sns
%matplotlib inline

``````
``````

Out[1]:

.dataframe thead tr:only-child th {
text-align: right;
}

.dataframe thead th {
text-align: left;
}

.dataframe tbody tr th {
vertical-align: top;
}

total_bill
tip
sex
smoker
day
time
size

0
16.99
1.01
Female
No
Sun
Dinner
2

1
10.34
1.66
Male
No
Sun
Dinner
3

2
21.01
3.50
Male
No
Sun
Dinner
3

3
23.68
3.31
Male
No
Sun
Dinner
2

4
24.59
3.61
Female
No
Sun
Dinner
4

``````

# Barplot

Barplot is used to indicate some measure of central tendancy. Seaborn adds some descriptors to indicate the variance in the data. Call this with a categorical column in X and numerical column for Y

``````

In [2]:

sns.barplot(x='sex', y='total_bill', data=tips)

``````
``````

Out[2]:

<matplotlib.axes._subplots.AxesSubplot at 0x1101e1860>

``````

Thus the average bill for men was higher than women.

## Countplot

If you want a regular bar chart that shows the count of data, then do a `countplot`

``````

In [4]:

sns.countplot(x='sex', data=tips)

``````
``````

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x1127c1320>

``````

## Boxplot

Boxplots are very common. It is used to display distribution of data as well as outliers. A boxplot splits the data into 4 `quantiles` or `quartiles`. The `median` is represented as a horizontal line with the quartile +- medain in solid shade. The end of the whiskers may represent the ends of the remaining quartiles

If outliers are calculated, then whiskers are shorter and values greater than `1.5` times the `IQR` - Inter Quartile Range are considered outliers.

``````

In [2]:

sns.boxplot(x='time', y='total_bill', data=tips)

``````
``````

Out[2]:

<matplotlib.axes._subplots.AxesSubplot at 0x1129ecdd8>

``````

We can interpret this as people spend more on dinner on average than lunch. The median is higher. Yet there is higher variability as well with the amount spent on dinner. The lowest being lower than lunch.

``````

In [6]:

sns.boxplot(x='time',y='total_bill', data=tips, hue='sex')

``````
``````

Out[6]:

<matplotlib.axes._subplots.AxesSubplot at 0x1150d1278>

``````

## Violin plot

A violin plot builds on a boxplot by showing KDE of the data distribution.

``````

In [9]:

sns.violinplot(x='time',y='total_bill', data=tips)

``````
``````

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x1155731d0>

``````

You can see, lunch bills are tighter around the median compared to dinner. The `Q3` of dinner is long, which can be noticed in the spread of the green violin plot.

``````

In [10]:

sns.violinplot(x='time', y='total_bill', data=tips, hue='sex', split=True)

``````
``````

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x115587fd0>

``````

From this plot, we assert our experience so far that women's bills are lesser than men - the width of the violin is higher on the lower end.

## Stirp plot

Strip plot is like a scatter plot for a categorial data. You specify a categorial column for X and numeric for Y.

``````

In [12]:

sns.stripplot(x='time', y='total_bill', data=tips)

``````
``````

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x11577bd68>

``````

To make out the data distribution, you can add some jitter to the plot. Jitter will shift the points laterally in a random manner.

``````

In [13]:

sns.stripplot(x='time', y='total_bill', data=tips, jitter=True)

``````
``````

Out[13]:

<matplotlib.axes._subplots.AxesSubplot at 0x115637518>

``````

## Swarm plot

Swram plots are a combination of violin and strip plots. It shows the real data distribution using actual point values.

``````

In [14]:

sns.swarmplot(x='time', y='total_bill', data=tips)

``````
``````

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x115997978>

``````

You can combine a violin and swarm plot to see how the KDE is calculated and smooths

``````

In [15]:

sns.violinplot(x='time', y='total_bill', data=tips)
sns.swarmplot(x='time', y='total_bill', data=tips, color='black')

``````
``````

Out[15]:

<matplotlib.axes._subplots.AxesSubplot at 0x1159aa278>

``````