First we will import the three packages used to make the plot. Pandas is used to import the data and put it in a pandas dataframe. A pandas dataframe is an object type that works well for building plots. It stores data in rows and columns like an excel file. We'll use matplotlib to dipay the chart. The last import, seaborn will be used to construct the plot. Seaborn will output a matplotlib axis object. Therefore can use standard matplotlib methods to modify the seaborn plot


In [1]:
# seaborn bar chart

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)

Now we need to import the data. The data in the .csv file represents the density of wood blocks from IKEA. The blocks are either rectangle, square, triangle or cylinder in shape. There is a column for each shape. The data points are density measurements in g/cm3. We use pandas read_csv() function to read the data and put it in a dataframe. df is a common pandas datafram varible name. Seaborn can't plot .csv files directly, but it can plot pandas dataframes.


In [3]:
df = pd.read_csv('LAB_3_large_data_set_cleaned.csv')

In [4]:
df


Out[4]:
Rectangle Square Triangle Cylinder
0 0.560 0.760 0.244 0.943
1 0.822 0.760 0.252 0.943
2 0.628 0.750 0.299 0.932
3 0.557 0.730 0.308 0.910
4 0.516 0.730 0.323 0.860
5 0.736 0.726 0.326 0.839
6 0.561 0.717 0.327 0.825
7 0.646 0.717 0.332 0.817
8 0.510 0.713 0.338 0.804
9 0.900 0.704 0.365 0.804
10 0.610 0.700 0.405 0.795
11 0.650 0.695 0.412 0.790
12 0.625 0.691 0.430 0.786
13 0.665 0.690 0.435 0.779
14 1.045 0.689 0.454 0.779
15 0.437 0.680 0.456 0.777
16 0.538 0.677 0.460 0.777
17 0.762 0.672 0.467 0.776
18 0.560 0.672 0.471 0.776
19 0.596 0.670 0.480 0.770
20 0.583 0.665 0.484 0.768
21 0.640 0.661 0.484 0.767
22 0.666 0.661 0.485 0.767
23 0.769 0.660 0.486 0.765
24 0.532 0.660 0.491 0.762
25 0.457 0.654 0.491 0.762
26 0.648 0.650 0.495 0.761
27 0.616 0.650 0.497 0.761
28 0.768 0.649 0.497 0.753
29 0.440 0.648 0.499 0.753
... ... ... ... ...
203 NaN NaN 0.745 NaN
204 NaN NaN 0.748 NaN
205 NaN NaN 0.748 NaN
206 NaN NaN 0.750 NaN
207 NaN NaN 0.750 NaN
208 NaN NaN 0.750 NaN
209 NaN NaN 0.751 NaN
210 NaN NaN 0.752 NaN
211 NaN NaN 0.752 NaN
212 NaN NaN 0.752 NaN
213 NaN NaN 0.752 NaN
214 NaN NaN 0.763 NaN
215 NaN NaN 0.770 NaN
216 NaN NaN 0.771 NaN
217 NaN NaN 0.771 NaN
218 NaN NaN 0.775 NaN
219 NaN NaN 0.778 NaN
220 NaN NaN 0.780 NaN
221 NaN NaN 0.780 NaN
222 NaN NaN 0.780 NaN
223 NaN NaN 0.781 NaN
224 NaN NaN 0.782 NaN
225 NaN NaN 0.784 NaN
226 NaN NaN 0.784 NaN
227 NaN NaN 0.806 NaN
228 NaN NaN 0.807 NaN
229 NaN NaN 0.813 NaN
230 NaN NaN 0.816 NaN
231 NaN NaN 0.823 NaN
232 NaN NaN 0.952 NaN

233 rows × 4 columns

Time to build the plot. We will call seaborns sns.barplot() function to create a nice-looking plot with error bars already included. The arguments within the function are the data=df, this tells seaborn which dataset to plot. We don't need to do any further manipulation of our dataframe becuase the data is already in columns with the dataframe column headers corresponding to the block shape. The second argument palette="pastel" will color our bars a nice pastel group of colors. You could also try palette="muted", palette="PRGn" or palette="Set3". Seaborn has many different built-in pallet types.


In [5]:
sns.barplot(data=df, palette="pastel")


Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a0fe7b9b0>

Now we can do a little more plot customization. First thing we are going to customize is the error bars. Seaborns sns.barplot() function will build a plot with the standard seaborn confidence interval as the length of the error bars. From my point of view, this gives a warped sense of confidence in the data. Instead of seaborns default confidence interval as the error bar length, let's use the standard deviation instead. The ci="sd" argument will set the confidence interval(ci) to the standard deviation(sd). We are also going style the error bars. I like caps on top of error bars. I think it makes it easier to see across and compare bar and error bar height. We'll add error bar caps with the errwidth=1 and capsize=0.1 arguments. errwidth is the line width of the error bars. Setting erridth to 1 will style the error bars to look like the line thickness in the rest of the plot. capsize=0.1 will set the error bar cap size to a good width. You can play around with different capsizes, but I like 0.1 for this plot.


In [10]:
sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
plt.show()


Next we are going to add some labels to the bar plot. We need to add an x-axis label which shows density and the units g/cm3. We are also going to add a title. Seabrons sns.barplot() function outputs standard matplotlib axis objects. If we assign the plot to an axis object variable (ax is a typical varaible name), we can then modify that axis using standard matplotlib methods. We'll use the ax.set_ylabel() method to add the y-axis label, and ax.set_title() to add a title to the plot.


In [12]:
ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density (g/cm3)')
ax.set_title('Density of IKEA wood blocks based on shape')
plt.show()


Now the y-axis is included on the plot, but it still looks a little funny. The units g/cm3 don't have the cubed (3) as a super script like it should be. Matplotlib allows LaTeX formatting in text. We'll use LaTeX formatting to make the cubed (3) a super script. To call LaTeX formating we surround the LaTeX expression in dollar signs ($). In LaTeX putting a number in superscript is done with the carrot character (^).


In [14]:
ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density ($g/cm^3$)')
ax.set_title('Density of IKEA wood blocks based on shape')
plt.show()


Our final plot looks great! We created a bar chart with seabron and modified some of the plot elements. We took advantage of seaborn plots being matplotlib axis objects which allowed us to modify the plot further. We even added some fancy LaTeX to style a super script. The full code is below.


In [15]:
# seaborn bar chart

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)

df = pd.read_csv('LAB_3_large_data_set_cleaned.csv')

ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density ($g/cm^3$)')
ax.set_title('Density of IKEA wood blocks based on shape')

plt.show()



In [ ]: