First we will import the three packages used to make the plot. Pandas is used to import the data and put it in a pandas dataframe. A pandas dataframe is an object type that works well for building plots. It stores data in rows and columns like an excel file. We'll use matplotlib to dipay the chart. The last import, seaborn will be used to construct the plot. Seaborn will output a matplotlib axis object. Therefore can use standard matplotlib methods to modify the seaborn plot
In [1]:
# seaborn bar chart
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)
Now we need to import the data. The data in the .csv file represents the density of wood blocks from IKEA. The blocks are either rectangle, square, triangle or cylinder in shape. There is a column for each shape. The data points are density measurements in g/cm3. We use pandas read_csv() function to read the data and put it in a dataframe. df is a common pandas datafram varible name. Seaborn can't plot .csv files directly, but it can plot pandas dataframes.
In [3]:
df = pd.read_csv('LAB_3_large_data_set_cleaned.csv')
In [4]:
df
Out[4]:
Time to build the plot. We will call seaborns sns.barplot() function to create a nice-looking plot with error bars already included. The arguments within the function are the data=df, this tells seaborn which dataset to plot. We don't need to do any further manipulation of our dataframe becuase the data is already in columns with the dataframe column headers corresponding to the block shape. The second argument palette="pastel" will color our bars a nice pastel group of colors. You could also try palette="muted", palette="PRGn" or palette="Set3". Seaborn has many different built-in pallet types.
In [5]:
sns.barplot(data=df, palette="pastel")
Out[5]:
Now we can do a little more plot customization. First thing we are going to customize is the error bars. Seaborns sns.barplot() function will build a plot with the standard seaborn confidence interval as the length of the error bars. From my point of view, this gives a warped sense of confidence in the data. Instead of seaborns default confidence interval as the error bar length, let's use the standard deviation instead. The ci="sd" argument will set the confidence interval(ci) to the standard deviation(sd). We are also going style the error bars. I like caps on top of error bars. I think it makes it easier to see across and compare bar and error bar height. We'll add error bar caps with the errwidth=1 and capsize=0.1 arguments. errwidth is the line width of the error bars. Setting erridth to 1 will style the error bars to look like the line thickness in the rest of the plot. capsize=0.1 will set the error bar cap size to a good width. You can play around with different capsizes, but I like 0.1 for this plot.
In [10]:
sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
plt.show()
Next we are going to add some labels to the bar plot. We need to add an x-axis label which shows density and the units g/cm3. We are also going to add a title. Seabrons sns.barplot() function outputs standard matplotlib axis objects. If we assign the plot to an axis object variable (ax is a typical varaible name), we can then modify that axis using standard matplotlib methods. We'll use the ax.set_ylabel() method to add the y-axis label, and ax.set_title() to add a title to the plot.
In [12]:
ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density (g/cm3)')
ax.set_title('Density of IKEA wood blocks based on shape')
plt.show()
Now the y-axis is included on the plot, but it still looks a little funny. The units g/cm3 don't have the cubed (3) as a super script like it should be. Matplotlib allows LaTeX formatting in text. We'll use LaTeX formatting to make the cubed (3) a super script. To call LaTeX formating we surround the LaTeX expression in dollar signs ($). In LaTeX putting a number in superscript is done with the carrot character (^).
In [14]:
ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density ($g/cm^3$)')
ax.set_title('Density of IKEA wood blocks based on shape')
plt.show()
Our final plot looks great! We created a bar chart with seabron and modified some of the plot elements. We took advantage of seaborn plots being matplotlib axis objects which allowed us to modify the plot further. We even added some fancy LaTeX to style a super script. The full code is below.
In [15]:
# seaborn bar chart
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)
df = pd.read_csv('LAB_3_large_data_set_cleaned.csv')
ax = sns.barplot(data=df, palette="pastel" ,ci='sd', errwidth=1, capsize=0.1)
ax.set_ylabel('Density ($g/cm^3$)')
ax.set_title('Density of IKEA wood blocks based on shape')
plt.show()
In [ ]: