In [15]:
# This section imports some important libraries that we'll need.
# the most important is the second line. After you run this, "plt" will be
# the plotting library in python.
import numpy
import matplotlib.pyplot as plt
In [18]:
my_data = [1,2.4,-1.3,4]
plt.plot(my_data)
plt.ylabel("yi")
plt.xlabel("hi")
plt.title("A plot!")
plt.show()
Try This
Using the documentation for matplotlib.pyplot found here make a plot for the following data set. In this case, you want to plot both x and y values.
In [25]:
height = [68, 66, 75, 71, 67, 65, 67, 75, 72, 74, 72, 75, 69, 70, 65, 64, 68, 73, 70, 76, 64, 64, 66, 63, 68, 62, 72, 74, 76, 69, 65, 65, 64, 66, 64, 72, 74]
shoe_size = [12 ,9 ,12 ,11 ,12 ,8.5 ,9 ,13 ,11 ,12 ,12 ,12 ,10 ,11 ,10 ,8 ,9 ,11 ,8 ,12 ,8 ,9 ,11 ,10 ,9 ,9 ,11 ,11 ,12 ,10 ,8 ,7 ,9 ,10 ,13 ,12 ,11]
# put your code here
Improving the plot
When I made the plot above, it was a very jumbled line chart, which didn't look very good. Using the tutorial figure out how to plot the points as dots rather than as a line chart.
In [26]:
# put your new plot here
If you look carefully at the data, you'll find cases where the same exact point shows up twice or more. One trick is to use alpha or the transparency of a dot to handle overlapping data. Using the argument alpha=0.3 in the plot above to make each particular dot slightly transparent.
In [29]:
# put your code here. Make sure you include axis labels and a title!
Challenge
Last week, we processed the congressional district data to produce a data set with the percent of the working population in each industry compared to the mean and median salary in that district. You opened that data in Excel to produce a plot.
Recreate a plot of percent of the district in engineering and the mean salary for that district. Make sure to label your axes.
(Hint) remember: all of the python you have learned will work in the notebook. You can still use open for instance.
In [31]:
# put your code here.
The line charts above are great for cardinal data, but python can do many other kinds of plots. Here, we'll make a bar chart. There are a few differences:
range and len. What are the values in fake_x_data?xticks to label our groupings.
In [60]:
month = ["jan", "feb", "mar", 'apr', 'may', 'june', 'july', 'aug', 'sept', 'oct', 'nov', 'dec']
high = [45, 48, 52, 58, 64, 69, 72, 73, 67, 59, 51, 47]
low = [36, 37, 39, 43, 47, 52, 54, 55, 52, 47, 41, 38]
fake_x_data = range(len(month))
width = .8
plt.bar(fake_x_data, high, width)
plt.xticks(fake_x_data, month)
plt.show()
Try it
Add another bar to the same chart for the monthly low temperature. Make sure you label your axes!
Now that we've created some simple charts in python, let's see how to make them beautiful!
Matplotlib comes with many pre-built style packages that can be used to make your plots automatically look professional. Try the samples below for a taste.
The full set can be found by running the command below:
In [52]:
list(plt.style.available)
Out[52]:
Use a style like this:
In [59]:
plt.style.use("ggplot")
Try it!
Set a context like above and rerun any of the plots in your document.
In [63]:
fake_data = [1, 1, 1, 1, 2, 2, 3, 4, 5, 1, 2, 3, 4, 6, 7]
plt.hist(fake_data)
plt.show()
Do it!
Read in the mean salary per congressional district and construct a histogram from that data.
bins argument. What happens if you set this too high? What if you set it too low? Generally, sqrt(n) is a good size for n data points.Challenge
Get the salary data from here (it's the tsv file). Import this file into python and build a histogram of salary.
range argument to hist to help truncate the plot.
In [65]:
# put your code here.
The plotting libraries in python are incredibly rich, and we have barely scratched the surface.
The best way to figure out what is possible is to look at the gallery in Matplotlib's documentation. Every plot in the gallery has sample python code that produces exactly that plot. Usually, when I'm trying to make a plot, I start with a gallery example and modify it to suite my needs.
The key thing to do when you use these examples is to figure out how to structure the data. If you can map your data format to the format in the example, then you can usually just use the examples directly.
For a comprehensive tutorial on plotting see the official guide from Matplotlib.
In [ ]: