The aims of this lab are:
matplotlib's colormaps, including the awesome viridis. matplotlib. First, import numpy and matplotlib libraries (don't forget the matplotlib inline magic command if you are using Jupyter notebook).
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
We discussed colors for categorical and quantitative data. We can further specify the quantitative cases into sequential and diverging. "Sequential" means that the underlying value has a sequential ordering and the color also just needs to change sequentially and monotonically.
In the "diverging" case, there should be a meaningful anchor point. For instance, the correlation values may be positive or negative. Both large positive correlation and large negative correlation are important and the sign of the correlation has an important meaning. Therefore, we would like to stitch two sequential colormap together, one from zero to +1, the other from zero to -1.
numpynumpy is one of the most important packages in Python. As the name suggests it handles all kinds of numerical manipulations and is the basis of pretty much all scientific packages. Actually, a pandas "series" is essentially a numpy array and a dataframe is essentially a bunch of numpy arrays grouped together.
If you use it wisely, it can easily give you 10x, 100x or even 1000x speed-up, although pandas takes care of such optimization under the hood in many cases. If you want to study numpy more, check out the official tutorial and "From Python to Numpy" book:
Let's plot a sine and cosine function. By the way, a common trick to plot a function is creating a list of x coordinate values (evenly spaced numbers over an interval) first. numpy has a function called linspace for that. By default, it creates 50 numbers that fill the interval that you pass.
In [2]:
np.linspace(0, 3)
Out[2]:
And a nice thing about numpy is that many operations just work with vectors.
In [2]:
np.linspace(0, 3, 10) # 10 numbers instead of 50
Out[2]:
If you want to apply a function to every value in a vector, you simply pass that vector to the function.
In [3]:
np.sin(np.linspace(0, 3, 10))
Out[3]:
Q: Let's plot sin and cos
In [4]:
# TODO: put your code here
Out[4]:
matplotlib picks a pretty good color pair by default! Orange-blue pair is colorblind-safe and it is like the color pair of every movie.
matplotlib has many qualitative (categorical) colorschemes. https://matplotlib.org/users/colormaps.html
You can access them through the following ways:
In [5]:
plt.cm.Pastel1
Out[5]:
or
In [7]:
pastel1 = plt.get_cmap('Pastel1')
pastel1
Out[7]:
You can also see the colors in the colormap in RGB.
In [8]:
pastel1.colors
Out[8]:
To get the first and second colors, you can use either ways:
In [9]:
plt.plot(x, np.sin(x), color=plt.cm.Pastel1(0))
plt.plot(x, np.cos(x), color=pastel1(1))
Out[9]:
Q: pick a qualitative colormap and then draw four different curves with four different colors in the colormap.
Note that the colorschemes are not necessarily colorblindness-safe nor lightness-varied! Think about whether the colormap you chose is a good one or not based on the criteria that we discussed.
In [13]:
# TODO: put your code here
Take a look at the tutorial about image processing in matplotlib: http://matplotlib.org/users/image_tutorial.html
We can also display an image using quantitative (sequential) colormaps. Download the image of a snake: https://github.com/yy/dviz-course/blob/master/m05-design/sneakySnake.png or use other image of your liking.
Check out imread() function that returns an numpy.array().
In [14]:
import matplotlib.image as mpimg
In [15]:
img = mpimg.imread('sneakySnake.png')
In [16]:
plt.imshow(img)
Out[16]:
How is the image stored?
In [17]:
img
Out[17]:
shape() method lets you know the dimensions of the array.
In [18]:
np.shape(img)
Out[18]:
This means that img is a three-dimensional array with 219 x 329 x 4 numbers. If you look at the image, you can easily see that 219 and 329 are the dimensions (height and width in terms of the number of pixels) of the image. What is 4?
We can actually create our own small image to investigate. Let's create a 3x3 image.
In [19]:
myimg = np.array([ [[1,0,0,1], [1,1,1,1], [1,1,1,1]],
[[1,1,1,1], [1,1,1,1], [1,0,0,1]],
[[1,1,1,1], [1,1,1,1], [1,0,1,0.5]] ])
plt.imshow(myimg)
Out[19]:
Q: Play with the values of the matrix, and explain what are each of the four dimensions (this matrix is 3x3x4) below.
Write your answer here
Let's assume that the first value of the four dimensions represents some data of your interest. You can obtain height x width x 1 matrix by doing img[:,:,0], which means give me the all of the first dimension (:), all of the second dimension (:), but only the first one from the last dimension (0).
In [20]:
plt.pcolormesh(img[:,:,0], cmap=plt.cm.viridis)
plt.colorbar()
Out[20]:
Q: Why is it flipped upside down? Take a look at the previous imshow example closely and compare the axes across these two displays. Let's flip the figure upside down to show it properly. This function numpy.flipud() may be handy.
In [21]:
# TODO: put your code here
Out[21]:
Q: Try another sequential colormap here.
In [22]:
# TODO: put your code here
Out[22]:
Q: Try a diverging colormap, say coolwarm.
In [23]:
# TODO: put your code here
Out[23]:
Although there are clear choices such as viridis for quantitative data, you can come up with various custom colormaps depending on your application. For instance, take a look at this video about colormaps for Oceanography: https://www.youtube.com/watch?v=XjHzLUnHeM0 There is a colormap designed specifically for the oxygen level, which has three regimes.
In [24]:
x = np.linspace(0, 3*np.pi)
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
Out[24]:
You can change the size of the whole figure by using figsize option. You specify the horizontal and vertical dimension in inches.
In [25]:
plt.figure(figsize=(4,3))
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
Out[25]:
A very common mistake is making the plot too big compared to the labels and ticks.
In [26]:
plt.figure(figsize=(80, 20))
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
Out[26]:
If you shrink this plot into a reasonable size, you cannot read the labels anymore! Actually this is one of the most common comments that I provide to my students!
You can adjust the range using xlim and ylim
In [27]:
plt.figure(figsize=(4,3))
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.xlim((0,4))
plt.ylim((-0.5, 1))
Out[27]:
You can adjust the ticks.
In [28]:
plt.figure(figsize=(4,3))
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.xticks(np.arange(0, 10, 4))
Out[28]:
colors, linewidth, and so on.
In [29]:
plt.figure(figsize=(7,4))
plt.xlabel("Some variable")
plt.ylabel("Another variable")
plt.plot(x, np.sin(x), color='red', linewidth=5, label="sine")
plt.plot(x, np.cos(x), label='cosine')
plt.legend(loc='lower left')
Out[29]:
For more information, take a look at this excellent tutorial: https://www.labri.fr/perso/nrougier/teaching/matplotlib/matplotlib.html
Q: Now, pick an interesting dataset (e.g. from vega_datasets package) and create a plot. Adjust the size of the figure, labels, colors, and many other aspects of the plot to obtain a nicely designed figure. Explain your rationales for each choice.
In [27]:
# TODO: put your code here
First of all, think about various ways to store an image, which can be a beautiful scenary or a geometric shape. How can you efficiently store them in a computer? Consider pros and cons of different approaches. Which methods would work best for a photograph? Which methods would work best for a blueprint or a histogram?
There are two approaches. One is storing the color of each pixel as shown above. This assumes that each pixel in the image contains some information, which is true in the case of photographs. Obviously, in this case, you cannot zoom in more than the original resolution of the image (if you're not in the movie). Also if you just want to store some geometric shapes, you will be wasting a lot of space. This is called raster graphics.
In [ ]:
Another approach is using vector graphics, where you store the instructions to draw the image rather than the color values of each pixel. For instance, you can store "draw a circle with a radius of 5 at (100,100) with a red line" instead of storing all the red pixels corresponding to the circle. Compared to raster graphics, vector graphics won't lose quality when zooming in.
In [ ]:
Since a lot of data visualization tasks are about drawing geometric shapes, vector graphics is a common option. Most libraries allow you to save the figures in vector formats.
On the web, a common standard format is SVG. SVG stands for "Scalable Vector Graphics". Because it's really a list of instructions to draw figures, you can create one even using a basic text editor. What many web-based drawing libraries do is simply writeing down the instructions (SVG) into a webpage, so that a web browser can show the figure. The SVG format can be edited in many vector graphics software such as Adobe Illustrator and Inkscape. Although we rarely touch the SVG directly when we create data visualizations, I think it's very useful to understand what's going on under the hood. So let's get some intuitive understanding of SVG.
In [ ]:
You can put an SVG figure by simply inserting a <svg> tag in an HTML file. It tells the browser to reserve some space for a drawing. For example,
<svg width="200" height="200">
<circle cx="100" cy="100" r="22" fill="yellow" stroke="orange" stroke-width="5"/>
</svg>
This code creates a drawing space of 200x200 pixels. And then draw a circle of radius 22 at (100,100). The circle is filled with yellow color and stroked with 5-pixel wide orange line. That's pretty simple, isn't it? Place this code into an HTML file and open with your browser. Do you see this circle?
Another cool thing is that, because svg is an HTML tag, you can use CSS to change the styles of your shapes. You can adjust all kinds of styles using CSS:
<head>
<style>
.krypton_sun {
fill: red;
stroke: orange;
stroke-width: 10;
}
</style>
</head>
<body>
<svg width="500" height="500">
<circle cx="200" cy="200" r="50" class="krypton_sun"/>
</svg>
</body>
This code says "draw a circle with a radius 50 at (200, 200), with the style defined for krypton_sun". The style krypton_sun is defined with the <style> tag.
There are other shapes in SVG, such as ellipse, line, polygon (this can be used to create triangles), and path (for curved and other complex lines). You can even place text with advanced formatting inside an svg element.
Let's reproduce the symbol for the Deathly Hallows (as shown below) with SVG. It doesn't need to be a perfect duplication (an equilateral triangle, etc), just be visually as close as you can. What's the most efficient way of drawing this? Color it in the way you like. Upload this file to canvas.
In [ ]: