Further to Sympy, this lab sheet will give a brief overview of a number of popular Python libraries that can be used to tackle a number of different problems:
This lab sheet will not good a detailed overview of each library but aim to give a brief introduction to each.
A video describing the library.
This is a fundamental library for efficient numeric computation. The building block is the numpy.array
which can also be used to carry out linear algebraic manipulation.
Let us import numpy
and define two 3 by 3 matrices:
In [1]:
import numpy as np
A = np.array([[5, 1, -1], [-1, 2, 4], [1, 1, 1]])
B = np.array([[1, 2, 0], [-4, 2, 2], [1, 3, 1]])
We can access $A_{02}$ for example:
In [2]:
A[0, 2]
Out[2]:
Or an entire row of $A$:
In [3]:
A[0]
Out[3]:
Or an entire column:
In [4]:
A[:,1]
Out[4]:
We can do scalar multiplication:
In [5]:
5 * A
Out[5]:
We can raise the matrix to a high power:
In [6]:
np.linalg.matrix_power(A, 5)
Out[6]:
We can carry out matrix addition:
In [7]:
A + B
Out[7]:
But also more complex things like matrix multiplication:
In [8]:
np.dot(A, B)
Out[8]:
Recent versions of Python have @
as shorthand for matrix multiplication:
In [9]:
A @ B
Out[9]:
We can also get the inverse and the determinant of a $A$:
In [10]:
np.linalg.inv(A), np.linalg.det(A)
Out[10]:
Numpy also has numerous helpful functions that can be useful even if we are not doing any linear algebra. For example let us get an array of 100 values between -2
and 1
:
In [11]:
np.linspace(-2, 1, 10 ** 2)
Out[11]:
A video describing the library.
This is the most popular Python library for creating plots. We will illustrate first by creating a plot of the function:
$$ -x ^ 4 + 9x ^ 2 + 4 x - 12 $$We do this by creating a set of $x$ values and computing the corresponding set of $f(x)$ values:
In [12]:
def f(x):
return - x ** 4 + 9 * x ** 2 + 4 * x - 12
xs = np.linspace(-4, 5, 10 ** 3) # Create 1000 points
ys = [f(x) for x in xs]
We are now ready to import and use matplotlib:
In [13]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure()
plt.plot(xs, ys)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.show()
Note the %matplotlib inline
command which is a special Jupyter command to ensure plots are displayed in this notebook.
We can also save that image to a file that can be used elsewhere:
In [14]:
plt.figure()
plt.plot(xs, ys)
plt.xlabel("$x$")
plt.ylabel("$y$")
plt.savefig("plot.pdf") # experiment with `.png`, `.jpg` etc...
Note that there are numerous other types of plots that can be obtained, for example here is a histogram of 1000 random values generated by numpy:
In [15]:
np.random.seed(0) # Setting a seed
values = np.random.normal(size=10 ** 3)
plt.figure()
plt.hist(values, bins=20)
plt.xlabel("$Value$")
plt.ylabel("Frequency")
plt.show()
A video describing the library.
Scipy is a library bringing together many different capabilities. One thing it can do is find roots using numeric approximation techniques:
In [16]:
from scipy import optimize
In [17]:
optimize.root(f, x0=0)
Out[17]:
In [18]:
f(3)
Out[18]:
It can also minimize functions (of more than one variable):
In [19]:
def g(x):
"""Here we assume g is a function of 2 variables and x is a vector"""
return np.cos(x[1]) / (1 + x[0])
In [20]:
optimize.minimize(g, x0=[0, 0])
Out[20]:
We can also fit a function to data, here we will create some data that fits a curve, add some noise and try and recover the function:
In [20]:
def func(x, a, b):
return a * x ** 2 + b
xs = np.linspace(0, 10, 50)
a_value, b_value = 1 / 2, 70
np.random.seed(0)
ys = [func(x, a=a_value, b=b_value) + 2 * np.random.random() for x in xs]
Let us take a look at the data:
In [21]:
plt.figure()
plt.scatter(xs, ys)
plt.show()
In [22]:
popt, pcov = optimize.curve_fit(func, xs, ys)
We recover the original used values of a
and b
:
In [23]:
popt
Out[23]:
In [24]:
fitted_ys = [func(x, *popt) for x in xs]
plt.scatter(xs, ys, label="Original data")
plt.plot(xs, fitted_ys, label="Fitted data", color="red")
plt.show()
A video describing the library.
Networkx is used to handle graph theoretic objects. We can create a matrix in a number of ways, one of the simplest is by passing a set of edges. Here we import the library and create a graph object:
In [25]:
import networkx as nx
edges = [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3)]
G = nx.Graph(edges)
G
Out[25]:
It is possible to obtain the adjacency matrix of the graph:
In [26]:
M = nx.to_numpy_array(G)
M
Out[26]:
It is also possible to draw the graph:
In [27]:
plt.figure()
nx.draw(G)
We can also compute a graph coloring:
In [28]:
coloring = nx.greedy_color(G)
coloring
Out[28]:
We see that the chromatic number of this graph is 3:
In [29]:
len(set(coloring.values()))
Out[29]:
A video describing the library.
Pandas is Python's main library for data manipulation. As an example let us consider this dataset about extra marrital afairs: affairs.csv
(download). This is data from the following research paper:
Fair, Ray C. "A theory of extramarital affairs." Journal of Political Economy 86.1 (1978): 45-61.
A pdf can be found here: https://fairmodel.econ.yale.edu/rayfair/pdf/1978A200.PDF
Let us import pandas and read the dataset in:
In [30]:
import pandas as pd
df = pd.read_csv("affairs.csv")
We can look at the top of the data set:
In [31]:
df.head()
Out[31]:
From the paper we read that the variables are:
sex
: The reported gender of the individual;age
: The age of the individual;ym
: Number of years married;child
: Whether or not the individual has a child;religious
: How religious the individual is (5
: "very", 1
: "not");education
: Level of education (9
: grade school, 20
: PhD or MD);occupation
: Occupation based on a scalle called the "Hollingshead classification";rate
: Individual rating of marriage (5
: "very happy", 1
: "very unhappy").We can obtain a quick overview of the data using the describe()
method:
In [32]:
df.describe()
Out[32]:
We can also choose to slice our data set in specific ways, for example how is the mean number of affairs related to the rating of the marriage and whether or not the individual is male or female?
In [33]:
df.groupby(["sex", "rate"])["nbaffairs"].mean()
Out[33]:
We can also choose to slice our data manually, for example let us only look at the data set for males:
In [34]:
df[df["sex"] == "male"]
Out[34]:
We can combine this with matplotlib to get a plot of the mean number of affairs based on rating by gender:
In [35]:
plt.figure()
for sex in ["male", "female"]:
plt.scatter(range(1, 5 + 1), df[df["sex"] == sex].groupby("rate")["nbaffairs"].mean(), label=sex)
plt.legend()
plt.xlabel("rate")
plt.ylabel("mean nbaffairs")
plt.show()
A video describing the library.
The scikit learn library is a popular library for machine learning. As an example, let us aim to train a model that will predict whether or not someone will have an affair using the data above.
First let us create a new variable male
and also change the child
variable to be a boolean and store this data in a new dataframe called X
which will be used to predict a new variable a new variable cheat
(stored in y
):
In [36]:
df["bool_child"] = df["child"] == "yes"
df["male"] = df["sex"] == "male"
X = df[["age", "ym", "religious", "education", "rate", "occupation", "bool_child", "male"]]
y = df["nbaffairs"] > 0
Now let us import the specifit classifier for the algorithm we want to use:
In [37]:
from sklearn.ensemble import RandomForestClassifier
In [38]:
seed = 3
clf = RandomForestClassifier(random_state=seed)
clf.fit(X, y)
Out[38]:
In [39]:
for feature, importance in zip(X.columns, clf.feature_importances_):
print(feature, round(importance, 5))
The most important feature seems to be the age
of the individual. Let us use our trained model to predict whether or not some given individual is likely to have an affair over their lifetime:
In [40]:
ages = range(35, 100)
probability_of_affair = []
for age in ages:
ym = age - 24
vince_knight = [[age, ym, 1, 20, 5, 3, True, True]]
probability_of_affair.append(clf.predict_proba(vince_knight)[0][0])
plt.figure()
plt.plot(ages, probability_of_affair)
plt.xlabel("Vince's age")
plt.ylabel("Vince's probability of having an affair")
plt.ylim(0, 1)
plt.show()
This is a brief overview of the type of thing each library can do, depending on what you want to do be sure to explore them fully and there are many other libraries in the Python ecosystem.