First we add the usual scientific computing modules with the typical abbreviations, including sp for scipy. We could invoke scipy's statistical package as sp.stats, but for the sake of laziness we abbreviate that too.
In [1]:
import numpy as np # import numpy
import scipy as sp # import scipy
from scipy import stats # refer directly to stats rather than sp.stats
import matplotlib as mpl # for visualization
from matplotlib import pyplot as plt # refer directly to pyplot
# rather than mpl.pyplot
# for exporting PDF's we need to import PdfPages
from matplotlib.backends.backend_pdf import PdfPages
Now we create some random data to play with. We generate 100 samples from a Gaussian distribution centered at zero.
In [2]:
s = sp.randn(100)
How many elements are in the set?
In [3]:
print ('There are',len(s),'elements in the set')
What is the mean (average) of the set?
In [4]:
print ('The mean of the set is',s.mean())
What is the minimum of the set?
In [5]:
print ('The minimum of the set is',s.min())
What is the maximum of the set?
In [6]:
print ('The maximum of the set is',s.max())
We can use the scipy functions too. What's the median?
In [7]:
print ('The median of the set is',sp.median(s))
What about the standard deviation and variance?
In [8]:
print ('The standard deviation is',sp.std(s),
'and the variance is',sp.var(s))
Isn't the variance the square of the standard deviation?
In [9]:
print ('The square of the standard deviation is',sp.std(s)**2)
How close are the measures? The differences are close as the following calculation shows
In [10]:
print ('The difference is',abs(sp.std(s)**2 - sp.var(s)))
In [11]:
print ('And in decimal form, the difference is %0.16f' %
(abs(sp.std(s)**2 - sp.var(s))))
How does this look as a histogram?
In [12]:
plt.hist(s) # yes, one line of code for a histogram
plt.show()
Let's add some titles.
In [13]:
plt.clf() # clear out the previous plot
plt.hist(s)
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Typically we do not include titles when we prepare images for inclusion in LaTeX. There we use the caption to describe what the figure is about.
In [14]:
plt.clf() # clear out the previous plot
plt.hist(s)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
Let's try out some linear regression, or curve fitting.
In [15]:
import random
def F(x):
return 2*x - 2
def add_noise(x):
return x + random.uniform(-1,1)
In [16]:
X = range(0,10,1)
Y = []
for i in range(len(X)):
Y.append(add_noise(X[i]))
plt.clf() # clear out the old figure
plt.plot(X,Y,'.')
plt.show()
Now let's try linear regression to fit the curve.
In [17]:
m, b, r, p, est_std_err = stats.linregress(X,Y)
What is the slope and y-intercept of the fitted curve?
In [18]:
print ('The slope is',m,'and the y-intercept is', b)
In [19]:
def Fprime(x): # the fitted curve
return m*x + b
Now let's see how well the curve fits the data. We'll call the fitted curve F'.
In [20]:
X = range(0,10,1)
Yprime = []
for i in range(len(X)):
Yprime.append(Fprime(X[i]))
In [21]:
plt.clf() # clear out the old figure
# the observed points, blue dots
plt.plot(X, Y, '.', label='observed points')
# the interpolated curve, connected red line
plt.plot(X, Yprime, 'r-', label='estimated points')
plt.title("Linear Regression Example") # title
plt.xlabel("x") # horizontal axis title
plt.ylabel("y") # vertical axis title
# legend labels to plot
plt.legend(['obsered points', 'estimated points'])
Out[21]:
To save the figure for inclusion in LaTeX as PDF
In [22]:
plt.savefig("regression.pdf", bbox_inches='tight')
To save it as png you can use the following, but you need to use the PDF format if you like to include it in a document.
In [23]:
plt.savefig('regression.png')
In [24]:
plt.show()
In [ ]: