Scipy Examples

First we add the usual scientific computing modules with the typical abbreviations, including sp for scipy. We could invoke scipy's statistical package as sp.stats, but for the sake of laziness we abbreviate that too.



In [1]:

    
import numpy as np # import numpy
import scipy as sp # import scipy

from scipy import stats # refer directly to stats rather than sp.stats
import matplotlib as mpl # for visualization
from matplotlib import pyplot as plt # refer directly to pyplot 
                                     # rather than mpl.pyplot
# for exporting PDF's we need to import PdfPages
from matplotlib.backends.backend_pdf import PdfPages

Now we create some random data to play with. We generate 100 samples from a Gaussian distribution centered at zero.



In [2]:

    
s = sp.randn(100)

How many elements are in the set?



In [3]:

    
print ('There are',len(s),'elements in the set')









    



There are 100 elements in the set

What is the mean (average) of the set?



In [4]:

    
print ('The mean of the set is',s.mean())









    



The mean of the set is 0.0733915957305

What is the minimum of the set?



In [5]:

    
print ('The minimum of the set is',s.min())









    



The minimum of the set is -2.43251103557

What is the maximum of the set?



In [6]:

    
print ('The maximum of the set is',s.max())









    



The maximum of the set is 2.04262779759

We can use the scipy functions too. What's the median?



In [7]:

    
print ('The median of the set is',sp.median(s))









    



The median of the set is 0.0924289417472

What about the standard deviation and variance?



In [8]:

    
print ('The standard deviation is',sp.std(s),
       'and the variance is',sp.var(s))









    



The standard deviation is 0.940023468383 and the variance is 0.88364412111

Isn't the variance the square of the standard deviation?



In [9]:

    
print ('The square of the standard deviation is',sp.std(s)**2)









    



The square of the standard deviation is 0.88364412111

How close are the measures? The differences are close as the following calculation shows



In [10]:

    
print ('The difference is',abs(sp.std(s)**2 - sp.var(s)))









    



The difference is 1.11022302463e-16



In [11]:

    
print ('And in decimal form, the difference is %0.16f' % 
       (abs(sp.std(s)**2 - sp.var(s))))









    



And in decimal form, the difference is 0.0000000000000001

How does this look as a histogram?



In [12]:

    
plt.hist(s) # yes, one line of code for a histogram
plt.show()

Let's add some titles.



In [13]:

    
plt.clf() # clear out the previous plot

plt.hist(s)
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")

plt.show()

Typically we do not include titles when we prepare images for inclusion in LaTeX. There we use the caption to describe what the figure is about.



In [14]:

    
plt.clf() # clear out the previous plot

plt.hist(s)
plt.xlabel("Value")
plt.ylabel("Frequency")

plt.show()

Let's try out some linear regression, or curve fitting.



In [15]:

    
import random

def F(x):
    return 2*x - 2

def add_noise(x):
    return x + random.uniform(-1,1)



In [16]:

    
X = range(0,10,1)

Y = []
for i in range(len(X)):
    Y.append(add_noise(X[i]))
    
plt.clf() # clear out the old figure
plt.plot(X,Y,'.')
plt.show()

Now let's try linear regression to fit the curve.



In [17]:

    
m, b, r, p, est_std_err = stats.linregress(X,Y)

What is the slope and y-intercept of the fitted curve?



In [18]:

    
print ('The slope is',m,'and the y-intercept is', b)









    



The slope is 0.993412071404 and the y-intercept is 0.160346622317



In [19]:

    
def Fprime(x): # the fitted curve
    return m*x + b

Now let's see how well the curve fits the data. We'll call the fitted curve F'.



In [20]:

    
X = range(0,10,1)

Yprime = []
for i in range(len(X)):
    Yprime.append(Fprime(X[i]))



In [21]:

    
plt.clf() # clear out the old figure

# the observed points, blue dots
plt.plot(X, Y, '.', label='observed points') 
         
# the interpolated curve, connected red line
plt.plot(X, Yprime, 'r-', label='estimated points')  

plt.title("Linear Regression Example") # title
plt.xlabel("x") # horizontal axis title
plt.ylabel("y") # vertical axis title
# legend labels to plot
plt.legend(['obsered points', 'estimated points'])









    Out[21]:





<matplotlib.legend.Legend at 0x1088b61d0>

To save the figure for inclusion in LaTeX as PDF



In [22]:

    
plt.savefig("regression.pdf", bbox_inches='tight')

To save it as png you can use the following, but you need to use the PDF format if you like to include it in a document.



In [23]:

    
plt.savefig('regression.png')



In [24]:

    
plt.show()



In [ ]:

References

Prasanth. "Simple statistics with SciPy." Comfort at 1 AU. Created February 28, 2011. Available online: https://oneau.wordpress.com/2011/02/28/simple-statistics-with-scipy/. Last accessed October 26, 2017.
SciPy Cookbook. Lasted updated: 2015. Available online: http://scipy-cookbook.readthedocs.io/. Last accessed: October 26, 2017.