In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pylab as plt

In [2]:
import scipy.optimize
import pandas as pd

What does the following code do?


In [4]:
d = pd.read_csv("data/dataset_0.csv")
plt.plot(d.x,d.y,'o')


Out[4]:
[<matplotlib.lines.Line2D at 0x117ae6e48>]

What does the following code do?


In [5]:
def linear(x,a,b):
    return a + b*x

Defines a linear function of x with the slope a and the intercept b.

What does the following code do?

  • What is the param business?
  • What is the y bit?

In [6]:
def linear(x,a,b):
    return a + b*x

def linear_r(param,x,y):
    return linear(x,param[0],param[1]) - y
  • Defines a residuals function for linear that compares the output of linear to whatever's in y.
  • params holds a and b in a list.

What does the following code do?


In [6]:
def linear_r(param,x,y):                     # copied from previous cell
    return linear(x,param[0],param[1]) - y   # copied from previous cell

param_guesses = [0,0]
fit = scipy.optimize.least_squares(linear_r,param_guesses,
                                   args=(d.x,d.y))
fit_a = fit.x[0]
fit_b = fit.x[1]
sum_of_square_residuals = fit.cost
print(fit.x)


[1.13006323 0.31994236]
  • Uses least_squares regression to find values of a and b that minimize linear_r given d.x and d.y.
  • The fit parameters end up in fit_a and fit_b
  • The $SSR$ is stored in sum_of_square_residuals

What does the following code do?

  • What the heck is linspace?
  • What are we plotting?

In [8]:
x_range = np.linspace(np.min(d.x),np.max(d.x),100)

plt.plot(d.x,d.y,"o")
plt.plot(x_range,linear(x_range,fit_a,fit_b))
plt.plot(x_range,linear(x_range,0,0))


Out[8]:
[<matplotlib.lines.Line2D at 0x118eabd30>]

Put together


In [12]:
def linear(x,a,b):
    """Linear model of x using a (slope) and b (intercept)"""
    return a + b*x

def linear_r(param,x,y):
    """Residuals function for linear"""
    return linear(x,param[0],param[1]) - y

# Read data
d = pd.read_csv("data/dataset_0.csv")
plt.plot(d.x,d.y,'o')

# Perform regression
param_guesses = [1,1]
fit = scipy.optimize.least_squares(linear_r,param_guesses,args=(d.x,d.y))
fit_a = fit.x[0]
fit_b = fit.x[1]
sum_of_square_residuals = fit.cost

# Plot result
x_range = np.linspace(np.min(d.x),np.max(d.x),100)
plt.plot(x_range,linear(x_range,fit_a,fit_b))
print(fit.cost)


4.261679630541931

For your assigned model:

  • Write a function and residuals function
  • Estimate the parameters of the model
  • Plot the data and the model on the same graph
  • Write the SSR and number of parameters on the board
  • If you finish early: plot the residuals and decide if you like your model
  • If you're still waiting: try to figure out which model best fits dataset_1.csv

$y = a \Big ( \frac{bx}{1 + bx} \Big )$
$y = a \Big ( \frac{bx^{c}}{1 + bx^{c}} \Big )$
$y = a(1 - e^{-bx})$
$y = a + bx^{2} + cx^{3}$
$y = a + bx^{2} + cx^{3} + dx^{4}$
$y = asin(bx + c)$
$y = aln(x + b)$
$y = aln(bx + c)$

In [17]:
def binding(x,a,b):
    """binding equation with b"""
    
    return a*(b*x/(1 + b*x))
    

def binding_r(param,x,y):
    """Residuals function for binding"""
    return binding(x,param[0],param[1]) - y

# Read data
d = pd.read_csv("data/dataset_0.csv")
plt.plot(d.x,d.y,'o')

# Perform regression
param_guesses = [5,0.3]
fit = scipy.optimize.least_squares(binding_r,param_guesses,args=(d.x,d.y))
fit_a = fit.x[0]
fit_b = fit.x[1]
sum_of_square_residuals = fit.cost

# Plot result
x_range = np.linspace(np.min(d.x),np.max(d.x),100)
plt.plot(x_range,binding(x_range,fit_a,fit_b))

print(fit)


 active_mask: array([0., 0.])
        cost: 1.9072136232474737
         fun: array([-0.02038712, -0.48263314,  0.08941157, -0.25232949, -0.36369907,
        0.71703862, -0.08743779,  0.39544063, -0.13417045,  0.22549019,
        0.0652992 , -0.40824685,  0.22458456, -0.00828376, -0.20232713,
        0.07960503,  0.06256882, -0.17741506,  0.38291825, -0.51799545,
        0.06920244,  0.06649557, -0.13772723, -0.06033942,  0.2910477 ,
       -0.09597391,  0.02570988, -0.76140705,  0.3877464 , -0.24929162,
       -0.45408894,  0.38237281,  0.33212765, -0.15604666,  0.08244776,
        0.59261592,  0.01735885, -0.25825004,  0.23259382, -0.09311806,
       -0.13274634])
        grad: array([-3.47540366e-09, -2.10284715e-06])
         jac: array([[0.        , 0.        ],
       [0.06734527, 1.12555666],
       [0.12619209, 1.97600254],
       [0.17805366, 2.62261021],
       [0.22410403, 3.11596479],
       [0.26526812, 3.49263601],
       [0.30228446, 3.77949345],
       [0.33574988, 3.99656503],
       [0.36615194, 4.15897089],
       [0.39389285, 4.2782571 ],
       [0.41930737, 4.36333126],
       [0.44267636, 4.42112884],
       [0.46423723, 4.45709479],
       [0.484192  , 4.47553599],
       [0.5027137 , 4.47988215],
       [0.51995134, 4.47288093],
       [0.53603399, 4.45674583],
       [0.55107397, 4.43326873],
       [0.56516948, 4.40390652],
       [0.57840679, 4.36984828],
       [0.59086196, 4.332068  ],
       [0.60260232, 4.29136544],
       [0.61368769, 4.24839935],
       [0.62417139, 4.20371306],
       [0.63410111, 4.15775579],
       [0.64351963, 4.11089882],
       [0.65246545, 4.06344959],
       [0.66097326, 4.01566258],
       [0.66907448, 3.96774814],
       [0.67679757, 3.91988003],
       [0.6841684 , 3.87220159],
       [0.69121053, 3.82483047],
       [0.69794548, 3.77786306],
       [0.70439291, 3.73137754],
       [0.71057084, 3.68543738],
       [0.71649583, 3.64009303],
       [0.72218311, 3.59538448],
       [0.72764669, 3.5513427 ],
       [0.73289953, 3.50799102],
       [0.73795357, 3.4653464 ],
       [0.7428199 , 3.42342058]])
     message: '`ftol` termination condition is satisfied.'
        nfev: 4
        njev: 4
  optimality: 2.102847148965914e-06
      status: 2
     success: True
           x: array([5.17589617, 0.28883257])

In [ ]:


In [19]:
def model(a,b,x):
    return a*(b*x/(1 + b*x))

#Residuals function
def model_r(param,x,y):
    return model(param[0],param[1],x) - y

# Read data
d = pd.read_csv("data/dataset_0.csv")
plt.plot(d.x,d.y,'o')

# Perform regression
param_guesses = [5,0.3]
fit = scipy.optimize.least_squares(model_r,param_guesses,args=(d.x,d.y))
fit_a = fit.x[0]
fit_b = fit.x[1]
sum_of_square_residuals = fit.cost

# Plot result
x_range = np.linspace(np.min(d.x),np.max(d.x),100)
plt.plot(x_range,model(x_range,fit_a,fit_b))

print(fit.cost)


1.9072136232474737

In [ ]: