FLUORESCENCE BINDING ASSAY ANALYSIS

Experiment date: 2015/09/22

Protein: HSA

Fluorescent ligand : dansyl glycine(lig2)

Xml parsing parts adopted from Sonya's assaytools/examples/fluorescence-binding-assay/Src-gefitinib fluorescence simple.ipynb


In [1]:
import numpy as np
import matplotlib.pyplot as plt
from lxml import etree
import pandas as pd
import os
import matplotlib.cm as cm 
import seaborn as sns
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [2]:
# Get read and position data of each fluorescence reading section
def get_wells_from_section(path):
    reads = path.xpath("*/Well")
    wellIDs = [read.attrib['Pos'] for read in reads]

    data = [(float(s.text), r.attrib['Pos'])
         for r in reads
         for s in r]

    datalist = {
      well : value
      for (value, well) in data
    }
    
    welllist = [
                [
                 datalist[chr(64 + row) + str(col)]          
                 if chr(64 + row) + str(col) in datalist else None
                 for row in range(1,9)
                ]
                for col in range(1,13)
                ]
                
    return welllist

In [5]:
file_lig="MI_FLU_hsa_lig2_20150922_164254.xml"
file_name = os.path.splitext(file_lig1)[0]
label = file_name[0:25]
print label


MI_FLU_hsa_lig2_20150922_

In [6]:
root = etree.parse(file_lig)

#find data sections
Sections = root.xpath("/*/Section")
much = len(Sections)
print "****The xml file " + file_lig + " has %s data sections:****" % much
for sect in Sections:
    print sect.attrib['Name']


****The xml file MI_FLU_hsa_lig2_20150922_164254.xml has 3 data sections:****
ex350_em480_topRead
ex350_em480_bottomRead
Abs_600

In [7]:
#Work with topread
TopRead = root.xpath("/*/Section")[0]
welllist = get_wells_from_section(TopRead)

df_topread = pd.DataFrame(welllist, columns = ['A - HSA','B - Buffer','C - HSA','D - Buffer', 'E - HSA','F - Buffer','G - HSA','H - Buffer'])
df_topread.transpose()


Out[7]:
0 1 2 3 4 5 6 7 8 9 10 11
A - HSA 1302 1087 846 635 509 369 270 172 134 113 105 93
B - Buffer 531 329 200 149 130 115 110 108 108 135 113 116
C - HSA 1395 1248 1000 779 573 408 271 185 140 116 105 110
D - Buffer 573 342 210 151 118 117 110 114 113 109 131 108
E - HSA 1423 1247 991 770 564 407 262 179 134 114 114 96
F - Buffer 529 318 195 144 126 113 110 110 109 112 111 114
G - HSA 1357 1258 999 771 583 406 252 180 137 113 121 96
H - Buffer 554 326 201 336 117 113 105 110 112 112 107 115

In [10]:
# To generate cvs file
# df_topread.transpose().to_csv(label + Sections[0].attrib['Name']+ ".csv")

Calculating Molar Fluorescence (MF) of Free Ligand

1. Maximum likelihood curve-fitting

Find the maximum likelihood estimate, $\theta^*$, i.e. the curve that minimizes the squared error $\theta^* = \text{argmin} \sum_i |y_i - f_\theta(x_i)|^2$ (assuming i.i.d. Gaussian noise)

Y = MF*L + BKG

Y: Fluorescence read (Flu unit)

L: Total ligand concentration (uM)

BKG: background fluorescence without ligand (Flu unit)

MF: molar fluorescence of free ligand (Flu unit/ uM)


In [8]:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
%matplotlib inline

def model(x,slope,intercept):
    ''' 1D linear model in the format scipy.optimize.curve_fit expects: '''
    return x*slope + intercept

# generate some data
#X = np.random.rand(1000)
#true_slope=1.0
#true_intercept=0.0
#noise = np.random.randn(len(X))*0.1
#Y = model(X,slope=true_slope,intercept=true_intercept) + noise

#ligand titration
lig2=np.array([200.0000,86.6000,37.5000,16.2000,7.0200, 3.0400, 1.3200, 0.5700, 0.2470, 0.1070, 0.0462, 0.0200])
lig2


Out[8]:
array([  2.00000000e+02,   8.66000000e+01,   3.75000000e+01,
         1.62000000e+01,   7.02000000e+00,   3.04000000e+00,
         1.32000000e+00,   5.70000000e-01,   2.47000000e-01,
         1.07000000e-01,   4.62000000e-02,   2.00000000e-02])

In [9]:
# Since I have 4 replicates
L=np.concatenate((lig2, lig2, lig2, lig2))
len(L)


Out[9]:
48

In [36]:
# Fluorescence read
df_topread.loc[:,("B - Buffer", "D - Buffer", "F - Buffer", "H - Buffer")]


Out[36]:
B - Buffer D - Buffer F - Buffer H - Buffer
0 540 572 557 651
1 306 311 310 364
2 184 182 183 210
3 122 130 121 139
4 95 98 99 113
5 96 86 87 107
6 84 80 82 96
7 78 82 82 98
8 83 85 79 93
9 80 78 80 89
10 81 81 80 97
11 84 80 81 93

In [10]:
B=df_topread.loc[:,("B - Buffer")]
D=df_topread.loc[:,("D - Buffer")]
F=df_topread.loc[:,("F - Buffer")]
H=df_topread.loc[:,("H - Buffer")]

Y = np.concatenate((B.as_matrix(),D.as_matrix(),F.as_matrix(),H.as_matrix()))

In [11]:
(MF,BKG),_ = optimize.curve_fit(model,L,Y)
print('MF: {0:.3f}, BKG: {1:.3f}'.format(MF,BKG))
print('y = {0:.3f} * L + {1:.3f}'.format(MF, BKG))


MF: 2.207, BKG: 117.187
y = 2.207 * L + 117.187

Curve-fitting to binding saturation curve

Fluorescence intensity vs added ligand

LR= ((X+Rtot+KD)-SQRT((X+Rtot+KD)^2-4XRtot))/2

L= X - LR

Y= BKG + MFL + FRMF*LR

Constants

Rtot: receptor concentration (uM)

BKG: background fluorescence without ligand (Flu unit)

MF: molar fluorescence of free ligand (Flu unit/ uM)

Parameters to fit

Kd: dissociation constant (uM)

FR: Molar fluorescence ratio of complex to free ligand (unitless) complex flurescence = FRMFLR

Experimental data

Y: fluorescence measurement X: total ligand concentration L: free ligand concentration


In [12]:
def model2(x,kd,fr):
    ''' 1D linear model in the format scipy.optimize.curve_fit expects: '''
    # lr =((x+rtot+kd)-((x+rtot+kd)**2-4*x*rtot)**(1/2))/2
    # y = bkg + mf*(x - lr) + fr*mf*lr
    bkg = 86.2
    mf = 2.517
    rtot = 0.5
    return bkg + mf*(x - ((x+rtot+kd)-((x+rtot+kd)**2-4*x*rtot)**(1/2))/2) + fr*mf*(((x+rtot+kd)-((x+rtot+kd)**2-4*x*rtot)**(1/2))/2)

In [13]:
# Total HSA concentration (uM)
Rtot = 0.5
#Total ligand titration
X = L
len(X)


Out[13]:
48

In [14]:
# Fluorescence read
df_topread.loc[:,("A - HSA", "C - HSA", "E - HSA", "G - HSA")]


Out[14]:
A - HSA C - HSA E - HSA G - HSA
0 1302 1395 1423 1357
1 1087 1248 1247 1258
2 846 1000 991 999
3 635 779 770 771
4 509 573 564 583
5 369 408 407 406
6 270 271 262 252
7 172 185 179 180
8 134 140 134 137
9 113 116 114 113
10 105 105 114 121
11 93 110 96 96

In [15]:
A=df_topread.loc[:,("A - HSA")]
C=df_topread.loc[:,("C - HSA")]
E=df_topread.loc[:,("E - HSA")]
G=df_topread.loc[:,("G - HSA")]

Y = np.concatenate((A.as_matrix(),C.as_matrix(),E.as_matrix(),G.as_matrix()))
len(Y)


Out[15]:
48

In [16]:
(Kd,FR),_ = optimize.curve_fit(model2, X, Y, p0=(5,1))

print('Kd: {0:.3f}, Fr: {1:.3f}'.format(Kd,FR))


Kd: 59.663, Fr: 4.150

In [ ]: