Hosted at https://github.com/liuyigh/PyHRM
Please read a very nice introduction provided by Kapa BioSystems to understand, prepare and troubleshoot
http://www.kapabiosystems.com/document/introduction-high-resolution-melt-analysis-guide/
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Go to GitHub to view the format of data file. Basically, it's a CSV file exported from CFX manager data analysis melting curve data "RFU" table (Right click to export to CSV). Open with Excel, delet the empty column. First column "Temperature", each of following columns represent a sample well.
In [2]:
df = pd.read_csv('Sample-HRM-p50-genotyping.csv')
plt.plot(df[[0]],df.ix[:,1:])
plt.show()
Based on the plot above, select a range of temperature.
In [3]:
df_melt=df.ix[(df.iloc[:,0]>75) & (df.iloc[:,0]<88)]
df_data=df_melt.ix[:,1:]
plt.plot(df_melt[[0]],df_data)
plt.show()
In [4]:
df_norm= (df_data - df_data.min()) / (df_data.max()-df_data.min())*100
plt.plot(df_melt[[0]],df_norm)
plt.show()
In [5]:
dfdif = df_norm.sub(df_norm['J14'],axis=0)
plt.plot(df_melt[[0]],dfdif)
plt.show()
Use KMeans module from SciKit-Learn to cluster your sample into three groups (WT, KO, HET). Be careful, your samples may have less than three groups. So always check the diff plots first.
In [6]:
import sklearn.cluster as sc
from IPython.display import display
In [7]:
mat = dfdif.T.as_matrix()
hc = sc.KMeans(n_clusters=3)
hc.fit(mat)
labels = hc.labels_
results = pd.DataFrame([dfdif.T.index,labels])
display(results.ix[:0,results.ix[1]==0])
display(results.ix[:0,results.ix[1]==1])
display(results.ix[:0,results.ix[1]==2])
My controls are
So you can identify your genotyping results by looking at: to which control they cluster.