We will try to compute the probability of having a type of Strain $P(Y=y)$ given a feature vector X (i.e. a vector containing the Input resistance, sag ratio, etc...). We will use the “naive” assumption of independence between every pair of features.
Given a class variable $Y$ and a dependent feature vector $X_1$ through $X_n$, Bayes’ theorem states the following relationship:
$$P(Y | X_i) = \frac{P(Y) P(X_i|Y)}{P(X_i)}$$
In [1]:
    
%pylab inline
    
    
In [2]:
    
import pandas as pd
    
In [3]:
    
# first row contains units
df = pd.read_excel(io='../data/Cell_types.xlsx', sheetname='PFC',  skiprows=1)
del df['CellID'] # remove column with cell IDs
df.head() # show first elements
    
    Out[3]:
We use pandas to split up the matrix into the feature vectors we're interested in. We will also to convert textual category data (Strain, Gender) into an ordinal number that we can work with.
In [4]:
    
pd.Categorical(df.Strain).codes # CB57BL is zero, GAD67 is one
    
    Out[4]:
In [5]:
    
df['Gender'] = pd.Categorical(df.Gender).codes
df['Strain'] = pd.Categorical(df.Strain).codes
df.head()
    
    Out[5]:
In [6]:
    
df.shape # as with NumPy the number of rows first
    
    Out[6]:
In [7]:
    
df.iloc[[0]].values[0] # get a row as NumPy array
    
    Out[7]:
In [8]:
    
# create X and Y
Y = df['Strain'].values
del df['Strain'] # remove Strain
X = [ df.iloc[[i]].values[0] for i in range(df.shape[0]) ]
len(X)==len(Y)
    
    Out[8]:
In [9]:
    
X[0] # data from CB57BL
    
    Out[9]:
In [10]:
    
from sklearn.naive_bayes import GaussianNB
    
In [11]:
    
myclassifier = GaussianNB()
myclassifier.fit(X,Y)
    
    Out[11]:
In [12]:
    
df.iloc[[-2]].values # this is a GAD67 mice
    
    Out[12]:
In [13]:
    
df.iloc[[-1]].values # this is a GAD67 mice
    
    Out[13]:
We now test with the classifier with the training data
In [14]:
    
def predict(idx):
    if myclassifier.predict( X[idx]):
        print('Cell %2d is GAD67  mice'%idx)
    else:
        print('Cell %2d is CB57BL mice'%idx)
# test with training data (similar to myclassifier.score(X,Y) )
for i in range(df.shape[0]):
    predict(i)
    
    
We test with some fictitious data
In [15]:
    
d = np.array([[ -75.50, 98.25, 1.49, 24.75, 90. ,21.5, 24.5 ,60, 1, 85.95, -48.6,
              430.95, 0.5, 385.55]])
test_df = pd.DataFrame(d, columns=df.columns)
test_df
    
    Out[15]:
In [16]:
    
if myclassifier.predict( test_df.iloc[[0]].values[0]):
    print('Test is GAD67  mice')
else:
    print('Test is CB57BL mice')