This is a standard data set available from the UCI Machine Learning Repository. The data set contains patterns obtained by bouncing sonar signals off metal cylinders (mines) and rocks at various angles. Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder).
In [1]:
from pandas import read_csv
filename = 'data/sonar-data.csv'
names = list('var'+str(x) for x in range(0,60))
names.append('output')
sonar_data = read_csv(filename, header=None, names=names)
print(sonar_data.shape)
display(sonar_data.head())
In [2]:
sonar_data.groupby('output').size()
Out[2]:
Let's split the data into inputs and output. The inputs are numeric and already standardized to have min=0.0 and max=1.0. But we do need to map the output (class attribute) from character to numeric.
In [161]:
values_array = sonar_data.values
x = values_array[:,0:60].astype(float)
labels = values_array[:,60]
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(labels)
print(list(le.classes_))
y = le.transform(labels)
print(y[0:10])
In [144]:
zeroRuleAccuracy = len(sonar_data.loc[sonar_data.output=='M'])/len(sonar_data)
zeroRuleAccuracy
Out[144]:
In [165]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
In [166]:
# scale the input
scaler = StandardScaler().fit(x)
rescaledX = scaler.transform(x)
In [159]:
# convenience function to evaluate different models
def evaluate_model(model, x, y):
kfold = KFold(n_splits=10, random_state=7)
results = cross_val_score(model, x, y, cv=kfold, scoring='accuracy')
print("Accuracy: %.4f (%.4f)" % (results.mean(), results.std()))
In [169]:
# try logistic regression
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
evaluate_model(model, rescaledX, y)
In [210]:
# try a perceptron
from sklearn.linear_model import Perceptron
model = Perceptron(penalty='l2', alpha=0.001, max_iter=500)
evaluate_model(model, rescaledX, y)
68% accuracy is not a bad start, but I'm sure we can do better with more optimization!