Predict Objects from Sonar Data

About the Sonar Data Set

This is a standard data set available from the UCI Machine Learning Repository. The data set contains patterns obtained by bouncing sonar signals off metal cylinders (mines) and rocks at various angles. Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder).

Load the Data


In [1]:
from pandas import read_csv
filename = 'data/sonar-data.csv'
names = list('var'+str(x) for x in range(0,60))
names.append('output')
sonar_data = read_csv(filename, header=None, names=names)
print(sonar_data.shape)
display(sonar_data.head())


(208, 61)
var0 var1 var2 var3 var4 var5 var6 var7 var8 var9 ... var51 var52 var53 var54 var55 var56 var57 var58 var59 output
0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 0.2111 ... 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 0.2872 ... 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 0.6194 ... 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 0.1264 ... 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 0.4459 ... 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R

5 rows × 61 columns

Explore and Data Prep

There's probably not much we need to explore in the sonar data set, since the description of the data set (above) gives us a pretty good understanding. We are definitely interested in the distribution of classes (rock vs. mine).


In [2]:
sonar_data.groupby('output').size()


Out[2]:
output
M    111
R     97
dtype: int64

Let's split the data into inputs and output. The inputs are numeric and already standardized to have min=0.0 and max=1.0. But we do need to map the output (class attribute) from character to numeric.


In [161]:
values_array = sonar_data.values
x = values_array[:,0:60].astype(float)
labels = values_array[:,60]

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(labels)
print(list(le.classes_))
y = le.transform(labels)
print(y[0:10])


['M', 'R']
[1 1 1 1 1 1 1 1 1 1]

Predict Mine or Rock

Zero Rule Algorithm

The Zero Rule algorithm gives us a baseline against which we can compare our machine learning algorithms. We ignore all input data and simply predict the majority class which is M as we saw earlier.


In [144]:
zeroRuleAccuracy = len(sonar_data.loc[sonar_data.output=='M'])/len(sonar_data)
zeroRuleAccuracy


Out[144]:
0.5336538461538461

Classification Using Machine Learning


In [165]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

In [166]:
# scale the input
scaler = StandardScaler().fit(x)
rescaledX = scaler.transform(x)

In [159]:
# convenience function to evaluate different models
def evaluate_model(model, x, y):
    kfold = KFold(n_splits=10, random_state=7)
    results = cross_val_score(model, x, y, cv=kfold, scoring='accuracy')
    print("Accuracy: %.4f (%.4f)" % (results.mean(), results.std()))

In [169]:
# try logistic regression
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
evaluate_model(model, rescaledX, y)


Accuracy: 0.5781 (0.1553)

In [210]:
# try a perceptron
from sklearn.linear_model import Perceptron

model = Perceptron(penalty='l2', alpha=0.001, max_iter=500)
evaluate_model(model, rescaledX, y)


Accuracy: 0.6838 (0.1758)

68% accuracy is not a bad start, but I'm sure we can do better with more optimization!