Predict Objects from Sonar Data

About the Sonar Data Set

This is a standard data set available from the UCI Machine Learning Repository. The data set contains patterns obtained by bouncing sonar signals off metal cylinders (mines) and rocks at various angles. Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder).

Load the Data



In [1]:

    
from pandas import read_csv
filename = 'data/sonar-data.csv'
names = list('var'+str(x) for x in range(0,60))
names.append('output')
sonar_data = read_csv(filename, header=None, names=names)
print(sonar_data.shape)
display(sonar_data.head())









    



(208, 61)






    







  
    
      
      var0
      var1
      var2
      var3
      var4
      var5
      var6
      var7
      var8
      var9
      ...
      var51
      var52
      var53
      var54
      var55
      var56
      var57
      var58
      var59
      output
    
  
  
    
      0
      0.0200
      0.0371
      0.0428
      0.0207
      0.0954
      0.0986
      0.1539
      0.1601
      0.3109
      0.2111
      ...
      0.0027
      0.0065
      0.0159
      0.0072
      0.0167
      0.0180
      0.0084
      0.0090
      0.0032
      R
    
    
      1
      0.0453
      0.0523
      0.0843
      0.0689
      0.1183
      0.2583
      0.2156
      0.3481
      0.3337
      0.2872
      ...
      0.0084
      0.0089
      0.0048
      0.0094
      0.0191
      0.0140
      0.0049
      0.0052
      0.0044
      R
    
    
      2
      0.0262
      0.0582
      0.1099
      0.1083
      0.0974
      0.2280
      0.2431
      0.3771
      0.5598
      0.6194
      ...
      0.0232
      0.0166
      0.0095
      0.0180
      0.0244
      0.0316
      0.0164
      0.0095
      0.0078
      R
    
    
      3
      0.0100
      0.0171
      0.0623
      0.0205
      0.0205
      0.0368
      0.1098
      0.1276
      0.0598
      0.1264
      ...
      0.0121
      0.0036
      0.0150
      0.0085
      0.0073
      0.0050
      0.0044
      0.0040
      0.0117
      R
    
    
      4
      0.0762
      0.0666
      0.0481
      0.0394
      0.0590
      0.0649
      0.1209
      0.2467
      0.3564
      0.4459
      ...
      0.0031
      0.0054
      0.0105
      0.0110
      0.0015
      0.0072
      0.0048
      0.0107
      0.0094
      R
    
  

5 rows × 61 columns

Explore and Data Prep

There's probably not much we need to explore in the sonar data set, since the description of the data set (above) gives us a pretty good understanding. We are definitely interested in the distribution of classes (rock vs. mine).



In [2]:

    
sonar_data.groupby('output').size()









    Out[2]:





output
M    111
R     97
dtype: int64

Let's split the data into inputs and output. The inputs are numeric and already standardized to have min=0.0 and max=1.0. But we do need to map the output (class attribute) from character to numeric.



In [161]:

    
values_array = sonar_data.values
x = values_array[:,0:60].astype(float)
labels = values_array[:,60]

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(labels)
print(list(le.classes_))
y = le.transform(labels)
print(y[0:10])









    



['M', 'R']
[1 1 1 1 1 1 1 1 1 1]

Predict Mine or Rock

Zero Rule Algorithm

The Zero Rule algorithm gives us a baseline against which we can compare our machine learning algorithms. We ignore all input data and simply predict the majority class which is M as we saw earlier.



In [144]:

    
zeroRuleAccuracy = len(sonar_data.loc[sonar_data.output=='M'])/len(sonar_data)
zeroRuleAccuracy









    Out[144]:





0.5336538461538461

Classification Using Machine Learning



In [165]:

    
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler



In [166]:

    
# scale the input
scaler = StandardScaler().fit(x)
rescaledX = scaler.transform(x)



In [159]:

    
# convenience function to evaluate different models
def evaluate_model(model, x, y):
    kfold = KFold(n_splits=10, random_state=7)
    results = cross_val_score(model, x, y, cv=kfold, scoring='accuracy')
    print("Accuracy: %.4f (%.4f)" % (results.mean(), results.std()))



In [169]:

    
# try logistic regression
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
evaluate_model(model, rescaledX, y)









    



Accuracy: 0.5781 (0.1553)



In [210]:

    
# try a perceptron
from sklearn.linear_model import Perceptron

model = Perceptron(penalty='l2', alpha=0.001, max_iter=500)
evaluate_model(model, rescaledX, y)









    



Accuracy: 0.6838 (0.1758)

68% accuracy is not a bad start, but I'm sure we can do better with more optimization!

	var0	var1	var2	var3	var4	var5	var6	var7	var8	var9	...	var51	var52	var53	var54	var55	var56	var57	var58	var59	output
0	0.0200	0.0371	0.0428	0.0207	0.0954	0.0986	0.1539	0.1601	0.3109	0.2111	...	0.0027	0.0065	0.0159	0.0072	0.0167	0.0180	0.0084	0.0090	0.0032	R
1	0.0453	0.0523	0.0843	0.0689	0.1183	0.2583	0.2156	0.3481	0.3337	0.2872	...	0.0084	0.0089	0.0048	0.0094	0.0191	0.0140	0.0049	0.0052	0.0044	R
2	0.0262	0.0582	0.1099	0.1083	0.0974	0.2280	0.2431	0.3771	0.5598	0.6194	...	0.0232	0.0166	0.0095	0.0180	0.0244	0.0316	0.0164	0.0095	0.0078	R
3	0.0100	0.0171	0.0623	0.0205	0.0205	0.0368	0.1098	0.1276	0.0598	0.1264	...	0.0121	0.0036	0.0150	0.0085	0.0073	0.0050	0.0044	0.0040	0.0117	R
4	0.0762	0.0666	0.0481	0.0394	0.0590	0.0649	0.1209	0.2467	0.3564	0.4459	...	0.0031	0.0054	0.0105	0.0110	0.0015	0.0072	0.0048	0.0107	0.0094	R