build a neural network to predict the magnitude of an Earthquake given the date, time, Latitude, and Longitude as features. This is the dataset. Optimize at least 1 hyperparameter using Random Search. See this example for more information.

You can use any library you like, bonus points are given if you do this using only numpy.


In [235]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [236]:
df = pd.read_csv("data/earthquake-database.csv")
print(df.shape)
df.head()


(23412, 21)
Out[236]:
Date Time Latitude Longitude Type Depth Depth Error Depth Seismic Stations Magnitude Magnitude Type ... Magnitude Seismic Stations Azimuthal Gap Horizontal Distance Horizontal Error Root Mean Square ID Source Location Source Magnitude Source Status
0 01/02/1965 13:44:18 19.246 145.616 Earthquake 131.6 NaN NaN 6.0 MW ... NaN NaN NaN NaN NaN ISCGEM860706 ISCGEM ISCGEM ISCGEM Automatic
1 01/04/1965 11:29:49 1.863 127.352 Earthquake 80.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860737 ISCGEM ISCGEM ISCGEM Automatic
2 01/05/1965 18:05:58 -20.579 -173.972 Earthquake 20.0 NaN NaN 6.2 MW ... NaN NaN NaN NaN NaN ISCGEM860762 ISCGEM ISCGEM ISCGEM Automatic
3 01/08/1965 18:49:43 -59.076 -23.557 Earthquake 15.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860856 ISCGEM ISCGEM ISCGEM Automatic
4 01/09/1965 13:32:50 11.938 126.427 Earthquake 15.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860890 ISCGEM ISCGEM ISCGEM Automatic

5 rows × 21 columns

We're using date, time, Latitude, and Longitude to predict the magnitude.


In [347]:
#prediction_cols = ["Date", "Time", "Latitude", "Longitude"]
# ignoring time for now
prediction_cols = ["Date", "Latitude", "Longitude"]
x = df[prediction_cols]
x.head()


Out[347]:
Date Latitude Longitude
0 01/02/1965 19.246 145.616
1 01/04/1965 1.863 127.352
2 01/05/1965 -20.579 -173.972
3 01/08/1965 -59.076 -23.557
4 01/09/1965 11.938 126.427

y is the target for the prediction:


In [348]:
y = df["Magnitude"]
y.head()


Out[348]:
0    6.0
1    5.8
2    6.2
3    5.8
4    5.8
Name: Magnitude, dtype: float64

We need to convert the input data into something better suited for prediction. The date and time are strings which doesn't work at all, and latitude and longitude could be normalized.

But first, check to see if the input data has any missing values:


In [350]:
x.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23412 entries, 0 to 23411
Data columns (total 3 columns):
Date         23412 non-null object
Latitude     23412 non-null float64
Longitude    23412 non-null float64
dtypes: float64(2), object(1)
memory usage: 548.8+ KB

There is a value in each of the rows, so moving ahead, first we change the date string into a pandas datetime


In [351]:
x.loc[:,'Date'] = x.loc[:,'Date'].apply(pd.to_datetime)
x.head()


/Users/ko/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:477: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
Out[351]:
Date Latitude Longitude
0 1965-01-02 19.246 145.616
1 1965-01-04 1.863 127.352
2 1965-01-05 -20.579 -173.972
3 1965-01-08 -59.076 -23.557
4 1965-01-09 11.938 126.427

In [362]:
x.info()
x['Date'].items()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23412 entries, 0 to 23411
Data columns (total 3 columns):
Date         23412 non-null datetime64[ns]
Latitude     23412 non-null float64
Longitude    23412 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 548.8 KB
Out[362]:
<zip at 0x121644488>

Now to normalize the data


In [359]:
# normalize the target y
y = (y - y.min()) / (y.max() - y.min())

now to start the prediction

First, splitting x into training and testing sets:


In [304]:
x_train = x[:20000]
y_train = y[:20000]

y_test = x[20000:]
y_test = y[20000:]

len(x_train), len(y_train), len(y_test), len(y_test)


Out[304]:
(20000, 20000, 3412, 3412)

In [298]:
input_features = 3
output_features = 1
data_length = len(x_train)

In [295]:
weights = np.random.random([input_features, data_length])
weights.shape


Out[295]:
(3, 23412)

In [323]:
# testing how to loop through the data
t = x[:10]
for a,b in t.iterrows():
    print(b[0], '|', b[1], '|', b[2])


1965-01-02 00:00:00 | 19.246 | 145.616
1965-01-04 00:00:00 | 1.8630000000000002 | 127.352
1965-01-05 00:00:00 | -20.579 | -173.972
1965-01-08 00:00:00 | -59.076 | -23.557
1965-01-09 00:00:00 | 11.937999999999999 | 126.427
1965-01-10 00:00:00 | -13.405 | 166.62900000000002
1965-01-12 00:00:00 | 27.357 | 87.867
1965-01-15 00:00:00 | -13.309000000000001 | 166.21200000000002
1965-01-16 00:00:00 | -56.452 | -27.043000000000003
1965-01-17 00:00:00 | -24.563000000000002 | 178.487

In [ ]: