build a neural network to predict the magnitude of an Earthquake given the date, time, Latitude, and Longitude as features. This is the dataset. Optimize at least 1 hyperparameter using Random Search. See this example for more information.

You can use any library you like, bonus points are given if you do this using only numpy.



In [235]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np



In [236]:

    
df = pd.read_csv("data/earthquake-database.csv")
print(df.shape)
df.head()









    



(23412, 21)






    Out[236]:






  
    
      
      Date
      Time
      Latitude
      Longitude
      Type
      Depth
      Depth Error
      Depth Seismic Stations
      Magnitude
      Magnitude Type
      ...
      Magnitude Seismic Stations
      Azimuthal Gap
      Horizontal Distance
      Horizontal Error
      Root Mean Square
      ID
      Source
      Location Source
      Magnitude Source
      Status
    
  
  
    
      0
      01/02/1965
      13:44:18
      19.246
      145.616
      Earthquake
      131.6
      NaN
      NaN
      6.0
      MW
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      ISCGEM860706
      ISCGEM
      ISCGEM
      ISCGEM
      Automatic
    
    
      1
      01/04/1965
      11:29:49
      1.863
      127.352
      Earthquake
      80.0
      NaN
      NaN
      5.8
      MW
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      ISCGEM860737
      ISCGEM
      ISCGEM
      ISCGEM
      Automatic
    
    
      2
      01/05/1965
      18:05:58
      -20.579
      -173.972
      Earthquake
      20.0
      NaN
      NaN
      6.2
      MW
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      ISCGEM860762
      ISCGEM
      ISCGEM
      ISCGEM
      Automatic
    
    
      3
      01/08/1965
      18:49:43
      -59.076
      -23.557
      Earthquake
      15.0
      NaN
      NaN
      5.8
      MW
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      ISCGEM860856
      ISCGEM
      ISCGEM
      ISCGEM
      Automatic
    
    
      4
      01/09/1965
      13:32:50
      11.938
      126.427
      Earthquake
      15.0
      NaN
      NaN
      5.8
      MW
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      ISCGEM860890
      ISCGEM
      ISCGEM
      ISCGEM
      Automatic
    
  

5 rows × 21 columns

We're using date, time, Latitude, and Longitude to predict the magnitude.



In [347]:

    
#prediction_cols = ["Date", "Time", "Latitude", "Longitude"]
# ignoring time for now
prediction_cols = ["Date", "Latitude", "Longitude"]
x = df[prediction_cols]
x.head()

y is the target for the prediction:



In [348]:

    
y = df["Magnitude"]
y.head()









    Out[348]:





0    6.0
1    5.8
2    6.2
3    5.8
4    5.8
Name: Magnitude, dtype: float64

We need to convert the input data into something better suited for prediction. The date and time are strings which doesn't work at all, and latitude and longitude could be normalized.

But first, check to see if the input data has any missing values:



In [350]:

    
x.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23412 entries, 0 to 23411
Data columns (total 3 columns):
Date         23412 non-null object
Latitude     23412 non-null float64
Longitude    23412 non-null float64
dtypes: float64(2), object(1)
memory usage: 548.8+ KB

There is a value in each of the rows, so moving ahead, first we change the date string into a pandas datetime



In [351]:

    
x.loc[:,'Date'] = x.loc[:,'Date'].apply(pd.to_datetime)
x.head()









    



/Users/ko/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:477: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s






    Out[351]:






  
    
      
      Date
      Latitude
      Longitude
    
  
  
    
      0
      1965-01-02
      19.246
      145.616
    
    
      1
      1965-01-04
      1.863
      127.352
    
    
      2
      1965-01-05
      -20.579
      -173.972
    
    
      3
      1965-01-08
      -59.076
      -23.557
    
    
      4
      1965-01-09
      11.938
      126.427



In [362]:

    
x.info()
x['Date'].items()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23412 entries, 0 to 23411
Data columns (total 3 columns):
Date         23412 non-null datetime64[ns]
Latitude     23412 non-null float64
Longitude    23412 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 548.8 KB






    Out[362]:





<zip at 0x121644488>

Now to normalize the data



In [359]:

    
# normalize the target y
y = (y - y.min()) / (y.max() - y.min())

now to start the prediction

First, splitting x into training and testing sets:



In [304]:

    
x_train = x[:20000]
y_train = y[:20000]

y_test = x[20000:]
y_test = y[20000:]

len(x_train), len(y_train), len(y_test), len(y_test)









    Out[304]:





(20000, 20000, 3412, 3412)



In [298]:

    
input_features = 3
output_features = 1
data_length = len(x_train)



In [295]:

    
weights = np.random.random([input_features, data_length])
weights.shape









    Out[295]:





(3, 23412)



In [323]:

    
# testing how to loop through the data
t = x[:10]
for a,b in t.iterrows():
    print(b[0], '|', b[1], '|', b[2])









    



1965-01-02 00:00:00 | 19.246 | 145.616
1965-01-04 00:00:00 | 1.8630000000000002 | 127.352
1965-01-05 00:00:00 | -20.579 | -173.972
1965-01-08 00:00:00 | -59.076 | -23.557
1965-01-09 00:00:00 | 11.937999999999999 | 126.427
1965-01-10 00:00:00 | -13.405 | 166.62900000000002
1965-01-12 00:00:00 | 27.357 | 87.867
1965-01-15 00:00:00 | -13.309000000000001 | 166.21200000000002
1965-01-16 00:00:00 | -56.452 | -27.043000000000003
1965-01-17 00:00:00 | -24.563000000000002 | 178.487



In [ ]:

	Date	Time	Latitude	Longitude	Type	Depth	Depth Error	Depth Seismic Stations	Magnitude	Magnitude Type	...	Magnitude Seismic Stations	Azimuthal Gap	Horizontal Distance	Horizontal Error	Root Mean Square	ID	Source	Location Source	Magnitude Source	Status
0	01/02/1965	13:44:18	19.246	145.616	Earthquake	131.6	NaN	NaN	6.0	MW	...	NaN	NaN	NaN	NaN	NaN	ISCGEM860706	ISCGEM	ISCGEM	ISCGEM	Automatic
1	01/04/1965	11:29:49	1.863	127.352	Earthquake	80.0	NaN	NaN	5.8	MW	...	NaN	NaN	NaN	NaN	NaN	ISCGEM860737	ISCGEM	ISCGEM	ISCGEM	Automatic
2	01/05/1965	18:05:58	-20.579	-173.972	Earthquake	20.0	NaN	NaN	6.2	MW	...	NaN	NaN	NaN	NaN	NaN	ISCGEM860762	ISCGEM	ISCGEM	ISCGEM	Automatic
3	01/08/1965	18:49:43	-59.076	-23.557	Earthquake	15.0	NaN	NaN	5.8	MW	...	NaN	NaN	NaN	NaN	NaN	ISCGEM860856	ISCGEM	ISCGEM	ISCGEM	Automatic
4	01/09/1965	13:32:50	11.938	126.427	Earthquake	15.0	NaN	NaN	5.8	MW	...	NaN	NaN	NaN	NaN	NaN	ISCGEM860890	ISCGEM	ISCGEM	ISCGEM	Automatic

	Date	Latitude	Longitude
0	1965-01-02	19.246	145.616
1	1965-01-04	1.863	127.352
2	1965-01-05	-20.579	-173.972
3	1965-01-08	-59.076	-23.557
4	1965-01-09	11.938	126.427