©2018 Raazesh Sainudiin. Attribution 4.0 International (CC BY 4.0)
Go to https://quakesearch.geonet.org.nz/ and download data on NZ earthquakes.
In my attempt above to zoom out to include both islands of New Zealand (NZ) and get one year of data using the Last Year
button choice from this site:
Search
box gave the following URLs for downloading data. I used the DOWNLOAD
button to get my own data in Outpur Format CSV
as chosen earlier.https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-06-01&enddate=2018-05-17T14:00:00 https://quakesearch.geonet.org.nz/csv?bbox=163.52051,-49.23912,182.19727,-32.36140&startdate=2017-5-17T13:00:00&enddate=2017-06-01
Try to DOWNLOAD
your own CSV
data and store it in a file named my_earthquakes.csv
(NOTE: rename the file when you download so you don't replace the file earthquakes.csv
!) inside the folder named data
that is inside the same directory that this notebook is in.
In [1]:
%%sh
# print working directory
pwd
In [2]:
%%sh
ls # list contents of working directory
In [3]:
%%sh
# after download you should have the following file in directory named data
ls data
In [4]:
%%sh
# first three lines
head -3 data/earthquakes.csv
In [5]:
%%sh
# last three lines
tail -3 data/earthquakes.csv
In [6]:
%%sh
# number of lines in the file; menmonic from `man wc` is wc = word-count option=-l is for lines
wc -l data/earthquakes.csv
In [7]:
%%sh
man wc
In [8]:
with open("data/earthquakes.csv") as f:
reader = f.read()
dataList = reader.split('\n')
In [9]:
len(dataList)
Out[9]:
In [10]:
dataList[0]
Out[10]:
In [11]:
myDataAccumulatorList =[]
for data in dataList[1:-2]:
dataRow = data.split(',')
myData = [dataRow[4],dataRow[5],dataRow[6]]#,dataRow[7]]
myFloatData = tuple([float(x) for x in myData])
myDataAccumulatorList.append(myFloatData)
In [13]:
points(myDataAccumulatorList)
Recall that a statistic is any measureable function of the data: $T(x): \mathbb{X} \rightarrow \mathbb{T}$.
Thus, a statistic $T$ is also an RV that takes values in the space $\mathbb{T}$.
When $x \in \mathbb{X}$ is the observed data, $T(x)=t$ is the observed statistic of the observed data $x$.
Let's go back to our New Zealand lotto data.
We showed that for New Zealand lotto (40 balls in the machine, numbered $1, 2, \ldots, 40$), the number on the first ball out of the machine can be modelled as a de Moivre$(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$ RV.
We have the New Zealand Lotto results for 1114 draws, from 1 August 1987 to 10 November 2008 (retrieved from the NZ lotto web site: http://lotto.nzpages.co.nz/previousresults.html ).
We can think of this data as $x$, the realisation of a random vector $X = (X_1, X_2,\ldots, X_{1114})$ where $X_1, X_2,\ldots, X_{1114} \overset{IID}{\thicksim} \text{de Moivre}(\frac{1}{40}, \frac{1}{40}, \ldots, \frac{1}{40})$
The data space is every possible sequence of ball numbers that we could have got in these 1114 draws. $\mathbb{X} = \{1, 2, \ldots, 40\}^{1114}$. There are $40^{1114}$ possible sequences and our data is just one of these $40^{1114}$ possible points in the data space.
We will use our hidden function that enables us to get the ball one data in a list. Evaluate the cell below to get the data and confirm that we have data for 1114 draws.
In [ ]: