This tutorial is designed to cover several concepts at once by reading in raw mesonet data and storing it in a format we can use to plot it up! To do this, we will:
- Open a file over the internet
- Loop over the file line by line
- Skip over the header information
- Split the data separated by white space into columns
- Store the data in a list
- Account for missing data
- Return the data
We will do this by making our data reader a generic function that we can use for any station in the Oklahoma Mesonet, and therefore will be able to plot any station that we desire! As always, we start by importing the libraries that we need. This time, we will be reading data over the internet, so we need a new library to do that...
In [1]:
## import our libraries
import numpy as np
import matplotlib.pyplot as plt
## this is the library used for parsing URLs
import urllib2
Next, we will create our function to read in the data. We are creating it as a function because we want to be able to reuse the same code in order to read in other mesonet stations.
Functions are declared with the following syntax:
def function_name( argument1, argument2, ... argumentN):
stuff_function_does
return stuff_function_does
A function can have any number of arguments passed to it, or none at all. In addition, it can return any number of data types and values, or none at all. We will pass the argument 'station' through to our function so that we can switch out the station name easily. Also, when coding in a function, Python needs to know where the function starts and ends. It knows this by the fact that code inside of a function is indented 1 tab to the right. Any code that is tabbed to the left is considered outside of the function!
In [18]:
def data_read( station ):
'''
This function will read in data from the Oklahoma Mesonet for any
station on the date of May 24, 2011. The date is hard-coded, but it is
relatively simple to do this for real-time data as well.
This function will fill missing values with NaN.
Parameters:
station: a string specifying the name of the mesonet station to download
Returns:
tmpc: a list of temperature values in degrees C starting at 00Z
wspd: a list of wind speed values in m/s starting at 00Z
wdsd: a list of wind speed standard deviation in m/s starting at 00Z
'''
## this string will holf the full URL to the file we are downloading, where station
## is the argument passed through to the function.
file_link = 'http://www.mesonet.org/data/public/mesonet/mts/2011/05/24/20110524' + station + '.mts'
## now we will open the file using urllib. I found this function by using google...
## http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
data_file = urllib2.urlopen(file_link)
## the readlines function will split the file on every newline character, '\n', and return
## a list with each line in it
lines = data_file.readlines()
## the total number of lines read in from the file
num_lines = len(lines)
## empty lists for data values
tmpc = []
wspd = []
wdsd = []
## loop through each line of the data... we need an index to keep
## track of where we are so that we know when we can ignore the
## header information in the file.
index = 0
for line in lines:
## Python indexes data starting at 0, meaning that the first
## element of a list is is list[0].
## The header is through index 2, so skip until then
if index <= 2:
## increment the index by 1, we are finished with this line
index += 1
## go to the next line of text
continue
## now we have successfully skipped the header information
## split the line on white space with a function called .split().
## Again, Google is your friend for these things...
## http://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python
line = line.split()
## the temperature is in the 5th column of the file, wind speed in the 6th,
## and wind speed standard deviation in the 9th. Remember though, data is indexed
## starting at 0!
temperature = line[4]
wind_speed = line[5]
wind_stdev = line[8]
## fill in missing data with NaN
## NaN stands for Not a Number... this is useful in order to prevent things such as
## mean values reporting somewhere in the -900s when missing data is present.
if temperature == '-999':
temperature = np.nan
elif wind_speed == '-999':
wind_speed = np.nan
elif wind_stdev == '-999':
wind_stdev = np.nan
## now we want to append our values to the empty lists we made, as well as convert
## the data values from strings to floating point numbers!
tmpc.append(float(temperature))
wspd.append(float(wind_speed))
wdsd.append(float(wind_stdev))
## once we are done with the line, increment the index
index += 1
## once we have looped through every line of data, return the lists we made!
return tmpc, wspd, wdsd
In [40]:
## this is an empty dictionary
data_dict = { }
stations = ['elre', 'nrmn', 'arne']
for station in stations:
data_dict[station] = data_read(station)[0]
In [44]:
for station in stations:
temp = data_dict[station]
plt.plot(temp, 'r-')
plt.show()
In [ ]: