Importing data from LOBO-Buoy server


Objectives:

* Download data from LOBO server
* (if needed) transform data into a format suitable for plotting
* Plot time vs variable (e.g. temperature)
* Turn your code into function for future (easier) use
* Add docstrings and other relevant comments and documentation

Other relevant information:

time to work...


Getting the data

  • From the "LOBOviz" webpage, you can select a variable (e.g. temperature) and a date range sometime during the summer.
  • Click "Plot data"
  • Right-click on the button "Download tab separtaed values" and select "Copy link address"
  • In python (i.e. Notebook or Spyder), "paste" the URL address in a variable

In [55]:
# URL quering the LOBO server for data (in this case, temperature data)
URL = 'http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date=20090610&max_date=20090706&y=temperature'

NOTE: You can see in the URL that the dates are included (i.e. min_date=20090610 and max_date=20090706) as well as the variable to query (i.e. y=temperature).

Now lets use "pandas" to actually dowload the data...


In [56]:
import pandas as pd

# Import data from LOBO server
data = pd.read_csv(URL,sep='\t')

Exploring the data

Lets see what is in "data"...


In [57]:
data[:10]


Out[57]:
date [AST] temperature [C]
0 2009-06-10 00:00:00 10.26
1 2009-06-10 01:00:00 10.29
2 2009-06-10 02:00:00 10.19
3 2009-06-10 03:00:00 10.12
4 2009-06-10 04:00:00 10.08
5 2009-06-10 05:00:00 10.05
6 2009-06-10 06:00:00 9.86
7 2009-06-10 07:00:00 9.63
8 2009-06-10 08:00:00 9.77
9 2009-06-10 09:00:00 9.66

Now lets disect "data" a bit... lets find the title (or "keys") of its collumns...


In [58]:
data.keys()


Out[58]:
Index([u'date [AST]', u'temperature [C]'], dtype='object')

Lets see what is in the temperature column...


In [59]:
data['temperature [C]'][:10]


Out[59]:
0    10.26
1    10.29
2    10.19
3    10.12
4    10.08
5    10.05
6     9.86
7     9.63
8     9.77
9     9.66
Name: temperature [C], dtype: float64

...and in the "date" column...


In [60]:
data['date [AST]'][:10]


Out[60]:
0    2009-06-10 00:00:00
1    2009-06-10 01:00:00
2    2009-06-10 02:00:00
3    2009-06-10 03:00:00
4    2009-06-10 04:00:00
5    2009-06-10 05:00:00
6    2009-06-10 06:00:00
7    2009-06-10 07:00:00
8    2009-06-10 08:00:00
9    2009-06-10 09:00:00
Name: date [AST], dtype: object

...and the indices


In [61]:
data.index


Out[61]:
RangeIndex(start=0, stop=647, step=1)

Plotting

Ok! Now lets start with a quick plot...

...but first a quick trick. You need to execute %matplotlib inline so that your plots appear in the Notebook.


In [62]:
%matplotlib inline

Ok, now lets do the plot...


In [63]:
#import matplotlib.pyplot as plt

# ...just a quick plot
data.plot();


If we change the indices from "numbers" to "DatetimeIndex", then we can plot nicer!


In [64]:
# Change indices to DatetimeIndex objects
data.index = pd.DatetimeIndex(data['date [AST]'])

# Now that we made "date indices" we can drop the "date" column
data = data.drop('date [AST]',axis=1)

Lets see what happened....


In [65]:
data.index


Out[65]:
DatetimeIndex(['2009-06-10 00:00:00', '2009-06-10 01:00:00',
               '2009-06-10 02:00:00', '2009-06-10 03:00:00',
               '2009-06-10 04:00:00', '2009-06-10 05:00:00',
               '2009-06-10 06:00:00', '2009-06-10 07:00:00',
               '2009-06-10 08:00:00', '2009-06-10 09:00:00',
               ...
               '2009-07-06 14:00:00', '2009-07-06 15:00:00',
               '2009-07-06 16:00:00', '2009-07-06 17:00:00',
               '2009-07-06 18:00:00', '2009-07-06 19:00:00',
               '2009-07-06 20:00:00', '2009-07-06 21:00:00',
               '2009-07-06 22:00:00', '2009-07-06 23:00:00'],
              dtype='datetime64[ns]', length=647, freq=None)

Lets plot again...


In [66]:
data.plot()


Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0xa2e9ef0>

Lets make it prettier!


Putting it all together

Ok. NOw it is the time to put it all together... Note that I replaced the 'min_date' and 'max_date' arguments in the url for a couple of variables, so that we can easily change dates.


In [67]:
import pandas as pd
import matplotlib.pyplot as plt

# Start and End dates
mindate = '20090610'
maxdate = '20090706'

# URL quering the LOBO server for data (in this case, temperature data)
URL = 'http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date='+mindate+'&max_date='+maxdate+'&y=temperature'

# Import data from LOBO server
data = pd.read_csv(URL,sep='\t')

# Change indices to DatetimeIndex objects
data.index = pd.DatetimeIndex(data['date [AST]'])

# Now that we made "date indices" we can drop the "date" column
data = data.drop('date [AST]',axis=1)

# ...a bit fancier plot
data.plot(style='-r',legend=False)
plt.title('Temperature from LOBO (Halifax, Canada)')
plt.ylabel('Temperature (oC)')
plt.xlabel('Dates')


Out[67]:
<matplotlib.text.Text at 0x9752f28>

Making a function

Now it is time to make our code into a function, note that:

  • At the begining, add: def NameOfFunction(arguments):
  • At the end, add: return Output
  • Indent contents
  • NOTE that we inserted mindate and maxdate in the URL string!!!

In [68]:
def load_temp(mindate,maxdate): 
    import pandas as pd
    import matplotlib.pyplot as plt

    # URL quering the LOBO server for data (in this case, temperature data)
    URL = 'http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date='+mindate+'&max_date='+maxdate+'&y=temperature'

    # Import data from LOBO server
    data = pd.read_csv(URL,sep='\t')

    # Change indices to DatetimeIndex objects
    data.index = pd.DatetimeIndex(data['date [AST]'])
    
    # Now that we made "date indices" we can drop the "date" column
    data = data.drop('date [AST]',axis=1)

    # ...a bit fancier plot
    data.plot(style='-r',legend=False)
    plt.title('Temperature from LOBO (Halifax, Canada)')
    plt.ylabel('Temperature (oC)')
    plt.xlabel('Dates')
    plt.show()
    return data

Lets take our new function for a spin...


In [73]:
mydata = load_temp('20090616','20101017')

print mydata[:20]


                     temperature [C]
2009-06-16 00:00:00            11.61
2009-06-16 01:00:00            11.59
2009-06-16 02:00:00            11.55
2009-06-16 03:00:00            11.30
2009-06-16 04:00:00            11.55
2009-06-16 05:00:00            11.57
2009-06-16 06:00:00            11.25
2009-06-16 07:00:00            11.63
2009-06-16 08:00:00            12.10
2009-06-16 09:00:00            12.06
2009-06-16 10:00:00            12.42
2009-06-16 11:00:00            12.69
2009-06-16 12:00:00            12.89
2009-06-16 13:00:00            13.02
2009-06-16 14:00:00            13.10
2009-06-16 15:00:00            13.07
2009-06-16 16:00:00            13.07
2009-06-16 17:00:00            12.98
2009-06-16 18:00:00            13.09
2009-06-16 19:00:00            12.83

Some final touch-ups, like Documentation and a karg to prevent plotting...


In [70]:
def load_temp(mindate,maxdate,plot=True):
    '''
    Downloads data from LOBO and makes a plot.
    
    Arguments:
        mindate (string): Start date
        maxdate (string): End date
        plot (boolean): If True, it produces a plot (default=True)
    
    Returns: 
        DataFrame and a plot (if `plot` argument = True)
    '''
    import pandas as pd
    import matplotlib.pyplot as plt

    # URL quering the LOBO server for data (in this case, temperature data)
    URL = 'http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date='+mindate+'&max_date='+maxdate+'&y=temperature'

    # Import data from LOBO server
    data = pd.read_csv(URL,sep='\t')

    # Change indices to DatetimeIndex objects
    data.index = pd.DatetimeIndex(data['date [AST]'])

    # Now that we made "date indices" we can drop the "date" column
    data = data.drop('date [AST]',axis=1)
    
    if plot==True:
        # ...a bit fancier plot
        data.plot(style='-r',legend=False)
        plt.title('Temperature from LOBO (Halifax, Canada)')
        plt.ylabel('Temperature (oC)')
        plt.xlabel('Dates')
        plt.show()
        
    return data

Lets see if we can see the documentation...


In [71]:
load_temp?

One last spin without making a plot:


In [72]:
mydata = load_temp('20090610','20090706',plot=False)

print mydata[:10]


                     temperature [C]
2009-06-10 00:00:00            10.26
2009-06-10 01:00:00            10.29
2009-06-10 02:00:00            10.19
2009-06-10 03:00:00            10.12
2009-06-10 04:00:00            10.08
2009-06-10 05:00:00            10.05
2009-06-10 06:00:00             9.86
2009-06-10 07:00:00             9.63
2009-06-10 08:00:00             9.77
2009-06-10 09:00:00             9.66

Last thing... create a new file (in this directory) called lobo_YourName.py .... copy-paste your new function, so we can use it later.