Here's a python notebook demonstrating how to read in and plot an ROI (Region of Interest) summary using python. In this case I'm using the 1-day summary file from the alligatorriver site. The summary files are in CSV format and can be read directly from the site using a URL. Before reading from a URL let's make sure we can read directly from a file.
In [1]:
%matplotlib inline
import os, sys
import numpy as np
import matplotlib
import pandas as pd
import requests
import StringIO
# set matplotlib style
matplotlib.style.use('ggplot')
sitename = 'alligatorriver'
roiname = 'DB_0001'
infile = "{}_{}_1day.csv".format(sitename, roiname)
print infile
In [2]:
%%bash
head -30 alligatorriver_DB_0001_1day.csv
While the data can be read directly from a URL we'll start by doing the simple thing of reading the CSV file directly from our local disk.
In [3]:
with open(infile,'r') as fd:
df = pd.read_csv(fd, comment='#', parse_dates=[0])
df.head()
Out[3]:
In [4]:
df.plot('date', ['gcc_90'], figsize=(16,4),
grid=True, style=['g'] )
Out[4]:
That was pretty simple. Now try to read directly from a URL to see if we get the same result. This has the advantage that you always get the latest version of the file which is updated nightly.
In [5]:
url = "https://phenocam.sr.unh.edu/data/archive/{}/ROI/{}_{}_1day.csv"
url = url.format(sitename, sitename, roiname)
print url
Use the requests
package to read the CSV file from the URL.
In [6]:
response = requests.get(url)
fd = StringIO.StringIO(response.text)
df = pd.read_csv(fd, comment='#', parse_dates=[0])
fd.close
df[0:5]
Out[6]:
If necessary we'll need to convert nodata values.
In [7]:
df[df['gcc_90'] == -9999.].gcc_90 = np.nan
In [8]:
df.plot('date', ['gcc_90'], figsize=(16,4),
grid=True, style=['g'] )
Out[8]:
We can look at other columns and also filter the data in a variety of ways. Recently we had a site where the number of images varied a lot over time. Let's look at how consistent the number of images for the alligator river site. The image_count reflects our brightness threshold which will eliminate images in the winter time when the days are shorter. But there are a number of other ways the image count can be reduced. The ability reliably extract a 90^th precentile value is dependent on the number of images available for a particular summary period.
In [9]:
df.plot('date','image_count', figsize=(16,4), style='b')
Out[9]:
One possibility would be to filter the data for summary periods which had at least 10 images.
In [10]:
df10 = df[df['image_count'] >= 10]
df10.plot('date', ['gcc_90'], figsize=(16,4),
grid=True, style=['g'] )
Out[10]:
This looks a little cleaner especially for the 2013 season.
In [ ]: