PhenoCam ROI Summary Files

Here's a python notebook demonstrating how to read in and plot an ROI (Region of Interest) summary using python. In this case I'm using the 1-day summary file from the alligatorriver site. The summary files are in CSV format and can be read directly from the site using a URL. Before reading from a URL let's make sure we can read directly from a file.


In [1]:
%matplotlib inline

import os, sys
import numpy as np
import matplotlib
import pandas as pd
import requests
import StringIO

# set matplotlib  style
matplotlib.style.use('ggplot')

sitename = 'alligatorriver'
roiname = 'DB_0001'
infile = "{}_{}_1day.csv".format(sitename, roiname)
print infile


alligatorriver_DB_0001_1day.csv

In [2]:
%%bash
head -30 alligatorriver_DB_0001_1day.csv


#
# 1-day summary product time series for alligatorriver
#
# Site: alligatorriver
# Veg Type: DB
# ROI ID Number: 0001
# Lat: 35.7879
# Lon: -75.9038
# Elev: 1
# UTC Offset: -5
# Image Count Threshold: 1
# Aggregation Period: 1
# Solar Elevation Min: 5.0
# Time of Day Min: 00:00:00
# Time of Day Max: 23:59:59
# ROI Brightness Min: 100
# ROI Brightness Max: 665
# Creation Date: 2016-10-13
# Creation Time: 11:12:18
# Update Date: 2016-10-13
# Update Time: 11:12:21
#
date,year,doy,image_count,midday_filename,midday_r,midday_g,midday_b,midday_gcc,midday_rcc,r_mean,r_std,g_mean,g_std,b_mean,b_std,gcc_mean,gcc_std,gcc_50,gcc_75,gcc_90,rcc_mean,rcc_std,rcc_50,rcc_75,rcc_90,max_solar_elev,snow_flag,outlierflag_gcc_mean,outlierflag_gcc_50,outlierflag_gcc_75,outlierflag_gcc_90
2012-05-03,2012,124,2,alligatorriver_2012_05_03_120110.jpg,106.30031,115.73730,55.34694,0.41724,0.38322,106.39446,0.09415,115.55613,0.18118,55.95246,0.60553,0.41581,0.00143,0.41581,0.41653,0.41696,0.38285,0.00038,0.38285,0.38304,0.38315,70.15241,NA,NA,NA,NA,NA
2012-05-04,2012,125,0,None,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
2012-05-05,2012,126,0,None,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
2012-05-06,2012,127,18,alligatorriver_2012_05_06_120108.jpg,98.56315,114.36571,58.60785,0.42118,0.36298,99.56063,3.59571,112.30546,2.26816,57.38592,3.95619,0.41716,0.00405,0.41683,0.42104,0.42216,0.36983,0.01198,0.37316,0.37640,0.37951,71.00152,NA,NA,NA,NA,NA
2012-05-07,2012,128,22,alligatorriver_2012_05_07_120109.jpg,104.66830,114.42699,57.99294,0.41296,0.37774,101.61201,3.42694,109.60297,5.09935,56.93328,3.30283,0.40870,0.00760,0.41157,0.41206,0.41258,0.37906,0.00392,0.37819,0.38044,0.38412,71.27531,NA,NA,NA,NA,NA
2012-05-08,2012,129,19,alligatorriver_2012_05_08_120109.jpg,109.42668,117.97895,61.57828,0.40825,0.37866,103.80104,5.38061,111.59026,7.34960,58.74146,6.34041,0.40709,0.00670,0.40976,0.41134,0.41300,0.37905,0.00911,0.38013,0.38524,0.38747,71.54443,NA,NA,NA,NA,NA
2012-05-09,2012,130,21,alligatorriver_2012_05_09_120109.jpg,109.50769,117.67200,57.08726,0.41395,0.38523,100.64798,5.84903,112.13978,5.14872,57.89002,4.89616,0.41436,0.00287,0.41424,0.41590,0.41889,0.37191,0.01344,0.37685,0.38239,0.38647,71.80876,NA,NA,NA,NA,NA

While the data can be read directly from a URL we'll start by doing the simple thing of reading the CSV file directly from our local disk.


In [3]:
with open(infile,'r') as fd:
    df = pd.read_csv(fd, comment='#', parse_dates=[0])

df.head()


Out[3]:
date year doy image_count midday_filename midday_r midday_g midday_b midday_gcc midday_rcc ... rcc_std rcc_50 rcc_75 rcc_90 max_solar_elev snow_flag outlierflag_gcc_mean outlierflag_gcc_50 outlierflag_gcc_75 outlierflag_gcc_90
0 2012-05-03 2012 124 2 alligatorriver_2012_05_03_120110.jpg 106.30031 115.73730 55.34694 0.41724 0.38322 ... 0.00038 0.38285 0.38304 0.38315 70.15241 NaN NaN NaN NaN NaN
1 2012-05-04 2012 125 0 None NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2012-05-05 2012 126 0 None NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2012-05-06 2012 127 18 alligatorriver_2012_05_06_120108.jpg 98.56315 114.36571 58.60785 0.42118 0.36298 ... 0.01198 0.37316 0.37640 0.37951 71.00152 NaN NaN NaN NaN NaN
4 2012-05-07 2012 128 22 alligatorriver_2012_05_07_120109.jpg 104.66830 114.42699 57.99294 0.41296 0.37774 ... 0.00392 0.37819 0.38044 0.38412 71.27531 NaN NaN NaN NaN NaN

5 rows × 32 columns


In [4]:
df.plot('date', ['gcc_90'], figsize=(16,4),
        grid=True, style=['g'] )


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x108dd6a10>

That was pretty simple. Now try to read directly from a URL to see if we get the same result. This has the advantage that you always get the latest version of the file which is updated nightly.


In [5]:
url = "https://phenocam.sr.unh.edu/data/archive/{}/ROI/{}_{}_1day.csv"
url = url.format(sitename, sitename, roiname)
print url


https://phenocam.sr.unh.edu/data/archive/alligatorriver/ROI/alligatorriver_DB_0001_1day.csv

Use the requests package to read the CSV file from the URL.


In [6]:
response = requests.get(url)
fd = StringIO.StringIO(response.text)
df = pd.read_csv(fd, comment='#', parse_dates=[0])
fd.close
df[0:5]


Out[6]:
date year doy image_count midday_filename midday_r midday_g midday_b midday_gcc midday_rcc ... rcc_std rcc_50 rcc_75 rcc_90 max_solar_elev snow_flag outlierflag_gcc_mean outlierflag_gcc_50 outlierflag_gcc_75 outlierflag_gcc_90
0 2012-05-03 2012 124 2 alligatorriver_2012_05_03_120110.jpg 106.30031 115.73730 55.34694 0.41724 0.38322 ... 0.00038 0.38285 0.38304 0.38315 70.15241 NaN NaN NaN NaN NaN
1 2012-05-04 2012 125 0 None NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2012-05-05 2012 126 0 None NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2012-05-06 2012 127 17 alligatorriver_2012_05_06_120108.jpg 98.56315 114.36571 58.60785 0.42118 0.36298 ... 0.00954 0.37324 0.37641 0.37958 71.00152 NaN NaN NaN NaN NaN
4 2012-05-07 2012 128 21 alligatorriver_2012_05_07_120109.jpg 104.66830 114.42699 57.99294 0.41296 0.37774 ... 0.00329 0.37774 0.38016 0.38301 71.27531 NaN NaN NaN NaN NaN

5 rows × 32 columns

If necessary we'll need to convert nodata values.


In [7]:
df[df['gcc_90'] == -9999.].gcc_90 = np.nan

In [8]:
df.plot('date', ['gcc_90'], figsize=(16,4),
        grid=True, style=['g'] )


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x108f14c50>

We can look at other columns and also filter the data in a variety of ways. Recently we had a site where the number of images varied a lot over time. Let's look at how consistent the number of images for the alligator river site. The image_count reflects our brightness threshold which will eliminate images in the winter time when the days are shorter. But there are a number of other ways the image count can be reduced. The ability reliably extract a 90^th precentile value is dependent on the number of images available for a particular summary period.


In [9]:
df.plot('date','image_count', figsize=(16,4), style='b')


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x108f92610>

One possibility would be to filter the data for summary periods which had at least 10 images.


In [10]:
df10 = df[df['image_count'] >= 10]
df10.plot('date', ['gcc_90'], figsize=(16,4),
        grid=True, style=['g'] )


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ad8bbd0>

This looks a little cleaner especially for the 2013 season.


In [ ]: