Investigating Right Whale Sighting data in Cape Cod Bay

Scott Lindell at MBL was working on aquaculture siting in Cape Cod Bay, and wanted to know when and where endangered Right Whales were found in shallow water.


In [26]:
import pandas as pd
import netCDF4
from mpl_toolkits.basemap import interp

Scott's data was in a simple spreadsheet output, so we saved to CSV and read it into Pandas


In [27]:
df = pd.read_csv('/usgs/data2/notebook/right_whale.csv',index_col='DATE',parse_dates=True)

In [28]:
df.head(4)


Out[28]:
Species Code SPECIES TOTAL Latitude Longitude
DATE
1998-12-13 RIWH 1 42.05167 -70.15833
1998-12-13 RIWH 1 42.05167 -70.14833
1998-12-13 RIWH 1 41.86000 -70.26334
1998-12-13 RIWH 2 41.79167 -70.35333

To see what depths the whales were in, get a high-resolution bathymetry dataset for the Gulf of Maine. We will get just the bathymetric grid we need by determining the bounding box of the whale data, and then using that to extract the data via OPeNDAP. We access OPeNDAP data using NetCDF4-Python, which makes OPeNDAP datasets look like a local NetCDF file.


In [29]:
bathy_url='http://geoport.whoi.edu/thredds/dodsC/bathy/gom03_v1_0'
nc = netCDF4.Dataset(bathy_url).variables
lon=nc['lon'][:]
lat=nc['lat'][:]
bi=(lon>=df['Longitude'].min())&(lon<=df['Longitude'].max())
bj=(lat>=df['Latitude'].min())&(lat<=df['Latitude'].max())
z=nc['topo'][bj,bi]
lon=lon[bi]
lat=lat[bj]
print shape(z)


(368, 722)

Get whale depths by interpolating from bathymetric grid


In [30]:
df['Depth']=interp(-z,lon,lat,df['Longitude'],df['Latitude'])

In [31]:
df.head()


Out[31]:
Species Code SPECIES TOTAL Latitude Longitude Depth
DATE
1998-12-13 RIWH 1 42.05167 -70.15833 6.740654
1998-12-13 RIWH 1 42.05167 -70.14833 3.542641
1998-12-13 RIWH 1 41.86000 -70.26334 28.271041
1998-12-13 RIWH 2 41.79167 -70.35333 23.508510
1999-01-06 RIWH 1 41.83333 -70.15833 6.958285

At what water depths are whales usually found in Cape Cod Bay?


In [32]:
hist(df['Depth']);



In [33]:
# create dataframe of shallow sightings
ishallow = where(df['Depth'] < 10.0)[0]
df_shallow = df.ix[ishallow]
df_shallow.head(10)


Out[33]:
Species Code SPECIES TOTAL Latitude Longitude Depth
DATE
1998-12-13 RIWH 1 42.05167 -70.15833 6.740654
1998-12-13 RIWH 1 42.05167 -70.14833 3.542641
1999-01-06 RIWH 1 41.83333 -70.15833 6.958285
1999-01-21 RIWH 1 41.83333 -70.16167 7.666747
1999-01-21 RIWH 2 41.83333 -70.16167 7.666747
1999-01-26 RIWH 1 41.75167 -70.23000 9.675799
1999-02-01 RIWH 1 41.83167 -70.10500 9.478573
1999-02-17 RIWH 1 41.83333 -70.16666 8.686150
2007-05-13 RIWH 2 41.80426 -70.07568 9.200651
2007-04-25 RIWH 1 42.02145 -70.19609 1.540242

In [34]:
figure(figsize=(12,12))
z = ma.masked_where(z>0,z)
subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
pcolormesh(lon,lat,z,vmax=0)
plot(df_shallow.Longitude,df_shallow.Latitude,'ko');
plot(df_shallow.Longitude,df_shallow.Latitude,'w+');
axis([-70.6, -70.0, 41.75, 42.06])
grid()



In [35]:
# determine stats by year (or month, quarter, etc)
#df_shallow_stats = df_shallow.resample('6M', how='mean')   # monthly means
df_shallow_stats = df_shallow.resample('A-AUG', how='sum')   # annual sum, ending in august

In [36]:
df_shallow_stats['SPECIES TOTAL']


Out[36]:
DATE
1998-08-31     9
1999-08-31     9
2000-08-31     4
2001-08-31     3
2002-08-31   NaN
2003-08-31   NaN
2004-08-31     8
2005-08-31     1
2006-08-31   NaN
2007-08-31     3
2008-08-31     6
2009-08-31     4
2010-08-31     5
2011-08-31    25
Freq: A-AUG, Name: SPECIES TOTAL, dtype: float64

In [37]:
# any significant trend in time?
df_shallow_stats['SPECIES TOTAL'].plot(kind='bar',figsize=(12,5))


Out[37]:
<matplotlib.axes.AxesSubplot at 0x4900f10>

So how many whales on average would we expect to see in shallow water in specific months? We can use the Pandas groupby method to group by month and take the mean


In [64]:
# average number of sightings for each month
df_monthly_sum = df_shallow.groupby(lambda x: x.month).mean()

In [67]:
df_monthly_sum.head(12)


Out[67]:
SPECIES TOTAL Latitude Longitude Depth
1 1.875 41.917880 -70.189633 6.414131
2 1.000 41.926982 -70.141615 4.845008
3 1.800 41.952314 -70.153653 6.878730
4 2.000 41.934977 -70.191988 6.019028
5 1.750 41.925157 -70.116205 7.168017
12 1.000 42.051670 -70.153330 5.141647

In [68]:
# plot 6 frame panel of sightings in each month
iframe=0
figure(figsize=(20,10))
grouped = df_shallow.groupby(lambda x: x.month)
for month,group in grouped:
    iframe = iframe + 1
    subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    pcolormesh(lon,lat,z,vmax=0)
    plot(group.Longitude,group.Latitude,'ko')
    plot(group.Longitude,group.Latitude,'w+')
    axis('tight')
    #axis('off')
    grid('on')
    title('Month = %d' % month)



In [42]:
for month,group in grouped:
    group.to_csv('/usgs/data2/notebook/%s.csv' % month)

In [43]:
# plot each month separately
iframe=0
for month,group in grouped:
    iframe = iframe + 1
    #subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    figure(iframe,figsize=(12,12))
    subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    
    pcolormesh(lon,lat,z,vmax=0)
    plot(group.Longitude,group.Latitude,'ko')
    plot(group.Longitude,group.Latitude,'w+')
    axis([-70.6, -70.0, 41.75, 42.06])
    #axis('tight')
    #axis('off')
    grid('on')
    title('Month = %d' % month)



In [55]:
df_shallow_stats[['SPECIES TOTAL','Depth']].describe()


Out[55]:
SPECIES TOTAL Depth
count 11.000000 11.000000
mean 7.000000 25.700881
std 6.511528 19.597841
min 1.000000 -0.481648
25% 3.500000 12.481057
50% 5.000000 20.128451
75% 8.500000 29.539113
max 25.000000 60.851178

In [45]:


In [46]:
df_shallow.index


Out[46]:
<class 'pandas.tseries.index.DatetimeIndex'>
[1998-12-13 00:00:00, ..., 2001-03-17 00:00:00]
Length: 45, Freq: None, Timezone: None

In [47]:
type(df_shallow.index)


Out[47]:
pandas.tseries.index.DatetimeIndex

In [ ]: