7Day subsampling on the OceanColor Dataset



In [1]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [2]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [3]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [4]:

    
[ds.load() for ds in both_datasets]









    Out[4]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [5]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [6]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[6]:





[None, None]



In [7]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[7]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [8]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[8]:





<matplotlib.collections.QuadMesh at 0x119490cc0>



In [9]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[9]:





<matplotlib.text.Text at 0x109882470>



In [10]:

    
#ds_8day



In [11]:

    
#ds_daily



In [12]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[12]:





<matplotlib.collections.QuadMesh at 0x11b252518>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [13]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [14]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[14]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [15]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[15]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [16]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [17]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [18]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[18]:





<matplotlib.legend.Legend at 0x11a195908>

Seasonal Climatology



In [19]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [20]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[20]:





<matplotlib.legend.Legend at 0x11dc61978>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [21]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[21]:





<matplotlib.collections.QuadMesh at 0x11d932da0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [22]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[22]:





<matplotlib.collections.QuadMesh at 0x11d7adb38>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [23]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[23]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lon      (points) float64 65.04 70.04
    lat      (points) float64 16.04 18.04
  * points   (points) int64 0 1



In [24]:

    
#ds_daily.chlor_a.sel_points?



In [25]:

    
ds_7day = ds_daily.resample('7D', dim='time')
ds_7day









    Out[25]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 757)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-11 2002-07-18 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [26]:

    
plt.figure(figsize=(8,6))
ds_7day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[26]:





<matplotlib.collections.QuadMesh at 0x112bd9dd8>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [27]:

    
# check the range for the longitude
print(ds_7day.lon.min(),'\n' ,ds_7day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [28]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [29]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [30]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[30]:





0



In [31]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[31]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [32]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [33]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[33]:





<matplotlib.axes._subplots.AxesSubplot at 0x489be7908>



In [34]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [35]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[35]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [36]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_7D =floatsDSAll.resample('7D', dim='time')
floatsDSAll_7D









    Out[36]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 731)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-11 2002-07-18 ...
Data variables:
    spd      (time, id) float64 nan 9.116 19.57 18.47 18.71 nan 25.8 nan ...
    var_lat  (time, id) float64 nan 0.001587 6.252e-05 6.66e-05 5.442e-05 ...
    ve       (time, id) float64 nan 7.634 13.54 13.13 12.62 nan 24.31 nan ...
    lon      (time, id) float64 nan 66.49 69.8 64.94 69.81 nan 69.26 nan ...
    lat      (time, id) float64 nan 16.2 13.64 16.2 13.66 nan 20.11 nan 18.7 ...
    vn       (time, id) float64 nan -0.9647 -10.47 -11.1 -9.331 nan -2.7 nan ...
    var_lon  (time, id) float64 nan 0.006079 0.0001193 0.000128 0.0001 nan ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003622 0.08638 nan ...
    temp     (time, id) float64 nan nan nan 27.82 28.57 nan 28.99 nan 27.66 ...



In [37]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_7D = floatsDSAll_7D.to_dataframe()
floatsDFAll_7D
floatsDFAll_7D = floatsDFAll_7D.reset_index()
floatsDFAll_7D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_7D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[37]:





<matplotlib.axes._subplots.AxesSubplot at 0x11e855860>



In [38]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_7Dtimeorder = floatsDFAll_7D.sort_values(['time','id'],ascending=True)
floatsDFAll_7Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[38]:






  
    
      
      id
      time
      spd
      var_lat
      ve
      lon
      lat
      vn
      var_lon
      var_tmp
      temp
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      731
      10206
      2002-07-04
      9.116143
      0.001587
      7.633679
      66.486143
      16.199036
      -0.964714
      0.006079
      1000.000000
      NaN
    
    
      1462
      10208
      2002-07-04
      19.568179
      0.000063
      13.538179
      69.797393
      13.639571
      -10.466714
      0.000119
      1000.000000
      NaN
    
    
      2193
      11089
      2002-07-04
      18.467286
      0.000067
      13.125536
      64.944321
      16.201536
      -11.098214
      0.000128
      0.003622
      27.824321
    
    
      2924
      15703
      2002-07-04
      18.709607
      0.000054
      12.621429
      69.811536
      13.664357
      -9.331036
      0.000100
      0.086380
      28.566893
    
    
      3655
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4386
      27069
      2002-07-04
      25.798821
      0.000053
      24.312500
      69.255893
      20.107393
      -2.699643
      0.000098
      0.001709
      28.985429
    
    
      5117
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5848
      28842
      2002-07-04
      17.977607
      0.000093
      3.972357
      60.814786
      18.699071
      -6.595714
      0.000191
      0.003340
      27.664714
    
    
      6579
      34159
      2002-07-04
      34.312536
      0.000057
      31.120250
      59.465607
      12.735786
      12.632857
      0.000106
      1000.000000
      NaN
    
    
      7310
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8041
      34210
      2002-07-04
      23.652929
      0.000067
      -4.493786
      56.740607
      6.139929
      -12.919036
      0.000133
      0.003693
      26.666500
    
    
      8772
      34211
      2002-07-04
      27.875143
      0.000056
      22.426857
      68.382250
      8.319500
      -14.216000
      0.000103
      0.003483
      28.365179
    
    
      9503
      34212
      2002-07-04
      45.650536
      0.000056
      40.174036
      65.563929
      6.609179
      15.416786
      0.000104
      0.003568
      28.568643
    
    
      10234
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10965
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11696
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12427
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13158
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13889
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14620
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15351
      34708
      2002-07-04
      35.758821
      0.000052
      35.386786
      60.444357
      10.212750
      1.909321
      0.000095
      0.001803
      27.222107
    
    
      16082
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16813
      34710
      2002-07-04
      45.884500
      0.000053
      9.640107
      50.248679
      13.213071
      1.348857
      0.000102
      0.001863
      31.119429
    
    
      17544
      34714
      2002-07-04
      39.069893
      0.000060
      37.898500
      64.408214
      13.734893
      4.796821
      0.000115
      0.001812
      27.732107
    
    
      18275
      34716
      2002-07-04
      34.254786
      0.000059
      32.405357
      66.047750
      7.661964
      8.016571
      0.000109
      0.001764
      28.795321
    
    
      19006
      34718
      2002-07-04
      36.438321
      0.000057
      20.501357
      72.805143
      15.730036
      -29.651821
      0.000108
      0.001697
      29.113821
    
    
      19737
      34719
      2002-07-04
      28.049250
      0.000058
      14.905679
      71.282036
      17.422571
      -20.532143
      0.000109
      0.001665
      28.957786
    
    
      20468
      34720
      2002-07-04
      27.975464
      0.000061
      11.404786
      69.387107
      14.255964
      -23.544607
      0.000114
      0.001785
      28.667893
    
    
      21199
      34721
      2002-07-04
      12.971000
      0.000063
      6.588571
      65.513893
      17.012393
      -9.760893
      0.000119
      0.001765
      27.912107
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      168129
      3098682
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      168860
      60073460
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      169591
      60074440
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      170322
      60077450
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      171053
      60150420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      171784
      60454500
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      172515
      60656200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      173246
      60657200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      173977
      60658190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      174708
      60659110
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      175439
      60659120
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      176170
      60659190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      176901
      60659200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      177632
      60940960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      178363
      60940970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      179094
      60941960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      179825
      60941970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      180556
      60942960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      181287
      60942970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      182018
      60943960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      182749
      60943970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      183480
      60944960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      184211
      60944970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      184942
      60945970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      185673
      60946960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      186404
      60947960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      187135
      60947970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      187866
      60948960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      188597
      60950430
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      189328
      62321420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

189329 rows × 11 columns



In [40]:

    
floatsDFAll_7Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (4362,)









    Out[40]:





(4362,)



In [41]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[41]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [42]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_7day.chlor_a.sel_points(time=list(floatsDFAll_7Dtimeorder.time),lon=list(floatsDFAll_7Dtimeorder.lon), lat=list(floatsDFAll_7Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 187471



In [43]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (1858,) good values









    Out[43]:





(1858,)



In [45]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_7Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_7Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_7Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_7Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_7Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_7Dtimeorder['chlor_a_log10'] = floatsDFAll_7Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_7Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_7Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAll_7Dtimeorder.chlor_a.dropna().shape  # (1858,)
floatsDFAll_7Dtimeorder.chlor_a_log10.dropna().shape  # (1858,)









    



after editing the dataframe the nan values in 'chlor_a' is 187471






    Out[45]:





(1858,)



In [46]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_7Dtimeorder









    Out[46]:






  
    
      
      id
      time
      spd
      var_lat
      ve
      lon
      lat
      vn
      var_lon
      var_tmp
      temp
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      731
      10206
      2002-07-04
      9.116143
      0.001587
      7.633679
      66.486143
      16.199036
      -0.964714
      0.006079
      1000.000000
      NaN
      NaN
      NaN
    
    
      1462
      10208
      2002-07-04
      19.568179
      0.000063
      13.538179
      69.797393
      13.639571
      -10.466714
      0.000119
      1000.000000
      NaN
      NaN
      NaN
    
    
      2193
      11089
      2002-07-04
      18.467286
      0.000067
      13.125536
      64.944321
      16.201536
      -11.098214
      0.000128
      0.003622
      27.824321
      NaN
      NaN
    
    
      2924
      15703
      2002-07-04
      18.709607
      0.000054
      12.621429
      69.811536
      13.664357
      -9.331036
      0.000100
      0.086380
      28.566893
      NaN
      NaN
    
    
      3655
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4386
      27069
      2002-07-04
      25.798821
      0.000053
      24.312500
      69.255893
      20.107393
      -2.699643
      0.000098
      0.001709
      28.985429
      NaN
      NaN
    
    
      5117
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5848
      28842
      2002-07-04
      17.977607
      0.000093
      3.972357
      60.814786
      18.699071
      -6.595714
      0.000191
      0.003340
      27.664714
      NaN
      NaN
    
    
      6579
      34159
      2002-07-04
      34.312536
      0.000057
      31.120250
      59.465607
      12.735786
      12.632857
      0.000106
      1000.000000
      NaN
      NaN
      NaN
    
    
      7310
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8041
      34210
      2002-07-04
      23.652929
      0.000067
      -4.493786
      56.740607
      6.139929
      -12.919036
      0.000133
      0.003693
      26.666500
      NaN
      NaN
    
    
      8772
      34211
      2002-07-04
      27.875143
      0.000056
      22.426857
      68.382250
      8.319500
      -14.216000
      0.000103
      0.003483
      28.365179
      0.105164
      -0.978133
    
    
      9503
      34212
      2002-07-04
      45.650536
      0.000056
      40.174036
      65.563929
      6.609179
      15.416786
      0.000104
      0.003568
      28.568643
      NaN
      NaN
    
    
      10234
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10965
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11696
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12427
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13158
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13889
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14620
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15351
      34708
      2002-07-04
      35.758821
      0.000052
      35.386786
      60.444357
      10.212750
      1.909321
      0.000095
      0.001803
      27.222107
      NaN
      NaN
    
    
      16082
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16813
      34710
      2002-07-04
      45.884500
      0.000053
      9.640107
      50.248679
      13.213071
      1.348857
      0.000102
      0.001863
      31.119429
      NaN
      NaN
    
    
      17544
      34714
      2002-07-04
      39.069893
      0.000060
      37.898500
      64.408214
      13.734893
      4.796821
      0.000115
      0.001812
      27.732107
      NaN
      NaN
    
    
      18275
      34716
      2002-07-04
      34.254786
      0.000059
      32.405357
      66.047750
      7.661964
      8.016571
      0.000109
      0.001764
      28.795321
      0.110992
      -0.954708
    
    
      19006
      34718
      2002-07-04
      36.438321
      0.000057
      20.501357
      72.805143
      15.730036
      -29.651821
      0.000108
      0.001697
      29.113821
      NaN
      NaN
    
    
      19737
      34719
      2002-07-04
      28.049250
      0.000058
      14.905679
      71.282036
      17.422571
      -20.532143
      0.000109
      0.001665
      28.957786
      NaN
      NaN
    
    
      20468
      34720
      2002-07-04
      27.975464
      0.000061
      11.404786
      69.387107
      14.255964
      -23.544607
      0.000114
      0.001785
      28.667893
      NaN
      NaN
    
    
      21199
      34721
      2002-07-04
      12.971000
      0.000063
      6.588571
      65.513893
      17.012393
      -9.760893
      0.000119
      0.001765
      27.912107
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      168129
      3098682
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      168860
      60073460
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      169591
      60074440
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      170322
      60077450
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      171053
      60150420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      171784
      60454500
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      172515
      60656200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      173246
      60657200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      173977
      60658190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      174708
      60659110
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      175439
      60659120
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      176170
      60659190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      176901
      60659200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      177632
      60940960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      178363
      60940970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      179094
      60941960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      179825
      60941970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      180556
      60942960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      181287
      60942970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      182018
      60943960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      182749
      60943970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      183480
      60944960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      184211
      60944970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      184942
      60945970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      185673
      60946960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      186404
      60947960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      187135
      60947970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      187866
      60948960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      188597
      60950430
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      189328
      62321420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

189329 rows × 13 columns



In [47]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [49]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_7Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_7Dtimeorder=pd.merge(floatsDFAll_7Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_7Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_7Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_7Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_7Dtimeorder['chl_rate_log10'] = floatsDFAllRate_7Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_7Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_7Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAllRate_7Dtimeorder.chl_rate.dropna().shape   # (1127,) data points
#floatsDFAllRate_7Dtimeorder.chl_rate_log10.dropna().shape   # (466,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -68.59935079887515
check the sum of the chlor_a after the merge -68.59935079887515






    Out[49]:





(1127,)



In [50]:

    
pd.to_datetime(floatsDFAllRate_7Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_7Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_7Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_7Dtimeorder[selector].chl_rate.dropna().shape) # total (723,)
print('all the data count is', floatsDFAllRate_7Dtimeorder.chl_rate.dropna().shape )   # total (1127,)









    



shape of the selector (189329,)
all the data count in [11-01, 03-31]  is (723,)
all the data count is (1127,)



In [51]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_7Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('7-Day chl_rate')









    Out[51]:





<matplotlib.text.Text at 0x12b035898>



In [52]:

    
# standarized series
ts = floatsDFAllRate_7Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('7-Day standardized chl_rate')









    Out[52]:





<matplotlib.text.Text at 0x167124be0>



In [53]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_7Dtimeorder[ (floatsDFAllRate_7Dtimeorder.time > str(i))  & (floatsDFAllRate_7Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1121
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(50,)
(48,)
(4,)
(42,)
(98,)
(96,)
(143,)
(45,)
(79,)
(20,)
(38,)
(39,)
(240,)
(128,)
(51,)



In [54]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_7Dtimeorder[ (floatsDFAllRate_7Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_7Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 723
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(63,)
(0,)
(6,)
(72,)
(40,)
(122,)
(28,)
(58,)
(6,)
(44,)
(0,)
(118,)
(110,)
(56,)



In [55]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_7Dtimeorder[ (floatsDFAllRate_7Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_7Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is   (723,)
df_chl_out_7D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_7D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (723,)






    Out[55]:






  
    
      
      id
      time
      spd
      var_lat
      ve
      lon
      lat
      vn
      var_lon
      var_tmp
      temp
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      4663
      10206
      2002-11-07
      5.881464
      0.000487
      -2.351607
      67.145571
      11.112429
      3.113143
      0.001486
      1000.000000
      NaN
      0.130267
      -0.885166
      -0.004264
      NaN
    
    
      4665
      11089
      2002-11-07
      17.183500
      0.000067
      -16.224571
      64.522214
      14.321929
      -1.954857
      0.000133
      0.003821
      28.931286
      0.192224
      -0.716192
      0.067516
      -1.170591
    
    
      4667
      15707
      2002-11-07
      25.486857
      0.000077
      -9.886893
      67.237571
      13.279821
      -21.813714
      0.000155
      1000.000000
      NaN
      0.164760
      -0.783149
      0.009444
      -2.024855
    
    
      4685
      34710
      2002-11-07
      16.909357
      0.000073
      -4.254286
      63.074536
      17.550536
      15.411857
      0.000146
      0.001906
      28.607679
      0.392885
      -0.405735
      0.016794
      -1.774846
    
    
      4691
      34721
      2002-11-07
      16.744036
      0.000066
      9.964393
      68.010643
      12.662179
      6.091821
      0.000130
      0.001844
      29.422214
      0.141941
      -0.847893
      -0.001058
      NaN



In [56]:

    
df_chl_out_7D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_7D_modisa.to_csv('df_chl_out_7D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_7D_modisa.csv', index_col='index')
test.head()









    Out[56]:






  
    
      
      id
      time
      spd
      var_lat
      ve
      lon
      lat
      vn
      var_lon
      var_tmp
      temp
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      4663
      10206
      2002-11-07
      5.881464
      0.000487
      -2.351607
      67.145571
      11.112429
      3.113143
      0.001486
      1000.000000
      NaN
      0.130267
      -0.885166
      -0.004264
      NaN
    
    
      4665
      11089
      2002-11-07
      17.183500
      0.000067
      -16.224571
      64.522214
      14.321929
      -1.954857
      0.000133
      0.003821
      28.931286
      0.192224
      -0.716192
      0.067516
      -1.170591
    
    
      4667
      15707
      2002-11-07
      25.486857
      0.000077
      -9.886893
      67.237571
      13.279821
      -21.813714
      0.000155
      1000.000000
      NaN
      0.164760
      -0.783149
      0.009444
      -2.024855
    
    
      4685
      34710
      2002-11-07
      16.909357
      0.000073
      -4.254286
      63.074536
      17.550536
      15.411857
      0.000146
      0.001906
      28.607679
      0.392885
      -0.405735
      0.016794
      -1.774846
    
    
      4691
      34721
      2002-11-07
      16.744036
      0.000066
      9.964393
      68.010643
      12.662179
      6.091821
      0.000130
      0.001844
      29.422214
      0.141941
      -0.847893
      -0.001058
      NaN



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	spd	var_lat	ve	lon	lat	vn	var_lon	var_tmp	temp
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
731	10206	2002-07-04	9.116143	0.001587	7.633679	66.486143	16.199036	-0.964714	0.006079	1000.000000	NaN
1462	10208	2002-07-04	19.568179	0.000063	13.538179	69.797393	13.639571	-10.466714	0.000119	1000.000000	NaN
2193	11089	2002-07-04	18.467286	0.000067	13.125536	64.944321	16.201536	-11.098214	0.000128	0.003622	27.824321
2924	15703	2002-07-04	18.709607	0.000054	12.621429	69.811536	13.664357	-9.331036	0.000100	0.086380	28.566893
3655	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4386	27069	2002-07-04	25.798821	0.000053	24.312500	69.255893	20.107393	-2.699643	0.000098	0.001709	28.985429
5117	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5848	28842	2002-07-04	17.977607	0.000093	3.972357	60.814786	18.699071	-6.595714	0.000191	0.003340	27.664714
6579	34159	2002-07-04	34.312536	0.000057	31.120250	59.465607	12.735786	12.632857	0.000106	1000.000000	NaN
7310	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8041	34210	2002-07-04	23.652929	0.000067	-4.493786	56.740607	6.139929	-12.919036	0.000133	0.003693	26.666500
8772	34211	2002-07-04	27.875143	0.000056	22.426857	68.382250	8.319500	-14.216000	0.000103	0.003483	28.365179
9503	34212	2002-07-04	45.650536	0.000056	40.174036	65.563929	6.609179	15.416786	0.000104	0.003568	28.568643
10234	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10965	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11696	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
12427	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
13158	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
13889	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
14620	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
15351	34708	2002-07-04	35.758821	0.000052	35.386786	60.444357	10.212750	1.909321	0.000095	0.001803	27.222107
16082	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
16813	34710	2002-07-04	45.884500	0.000053	9.640107	50.248679	13.213071	1.348857	0.000102	0.001863	31.119429
17544	34714	2002-07-04	39.069893	0.000060	37.898500	64.408214	13.734893	4.796821	0.000115	0.001812	27.732107
18275	34716	2002-07-04	34.254786	0.000059	32.405357	66.047750	7.661964	8.016571	0.000109	0.001764	28.795321
19006	34718	2002-07-04	36.438321	0.000057	20.501357	72.805143	15.730036	-29.651821	0.000108	0.001697	29.113821
19737	34719	2002-07-04	28.049250	0.000058	14.905679	71.282036	17.422571	-20.532143	0.000109	0.001665	28.957786
20468	34720	2002-07-04	27.975464	0.000061	11.404786	69.387107	14.255964	-23.544607	0.000114	0.001785	28.667893
21199	34721	2002-07-04	12.971000	0.000063	6.588571	65.513893	17.012393	-9.760893	0.000119	0.001765	27.912107
...	...	...	...	...	...	...	...	...	...	...	...
168129	3098682	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
168860	60073460	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
169591	60074440	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
170322	60077450	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
171053	60150420	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
171784	60454500	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
172515	60656200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
173246	60657200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
173977	60658190	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
174708	60659110	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
175439	60659120	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
176170	60659190	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
176901	60659200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
177632	60940960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
178363	60940970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
179094	60941960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
179825	60941970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
180556	60942960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
181287	60942970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
182018	60943960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
182749	60943970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
183480	60944960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
184211	60944970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
184942	60945970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
185673	60946960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
186404	60947960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
187135	60947970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
187866	60948960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
188597	60950430	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
189328	62321420	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	spd	var_lat	ve	lon	lat	vn	var_lon	var_tmp	temp	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
4663	10206	2002-11-07	5.881464	0.000487	-2.351607	67.145571	11.112429	3.113143	0.001486	1000.000000	NaN	0.130267	-0.885166	-0.004264	NaN
4665	11089	2002-11-07	17.183500	0.000067	-16.224571	64.522214	14.321929	-1.954857	0.000133	0.003821	28.931286	0.192224	-0.716192	0.067516	-1.170591
4667	15707	2002-11-07	25.486857	0.000077	-9.886893	67.237571	13.279821	-21.813714	0.000155	1000.000000	NaN	0.164760	-0.783149	0.009444	-2.024855
4685	34710	2002-11-07	16.909357	0.000073	-4.254286	63.074536	17.550536	15.411857	0.000146	0.001906	28.607679	0.392885	-0.405735	0.016794	-1.774846
4691	34721	2002-11-07	16.744036	0.000066	9.964393	68.010643	12.662179	6.091821	0.000130	0.001844	29.422214	0.141941	-0.847893	-0.001058	NaN

	id	time	spd	var_lat	ve	lon	lat	vn	var_lon	var_tmp	temp	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
4663	10206	2002-11-07	5.881464	0.000487	-2.351607	67.145571	11.112429	3.113143	0.001486	1000.000000	NaN	0.130267	-0.885166	-0.004264	NaN
4665	11089	2002-11-07	17.183500	0.000067	-16.224571	64.522214	14.321929	-1.954857	0.000133	0.003821	28.931286	0.192224	-0.716192	0.067516	-1.170591
4667	15707	2002-11-07	25.486857	0.000077	-9.886893	67.237571	13.279821	-21.813714	0.000155	1000.000000	NaN	0.164760	-0.783149	0.009444	-2.024855
4685	34710	2002-11-07	16.909357	0.000073	-4.254286	63.074536	17.550536	15.411857	0.000146	0.001906	28.607679	0.392885	-0.405735	0.016794	-1.774846
4691	34721	2002-11-07	16.744036	0.000066	9.964393	68.010643	12.662179	6.091821	0.000130	0.001844	29.422214	0.141941	-0.847893	-0.001058	NaN