15Day subsampling on the OceanColor Dataset



In [1]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [2]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [3]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [4]:

    
[ds.load() for ds in both_datasets]









    Out[4]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [5]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [15]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[15]:





[None, None]



In [16]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[16]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [17]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[17]:





<matplotlib.collections.QuadMesh at 0x118cca358>



In [18]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[18]:





<matplotlib.text.Text at 0x13d359dd8>



In [19]:

    
#ds_8day



In [20]:

    
#ds_daily



In [21]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[21]:





<matplotlib.collections.QuadMesh at 0x11b0d01d0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [22]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [23]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[23]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...



In [ ]:



In [24]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[24]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [25]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [26]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [27]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[27]:





<matplotlib.legend.Legend at 0x1298b70b8>

Seasonal Climatology



In [28]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [29]:

    
print()



In [30]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[30]:





<matplotlib.legend.Legend at 0x129a89f28>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [31]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[31]:





<matplotlib.collections.QuadMesh at 0x129a8b4e0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [32]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[32]:





<matplotlib.collections.QuadMesh at 0x12b12fc50>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [33]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[33]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lon      (points) float64 65.04 70.04
    lat      (points) float64 16.04 18.04
  * points   (points) int64 0 1



In [34]:

    
#ds_daily.chlor_a.sel_points?



In [35]:

    
ds_15day = ds_daily.resample('15D', dim='time')
ds_15day









    Out[35]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 353)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-19 2002-08-03 ...
Data variables:
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...



In [48]:

    
plt.figure(figsize=(8,6))
ds_15day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[48]:





<matplotlib.collections.QuadMesh at 0x11b6ae160>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [47]:

    
# check the range for the longitude
print(ds_15day.lon.min(),'\n' ,ds_15day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [38]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [39]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [40]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[40]:





0



In [41]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[41]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [42]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [43]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[43]:





<matplotlib.axes._subplots.AxesSubplot at 0x2450c8390>



In [44]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [45]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[45]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [46]:

    
# resample on the xarray.dataset onto 15-day frequency
floatsDSAll_15D =floatsDSAll.resample('15D', dim='time')
floatsDSAll_15D









    Out[46]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 341)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-19 2002-08-03 ...
Data variables:
    lon      (time, id) float64 nan 66.66 70.2 65.25 70.17 nan 70.05 nan ...
    vn       (time, id) float64 nan 0.3577 -5.286 -14.1 -4.513 nan -5.424 ...
    spd      (time, id) float64 nan 7.395 15.01 18.7 13.97 nan 27.87 nan ...
    var_lon  (time, id) float64 nan 0.00561 0.0001185 0.0001289 0.000102 nan ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003614 0.08862 nan ...
    ve       (time, id) float64 nan 6.14 11.37 9.377 10.19 nan 25.86 nan ...
    temp     (time, id) float64 nan nan nan 27.77 28.59 nan 28.92 nan 27.23 ...
    var_lat  (time, id) float64 nan 0.001451 6.257e-05 6.695e-05 5.535e-05 ...
    lat      (time, id) float64 nan 16.27 13.55 15.66 13.61 nan 19.97 nan ...



In [49]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_15D = floatsDSAll_15D.to_dataframe()
floatsDFAll_15D
floatsDFAll_15D = floatsDFAll_15D.reset_index()
floatsDFAll_15D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_15D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[49]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b68eb38>



In [50]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_15Dtimeorder = floatsDFAll_15D.sort_values(['time','id'],ascending=True)
floatsDFAll_15Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[50]:






  
    
      
      id
      time
      lon
      vn
      spd
      var_lon
      var_tmp
      ve
      temp
      var_lat
      lat
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      341
      10206
      2002-07-04
      66.663800
      0.357733
      7.394967
      0.005610
      1000.000000
      6.140233
      NaN
      0.001451
      16.265717
    
    
      682
      10208
      2002-07-04
      70.195217
      -5.285617
      15.006967
      0.000118
      1000.000000
      11.373300
      NaN
      0.000063
      13.549633
    
    
      1023
      11089
      2002-07-04
      65.248067
      -14.097033
      18.695917
      0.000129
      0.003614
      9.376883
      27.773283
      0.000067
      15.657150
    
    
      1364
      15703
      2002-07-04
      70.165200
      -4.513033
      13.965250
      0.000102
      0.088623
      10.194983
      28.590333
      0.000055
      13.611350
    
    
      1705
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2046
      27069
      2002-07-04
      70.048350
      -5.424417
      27.865400
      0.000106
      0.001731
      25.855350
      28.916267
      0.000057
      19.969700
    
    
      2387
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2728
      28842
      2002-07-04
      60.961600
      -9.921900
      16.832533
      0.000362
      0.003382
      5.825783
      27.226833
      0.000149
      18.350883
    
    
      3069
      34159
      2002-07-04
      60.516650
      16.559017
      36.755683
      0.000116
      1000.000000
      31.603317
      NaN
      0.000061
      13.394633
    
    
      3410
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3751
      34210
      2002-07-04
      56.749953
      -18.675465
      26.752744
      0.000129
      0.003705
      -5.144814
      26.354721
      0.000066
      5.882953
    
    
      4092
      34211
      2002-07-04
      69.070367
      -14.960467
      27.234933
      0.000098
      0.003538
      19.858683
      28.430017
      0.000053
      7.797533
    
    
      4433
      34212
      2002-07-04
      66.877317
      1.993683
      42.610483
      0.000102
      0.003553
      34.703000
      28.568833
      0.000055
      6.519433
    
    
      4774
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5115
      34310
      2002-07-04
      70.029000
      -0.021000
      10.808286
      0.000103
      0.003749
      8.074714
      28.954857
      0.000056
      5.023286
    
    
      5456
      34311
      2002-07-04
      69.980455
      -11.504476
      13.484762
      0.000114
      0.003594
      -2.896714
      28.593818
      0.000061
      9.730864
    
    
      5797
      34312
      2002-07-04
      65.167048
      -15.195200
      17.403650
      0.000154
      0.003670
      7.005500
      28.129857
      0.000075
      9.638095
    
    
      6138
      34314
      2002-07-04
      54.903200
      -5.381333
      16.793444
      0.000087
      0.003738
      -6.718778
      26.905500
      0.000047
      5.116600
    
    
      6479
      34315
      2002-07-04
      59.998824
      5.554625
      28.943938
      0.000094
      0.003575
      -17.022625
      28.303294
      0.000052
      5.162294
    
    
      6820
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7161
      34708
      2002-07-04
      61.099483
      4.269317
      24.302933
      0.000110
      0.001807
      22.553517
      27.384050
      0.000058
      10.333167
    
    
      7502
      34709
      2002-07-04
      52.252040
      46.740240
      81.157560
      0.000162
      0.001818
      0.462760
      26.204160
      0.000079
      6.396520
    
    
      7843
      34710
      2002-07-04
      49.966633
      -6.248550
      46.052550
      0.000087
      0.001837
      -15.991800
      31.137717
      0.000046
      12.618183
    
    
      8184
      34714
      2002-07-04
      65.507833
      1.335217
      35.355967
      0.000110
      0.001815
      34.244133
      27.870250
      0.000059
      13.767567
    
    
      8525
      34716
      2002-07-04
      66.931733
      2.107750
      34.797783
      0.000106
      0.001772
      25.851967
      28.781367
      0.000057
      7.695783
    
    
      8866
      34718
      2002-07-04
      73.395800
      -39.532167
      44.367833
      0.000102
      0.001691
      16.290550
      28.827417
      0.000055
      14.420783
    
    
      9207
      34719
      2002-07-04
      71.694917
      -20.354250
      26.786467
      0.000112
      0.001666
      14.472783
      28.910950
      0.000060
      16.656500
    
    
      9548
      34720
      2002-07-04
      69.771467
      -12.078667
      20.477317
      0.000114
      0.001750
      11.449783
      28.658633
      0.000061
      13.938883
    
    
      9889
      34721
      2002-07-04
      65.645233
      -11.221400
      13.610417
      0.000109
      0.001740
      5.294917
      27.876133
      0.000058
      16.653933
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      78429
      3098682
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      78770
      60073460
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79111
      60074440
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79452
      60077450
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79793
      60150420
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80134
      60454500
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80475
      60656200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80816
      60657200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81157
      60658190
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81498
      60659110
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81839
      60659120
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82180
      60659190
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82521
      60659200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82862
      60940960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83203
      60940970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83544
      60941960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83885
      60941970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84226
      60942960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84567
      60942970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84908
      60943960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85249
      60943970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85590
      60944960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85931
      60944970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86272
      60945970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86613
      60946960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86954
      60947960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87295
      60947970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87636
      60948960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87977
      60950430
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      88318
      62321420
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

88319 rows × 11 columns



In [51]:

    
floatsDFAll_15Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (2195,)









    Out[51]:





(2195,)



In [52]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[52]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [53]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_15day.chlor_a.sel_points(time=list(floatsDFAll_15Dtimeorder.time),lon=list(floatsDFAll_15Dtimeorder.lon), lat=list(floatsDFAll_15Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 87023

New Interpolation routine here

interpolation for one point
interpolation for a list of points



In [ ]:



In [ ]:



In [ ]:



In [54]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (1296,) good values









    Out[54]:





(1296,)



In [56]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_15Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_15Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_15Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_15Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_15Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_15Dtimeorder['chlor_a_log10'] = floatsDFAll_15Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_15Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_15Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAll_15Dtimeorder.chlor_a.dropna().shape  # (1296,) 
floatsDFAll_15Dtimeorder.chlor_a_log10.dropna().shape  # (1296,)









    



after editing the dataframe the nan values in 'chlor_a' is 87023






    Out[56]:





(1296,)



In [57]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_15Dtimeorder









    Out[57]:






  
    
      
      id
      time
      lon
      vn
      spd
      var_lon
      var_tmp
      ve
      temp
      var_lat
      lat
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      341
      10206
      2002-07-04
      66.663800
      0.357733
      7.394967
      0.005610
      1000.000000
      6.140233
      NaN
      0.001451
      16.265717
      NaN
      NaN
    
    
      682
      10208
      2002-07-04
      70.195217
      -5.285617
      15.006967
      0.000118
      1000.000000
      11.373300
      NaN
      0.000063
      13.549633
      NaN
      NaN
    
    
      1023
      11089
      2002-07-04
      65.248067
      -14.097033
      18.695917
      0.000129
      0.003614
      9.376883
      27.773283
      0.000067
      15.657150
      NaN
      NaN
    
    
      1364
      15703
      2002-07-04
      70.165200
      -4.513033
      13.965250
      0.000102
      0.088623
      10.194983
      28.590333
      0.000055
      13.611350
      NaN
      NaN
    
    
      1705
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2046
      27069
      2002-07-04
      70.048350
      -5.424417
      27.865400
      0.000106
      0.001731
      25.855350
      28.916267
      0.000057
      19.969700
      NaN
      NaN
    
    
      2387
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2728
      28842
      2002-07-04
      60.961600
      -9.921900
      16.832533
      0.000362
      0.003382
      5.825783
      27.226833
      0.000149
      18.350883
      NaN
      NaN
    
    
      3069
      34159
      2002-07-04
      60.516650
      16.559017
      36.755683
      0.000116
      1000.000000
      31.603317
      NaN
      0.000061
      13.394633
      NaN
      NaN
    
    
      3410
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3751
      34210
      2002-07-04
      56.749953
      -18.675465
      26.752744
      0.000129
      0.003705
      -5.144814
      26.354721
      0.000066
      5.882953
      0.367018
      -0.435313
    
    
      4092
      34211
      2002-07-04
      69.070367
      -14.960467
      27.234933
      0.000098
      0.003538
      19.858683
      28.430017
      0.000053
      7.797533
      NaN
      NaN
    
    
      4433
      34212
      2002-07-04
      66.877317
      1.993683
      42.610483
      0.000102
      0.003553
      34.703000
      28.568833
      0.000055
      6.519433
      NaN
      NaN
    
    
      4774
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5115
      34310
      2002-07-04
      70.029000
      -0.021000
      10.808286
      0.000103
      0.003749
      8.074714
      28.954857
      0.000056
      5.023286
      NaN
      NaN
    
    
      5456
      34311
      2002-07-04
      69.980455
      -11.504476
      13.484762
      0.000114
      0.003594
      -2.896714
      28.593818
      0.000061
      9.730864
      NaN
      NaN
    
    
      5797
      34312
      2002-07-04
      65.167048
      -15.195200
      17.403650
      0.000154
      0.003670
      7.005500
      28.129857
      0.000075
      9.638095
      NaN
      NaN
    
    
      6138
      34314
      2002-07-04
      54.903200
      -5.381333
      16.793444
      0.000087
      0.003738
      -6.718778
      26.905500
      0.000047
      5.116600
      0.305509
      -0.514976
    
    
      6479
      34315
      2002-07-04
      59.998824
      5.554625
      28.943938
      0.000094
      0.003575
      -17.022625
      28.303294
      0.000052
      5.162294
      0.170065
      -0.769384
    
    
      6820
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7161
      34708
      2002-07-04
      61.099483
      4.269317
      24.302933
      0.000110
      0.001807
      22.553517
      27.384050
      0.000058
      10.333167
      NaN
      NaN
    
    
      7502
      34709
      2002-07-04
      52.252040
      46.740240
      81.157560
      0.000162
      0.001818
      0.462760
      26.204160
      0.000079
      6.396520
      0.951922
      -0.021399
    
    
      7843
      34710
      2002-07-04
      49.966633
      -6.248550
      46.052550
      0.000087
      0.001837
      -15.991800
      31.137717
      0.000046
      12.618183
      NaN
      NaN
    
    
      8184
      34714
      2002-07-04
      65.507833
      1.335217
      35.355967
      0.000110
      0.001815
      34.244133
      27.870250
      0.000059
      13.767567
      NaN
      NaN
    
    
      8525
      34716
      2002-07-04
      66.931733
      2.107750
      34.797783
      0.000106
      0.001772
      25.851967
      28.781367
      0.000057
      7.695783
      0.156574
      -0.805280
    
    
      8866
      34718
      2002-07-04
      73.395800
      -39.532167
      44.367833
      0.000102
      0.001691
      16.290550
      28.827417
      0.000055
      14.420783
      NaN
      NaN
    
    
      9207
      34719
      2002-07-04
      71.694917
      -20.354250
      26.786467
      0.000112
      0.001666
      14.472783
      28.910950
      0.000060
      16.656500
      NaN
      NaN
    
    
      9548
      34720
      2002-07-04
      69.771467
      -12.078667
      20.477317
      0.000114
      0.001750
      11.449783
      28.658633
      0.000061
      13.938883
      NaN
      NaN
    
    
      9889
      34721
      2002-07-04
      65.645233
      -11.221400
      13.610417
      0.000109
      0.001740
      5.294917
      27.876133
      0.000058
      16.653933
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      78429
      3098682
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      78770
      60073460
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79111
      60074440
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79452
      60077450
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      79793
      60150420
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80134
      60454500
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80475
      60656200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      80816
      60657200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81157
      60658190
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81498
      60659110
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      81839
      60659120
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82180
      60659190
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82521
      60659200
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      82862
      60940960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83203
      60940970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83544
      60941960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      83885
      60941970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84226
      60942960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84567
      60942970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      84908
      60943960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85249
      60943970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85590
      60944960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      85931
      60944970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86272
      60945970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86613
      60946960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      86954
      60947960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87295
      60947970
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87636
      60948960
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      87977
      60950430
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      88318
      62321420
      2016-06-20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

88319 rows × 13 columns



In [58]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [61]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_15Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_15Dtimeorder=pd.merge(floatsDFAll_15Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_15Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_15Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_15Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_15Dtimeorder['chl_rate_log10'] = floatsDFAllRate_15Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_15Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_15Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAllRate_15Dtimeorder.chl_rate.dropna().shape   # (853,) data points
#floatsDFAllRate_15Dtimeorder.chl_rate_log10.dropna().shape   # (345,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -79.98311130130112
check the sum of the chlor_a after the merge -79.98311130130112






    Out[61]:





(853,)



In [62]:

    
pd.to_datetime(floatsDFAllRate_15Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_15Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_15Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_15Dtimeorder[selector].chl_rate.dropna().shape) # total (500,)
print('all the data count is', floatsDFAllRate_15Dtimeorder.chl_rate.dropna().shape )   # total (853,)









    



shape of the selector (88319,)
all the data count in [11-01, 03-31]  is (500,)
all the data count is (853,)



In [63]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_15Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('15-Day chl_rate')









    Out[63]:





<matplotlib.text.Text at 0x12a00e6d8>



In [64]:

    
# standarized series
ts = floatsDFAllRate_15Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('15-Day standardized chl_rate')









    Out[64]:





<matplotlib.text.Text at 0x124909208>



In [65]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_15Dtimeorder[ (floatsDFAllRate_15Dtimeorder.time > str(i))  & (floatsDFAllRate_15Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1061
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(37,)
(32,)
(10,)
(36,)
(79,)
(76,)
(126,)
(42,)
(57,)
(30,)
(25,)
(20,)
(168,)
(80,)
(35,)



In [66]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_15Dtimeorder[ (floatsDFAllRate_15Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_15Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 692
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(43,)
(1,)
(13,)
(46,)
(34,)
(94,)
(19,)
(43,)
(3,)
(30,)
(0,)
(73,)
(67,)
(34,)



In [ ]:



In [ ]:



In [67]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_15Dtimeorder[ (floatsDFAllRate_15Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_15Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is (500,)
df_chl_out_15D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_15D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (500,)






    Out[67]:






  
    
      
      id
      time
      lon
      vn
      spd
      var_lon
      var_tmp
      ve
      temp
      var_lat
      lat
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      2073
      10206
      2002-11-01
      67.239550
      4.393483
      7.707300
      0.001396
      1000.000000
      -4.056967
      NaN
      0.000461
      11.003467
      0.131253
      -0.881891
      -0.000230
      NaN
    
    
      2077
      15707
      2002-11-01
      67.516567
      -13.316117
      24.136500
      0.000146
      1000.000000
      -15.863733
      NaN
      0.000074
      13.536150
      0.158023
      -0.801279
      0.002436
      -2.613278
    
    
      2095
      34710
      2002-11-01
      63.079750
      16.525517
      17.614183
      0.000127
      0.001804
      -1.195583
      28.666317
      0.000066
      17.411183
      0.392572
      -0.406080
      -0.167135
      NaN
    
    
      2101
      34721
      2002-11-01
      67.881250
      6.486583
      14.737233
      0.000121
      0.001813
      6.646367
      29.393383
      0.000063
      12.702833
      0.152538
      -0.816623
      0.010258
      -1.988946
    
    
      2332
      10206
      2002-11-16
      66.993317
      1.400700
      4.379483
      0.003644
      1000.000000
      -2.148200
      NaN
      0.000991
      11.224017
      0.140250
      -0.853096
      0.008997
      -2.045886



In [68]:

    
df_chl_out_15D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_15D_modisa.to_csv('df_chl_out_15D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_15D_modisa.csv', index_col='index')
test.head()









    Out[68]:






  
    
      
      id
      time
      lon
      vn
      spd
      var_lon
      var_tmp
      ve
      temp
      var_lat
      lat
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2073
      10206
      2002-11-01
      67.239550
      4.393483
      7.707300
      0.001396
      1000.000000
      -4.056967
      NaN
      0.000461
      11.003467
      0.131253
      -0.881891
      -0.000230
      NaN
    
    
      2077
      15707
      2002-11-01
      67.516567
      -13.316117
      24.136500
      0.000146
      1000.000000
      -15.863733
      NaN
      0.000074
      13.536150
      0.158023
      -0.801279
      0.002436
      -2.613278
    
    
      2095
      34710
      2002-11-01
      63.079750
      16.525517
      17.614183
      0.000127
      0.001804
      -1.195583
      28.666317
      0.000066
      17.411183
      0.392572
      -0.406080
      -0.167135
      NaN
    
    
      2101
      34721
      2002-11-01
      67.881250
      6.486583
      14.737233
      0.000121
      0.001813
      6.646367
      29.393383
      0.000063
      12.702833
      0.152538
      -0.816623
      0.010258
      -1.988946
    
    
      2332
      10206
      2002-11-16
      66.993317
      1.400700
      4.379483
      0.003644
      1000.000000
      -2.148200
      NaN
      0.000991
      11.224017
      0.140250
      -0.853096
      0.008997
      -2.045886



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	lon	vn	spd	var_lon	var_tmp	ve	temp	var_lat	lat
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
341	10206	2002-07-04	66.663800	0.357733	7.394967	0.005610	1000.000000	6.140233	NaN	0.001451	16.265717
682	10208	2002-07-04	70.195217	-5.285617	15.006967	0.000118	1000.000000	11.373300	NaN	0.000063	13.549633
1023	11089	2002-07-04	65.248067	-14.097033	18.695917	0.000129	0.003614	9.376883	27.773283	0.000067	15.657150
1364	15703	2002-07-04	70.165200	-4.513033	13.965250	0.000102	0.088623	10.194983	28.590333	0.000055	13.611350
1705	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2046	27069	2002-07-04	70.048350	-5.424417	27.865400	0.000106	0.001731	25.855350	28.916267	0.000057	19.969700
2387	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2728	28842	2002-07-04	60.961600	-9.921900	16.832533	0.000362	0.003382	5.825783	27.226833	0.000149	18.350883
3069	34159	2002-07-04	60.516650	16.559017	36.755683	0.000116	1000.000000	31.603317	NaN	0.000061	13.394633
3410	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3751	34210	2002-07-04	56.749953	-18.675465	26.752744	0.000129	0.003705	-5.144814	26.354721	0.000066	5.882953
4092	34211	2002-07-04	69.070367	-14.960467	27.234933	0.000098	0.003538	19.858683	28.430017	0.000053	7.797533
4433	34212	2002-07-04	66.877317	1.993683	42.610483	0.000102	0.003553	34.703000	28.568833	0.000055	6.519433
4774	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5115	34310	2002-07-04	70.029000	-0.021000	10.808286	0.000103	0.003749	8.074714	28.954857	0.000056	5.023286
5456	34311	2002-07-04	69.980455	-11.504476	13.484762	0.000114	0.003594	-2.896714	28.593818	0.000061	9.730864
5797	34312	2002-07-04	65.167048	-15.195200	17.403650	0.000154	0.003670	7.005500	28.129857	0.000075	9.638095
6138	34314	2002-07-04	54.903200	-5.381333	16.793444	0.000087	0.003738	-6.718778	26.905500	0.000047	5.116600
6479	34315	2002-07-04	59.998824	5.554625	28.943938	0.000094	0.003575	-17.022625	28.303294	0.000052	5.162294
6820	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7161	34708	2002-07-04	61.099483	4.269317	24.302933	0.000110	0.001807	22.553517	27.384050	0.000058	10.333167
7502	34709	2002-07-04	52.252040	46.740240	81.157560	0.000162	0.001818	0.462760	26.204160	0.000079	6.396520
7843	34710	2002-07-04	49.966633	-6.248550	46.052550	0.000087	0.001837	-15.991800	31.137717	0.000046	12.618183
8184	34714	2002-07-04	65.507833	1.335217	35.355967	0.000110	0.001815	34.244133	27.870250	0.000059	13.767567
8525	34716	2002-07-04	66.931733	2.107750	34.797783	0.000106	0.001772	25.851967	28.781367	0.000057	7.695783
8866	34718	2002-07-04	73.395800	-39.532167	44.367833	0.000102	0.001691	16.290550	28.827417	0.000055	14.420783
9207	34719	2002-07-04	71.694917	-20.354250	26.786467	0.000112	0.001666	14.472783	28.910950	0.000060	16.656500
9548	34720	2002-07-04	69.771467	-12.078667	20.477317	0.000114	0.001750	11.449783	28.658633	0.000061	13.938883
9889	34721	2002-07-04	65.645233	-11.221400	13.610417	0.000109	0.001740	5.294917	27.876133	0.000058	16.653933
...	...	...	...	...	...	...	...	...	...	...	...
78429	3098682	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
78770	60073460	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
79111	60074440	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
79452	60077450	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
79793	60150420	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
80134	60454500	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
80475	60656200	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
80816	60657200	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
81157	60658190	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
81498	60659110	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
81839	60659120	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
82180	60659190	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
82521	60659200	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
82862	60940960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
83203	60940970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
83544	60941960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
83885	60941970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
84226	60942960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
84567	60942970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
84908	60943960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
85249	60943970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
85590	60944960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
85931	60944970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
86272	60945970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
86613	60946960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
86954	60947960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
87295	60947970	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
87636	60948960	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
87977	60950430	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
88318	62321420	2016-06-20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	lon	vn	spd	var_lon	var_tmp	ve	temp	var_lat	lat	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
2073	10206	2002-11-01	67.239550	4.393483	7.707300	0.001396	1000.000000	-4.056967	NaN	0.000461	11.003467	0.131253	-0.881891	-0.000230	NaN
2077	15707	2002-11-01	67.516567	-13.316117	24.136500	0.000146	1000.000000	-15.863733	NaN	0.000074	13.536150	0.158023	-0.801279	0.002436	-2.613278
2095	34710	2002-11-01	63.079750	16.525517	17.614183	0.000127	0.001804	-1.195583	28.666317	0.000066	17.411183	0.392572	-0.406080	-0.167135	NaN
2101	34721	2002-11-01	67.881250	6.486583	14.737233	0.000121	0.001813	6.646367	29.393383	0.000063	12.702833	0.152538	-0.816623	0.010258	-1.988946
2332	10206	2002-11-16	66.993317	1.400700	4.379483	0.003644	1000.000000	-2.148200	NaN	0.000991	11.224017	0.140250	-0.853096	0.008997	-2.045886

	id	time	lon	vn	spd	var_lon	var_tmp	ve	temp	var_lat	lat	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
2073	10206	2002-11-01	67.239550	4.393483	7.707300	0.001396	1000.000000	-4.056967	NaN	0.000461	11.003467	0.131253	-0.881891	-0.000230	NaN
2077	15707	2002-11-01	67.516567	-13.316117	24.136500	0.000146	1000.000000	-15.863733	NaN	0.000074	13.536150	0.158023	-0.801279	0.002436	-2.613278
2095	34710	2002-11-01	63.079750	16.525517	17.614183	0.000127	0.001804	-1.195583	28.666317	0.000066	17.411183	0.392572	-0.406080	-0.167135	NaN
2101	34721	2002-11-01	67.881250	6.486583	14.737233	0.000121	0.001813	6.646367	29.393383	0.000063	12.702833	0.152538	-0.816623	0.010258	-1.988946
2332	10206	2002-11-16	66.993317	1.400700	4.379483	0.003644	1000.000000	-2.148200	NaN	0.000991	11.224017	0.140250	-0.853096	0.008997	-2.045886