9Day subsampling on the OceanColor Dataset



In [2]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [3]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [4]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [5]:

    
[ds.load() for ds in both_datasets]









    Out[5]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [6]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [7]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[7]:





[None, None]



In [8]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[8]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [9]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[9]:





<matplotlib.collections.QuadMesh at 0x1195a44a8>



In [10]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[10]:





<matplotlib.text.Text at 0x13d6044e0>



In [11]:

    
#ds_8day



In [12]:

    
#ds_daily



In [13]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[13]:





<matplotlib.collections.QuadMesh at 0x11a099d30>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [14]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [15]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[15]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...



In [ ]:



In [16]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[16]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [17]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [18]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [19]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[19]:





<matplotlib.legend.Legend at 0x11a3bc080>

Seasonal Climatology



In [20]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [21]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[21]:





<matplotlib.legend.Legend at 0x129c12ba8>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [22]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[22]:





<matplotlib.collections.QuadMesh at 0x12977a278>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [23]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[23]:





<matplotlib.collections.QuadMesh at 0x12af85518>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [24]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[24]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lat      (points) float64 16.04 18.04
    lon      (points) float64 65.04 70.04
  * points   (points) int64 0 1



In [25]:

    
#ds_daily.chlor_a.sel_points?



In [26]:

    
ds_9day = ds_daily.resample('9D', dim='time')
ds_9day









    Out[26]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 589)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-13 2002-07-22 ...
Data variables:
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...



In [27]:

    
plt.figure(figsize=(8,6))
ds_9day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[27]:





<matplotlib.collections.QuadMesh at 0x129d992b0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [28]:

    
# check the range for the longitude
print(ds_9day.lon.min(),'\n' ,ds_9day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [29]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [30]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [31]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[31]:





0



In [32]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[32]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [33]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [34]:

    
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[34]:





<matplotlib.axes._subplots.AxesSubplot at 0x24f194198>



In [35]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [36]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[36]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [37]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_9D =floatsDSAll.resample('9D', dim='time')
floatsDSAll_9D









    Out[37]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 568)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-13 2002-07-22 ...
Data variables:
    var_lon  (time, id) float64 nan 0.005002 0.0001159 0.000123 9.882e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.00362 0.08884 nan 0.001708 ...
    spd      (time, id) float64 nan 8.463 17.92 20.22 16.84 nan 25.5 nan ...
    lon      (time, id) float64 nan 66.53 69.91 65.04 69.92 nan 69.45 nan ...
    var_lat  (time, id) float64 nan 0.001326 6.127e-05 6.456e-05 5.404e-05 ...
    ve       (time, id) float64 nan 7.056 12.94 11.14 11.71 nan 24.13 nan ...
    vn       (time, id) float64 nan 0.02706 -8.271 -14.36 -7.037 nan -1.798 ...
    lat      (time, id) float64 nan 16.22 13.6 16.07 13.64 nan 20.08 nan ...
    temp     (time, id) float64 nan nan nan 27.8 28.57 nan 28.98 nan 27.62 ...



In [38]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_9D = floatsDSAll_9D.to_dataframe()
floatsDFAll_9D
floatsDFAll_9D = floatsDFAll_9D.reset_index()
floatsDFAll_9D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_9D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[38]:





<matplotlib.axes._subplots.AxesSubplot at 0x1235ff0b8>



In [39]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_9Dtimeorder = floatsDFAll_9D.sort_values(['time','id'],ascending=True)
floatsDFAll_9Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[39]:






  
    
      
      id
      time
      var_lon
      var_tmp
      spd
      lon
      var_lat
      ve
      vn
      lat
      temp
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      568
      10206
      2002-07-04
      0.005002
      1000.000000
      8.462583
      66.533639
      0.001326
      7.056056
      0.027056
      16.219000
      NaN
    
    
      1136
      10208
      2002-07-04
      0.000116
      1000.000000
      17.918639
      69.914139
      0.000061
      12.942722
      -8.270944
      13.599500
      NaN
    
    
      1704
      11089
      2002-07-04
      0.000123
      0.003620
      20.217250
      65.036972
      0.000065
      11.143806
      -14.363611
      16.068778
      27.796889
    
    
      2272
      15703
      2002-07-04
      0.000099
      0.088844
      16.841889
      69.915889
      0.000054
      11.706389
      -7.037222
      13.637028
      28.572750
    
    
      2840
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3408
      27069
      2002-07-04
      0.000102
      0.001708
      25.498500
      69.445889
      0.000056
      24.130583
      -1.797889
      20.077750
      28.981389
    
    
      3976
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4544
      28842
      2002-07-04
      0.000208
      0.003344
      18.067944
      60.830556
      0.000099
      5.026861
      -8.122111
      18.624861
      27.620472
    
    
      5112
      34159
      2002-07-04
      0.000112
      1000.000000
      37.039111
      59.736889
      0.000059
      31.753083
      16.704556
      12.894139
      NaN
    
    
      5680
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6248
      34210
      2002-07-04
      0.000124
      0.003696
      26.752611
      56.763194
      0.000063
      -3.750167
      -17.727861
      6.022361
      26.452806
    
    
      6816
      34211
      2002-07-04
      0.000102
      0.003512
      28.413083
      68.565389
      0.000055
      23.260944
      -14.241500
      8.210750
      28.380222
    
    
      7384
      34212
      2002-07-04
      0.000100
      0.003549
      48.849444
      65.946111
      0.000055
      42.864500
      7.009000
      6.679556
      28.577889
    
    
      7952
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8520
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9088
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9656
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10792
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11360
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11928
      34708
      2002-07-04
      0.000109
      0.001784
      30.958611
      60.658278
      0.000057
      30.589861
      2.088833
      10.225111
      27.291500
    
    
      12496
      34709
      2002-07-04
      0.000189
      0.002233
      96.889000
      52.974000
      0.000094
      -84.267000
      47.817000
      5.027000
      26.934000
    
    
      13064
      34710
      2002-07-04
      0.000097
      0.001854
      47.400444
      50.333750
      0.000051
      0.763722
      -8.140111
      13.057111
      31.149000
    
    
      13632
      34714
      2002-07-04
      0.000111
      0.001799
      37.025528
      64.694000
      0.000059
      35.870556
      4.682833
      13.741167
      27.765694
    
    
      14200
      34716
      2002-07-04
      0.000106
      0.001785
      37.381000
      66.285250
      0.000057
      34.058694
      2.213806
      7.716750
      28.780944
    
    
      14768
      34718
      2002-07-04
      0.000103
      0.001695
      39.619167
      72.973500
      0.000055
      20.389944
      -33.327528
      15.458417
      29.063444
    
    
      15336
      34719
      2002-07-04
      0.000107
      0.001652
      27.380083
      71.378167
      0.000057
      13.919778
      -20.858056
      17.217028
      28.959417
    
    
      15904
      34720
      2002-07-04
      0.000111
      0.001771
      24.609778
      69.482472
      0.000060
      11.141528
      -19.437778
      14.142056
      28.664167
    
    
      16472
      34721
      2002-07-04
      0.000113
      0.001733
      13.113972
      65.552333
      0.000060
      5.753667
      -10.328972
      16.928722
      27.908083
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      130639
      3098682
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      131207
      60073460
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      131775
      60074440
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      132343
      60077450
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      132911
      60150420
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      133479
      60454500
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      134047
      60656200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      134615
      60657200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      135183
      60658190
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      135751
      60659110
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      136319
      60659120
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      136887
      60659190
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      137455
      60659200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      138023
      60940960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      138591
      60940970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      139159
      60941960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      139727
      60941970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      140295
      60942960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      140863
      60942970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      141431
      60943960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      141999
      60943970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      142567
      60944960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      143135
      60944970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      143703
      60945970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      144271
      60946960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      144839
      60947960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      145407
      60947970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      145975
      60948960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      146543
      60950430
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      147111
      62321420
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

147112 rows × 11 columns



In [40]:

    
floatsDFAll_9Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (3466,)









    Out[40]:





(3466,)



In [42]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[42]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [43]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_9day.chlor_a.sel_points(time=list(floatsDFAll_9Dtimeorder.time),lon=list(floatsDFAll_9Dtimeorder.lon), lat=list(floatsDFAll_9Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 145481



In [44]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (1631,) good values









    Out[44]:





(1631,)



In [45]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_9Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_9Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_9Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_9Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_9Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_9Dtimeorder['chlor_a_log10'] = floatsDFAll_9Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_9Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_9Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAll_9Dtimeorder.chlor_a.dropna().shape  # (1631,)
#floatsDFAll_9Dtimeorder.chlor_a_log10.dropna().shape  # (1631,)









    



after editing the dataframe the nan values in 'chlor_a' is 145481






    Out[45]:





(1631,)



In [46]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_9Dtimeorder









    Out[46]:






  
    
      
      id
      time
      var_lon
      var_tmp
      spd
      lon
      var_lat
      ve
      vn
      lat
      temp
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      568
      10206
      2002-07-04
      0.005002
      1000.000000
      8.462583
      66.533639
      0.001326
      7.056056
      0.027056
      16.219000
      NaN
      NaN
      NaN
    
    
      1136
      10208
      2002-07-04
      0.000116
      1000.000000
      17.918639
      69.914139
      0.000061
      12.942722
      -8.270944
      13.599500
      NaN
      NaN
      NaN
    
    
      1704
      11089
      2002-07-04
      0.000123
      0.003620
      20.217250
      65.036972
      0.000065
      11.143806
      -14.363611
      16.068778
      27.796889
      NaN
      NaN
    
    
      2272
      15703
      2002-07-04
      0.000099
      0.088844
      16.841889
      69.915889
      0.000054
      11.706389
      -7.037222
      13.637028
      28.572750
      NaN
      NaN
    
    
      2840
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3408
      27069
      2002-07-04
      0.000102
      0.001708
      25.498500
      69.445889
      0.000056
      24.130583
      -1.797889
      20.077750
      28.981389
      NaN
      NaN
    
    
      3976
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4544
      28842
      2002-07-04
      0.000208
      0.003344
      18.067944
      60.830556
      0.000099
      5.026861
      -8.122111
      18.624861
      27.620472
      NaN
      NaN
    
    
      5112
      34159
      2002-07-04
      0.000112
      1000.000000
      37.039111
      59.736889
      0.000059
      31.753083
      16.704556
      12.894139
      NaN
      NaN
      NaN
    
    
      5680
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6248
      34210
      2002-07-04
      0.000124
      0.003696
      26.752611
      56.763194
      0.000063
      -3.750167
      -17.727861
      6.022361
      26.452806
      NaN
      NaN
    
    
      6816
      34211
      2002-07-04
      0.000102
      0.003512
      28.413083
      68.565389
      0.000055
      23.260944
      -14.241500
      8.210750
      28.380222
      0.118290
      -0.927052
    
    
      7384
      34212
      2002-07-04
      0.000100
      0.003549
      48.849444
      65.946111
      0.000055
      42.864500
      7.009000
      6.679556
      28.577889
      NaN
      NaN
    
    
      7952
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8520
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9088
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9656
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10792
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11360
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11928
      34708
      2002-07-04
      0.000109
      0.001784
      30.958611
      60.658278
      0.000057
      30.589861
      2.088833
      10.225111
      27.291500
      NaN
      NaN
    
    
      12496
      34709
      2002-07-04
      0.000189
      0.002233
      96.889000
      52.974000
      0.000094
      -84.267000
      47.817000
      5.027000
      26.934000
      0.288937
      -0.539196
    
    
      13064
      34710
      2002-07-04
      0.000097
      0.001854
      47.400444
      50.333750
      0.000051
      0.763722
      -8.140111
      13.057111
      31.149000
      NaN
      NaN
    
    
      13632
      34714
      2002-07-04
      0.000111
      0.001799
      37.025528
      64.694000
      0.000059
      35.870556
      4.682833
      13.741167
      27.765694
      NaN
      NaN
    
    
      14200
      34716
      2002-07-04
      0.000106
      0.001785
      37.381000
      66.285250
      0.000057
      34.058694
      2.213806
      7.716750
      28.780944
      NaN
      NaN
    
    
      14768
      34718
      2002-07-04
      0.000103
      0.001695
      39.619167
      72.973500
      0.000055
      20.389944
      -33.327528
      15.458417
      29.063444
      NaN
      NaN
    
    
      15336
      34719
      2002-07-04
      0.000107
      0.001652
      27.380083
      71.378167
      0.000057
      13.919778
      -20.858056
      17.217028
      28.959417
      NaN
      NaN
    
    
      15904
      34720
      2002-07-04
      0.000111
      0.001771
      24.609778
      69.482472
      0.000060
      11.141528
      -19.437778
      14.142056
      28.664167
      NaN
      NaN
    
    
      16472
      34721
      2002-07-04
      0.000113
      0.001733
      13.113972
      65.552333
      0.000060
      5.753667
      -10.328972
      16.928722
      27.908083
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      130639
      3098682
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      131207
      60073460
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      131775
      60074440
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      132343
      60077450
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      132911
      60150420
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      133479
      60454500
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      134047
      60656200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      134615
      60657200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      135183
      60658190
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      135751
      60659110
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      136319
      60659120
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      136887
      60659190
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      137455
      60659200
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      138023
      60940960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      138591
      60940970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      139159
      60941960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      139727
      60941970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      140295
      60942960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      140863
      60942970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      141431
      60943960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      141999
      60943970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      142567
      60944960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      143135
      60944970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      143703
      60945970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      144271
      60946960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      144839
      60947960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      145407
      60947970
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      145975
      60948960
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      146543
      60950430
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      147111
      62321420
      2016-06-23
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

147112 rows × 13 columns



In [47]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [48]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_9Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_9Dtimeorder=pd.merge(floatsDFAll_9Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_9Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_9Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_9Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_9Dtimeorder['chl_rate_log10'] = floatsDFAllRate_9Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_9Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_9Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAllRate_9Dtimeorder.chl_rate.dropna().shape   # (1008,) data points
floatsDFAllRate_9Dtimeorder.chl_rate_log10.dropna().shape   # (417,)data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -64.3533472159612
check the sum of the chlor_a after the merge -64.3533472159612






    Out[48]:





(417,)



In [49]:

    
pd.to_datetime(floatsDFAllRate_9Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_9Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_9Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_9Dtimeorder[selector].chl_rate.dropna().shape) # total (672,)
print('all the data count is', floatsDFAllRate_9Dtimeorder.chl_rate.dropna().shape )   # total (1008,)









    



shape of the selector (147112,)
all the data count in [11-01, 03-31]  is (672,)
all the data count is (1008,)



In [50]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_9Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('9-Day chl_rate')









    Out[50]:





<matplotlib.text.Text at 0x447f5fe48>



In [51]:

    
# standarized series
ts = floatsDFAllRate_9Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('9-Day standardized chl_rate')









    Out[51]:





<matplotlib.text.Text at 0x11a8bf048>



In [52]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_9Dtimeorder[ (floatsDFAllRate_9Dtimeorder.time > str(i))  & (floatsDFAllRate_9Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1001
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(47,)
(45,)
(7,)
(38,)
(105,)
(92,)
(140,)
(31,)
(62,)
(16,)
(32,)
(28,)
(198,)
(105,)
(55,)



In [53]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_9Dtimeorder[ (floatsDFAllRate_9Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_9Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 672
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(58,)
(1,)
(9,)
(66,)
(44,)
(129,)
(28,)
(50,)
(5,)
(32,)
(0,)
(106,)
(91,)
(53,)



In [ ]:



In [ ]:



In [104]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_9Dtimeorder[ (floatsDFAllRate_9Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_9Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is (692,)
df_chl_out_9D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_9D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (672,)






    Out[104]:






  
    
      
      id
      time
      temp
      var_lat
      var_tmp
      ve
      var_lon
      vn
      lon
      spd
      lat
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      3627
      10206
      2002-11-07
      NaN
      0.000494
      1000.000000
      -2.217083
      0.001535
      2.990778
      67.132000
      5.446583
      11.126222
      0.130267
      -0.885166
      -0.004264
      NaN
    
    
      3629
      11089
      2002-11-07
      28.829472
      0.000064
      0.003812
      -16.412472
      0.000123
      -3.991722
      64.391056
      17.995028
      14.279667
      0.197237
      -0.705012
      0.074821
      -1.125976
    
    
      3631
      15707
      2002-11-07
      NaN
      0.000074
      1000.000000
      -12.316611
      0.000147
      -18.253056
      67.155306
      24.656417
      13.142667
      0.152200
      -0.817584
      -0.004472
      NaN
    
    
      3649
      34710
      2002-11-07
      28.448167
      0.000069
      0.001857
      -2.827667
      0.000135
      19.539861
      63.041861
      20.774778
      17.717111
      0.372568
      -0.428795
      0.018603
      -1.730417
    
    
      3886
      10206
      2002-11-16
      NaN
      0.001033
      1000.000000
      -1.089083
      0.003872
      0.501111
      67.029167
      4.028889
      11.179833
      0.145233
      -0.837935
      0.014966
      -1.824894



In [105]:

    
df_chl_out_9D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_9D_modisa.to_csv('df_chl_out_9D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_9D_modisa.csv', index_col='index')
test.head()









    Out[105]:






  
    
      
      id
      time
      temp
      var_lat
      var_tmp
      ve
      var_lon
      vn
      lon
      spd
      lat
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      3627
      10206
      2002-11-07
      NaN
      0.000494
      1000.000000
      -2.217083
      0.001535
      2.990778
      67.132000
      5.446583
      11.126222
      0.130267
      -0.885166
      -0.004264
      NaN
    
    
      3629
      11089
      2002-11-07
      28.829472
      0.000064
      0.003812
      -16.412472
      0.000123
      -3.991722
      64.391056
      17.995028
      14.279667
      0.197237
      -0.705012
      0.074821
      -1.125976
    
    
      3631
      15707
      2002-11-07
      NaN
      0.000074
      1000.000000
      -12.316611
      0.000147
      -18.253056
      67.155306
      24.656417
      13.142667
      0.152200
      -0.817584
      -0.004472
      NaN
    
    
      3649
      34710
      2002-11-07
      28.448167
      0.000069
      0.001857
      -2.827667
      0.000135
      19.539861
      63.041861
      20.774778
      17.717111
      0.372568
      -0.428795
      0.018603
      -1.730417
    
    
      3886
      10206
      2002-11-16
      NaN
      0.001033
      1000.000000
      -1.089083
      0.003872
      0.501111
      67.029167
      4.028889
      11.179833
      0.145233
      -0.837935
      0.014966
      -1.824894



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	var_lon	var_tmp	spd	lon	var_lat	ve	vn	lat	temp
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
568	10206	2002-07-04	0.005002	1000.000000	8.462583	66.533639	0.001326	7.056056	0.027056	16.219000	NaN
1136	10208	2002-07-04	0.000116	1000.000000	17.918639	69.914139	0.000061	12.942722	-8.270944	13.599500	NaN
1704	11089	2002-07-04	0.000123	0.003620	20.217250	65.036972	0.000065	11.143806	-14.363611	16.068778	27.796889
2272	15703	2002-07-04	0.000099	0.088844	16.841889	69.915889	0.000054	11.706389	-7.037222	13.637028	28.572750
2840	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3408	27069	2002-07-04	0.000102	0.001708	25.498500	69.445889	0.000056	24.130583	-1.797889	20.077750	28.981389
3976	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4544	28842	2002-07-04	0.000208	0.003344	18.067944	60.830556	0.000099	5.026861	-8.122111	18.624861	27.620472
5112	34159	2002-07-04	0.000112	1000.000000	37.039111	59.736889	0.000059	31.753083	16.704556	12.894139	NaN
5680	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6248	34210	2002-07-04	0.000124	0.003696	26.752611	56.763194	0.000063	-3.750167	-17.727861	6.022361	26.452806
6816	34211	2002-07-04	0.000102	0.003512	28.413083	68.565389	0.000055	23.260944	-14.241500	8.210750	28.380222
7384	34212	2002-07-04	0.000100	0.003549	48.849444	65.946111	0.000055	42.864500	7.009000	6.679556	28.577889
7952	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8520	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9088	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9656	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10224	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10792	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11360	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11928	34708	2002-07-04	0.000109	0.001784	30.958611	60.658278	0.000057	30.589861	2.088833	10.225111	27.291500
12496	34709	2002-07-04	0.000189	0.002233	96.889000	52.974000	0.000094	-84.267000	47.817000	5.027000	26.934000
13064	34710	2002-07-04	0.000097	0.001854	47.400444	50.333750	0.000051	0.763722	-8.140111	13.057111	31.149000
13632	34714	2002-07-04	0.000111	0.001799	37.025528	64.694000	0.000059	35.870556	4.682833	13.741167	27.765694
14200	34716	2002-07-04	0.000106	0.001785	37.381000	66.285250	0.000057	34.058694	2.213806	7.716750	28.780944
14768	34718	2002-07-04	0.000103	0.001695	39.619167	72.973500	0.000055	20.389944	-33.327528	15.458417	29.063444
15336	34719	2002-07-04	0.000107	0.001652	27.380083	71.378167	0.000057	13.919778	-20.858056	17.217028	28.959417
15904	34720	2002-07-04	0.000111	0.001771	24.609778	69.482472	0.000060	11.141528	-19.437778	14.142056	28.664167
16472	34721	2002-07-04	0.000113	0.001733	13.113972	65.552333	0.000060	5.753667	-10.328972	16.928722	27.908083
...	...	...	...	...	...	...	...	...	...	...	...
130639	3098682	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
131207	60073460	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
131775	60074440	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
132343	60077450	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
132911	60150420	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
133479	60454500	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
134047	60656200	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
134615	60657200	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
135183	60658190	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
135751	60659110	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
136319	60659120	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
136887	60659190	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
137455	60659200	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
138023	60940960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
138591	60940970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
139159	60941960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
139727	60941970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
140295	60942960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
140863	60942970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
141431	60943960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
141999	60943970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
142567	60944960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
143135	60944970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
143703	60945970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
144271	60946960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
144839	60947960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
145407	60947970	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
145975	60948960	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
146543	60950430	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
147111	62321420	2016-06-23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	temp	var_lat	var_tmp	ve	var_lon	vn	lon	spd	lat	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
3627	10206	2002-11-07	NaN	0.000494	1000.000000	-2.217083	0.001535	2.990778	67.132000	5.446583	11.126222	0.130267	-0.885166	-0.004264	NaN
3629	11089	2002-11-07	28.829472	0.000064	0.003812	-16.412472	0.000123	-3.991722	64.391056	17.995028	14.279667	0.197237	-0.705012	0.074821	-1.125976
3631	15707	2002-11-07	NaN	0.000074	1000.000000	-12.316611	0.000147	-18.253056	67.155306	24.656417	13.142667	0.152200	-0.817584	-0.004472	NaN
3649	34710	2002-11-07	28.448167	0.000069	0.001857	-2.827667	0.000135	19.539861	63.041861	20.774778	17.717111	0.372568	-0.428795	0.018603	-1.730417
3886	10206	2002-11-16	NaN	0.001033	1000.000000	-1.089083	0.003872	0.501111	67.029167	4.028889	11.179833	0.145233	-0.837935	0.014966	-1.824894

	id	time	temp	var_lat	var_tmp	ve	var_lon	vn	lon	spd	lat	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
3627	10206	2002-11-07	NaN	0.000494	1000.000000	-2.217083	0.001535	2.990778	67.132000	5.446583	11.126222	0.130267	-0.885166	-0.004264	NaN
3629	11089	2002-11-07	28.829472	0.000064	0.003812	-16.412472	0.000123	-3.991722	64.391056	17.995028	14.279667	0.197237	-0.705012	0.074821	-1.125976
3631	15707	2002-11-07	NaN	0.000074	1000.000000	-12.316611	0.000147	-18.253056	67.155306	24.656417	13.142667	0.152200	-0.817584	-0.004472	NaN
3649	34710	2002-11-07	28.448167	0.000069	0.001857	-2.827667	0.000135	19.539861	63.041861	20.774778	17.717111	0.372568	-0.428795	0.018603	-1.730417
3886	10206	2002-11-16	NaN	0.001033	1000.000000	-1.089083	0.003872	0.501111	67.029167	4.028889	11.179833	0.145233	-0.837935	0.014966	-1.824894