6Day subsampling on the OceanColor Dataset



In [8]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [9]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [10]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [11]:

    
[ds.load() for ds in both_datasets]









    Out[11]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [12]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [13]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[13]:





[None, None]



In [14]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[14]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [15]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[15]:





<matplotlib.collections.QuadMesh at 0x1193b2f60>



In [16]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[16]:





<matplotlib.text.Text at 0x13d4725c0>



In [17]:

    
#ds_8day



In [18]:

    
#ds_daily



In [19]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[19]:





<matplotlib.collections.QuadMesh at 0x11c1835c0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [20]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [21]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[21]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [22]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[22]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [23]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [24]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [25]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[25]:





<matplotlib.legend.Legend at 0x11dc45080>

Seasonal Climatology



In [26]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [27]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[27]:





<matplotlib.legend.Legend at 0x11de45b70>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [28]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[28]:





<matplotlib.collections.QuadMesh at 0x11de527b8>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [29]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[29]:





<matplotlib.collections.QuadMesh at 0x12b5cf940>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [30]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[30]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lon      (points) float64 65.04 70.04
    lat      (points) float64 16.04 18.04
  * points   (points) int64 0 1



In [31]:

    
#ds_daily.chlor_a.sel_points?



In [32]:

    
ds_6day = ds_daily.resample('6D', dim='time')
ds_6day









    Out[32]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 883)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-10 2002-07-16 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [33]:

    
plt.figure(figsize=(8,6))
ds_6day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[33]:





<matplotlib.collections.QuadMesh at 0x11ad4e630>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [34]:

    
# check the range for the longitude
print(ds_6day.lon.min(),'\n' ,ds_6day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [35]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [36]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [37]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[37]:





0



In [38]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[38]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [39]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [40]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[40]:





<matplotlib.axes._subplots.AxesSubplot at 0x11ad5ac18>



In [41]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [42]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[42]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [43]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_6D =floatsDSAll.resample('6D', dim='time')
floatsDSAll_6D









    Out[43]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 852)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-10 2002-07-16 ...
Data variables:
    lon      (time, id) float64 nan 66.46 69.74 64.89 69.76 nan 69.16 nan ...
    var_lon  (time, id) float64 nan 0.006882 0.0001209 0.000122 0.0001014 ...
    vn       (time, id) float64 nan -1.914 -12.39 -8.704 -11.62 nan -2.414 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003651 0.08467 nan ...
    spd      (time, id) float64 nan 9.289 19.63 16.81 18.83 nan 26.07 nan ...
    var_lat  (time, id) float64 nan 0.001782 6.312e-05 6.401e-05 5.511e-05 ...
    temp     (time, id) float64 nan nan nan 27.84 28.56 nan 28.96 nan 27.67 ...
    lat      (time, id) float64 nan 16.19 13.67 16.25 13.68 nan 20.13 nan ...
    ve       (time, id) float64 nan 7.821 12.69 13.33 11.92 nan 24.46 nan ...



In [44]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_6D = floatsDSAll_6D.to_dataframe()
floatsDFAll_6D
floatsDFAll_6D = floatsDFAll_6D.reset_index()
floatsDFAll_6D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_6D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b8cd9b0>



In [45]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_6Dtimeorder = floatsDFAll_6D.sort_values(['time','id'],ascending=True)
floatsDFAll_6Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[45]:






  
    
      
      id
      time
      lon
      var_lon
      vn
      var_tmp
      spd
      var_lat
      temp
      lat
      ve
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      852
      10206
      2002-07-04
      66.462208
      0.006882
      -1.913917
      1000.000000
      9.288542
      0.001782
      NaN
      16.192375
      7.820625
    
    
      1704
      10208
      2002-07-04
      69.737208
      0.000121
      -12.390125
      1000.000000
      19.627375
      0.000063
      NaN
      13.665042
      12.693750
    
    
      2556
      11089
      2002-07-04
      64.888250
      0.000122
      -8.703625
      0.003651
      16.807458
      0.000064
      27.842458
      16.248958
      13.326750
    
    
      3408
      15703
      2002-07-04
      69.756625
      0.000101
      -11.618875
      0.084672
      18.828333
      0.000055
      28.563958
      13.684875
      11.921125
    
    
      4260
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5112
      27069
      2002-07-04
      69.159542
      0.000099
      -2.413708
      0.001707
      26.066417
      0.000054
      28.963625
      20.131083
      24.459167
    
    
      5964
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6816
      28842
      2002-07-04
      60.810958
      0.000188
      -6.515792
      0.003334
      18.883542
      0.000092
      27.665333
      18.734417
      5.594167
    
    
      7668
      34159
      2002-07-04
      59.335292
      0.000109
      9.713042
      1000.000000
      31.484542
      0.000058
      NaN
      12.677917
      29.419333
    
    
      8520
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9372
      34210
      2002-07-04
      56.760500
      0.000141
      -15.606167
      0.003675
      21.520333
      0.000070
      26.712458
      6.184083
      -11.194333
    
    
      10224
      34211
      2002-07-04
      68.285625
      0.000105
      -13.066833
      0.003488
      28.371292
      0.000057
      28.361250
      8.374333
      23.969000
    
    
      11076
      34212
      2002-07-04
      65.375750
      0.000093
      17.648625
      0.003588
      46.028333
      0.000051
      28.545250
      6.542208
      39.663333
    
    
      11928
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12780
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13632
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14484
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15336
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16188
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17040
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17892
      34708
      2002-07-04
      60.315542
      0.000096
      1.757542
      0.001768
      38.500708
      0.000052
      27.184000
      10.209708
      38.111667
    
    
      18744
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19596
      34710
      2002-07-04
      50.145667
      0.000103
      8.075625
      0.001875
      46.875167
      0.000053
      31.104542
      13.245708
      12.005167
    
    
      20448
      34714
      2002-07-04
      64.254667
      0.000106
      6.745875
      0.001818
      39.295750
      0.000057
      27.731167
      13.726333
      38.046958
    
    
      21300
      34716
      2002-07-04
      65.924917
      0.000099
      8.357917
      0.001769
      37.732375
      0.000054
      28.801500
      7.618917
      35.828000
    
    
      22152
      34718
      2002-07-04
      72.723917
      0.000110
      -28.297708
      0.001692
      35.052875
      0.000058
      29.128917
      15.847625
      19.968458
    
    
      23004
      34719
      2002-07-04
      71.230250
      0.000112
      -17.508583
      0.001647
      26.124958
      0.000059
      28.950125
      17.522292
      16.102833
    
    
      23856
      34720
      2002-07-04
      69.340333
      0.000116
      -26.427208
      0.001813
      29.710708
      0.000062
      28.669875
      14.327542
      10.629958
    
    
      24708
      34721
      2002-07-04
      65.490667
      0.000111
      -9.380792
      0.001788
      12.911625
      0.000059
      27.910875
      17.049667
      7.087000
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      195959
      3098682
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      196811
      60073460
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      197663
      60074440
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      198515
      60077450
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      199367
      60150420
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      200219
      60454500
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      201071
      60656200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      201923
      60657200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      202775
      60658190
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      203627
      60659110
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      204479
      60659120
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      205331
      60659190
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      206183
      60659200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      207035
      60940960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      207887
      60940970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      208739
      60941960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      209591
      60941970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      210443
      60942960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      211295
      60942970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      212147
      60943960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      212999
      60943970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      213851
      60944960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      214703
      60944970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      215555
      60945970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      216407
      60946960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      217259
      60947960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      218111
      60947970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      218963
      60948960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      219815
      60950430
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      220667
      62321420
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

220668 rows × 11 columns



In [47]:

    
floatsDFAll_6Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (5013,)









    Out[47]:





(5013,)



In [48]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[48]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [49]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_6day.chlor_a.sel_points(time=list(floatsDFAll_6Dtimeorder.time),lon=list(floatsDFAll_6Dtimeorder.lon), lat=list(floatsDFAll_6Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 218748



In [50]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (1920,) good values









    Out[50]:





(1920,)



In [52]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_6Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_6Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_6Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_6Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_6Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_6Dtimeorder['chlor_a_log10'] = floatsDFAll_6Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_6Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_6Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAll_6Dtimeorder.chlor_a.dropna().shape  # (1920,)
#floatsDFAll_6Dtimeorder.chlor_a_log10.dropna().shape  # (1920,)









    



after editing the dataframe the nan values in 'chlor_a' is 218748






    Out[52]:





(1920,)



In [53]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_6Dtimeorder









    Out[53]:






  
    
      
      id
      time
      lon
      var_lon
      vn
      var_tmp
      spd
      var_lat
      temp
      lat
      ve
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      852
      10206
      2002-07-04
      66.462208
      0.006882
      -1.913917
      1000.000000
      9.288542
      0.001782
      NaN
      16.192375
      7.820625
      NaN
      NaN
    
    
      1704
      10208
      2002-07-04
      69.737208
      0.000121
      -12.390125
      1000.000000
      19.627375
      0.000063
      NaN
      13.665042
      12.693750
      NaN
      NaN
    
    
      2556
      11089
      2002-07-04
      64.888250
      0.000122
      -8.703625
      0.003651
      16.807458
      0.000064
      27.842458
      16.248958
      13.326750
      NaN
      NaN
    
    
      3408
      15703
      2002-07-04
      69.756625
      0.000101
      -11.618875
      0.084672
      18.828333
      0.000055
      28.563958
      13.684875
      11.921125
      NaN
      NaN
    
    
      4260
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5112
      27069
      2002-07-04
      69.159542
      0.000099
      -2.413708
      0.001707
      26.066417
      0.000054
      28.963625
      20.131083
      24.459167
      NaN
      NaN
    
    
      5964
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6816
      28842
      2002-07-04
      60.810958
      0.000188
      -6.515792
      0.003334
      18.883542
      0.000092
      27.665333
      18.734417
      5.594167
      NaN
      NaN
    
    
      7668
      34159
      2002-07-04
      59.335292
      0.000109
      9.713042
      1000.000000
      31.484542
      0.000058
      NaN
      12.677917
      29.419333
      NaN
      NaN
    
    
      8520
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9372
      34210
      2002-07-04
      56.760500
      0.000141
      -15.606167
      0.003675
      21.520333
      0.000070
      26.712458
      6.184083
      -11.194333
      NaN
      NaN
    
    
      10224
      34211
      2002-07-04
      68.285625
      0.000105
      -13.066833
      0.003488
      28.371292
      0.000057
      28.361250
      8.374333
      23.969000
      NaN
      NaN
    
    
      11076
      34212
      2002-07-04
      65.375750
      0.000093
      17.648625
      0.003588
      46.028333
      0.000051
      28.545250
      6.542208
      39.663333
      NaN
      NaN
    
    
      11928
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12780
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13632
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14484
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15336
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16188
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17040
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17892
      34708
      2002-07-04
      60.315542
      0.000096
      1.757542
      0.001768
      38.500708
      0.000052
      27.184000
      10.209708
      38.111667
      NaN
      NaN
    
    
      18744
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19596
      34710
      2002-07-04
      50.145667
      0.000103
      8.075625
      0.001875
      46.875167
      0.000053
      31.104542
      13.245708
      12.005167
      NaN
      NaN
    
    
      20448
      34714
      2002-07-04
      64.254667
      0.000106
      6.745875
      0.001818
      39.295750
      0.000057
      27.731167
      13.726333
      38.046958
      NaN
      NaN
    
    
      21300
      34716
      2002-07-04
      65.924917
      0.000099
      8.357917
      0.001769
      37.732375
      0.000054
      28.801500
      7.618917
      35.828000
      0.108553
      -0.964358
    
    
      22152
      34718
      2002-07-04
      72.723917
      0.000110
      -28.297708
      0.001692
      35.052875
      0.000058
      29.128917
      15.847625
      19.968458
      NaN
      NaN
    
    
      23004
      34719
      2002-07-04
      71.230250
      0.000112
      -17.508583
      0.001647
      26.124958
      0.000059
      28.950125
      17.522292
      16.102833
      NaN
      NaN
    
    
      23856
      34720
      2002-07-04
      69.340333
      0.000116
      -26.427208
      0.001813
      29.710708
      0.000062
      28.669875
      14.327542
      10.629958
      NaN
      NaN
    
    
      24708
      34721
      2002-07-04
      65.490667
      0.000111
      -9.380792
      0.001788
      12.911625
      0.000059
      27.910875
      17.049667
      7.087000
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      195959
      3098682
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      196811
      60073460
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      197663
      60074440
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      198515
      60077450
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      199367
      60150420
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      200219
      60454500
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      201071
      60656200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      201923
      60657200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      202775
      60658190
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      203627
      60659110
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      204479
      60659120
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      205331
      60659190
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      206183
      60659200
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      207035
      60940960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      207887
      60940970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      208739
      60941960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      209591
      60941970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      210443
      60942960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      211295
      60942970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      212147
      60943960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      212999
      60943970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      213851
      60944960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      214703
      60944970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      215555
      60945970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      216407
      60946960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      217259
      60947960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      218111
      60947970
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      218963
      60948960
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      219815
      60950430
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      220667
      62321420
      2016-06-26
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

220668 rows × 13 columns



In [54]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [56]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_6Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_6Dtimeorder=pd.merge(floatsDFAll_6Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_6Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_6Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_6Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_6Dtimeorder['chl_rate_log10'] = floatsDFAllRate_6Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_6Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_6Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAllRate_6Dtimeorder.chl_rate.dropna().shape   # (1093,) data points
floatsDFAllRate_6Dtimeorder.chl_rate_log10.dropna().shape   # (458,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -57.68458902500565
check the sum of the chlor_a after the merge -57.68458902500565






    Out[56]:





(458,)



In [57]:

    
pd.to_datetime(floatsDFAllRate_6Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_6Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_6Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_6Dtimeorder[selector].chl_rate.dropna().shape) # total (745,)
print('all the data count is', floatsDFAllRate_6Dtimeorder.chl_rate.dropna().shape )   # total (1093,)









    



shape of the selector (220668,)
all the data count in [11-01, 03-31]  is (745,)
all the data count is (1093,)



In [58]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_6Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('6-Day chl_rate')









    Out[58]:





<matplotlib.text.Text at 0x11acee1d0>



In [59]:

    
# standarized series
ts = floatsDFAllRate_6Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('6-Day standardized chl_rate')









    Out[59]:





<matplotlib.text.Text at 0x4b60683c8>



In [60]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_6Dtimeorder[ (floatsDFAllRate_6Dtimeorder.time > str(i))  & (floatsDFAllRate_6Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1088
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(56,)
(51,)
(7,)
(38,)
(112,)
(96,)
(154,)
(37,)
(71,)
(22,)
(43,)
(34,)
(188,)
(128,)
(51,)



In [61]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_6Dtimeorder[ (floatsDFAllRate_6Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_6Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 745
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(71,)
(1,)
(7,)
(75,)
(51,)
(119,)
(27,)
(55,)
(7,)
(45,)
(0,)
(125,)
(110,)
(52,)



In [62]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_6Dtimeorder[ (floatsDFAllRate_6Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_6Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is  (745,)
df_chl_out_6D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_6D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (745,)






    Out[62]:






  
    
      
      id
      time
      lon
      var_lon
      vn
      var_tmp
      spd
      var_lat
      temp
      lat
      ve
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      5181
      10206
      2002-11-01
      67.400875
      0.001188
      6.497542
      1000.000000
      11.098375
      0.000411
      NaN
      10.819333
      -6.816792
      0.132351
      -0.878273
      -0.011445
      NaN
    
    
      5183
      11089
      2002-11-01
      65.187083
      0.000106
      5.029292
      0.003775
      12.775208
      0.000057
      28.979875
      14.236667
      -9.695500
      0.124708
      -0.904106
      -0.006008
      NaN
    
    
      5203
      34710
      2002-11-01
      63.136583
      0.000115
      12.004000
      0.001725
      12.873292
      0.000061
      28.993542
      16.952292
      1.252542
      0.404965
      -0.392582
      0.069651
      -1.157071
    
    
      5440
      10206
      2002-11-07
      67.149208
      0.001453
      3.659208
      1000.000000
      6.336958
      0.000476
      NaN
      11.107000
      -2.266292
      0.130267
      -0.885166
      -0.002084
      NaN
    
    
      5442
      11089
      2002-11-07
      64.589250
      0.000133
      -1.580333
      0.003873
      16.956875
      0.000068
      28.978875
      14.336875
      -15.959458
      0.188381
      -0.724962
      0.063673
      -1.196042



In [63]:

    
df_chl_out_6D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_6D_modisa.to_csv('df_chl_out_6D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_6D_modisa.csv', index_col='index')
test.head()









    Out[63]:






  
    
      
      id
      time
      lon
      var_lon
      vn
      var_tmp
      spd
      var_lat
      temp
      lat
      ve
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      5181
      10206
      2002-11-01
      67.400875
      0.001188
      6.497542
      1000.000000
      11.098375
      0.000411
      NaN
      10.819333
      -6.816792
      0.132351
      -0.878273
      -0.011445
      NaN
    
    
      5183
      11089
      2002-11-01
      65.187083
      0.000106
      5.029292
      0.003775
      12.775208
      0.000057
      28.979875
      14.236667
      -9.695500
      0.124708
      -0.904106
      -0.006008
      NaN
    
    
      5203
      34710
      2002-11-01
      63.136583
      0.000115
      12.004000
      0.001725
      12.873292
      0.000061
      28.993542
      16.952292
      1.252542
      0.404965
      -0.392582
      0.069651
      -1.157071
    
    
      5440
      10206
      2002-11-07
      67.149208
      0.001453
      3.659208
      1000.000000
      6.336958
      0.000476
      NaN
      11.107000
      -2.266292
      0.130267
      -0.885166
      -0.002084
      NaN
    
    
      5442
      11089
      2002-11-07
      64.589250
      0.000133
      -1.580333
      0.003873
      16.956875
      0.000068
      28.978875
      14.336875
      -15.959458
      0.188381
      -0.724962
      0.063673
      -1.196042



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	lon	var_lon	vn	var_tmp	spd	var_lat	temp	lat	ve
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
852	10206	2002-07-04	66.462208	0.006882	-1.913917	1000.000000	9.288542	0.001782	NaN	16.192375	7.820625
1704	10208	2002-07-04	69.737208	0.000121	-12.390125	1000.000000	19.627375	0.000063	NaN	13.665042	12.693750
2556	11089	2002-07-04	64.888250	0.000122	-8.703625	0.003651	16.807458	0.000064	27.842458	16.248958	13.326750
3408	15703	2002-07-04	69.756625	0.000101	-11.618875	0.084672	18.828333	0.000055	28.563958	13.684875	11.921125
4260	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5112	27069	2002-07-04	69.159542	0.000099	-2.413708	0.001707	26.066417	0.000054	28.963625	20.131083	24.459167
5964	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6816	28842	2002-07-04	60.810958	0.000188	-6.515792	0.003334	18.883542	0.000092	27.665333	18.734417	5.594167
7668	34159	2002-07-04	59.335292	0.000109	9.713042	1000.000000	31.484542	0.000058	NaN	12.677917	29.419333
8520	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9372	34210	2002-07-04	56.760500	0.000141	-15.606167	0.003675	21.520333	0.000070	26.712458	6.184083	-11.194333
10224	34211	2002-07-04	68.285625	0.000105	-13.066833	0.003488	28.371292	0.000057	28.361250	8.374333	23.969000
11076	34212	2002-07-04	65.375750	0.000093	17.648625	0.003588	46.028333	0.000051	28.545250	6.542208	39.663333
11928	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
12780	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
13632	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
14484	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
15336	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
16188	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
17040	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
17892	34708	2002-07-04	60.315542	0.000096	1.757542	0.001768	38.500708	0.000052	27.184000	10.209708	38.111667
18744	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
19596	34710	2002-07-04	50.145667	0.000103	8.075625	0.001875	46.875167	0.000053	31.104542	13.245708	12.005167
20448	34714	2002-07-04	64.254667	0.000106	6.745875	0.001818	39.295750	0.000057	27.731167	13.726333	38.046958
21300	34716	2002-07-04	65.924917	0.000099	8.357917	0.001769	37.732375	0.000054	28.801500	7.618917	35.828000
22152	34718	2002-07-04	72.723917	0.000110	-28.297708	0.001692	35.052875	0.000058	29.128917	15.847625	19.968458
23004	34719	2002-07-04	71.230250	0.000112	-17.508583	0.001647	26.124958	0.000059	28.950125	17.522292	16.102833
23856	34720	2002-07-04	69.340333	0.000116	-26.427208	0.001813	29.710708	0.000062	28.669875	14.327542	10.629958
24708	34721	2002-07-04	65.490667	0.000111	-9.380792	0.001788	12.911625	0.000059	27.910875	17.049667	7.087000
...	...	...	...	...	...	...	...	...	...	...	...
195959	3098682	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
196811	60073460	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
197663	60074440	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
198515	60077450	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
199367	60150420	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
200219	60454500	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
201071	60656200	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
201923	60657200	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
202775	60658190	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
203627	60659110	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
204479	60659120	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
205331	60659190	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
206183	60659200	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
207035	60940960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
207887	60940970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
208739	60941960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
209591	60941970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
210443	60942960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
211295	60942970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
212147	60943960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
212999	60943970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
213851	60944960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
214703	60944970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
215555	60945970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
216407	60946960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
217259	60947960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
218111	60947970	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
218963	60948960	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
219815	60950430	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
220667	62321420	2016-06-26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	lon	var_lon	vn	var_tmp	spd	var_lat	temp	lat	ve	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
5181	10206	2002-11-01	67.400875	0.001188	6.497542	1000.000000	11.098375	0.000411	NaN	10.819333	-6.816792	0.132351	-0.878273	-0.011445	NaN
5183	11089	2002-11-01	65.187083	0.000106	5.029292	0.003775	12.775208	0.000057	28.979875	14.236667	-9.695500	0.124708	-0.904106	-0.006008	NaN
5203	34710	2002-11-01	63.136583	0.000115	12.004000	0.001725	12.873292	0.000061	28.993542	16.952292	1.252542	0.404965	-0.392582	0.069651	-1.157071
5440	10206	2002-11-07	67.149208	0.001453	3.659208	1000.000000	6.336958	0.000476	NaN	11.107000	-2.266292	0.130267	-0.885166	-0.002084	NaN
5442	11089	2002-11-07	64.589250	0.000133	-1.580333	0.003873	16.956875	0.000068	28.978875	14.336875	-15.959458	0.188381	-0.724962	0.063673	-1.196042

	id	time	lon	var_lon	vn	var_tmp	spd	var_lat	temp	lat	ve	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
5181	10206	2002-11-01	67.400875	0.001188	6.497542	1000.000000	11.098375	0.000411	NaN	10.819333	-6.816792	0.132351	-0.878273	-0.011445	NaN
5183	11089	2002-11-01	65.187083	0.000106	5.029292	0.003775	12.775208	0.000057	28.979875	14.236667	-9.695500	0.124708	-0.904106	-0.006008	NaN
5203	34710	2002-11-01	63.136583	0.000115	12.004000	0.001725	12.873292	0.000061	28.993542	16.952292	1.252542	0.404965	-0.392582	0.069651	-1.157071
5440	10206	2002-11-07	67.149208	0.001453	3.659208	1000.000000	6.336958	0.000476	NaN	11.107000	-2.266292	0.130267	-0.885166	-0.002084	NaN
5442	11089	2002-11-07	64.589250	0.000133	-1.580333	0.003873	16.956875	0.000068	28.978875	14.336875	-15.959458	0.188381	-0.724962	0.063673	-1.196042