4Day subsampling on the OceanColor Dataset



In [8]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [9]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [10]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [11]:

    
[ds.load() for ds in both_datasets]









    Out[11]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [12]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [13]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[13]:





[None, None]



In [14]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[14]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [15]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[15]:





<matplotlib.collections.QuadMesh at 0x11959eb38>



In [16]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[16]:





<matplotlib.text.Text at 0x13d335ac8>



In [17]:

    
#ds_8day



In [18]:

    
#ds_daily



In [19]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[19]:





<matplotlib.collections.QuadMesh at 0x11b2e4438>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [20]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [21]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[21]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [22]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[22]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [23]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [24]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [25]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[25]:





<matplotlib.legend.Legend at 0x12b75e8d0>

Seasonal Climatology



In [26]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [27]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[27]:





<matplotlib.legend.Legend at 0x11dcc00f0>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [28]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[28]:





<matplotlib.collections.QuadMesh at 0x11dcdc080>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [29]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[29]:





<matplotlib.collections.QuadMesh at 0x11e60b978>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [30]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[30]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lat      (points) float64 16.04 18.04
    lon      (points) float64 65.04 70.04
  * points   (points) int64 0 1



In [31]:

    
#ds_daily.chlor_a.sel_points?



In [32]:

    
ds_4day = ds_daily.resample('4D', dim='time')
ds_4day









    Out[32]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 1324)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-08 2002-07-12 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [33]:

    
plt.figure(figsize=(8,6))
ds_4day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[33]:





<matplotlib.collections.QuadMesh at 0x11e3ce198>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [66]:

    
# check the range for the longitude
print(ds_4day.lon.min(),'\n' ,ds_4day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [67]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [68]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [69]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[69]:





0



In [70]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[70]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [71]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [72]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[72]:





<matplotlib.axes._subplots.AxesSubplot at 0x2442ac128>



In [73]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [74]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[74]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [75]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_4D =floatsDSAll.resample('4D', dim='time')
floatsDSAll_4D









    Out[75]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 1278)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-08 2002-07-12 ...
Data variables:
    ve       (time, id) float64 nan 9.425 10.36 12.13 9.833 nan 26.62 nan ...
    vn       (time, id) float64 nan -4.747 -17.28 -5.897 -16.89 nan 0.7761 ...
    lat      (time, id) float64 nan 16.2 13.75 16.31 13.77 nan 20.17 nan ...
    spd      (time, id) float64 nan 10.92 20.58 14.22 19.98 nan 28.07 nan ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003583 0.0803 nan 0.001669 ...
    temp     (time, id) float64 nan nan nan 27.88 28.56 nan 28.95 nan 27.67 ...
    var_lon  (time, id) float64 nan 0.008014 9.423e-05 0.0001387 9.623e-05 ...
    var_lat  (time, id) float64 nan 0.002037 5.228e-05 7.135e-05 5.294e-05 ...
    lon      (time, id) float64 nan 66.41 69.63 64.78 69.66 nan 68.95 nan ...



In [76]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_4D = floatsDSAll_4D.to_dataframe()
floatsDFAll_4D
floatsDFAll_4D = floatsDFAll_4D.reset_index()
floatsDFAll_4D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_4D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[76]:





<matplotlib.axes._subplots.AxesSubplot at 0x12897f0f0>



In [77]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_4Dtimeorder = floatsDFAll_4D.sort_values(['time','id'],ascending=True)
floatsDFAll_4Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[77]:






  
    
      
      id
      time
      ve
      vn
      lat
      spd
      var_tmp
      temp
      var_lon
      var_lat
      lon
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1278
      10206
      2002-07-04
      9.425000
      -4.747250
      16.196125
      10.918125
      1000.000000
      NaN
      0.008014
      0.002037
      66.409813
    
    
      2556
      10208
      2002-07-04
      10.355438
      -17.277062
      13.752000
      20.576187
      1000.000000
      NaN
      0.000094
      0.000052
      69.632875
    
    
      3834
      11089
      2002-07-04
      12.128187
      -5.896938
      16.305750
      14.222375
      0.003583
      27.884125
      0.000139
      0.000071
      64.777500
    
    
      5112
      15703
      2002-07-04
      9.833375
      -16.894688
      13.766187
      19.978313
      0.080302
      28.558125
      0.000096
      0.000053
      69.657312
    
    
      6390
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7668
      27069
      2002-07-04
      26.620125
      0.776125
      20.173938
      28.072937
      0.001669
      28.946250
      0.000104
      0.000056
      68.953562
    
    
      8946
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      28842
      2002-07-04
      8.325375
      -5.037438
      18.808937
      22.278562
      0.003263
      27.669500
      0.000197
      0.000095
      60.774188
    
    
      11502
      34159
      2002-07-04
      26.471125
      6.662250
      12.600438
      27.822062
      1000.000000
      NaN
      0.000101
      0.000054
      59.108062
    
    
      12780
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14058
      34210
      2002-07-04
      -10.486563
      -18.214750
      6.333500
      21.797125
      0.003636
      26.731250
      0.000129
      0.000065
      56.863687
    
    
      15336
      34211
      2002-07-04
      20.471125
      -15.337813
      8.471000
      25.889813
      0.003500
      28.340375
      0.000102
      0.000056
      68.096688
    
    
      16614
      34212
      2002-07-04
      32.634313
      13.436250
      6.398438
      38.895812
      0.003556
      28.492500
      0.000095
      0.000053
      64.999813
    
    
      17892
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19170
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20448
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      21726
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      23004
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      24282
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      25560
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      26838
      34708
      2002-07-04
      40.593875
      3.761375
      10.188625
      40.938563
      0.001807
      27.175250
      0.000095
      0.000052
      60.022438
    
    
      28116
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      29394
      34710
      2002-07-04
      -0.457312
      23.502313
      13.176250
      47.944187
      0.001737
      30.992562
      0.000071
      0.000040
      49.914500
    
    
      30672
      34714
      2002-07-04
      37.098250
      9.280125
      13.676500
      38.626187
      0.001825
      27.723000
      0.000115
      0.000060
      63.951062
    
    
      31950
      34716
      2002-07-04
      36.210688
      5.186000
      7.539062
      37.364812
      0.001768
      28.814563
      0.000105
      0.000057
      65.642375
    
    
      33228
      34718
      2002-07-04
      19.961438
      -28.952500
      16.075125
      35.701000
      0.001725
      29.147312
      0.000121
      0.000062
      72.572750
    
    
      34506
      34719
      2002-07-04
      18.884875
      -12.003562
      17.667063
      23.623312
      0.001569
      28.927000
      0.000112
      0.000059
      71.098500
    
    
      35784
      34720
      2002-07-04
      9.189375
      -33.784563
      14.530250
      35.320875
      0.001818
      28.661375
      0.000117
      0.000062
      69.258437
    
    
      37062
      34721
      2002-07-04
      8.399250
      -9.343312
      17.121938
      13.575938
      0.001748
      27.916125
      0.000122
      0.000064
      65.436687
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      293939
      3098682
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      295217
      60073460
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      296495
      60074440
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      297773
      60077450
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      299051
      60150420
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      300329
      60454500
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      301607
      60656200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      302885
      60657200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      304163
      60658190
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      305441
      60659110
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      306719
      60659120
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      307997
      60659190
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      309275
      60659200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      310553
      60940960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      311831
      60940970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      313109
      60941960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      314387
      60941970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      315665
      60942960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      316943
      60942970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      318221
      60943960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      319499
      60943970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      320777
      60944960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      322055
      60944970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      323333
      60945970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      324611
      60946960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      325889
      60947960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      327167
      60947970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      328445
      60948960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      329723
      60950430
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      331001
      62321420
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

331002 rows × 11 columns



In [78]:

    
floatsDFAll_4Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (7349,)









    Out[78]:





(7349,)



In [79]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[79]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [80]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_4day.chlor_a.sel_points(time=list(floatsDFAll_4Dtimeorder.time),lon=list(floatsDFAll_4Dtimeorder.lon), lat=list(floatsDFAll_4Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 328760



In [81]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (2242,) good values









    Out[81]:





(2242,)



In [83]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_4Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_4Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_4Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_4Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_4Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_4Dtimeorder['chlor_a_log10'] = floatsDFAll_4Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_4Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_4Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAll_4Dtimeorder.chlor_a.dropna().shape  # (2242,)
#floatsDFAll_4Dtimeorder.chlor_a_log10.dropna().shape  # (2242,)









    



after editing the dataframe the nan values in 'chlor_a' is 328760






    Out[83]:





(2242,)



In [84]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_4Dtimeorder









    Out[84]:






  
    
      
      id
      time
      ve
      vn
      lat
      spd
      var_tmp
      temp
      var_lon
      var_lat
      lon
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1278
      10206
      2002-07-04
      9.425000
      -4.747250
      16.196125
      10.918125
      1000.000000
      NaN
      0.008014
      0.002037
      66.409813
      NaN
      NaN
    
    
      2556
      10208
      2002-07-04
      10.355438
      -17.277062
      13.752000
      20.576187
      1000.000000
      NaN
      0.000094
      0.000052
      69.632875
      NaN
      NaN
    
    
      3834
      11089
      2002-07-04
      12.128187
      -5.896938
      16.305750
      14.222375
      0.003583
      27.884125
      0.000139
      0.000071
      64.777500
      NaN
      NaN
    
    
      5112
      15703
      2002-07-04
      9.833375
      -16.894688
      13.766187
      19.978313
      0.080302
      28.558125
      0.000096
      0.000053
      69.657312
      NaN
      NaN
    
    
      6390
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7668
      27069
      2002-07-04
      26.620125
      0.776125
      20.173938
      28.072937
      0.001669
      28.946250
      0.000104
      0.000056
      68.953562
      NaN
      NaN
    
    
      8946
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      28842
      2002-07-04
      8.325375
      -5.037438
      18.808937
      22.278562
      0.003263
      27.669500
      0.000197
      0.000095
      60.774188
      NaN
      NaN
    
    
      11502
      34159
      2002-07-04
      26.471125
      6.662250
      12.600438
      27.822062
      1000.000000
      NaN
      0.000101
      0.000054
      59.108062
      NaN
      NaN
    
    
      12780
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14058
      34210
      2002-07-04
      -10.486563
      -18.214750
      6.333500
      21.797125
      0.003636
      26.731250
      0.000129
      0.000065
      56.863687
      NaN
      NaN
    
    
      15336
      34211
      2002-07-04
      20.471125
      -15.337813
      8.471000
      25.889813
      0.003500
      28.340375
      0.000102
      0.000056
      68.096688
      NaN
      NaN
    
    
      16614
      34212
      2002-07-04
      32.634313
      13.436250
      6.398438
      38.895812
      0.003556
      28.492500
      0.000095
      0.000053
      64.999813
      NaN
      NaN
    
    
      17892
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19170
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20448
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      21726
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      23004
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      24282
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      25560
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      26838
      34708
      2002-07-04
      40.593875
      3.761375
      10.188625
      40.938563
      0.001807
      27.175250
      0.000095
      0.000052
      60.022438
      NaN
      NaN
    
    
      28116
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      29394
      34710
      2002-07-04
      -0.457312
      23.502313
      13.176250
      47.944187
      0.001737
      30.992562
      0.000071
      0.000040
      49.914500
      NaN
      NaN
    
    
      30672
      34714
      2002-07-04
      37.098250
      9.280125
      13.676500
      38.626187
      0.001825
      27.723000
      0.000115
      0.000060
      63.951062
      NaN
      NaN
    
    
      31950
      34716
      2002-07-04
      36.210688
      5.186000
      7.539062
      37.364812
      0.001768
      28.814563
      0.000105
      0.000057
      65.642375
      NaN
      NaN
    
    
      33228
      34718
      2002-07-04
      19.961438
      -28.952500
      16.075125
      35.701000
      0.001725
      29.147312
      0.000121
      0.000062
      72.572750
      NaN
      NaN
    
    
      34506
      34719
      2002-07-04
      18.884875
      -12.003562
      17.667063
      23.623312
      0.001569
      28.927000
      0.000112
      0.000059
      71.098500
      NaN
      NaN
    
    
      35784
      34720
      2002-07-04
      9.189375
      -33.784563
      14.530250
      35.320875
      0.001818
      28.661375
      0.000117
      0.000062
      69.258437
      NaN
      NaN
    
    
      37062
      34721
      2002-07-04
      8.399250
      -9.343312
      17.121938
      13.575938
      0.001748
      27.916125
      0.000122
      0.000064
      65.436687
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      293939
      3098682
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      295217
      60073460
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      296495
      60074440
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      297773
      60077450
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      299051
      60150420
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      300329
      60454500
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      301607
      60656200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      302885
      60657200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      304163
      60658190
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      305441
      60659110
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      306719
      60659120
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      307997
      60659190
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      309275
      60659200
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      310553
      60940960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      311831
      60940970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      313109
      60941960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      314387
      60941970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      315665
      60942960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      316943
      60942970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      318221
      60943960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      319499
      60943970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      320777
      60944960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      322055
      60944970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      323333
      60945970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      324611
      60946960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      325889
      60947960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      327167
      60947970
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      328445
      60948960
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      329723
      60950430
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      331001
      62321420
      2016-06-28
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

331002 rows × 13 columns



In [85]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [94]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_4Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_4Dtimeorder=pd.merge(floatsDFAll_4Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_4Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_4Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_4Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_4Dtimeorder['chl_rate_log10'] = floatsDFAllRate_4Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_4Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_4Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAllRate_4Dtimeorder.chl_rate.dropna().shape   # (1099,) data points
floatsDFAllRate_4Dtimeorder.chl_rate_log10.dropna().shape   # (493,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -102.27282937678196
check the sum of the chlor_a after the merge -102.27282937678196






    Out[94]:





(493,)



In [95]:

    
pd.to_datetime(floatsDFAllRate_4Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_4Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_4Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_4Dtimeorder[selector].chl_rate.dropna().shape) # total (774,)
print('all the data count is', floatsDFAllRate_4Dtimeorder.chl_rate.dropna().shape )   # total (1099,)









    



shape of the selector (331002,)
all the data count in [11-01, 03-31]  is (774,)
all the data count is (1099,)



In [96]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_4Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('4-Day chl_rate')









    Out[96]:





<matplotlib.text.Text at 0x12dd98860>



In [97]:

    
# standarized series
ts = floatsDFAllRate_4Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('4-Day standardized chl_rate')









    Out[97]:





<matplotlib.text.Text at 0x12d8348d0>



In [98]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_4Dtimeorder[ (floatsDFAllRate_4Dtimeorder.time > str(i))  & (floatsDFAllRate_4Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1093
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(49,)
(53,)
(5,)
(39,)
(89,)
(77,)
(144,)
(44,)
(62,)
(19,)
(40,)
(43,)
(251,)
(127,)
(51,)



In [99]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_4Dtimeorder[ (floatsDFAllRate_4Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_4Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 774
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(66,)
(0,)
(9,)
(67,)
(30,)
(126,)
(36,)
(55,)
(1,)
(40,)
(0,)
(169,)
(118,)
(57,)



In [ ]:



In [ ]:



In [100]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_4Dtimeorder[ (floatsDFAllRate_4Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_4Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is  (774,)
df_chl_out_4D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_4D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (774,)






    Out[100]:






  
    
      
      id
      time
      ve
      vn
      lat
      spd
      var_tmp
      temp
      var_lon
      var_lat
      lon
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      7793
      34710
      2002-11-01
      1.633062
      12.896375
      16.864937
      13.935000
      0.001790
      28.994687
      0.000128
      0.000066
      63.124500
      0.385674
      -0.413780
      0.060035
      -1.221596
    
    
      8030
      10206
      2002-11-05
      -7.127375
      6.176937
      10.969438
      11.645312
      1000.000000
      NaN
      0.001244
      0.000420
      67.246562
      0.142620
      -0.845818
      0.014256
      -1.846018
    
    
      8034
      15707
      2002-11-05
      -19.271875
      -17.786375
      13.879687
      26.887063
      1000.000000
      NaN
      0.000134
      0.000069
      67.560500
      0.154235
      -0.811817
      -0.025134
      NaN
    
    
      8052
      34710
      2002-11-05
      -0.118437
      10.472312
      17.212188
      10.930375
      0.001605
      28.945750
      0.000118
      0.000062
      63.165562
      0.407654
      -0.389708
      0.021980
      -1.657972
    
    
      8058
      34721
      2002-11-05
      6.933938
      -2.230437
      12.594937
      14.224375
      0.001764
      29.537625
      0.000098
      0.000054
      67.715438
      0.154256
      -0.811758
      0.015577
      -1.807530



In [101]:

    
df_chl_out_4D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_4D_modisa.to_csv('df_chl_out_4D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_4D_modisa.csv', index_col='index')
test.head()









    Out[101]:






  
    
      
      id
      time
      ve
      vn
      lat
      spd
      var_tmp
      temp
      var_lon
      var_lat
      lon
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      7793
      34710
      2002-11-01
      1.633063
      12.896375
      16.864937
      13.935000
      0.001790
      28.994688
      0.000128
      0.000066
      63.124500
      0.385674
      -0.413780
      0.060035
      -1.221596
    
    
      8030
      10206
      2002-11-05
      -7.127375
      6.176937
      10.969438
      11.645312
      1000.000000
      NaN
      0.001244
      0.000420
      67.246562
      0.142620
      -0.845818
      0.014256
      -1.846018
    
    
      8034
      15707
      2002-11-05
      -19.271875
      -17.786375
      13.879687
      26.887063
      1000.000000
      NaN
      0.000134
      0.000069
      67.560500
      0.154235
      -0.811817
      -0.025134
      NaN
    
    
      8052
      34710
      2002-11-05
      -0.118437
      10.472312
      17.212188
      10.930375
      0.001605
      28.945750
      0.000118
      0.000062
      63.165562
      0.407654
      -0.389708
      0.021980
      -1.657972
    
    
      8058
      34721
      2002-11-05
      6.933938
      -2.230437
      12.594938
      14.224375
      0.001764
      29.537625
      0.000098
      0.000054
      67.715438
      0.154256
      -0.811758
      0.015577
      -1.807530



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	ve	vn	lat	spd	var_tmp	temp	var_lon	var_lat	lon
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1278	10206	2002-07-04	9.425000	-4.747250	16.196125	10.918125	1000.000000	NaN	0.008014	0.002037	66.409813
2556	10208	2002-07-04	10.355438	-17.277062	13.752000	20.576187	1000.000000	NaN	0.000094	0.000052	69.632875
3834	11089	2002-07-04	12.128187	-5.896938	16.305750	14.222375	0.003583	27.884125	0.000139	0.000071	64.777500
5112	15703	2002-07-04	9.833375	-16.894688	13.766187	19.978313	0.080302	28.558125	0.000096	0.000053	69.657312
6390	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7668	27069	2002-07-04	26.620125	0.776125	20.173938	28.072937	0.001669	28.946250	0.000104	0.000056	68.953562
8946	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10224	28842	2002-07-04	8.325375	-5.037438	18.808937	22.278562	0.003263	27.669500	0.000197	0.000095	60.774188
11502	34159	2002-07-04	26.471125	6.662250	12.600438	27.822062	1000.000000	NaN	0.000101	0.000054	59.108062
12780	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
14058	34210	2002-07-04	-10.486563	-18.214750	6.333500	21.797125	0.003636	26.731250	0.000129	0.000065	56.863687
15336	34211	2002-07-04	20.471125	-15.337813	8.471000	25.889813	0.003500	28.340375	0.000102	0.000056	68.096688
16614	34212	2002-07-04	32.634313	13.436250	6.398438	38.895812	0.003556	28.492500	0.000095	0.000053	64.999813
17892	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
19170	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20448	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
21726	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
23004	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
24282	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25560	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
26838	34708	2002-07-04	40.593875	3.761375	10.188625	40.938563	0.001807	27.175250	0.000095	0.000052	60.022438
28116	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
29394	34710	2002-07-04	-0.457312	23.502313	13.176250	47.944187	0.001737	30.992562	0.000071	0.000040	49.914500
30672	34714	2002-07-04	37.098250	9.280125	13.676500	38.626187	0.001825	27.723000	0.000115	0.000060	63.951062
31950	34716	2002-07-04	36.210688	5.186000	7.539062	37.364812	0.001768	28.814563	0.000105	0.000057	65.642375
33228	34718	2002-07-04	19.961438	-28.952500	16.075125	35.701000	0.001725	29.147312	0.000121	0.000062	72.572750
34506	34719	2002-07-04	18.884875	-12.003562	17.667063	23.623312	0.001569	28.927000	0.000112	0.000059	71.098500
35784	34720	2002-07-04	9.189375	-33.784563	14.530250	35.320875	0.001818	28.661375	0.000117	0.000062	69.258437
37062	34721	2002-07-04	8.399250	-9.343312	17.121938	13.575938	0.001748	27.916125	0.000122	0.000064	65.436687
...	...	...	...	...	...	...	...	...	...	...	...
293939	3098682	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
295217	60073460	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
296495	60074440	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
297773	60077450	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
299051	60150420	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
300329	60454500	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
301607	60656200	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
302885	60657200	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
304163	60658190	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
305441	60659110	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
306719	60659120	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
307997	60659190	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
309275	60659200	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
310553	60940960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
311831	60940970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
313109	60941960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
314387	60941970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
315665	60942960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
316943	60942970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
318221	60943960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
319499	60943970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
320777	60944960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
322055	60944970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
323333	60945970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
324611	60946960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
325889	60947960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
327167	60947970	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
328445	60948960	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
329723	60950430	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
331001	62321420	2016-06-28	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	ve	vn	lat	spd	var_tmp	temp	var_lon	var_lat	lon	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
7793	34710	2002-11-01	1.633062	12.896375	16.864937	13.935000	0.001790	28.994687	0.000128	0.000066	63.124500	0.385674	-0.413780	0.060035	-1.221596
8030	10206	2002-11-05	-7.127375	6.176937	10.969438	11.645312	1000.000000	NaN	0.001244	0.000420	67.246562	0.142620	-0.845818	0.014256	-1.846018
8034	15707	2002-11-05	-19.271875	-17.786375	13.879687	26.887063	1000.000000	NaN	0.000134	0.000069	67.560500	0.154235	-0.811817	-0.025134	NaN
8052	34710	2002-11-05	-0.118437	10.472312	17.212188	10.930375	0.001605	28.945750	0.000118	0.000062	63.165562	0.407654	-0.389708	0.021980	-1.657972
8058	34721	2002-11-05	6.933938	-2.230437	12.594937	14.224375	0.001764	29.537625	0.000098	0.000054	67.715438	0.154256	-0.811758	0.015577	-1.807530

	id	time	ve	vn	lat	spd	var_tmp	temp	var_lon	var_lat	lon	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
7793	34710	2002-11-01	1.633063	12.896375	16.864937	13.935000	0.001790	28.994688	0.000128	0.000066	63.124500	0.385674	-0.413780	0.060035	-1.221596
8030	10206	2002-11-05	-7.127375	6.176937	10.969438	11.645312	1000.000000	NaN	0.001244	0.000420	67.246562	0.142620	-0.845818	0.014256	-1.846018
8034	15707	2002-11-05	-19.271875	-17.786375	13.879687	26.887063	1000.000000	NaN	0.000134	0.000069	67.560500	0.154235	-0.811817	-0.025134	NaN
8052	34710	2002-11-05	-0.118437	10.472312	17.212188	10.930375	0.001605	28.945750	0.000118	0.000062	63.165562	0.407654	-0.389708	0.021980	-1.657972
8058	34721	2002-11-05	6.933938	-2.230437	12.594938	14.224375	0.001764	29.537625	0.000098	0.000054	67.715438	0.154256	-0.811758	0.015577	-1.807530