8Day subsampling on the OceanColor Dataset



In [6]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [7]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [8]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [9]:

    
[ds.load() for ds in both_datasets]









    Out[9]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [10]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [11]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[11]:





[None, None]



In [12]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[12]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [13]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[13]:





<matplotlib.collections.QuadMesh at 0x11948c630>



In [14]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[14]:





<matplotlib.text.Text at 0x13d566550>



In [15]:

    
#ds_8day



In [16]:

    
#ds_daily



In [17]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[17]:





<matplotlib.collections.QuadMesh at 0x11c2bc550>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [18]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [19]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[19]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [ ]:



In [20]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[20]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [21]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [22]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [23]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[23]:





<matplotlib.legend.Legend at 0x11dc18780>

Seasonal Climatology



In [24]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [68]:

    
print()



In [69]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[69]:





<matplotlib.legend.Legend at 0x11de12588>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [70]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[70]:





<matplotlib.collections.QuadMesh at 0x11de26240>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [71]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[71]:





<matplotlib.collections.QuadMesh at 0x11acc6748>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [72]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[72]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lat      (points) float64 16.04 18.04
    lon      (points) float64 65.04 70.04
  * points   (points) int64 0 1



In [73]:

    
#ds_daily.chlor_a.sel_points?



In [74]:

    
ds_8day = ds_daily.resample('8D', dim='time')
ds_8day









    Out[74]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 662)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [75]:

    
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[75]:





<matplotlib.collections.QuadMesh at 0x11ad8b0f0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [76]:

    
# check the range for the longitude
print(ds_8day.lon.min(),'\n' ,ds_8day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [77]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [78]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [79]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[79]:





0



In [80]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[80]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [81]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [82]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[82]:





<matplotlib.axes._subplots.AxesSubplot at 0x1fafec7b8>



In [83]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [84]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[84]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [85]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_8D =floatsDSAll.resample('8D', dim='time')
floatsDSAll_8D









    Out[85]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 639)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
Data variables:
    lat      (time, id) float64 nan 16.21 13.62 16.14 13.65 nan 20.09 nan ...
    spd      (time, id) float64 nan 8.832 18.7 19.48 17.6 nan 25.74 nan ...
    var_lat  (time, id) float64 nan 0.001424 6.2e-05 6.574e-05 5.391e-05 nan ...
    vn       (time, id) float64 nan -0.2949 -9.82 -12.91 -8.593 nan -1.964 ...
    temp     (time, id) float64 nan nan nan 27.81 28.57 nan 28.99 nan 27.65 ...
    lon      (time, id) float64 nan 66.51 69.86 64.99 69.87 nan 69.35 nan ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.00364 0.08777 nan 0.001711 ...
    ve       (time, id) float64 nan 7.335 13.25 12.26 12.13 nan 24.29 nan ...
    var_lon  (time, id) float64 nan 0.005415 0.0001179 0.0001259 9.874e-05 ...



In [86]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_8D = floatsDSAll_8D.to_dataframe()
floatsDFAll_8D
floatsDFAll_8D = floatsDFAll_8D.reset_index()
floatsDFAll_8D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_8D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[86]:





<matplotlib.axes._subplots.AxesSubplot at 0x11babdf28>



In [87]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_8Dtimeorder = floatsDFAll_8D.sort_values(['time','id'],ascending=True)
floatsDFAll_8Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[87]:






  
    
      
      id
      time
      lat
      spd
      var_lat
      vn
      temp
      lon
      var_tmp
      ve
      var_lon
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      639
      10206
      2002-07-04
      16.208687
      8.832125
      0.001424
      -0.294906
      NaN
      66.510656
      1000.000000
      7.334875
      0.005415
    
    
      1278
      10208
      2002-07-04
      13.617187
      18.702906
      0.000062
      -9.820156
      NaN
      69.858594
      1000.000000
      13.248719
      0.000118
    
    
      1917
      11089
      2002-07-04
      16.140125
      19.484250
      0.000066
      -12.911969
      27.807781
      64.993937
      0.003640
      12.260094
      0.000126
    
    
      2556
      15703
      2002-07-04
      13.648188
      17.604156
      0.000054
      -8.592531
      28.569812
      69.867031
      0.087771
      12.131219
      0.000099
    
    
      3195
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3834
      27069
      2002-07-04
      20.090281
      25.743625
      0.000054
      -1.963906
      28.985781
      69.350187
      0.001711
      24.285875
      0.000099
    
    
      4473
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5112
      28842
      2002-07-04
      18.663125
      17.876969
      0.000103
      -7.667469
      27.649500
      60.820937
      0.003330
      3.765094
      0.000218
    
    
      5751
      34159
      2002-07-04
      12.808719
      35.638313
      0.000061
      14.802688
      NaN
      59.602656
      1000.000000
      31.401250
      0.000116
    
    
      6390
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7029
      34210
      2002-07-04
      6.092000
      25.260719
      0.000064
      -15.458906
      26.541750
      56.754250
      0.003695
      -2.609125
      0.000125
    
    
      7668
      34211
      2002-07-04
      8.265656
      27.956094
      0.000055
      -13.744125
      28.372250
      68.470625
      0.003516
      22.912125
      0.000102
    
    
      8307
      34212
      2002-07-04
      6.659437
      46.824937
      0.000055
      12.766750
      28.576844
      65.751156
      0.003590
      41.911531
      0.000102
    
    
      8946
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9585
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10863
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11502
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12141
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12780
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13419
      34708
      2002-07-04
      10.218219
      33.377594
      0.000058
      2.054469
      27.259781
      60.558000
      0.001789
      33.013969
      0.000110
    
    
      14058
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14697
      34710
      2002-07-04
      13.146719
      46.721688
      0.000052
      -4.710250
      31.146688
      50.311875
      0.001858
      5.607969
      0.000100
    
    
      15336
      34714
      2002-07-04
      13.736031
      38.004406
      0.000060
      3.964156
      27.743656
      64.554750
      0.001808
      36.962156
      0.000113
    
    
      15975
      34716
      2002-07-04
      7.701938
      34.903438
      0.000058
      7.112344
      28.789000
      66.159969
      0.001758
      32.958500
      0.000107
    
    
      16614
      34718
      2002-07-04
      15.600562
      38.508094
      0.000056
      -31.845094
      29.088344
      72.890563
      0.001709
      20.738031
      0.000105
    
    
      17253
      34719
      2002-07-04
      17.318281
      27.892406
      0.000058
      -20.870063
      28.957969
      71.331469
      0.001661
      14.599125
      0.000109
    
    
      17892
      34720
      2002-07-04
      14.194000
      26.035375
      0.000061
      -21.496063
      28.665031
      69.435531
      0.001797
      11.059094
      0.000113
    
    
      18531
      34721
      2002-07-04
      16.971531
      13.515531
      0.000061
      -10.533031
      27.911625
      65.534344
      0.001749
      6.204906
      0.000115
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      146969
      3098682
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      147608
      60073460
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      148247
      60074440
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      148886
      60077450
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      149525
      60150420
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      150164
      60454500
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      150803
      60656200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      151442
      60657200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      152081
      60658190
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      152720
      60659110
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      153359
      60659120
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      153998
      60659190
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      154637
      60659200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      155276
      60940960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      155915
      60940970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      156554
      60941960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      157193
      60941970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      157832
      60942960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      158471
      60942970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      159110
      60943960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      159749
      60943970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      160388
      60944960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      161027
      60944970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      161666
      60945970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      162305
      60946960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      162944
      60947960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      163583
      60947970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      164222
      60948960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      164861
      60950430
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      165500
      62321420
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

165501 rows × 11 columns



In [88]:

    
floatsDFAll_8Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (3855,)









    Out[88]:





(3855,)



In [89]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[89]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [90]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_8day.chlor_a.sel_points(time=list(floatsDFAll_8Dtimeorder.time),lon=list(floatsDFAll_8Dtimeorder.lon), lat=list(floatsDFAll_8Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 163755



In [91]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (1746,) good values









    Out[91]:





(1746,)



In [92]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_8Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_8Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_8Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_8Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_8Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_8Dtimeorder['chlor_a_log10'] = floatsDFAll_8Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_8Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_8Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAll_8Dtimeorder.chlor_a.dropna().shape  # (1746,) 
floatsDFAll_8Dtimeorder.chlor_a_log10.dropna().shape  # (1746,)









    



after editing the dataframe the nan values in 'chlor_a' is 163755






    Out[92]:





(1746,)



In [93]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_8Dtimeorder









    Out[93]:






  
    
      
      id
      time
      lat
      spd
      var_lat
      vn
      temp
      lon
      var_tmp
      ve
      var_lon
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      639
      10206
      2002-07-04
      16.208687
      8.832125
      0.001424
      -0.294906
      NaN
      66.510656
      1000.000000
      7.334875
      0.005415
      NaN
      NaN
    
    
      1278
      10208
      2002-07-04
      13.617187
      18.702906
      0.000062
      -9.820156
      NaN
      69.858594
      1000.000000
      13.248719
      0.000118
      NaN
      NaN
    
    
      1917
      11089
      2002-07-04
      16.140125
      19.484250
      0.000066
      -12.911969
      27.807781
      64.993937
      0.003640
      12.260094
      0.000126
      NaN
      NaN
    
    
      2556
      15703
      2002-07-04
      13.648188
      17.604156
      0.000054
      -8.592531
      28.569812
      69.867031
      0.087771
      12.131219
      0.000099
      NaN
      NaN
    
    
      3195
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3834
      27069
      2002-07-04
      20.090281
      25.743625
      0.000054
      -1.963906
      28.985781
      69.350187
      0.001711
      24.285875
      0.000099
      NaN
      NaN
    
    
      4473
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5112
      28842
      2002-07-04
      18.663125
      17.876969
      0.000103
      -7.667469
      27.649500
      60.820937
      0.003330
      3.765094
      0.000218
      NaN
      NaN
    
    
      5751
      34159
      2002-07-04
      12.808719
      35.638313
      0.000061
      14.802688
      NaN
      59.602656
      1000.000000
      31.401250
      0.000116
      NaN
      NaN
    
    
      6390
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7029
      34210
      2002-07-04
      6.092000
      25.260719
      0.000064
      -15.458906
      26.541750
      56.754250
      0.003695
      -2.609125
      0.000125
      NaN
      NaN
    
    
      7668
      34211
      2002-07-04
      8.265656
      27.956094
      0.000055
      -13.744125
      28.372250
      68.470625
      0.003516
      22.912125
      0.000102
      0.104210
      -0.982091
    
    
      8307
      34212
      2002-07-04
      6.659437
      46.824937
      0.000055
      12.766750
      28.576844
      65.751156
      0.003590
      41.911531
      0.000102
      NaN
      NaN
    
    
      8946
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9585
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10863
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11502
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12141
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12780
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13419
      34708
      2002-07-04
      10.218219
      33.377594
      0.000058
      2.054469
      27.259781
      60.558000
      0.001789
      33.013969
      0.000110
      NaN
      NaN
    
    
      14058
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14697
      34710
      2002-07-04
      13.146719
      46.721688
      0.000052
      -4.710250
      31.146688
      50.311875
      0.001858
      5.607969
      0.000100
      NaN
      NaN
    
    
      15336
      34714
      2002-07-04
      13.736031
      38.004406
      0.000060
      3.964156
      27.743656
      64.554750
      0.001808
      36.962156
      0.000113
      NaN
      NaN
    
    
      15975
      34716
      2002-07-04
      7.701938
      34.903438
      0.000058
      7.112344
      28.789000
      66.159969
      0.001758
      32.958500
      0.000107
      0.119733
      -0.921786
    
    
      16614
      34718
      2002-07-04
      15.600562
      38.508094
      0.000056
      -31.845094
      29.088344
      72.890563
      0.001709
      20.738031
      0.000105
      NaN
      NaN
    
    
      17253
      34719
      2002-07-04
      17.318281
      27.892406
      0.000058
      -20.870063
      28.957969
      71.331469
      0.001661
      14.599125
      0.000109
      NaN
      NaN
    
    
      17892
      34720
      2002-07-04
      14.194000
      26.035375
      0.000061
      -21.496063
      28.665031
      69.435531
      0.001797
      11.059094
      0.000113
      NaN
      NaN
    
    
      18531
      34721
      2002-07-04
      16.971531
      13.515531
      0.000061
      -10.533031
      27.911625
      65.534344
      0.001749
      6.204906
      0.000115
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      146969
      3098682
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      147608
      60073460
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      148247
      60074440
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      148886
      60077450
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      149525
      60150420
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      150164
      60454500
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      150803
      60656200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      151442
      60657200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      152081
      60658190
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      152720
      60659110
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      153359
      60659120
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      153998
      60659190
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      154637
      60659200
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      155276
      60940960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      155915
      60940970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      156554
      60941960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      157193
      60941970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      157832
      60942960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      158471
      60942970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      159110
      60943960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      159749
      60943970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      160388
      60944960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      161027
      60944970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      161666
      60945970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      162305
      60946960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      162944
      60947960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      163583
      60947970
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      164222
      60948960
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      164861
      60950430
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      165500
      62321420
      2016-06-24
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

165501 rows × 13 columns



In [94]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [95]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_8Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_8Dtimeorder=pd.merge(floatsDFAll_8Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_8Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_8Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_8Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_8Dtimeorder['chl_rate_log10'] = floatsDFAllRate_8Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_8Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_8Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAllRate_8Dtimeorder.chl_rate.dropna().shape   # (1062,) data points
#floatsDFAllRate_8Dtimeorder.chl_rate_log10.dropna().shape   # (436,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -25.070548981676524
check the sum of the chlor_a after the merge -25.070548981676524






    Out[95]:





(1062,)



In [96]:

    
pd.to_datetime(floatsDFAllRate_8Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_8Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_8Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_8Dtimeorder[selector].chl_rate.dropna().shape) # total (692,)
print('all the data count is', floatsDFAllRate_8Dtimeorder.chl_rate.dropna().shape )   # total (1062,)









    



shape of the selector (165501,)
all the data count in [11-01, 03-31]  is (692,)
all the data count is (1062,)



In [97]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_8Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('8-Day chl_rate')









    Out[97]:





<matplotlib.text.Text at 0x11dd28828>



In [98]:

    
# standarized series
ts = floatsDFAllRate_8Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('8-Day standardized chl_rate')









    Out[98]:





<matplotlib.text.Text at 0x1a38564e0>



In [99]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_8Dtimeorder[ (floatsDFAllRate_8Dtimeorder.time > str(i))  & (floatsDFAllRate_8Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1061
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(46,)
(43,)
(5,)
(43,)
(103,)
(94,)
(139,)
(37,)
(64,)
(18,)
(37,)
(32,)
(229,)
(113,)
(58,)



In [100]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_8Dtimeorder[ (floatsDFAllRate_8Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_8Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 692
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(56,)
(0,)
(11,)
(74,)
(47,)
(109,)
(31,)
(49,)
(3,)
(35,)
(0,)
(125,)
(97,)
(55,)



In [ ]:



In [ ]:



In [101]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_8Dtimeorder[ (floatsDFAllRate_8Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_8Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is (692,)
df_chl_out_8D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_8D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (692,)






    Out[101]:






  
    
      
      id
      time
      lat
      spd
      var_lat
      vn
      temp
      lon
      var_tmp
      ve
      var_lon
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      3886
      10206
      2002-11-01
      10.873656
      11.188906
      0.000352
      6.509875
      NaN
      67.351188
      1000.000000
      -6.823625
      0.000996
      0.137771
      -0.860842
      -0.012681
      NaN
    
    
      3888
      11089
      2002-11-01
      14.269219
      13.679406
      0.000057
      4.337844
      28.969813
      65.099156
      0.003679
      -11.122000
      0.000106
      0.152450
      -0.816873
      0.027142
      -1.566358
    
    
      3908
      34710
      2002-11-01
      17.038563
      12.432687
      0.000064
      11.684344
      28.970219
      63.145031
      0.001698
      0.757312
      0.000123
      0.383868
      -0.415819
      0.059694
      -1.224066
    
    
      4145
      10206
      2002-11-09
      11.155719
      3.428062
      0.000984
      1.562844
      NaN
      67.108219
      1000.000000
      -0.786375
      0.003551
      0.132682
      -0.877188
      -0.005089
      NaN
    
    
      4147
      11089
      2002-11-09
      14.220969
      19.677781
      0.000065
      -6.951906
      28.742188
      64.193281
      0.003868
      -17.539250
      0.000126
      0.201879
      -0.694909
      0.049429
      -1.306018



In [102]:

    
df_chl_out_8D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_8D_modisa.to_csv('df_chl_out_8D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_8D_modisa.csv', index_col='index')
test.head()









    Out[102]:






  
    
      
      id
      time
      lat
      spd
      var_lat
      vn
      temp
      lon
      var_tmp
      ve
      var_lon
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      3886
      10206
      2002-11-01
      10.873656
      11.188906
      0.000352
      6.509875
      NaN
      67.351188
      1000.000000
      -6.823625
      0.000996
      0.137771
      -0.860842
      -0.012681
      NaN
    
    
      3888
      11089
      2002-11-01
      14.269219
      13.679406
      0.000057
      4.337844
      28.969813
      65.099156
      0.003679
      -11.122000
      0.000106
      0.152450
      -0.816873
      0.027142
      -1.566358
    
    
      3908
      34710
      2002-11-01
      17.038563
      12.432687
      0.000064
      11.684344
      28.970219
      63.145031
      0.001698
      0.757312
      0.000123
      0.383868
      -0.415819
      0.059694
      -1.224066
    
    
      4145
      10206
      2002-11-09
      11.155719
      3.428062
      0.000984
      1.562844
      NaN
      67.108219
      1000.000000
      -0.786375
      0.003551
      0.132682
      -0.877188
      -0.005089
      NaN
    
    
      4147
      11089
      2002-11-09
      14.220969
      19.677781
      0.000065
      -6.951906
      28.742188
      64.193281
      0.003868
      -17.539250
      0.000126
      0.201879
      -0.694909
      0.049429
      -1.306018



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	lat	spd	var_lat	vn	temp	lon	var_tmp	ve	var_lon
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
639	10206	2002-07-04	16.208687	8.832125	0.001424	-0.294906	NaN	66.510656	1000.000000	7.334875	0.005415
1278	10208	2002-07-04	13.617187	18.702906	0.000062	-9.820156	NaN	69.858594	1000.000000	13.248719	0.000118
1917	11089	2002-07-04	16.140125	19.484250	0.000066	-12.911969	27.807781	64.993937	0.003640	12.260094	0.000126
2556	15703	2002-07-04	13.648188	17.604156	0.000054	-8.592531	28.569812	69.867031	0.087771	12.131219	0.000099
3195	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3834	27069	2002-07-04	20.090281	25.743625	0.000054	-1.963906	28.985781	69.350187	0.001711	24.285875	0.000099
4473	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5112	28842	2002-07-04	18.663125	17.876969	0.000103	-7.667469	27.649500	60.820937	0.003330	3.765094	0.000218
5751	34159	2002-07-04	12.808719	35.638313	0.000061	14.802688	NaN	59.602656	1000.000000	31.401250	0.000116
6390	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7029	34210	2002-07-04	6.092000	25.260719	0.000064	-15.458906	26.541750	56.754250	0.003695	-2.609125	0.000125
7668	34211	2002-07-04	8.265656	27.956094	0.000055	-13.744125	28.372250	68.470625	0.003516	22.912125	0.000102
8307	34212	2002-07-04	6.659437	46.824937	0.000055	12.766750	28.576844	65.751156	0.003590	41.911531	0.000102
8946	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9585	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10224	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10863	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11502	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
12141	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
12780	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
13419	34708	2002-07-04	10.218219	33.377594	0.000058	2.054469	27.259781	60.558000	0.001789	33.013969	0.000110
14058	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
14697	34710	2002-07-04	13.146719	46.721688	0.000052	-4.710250	31.146688	50.311875	0.001858	5.607969	0.000100
15336	34714	2002-07-04	13.736031	38.004406	0.000060	3.964156	27.743656	64.554750	0.001808	36.962156	0.000113
15975	34716	2002-07-04	7.701938	34.903438	0.000058	7.112344	28.789000	66.159969	0.001758	32.958500	0.000107
16614	34718	2002-07-04	15.600562	38.508094	0.000056	-31.845094	29.088344	72.890563	0.001709	20.738031	0.000105
17253	34719	2002-07-04	17.318281	27.892406	0.000058	-20.870063	28.957969	71.331469	0.001661	14.599125	0.000109
17892	34720	2002-07-04	14.194000	26.035375	0.000061	-21.496063	28.665031	69.435531	0.001797	11.059094	0.000113
18531	34721	2002-07-04	16.971531	13.515531	0.000061	-10.533031	27.911625	65.534344	0.001749	6.204906	0.000115
...	...	...	...	...	...	...	...	...	...	...	...
146969	3098682	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
147608	60073460	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
148247	60074440	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
148886	60077450	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
149525	60150420	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
150164	60454500	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
150803	60656200	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
151442	60657200	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
152081	60658190	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
152720	60659110	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
153359	60659120	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
153998	60659190	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
154637	60659200	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
155276	60940960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
155915	60940970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
156554	60941960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
157193	60941970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
157832	60942960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
158471	60942970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
159110	60943960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
159749	60943970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
160388	60944960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
161027	60944970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
161666	60945970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
162305	60946960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
162944	60947960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
163583	60947970	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
164222	60948960	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
164861	60950430	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
165500	62321420	2016-06-24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	lat	spd	var_lat	vn	temp	lon	var_tmp	ve	var_lon	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
3886	10206	2002-11-01	10.873656	11.188906	0.000352	6.509875	NaN	67.351188	1000.000000	-6.823625	0.000996	0.137771	-0.860842	-0.012681	NaN
3888	11089	2002-11-01	14.269219	13.679406	0.000057	4.337844	28.969813	65.099156	0.003679	-11.122000	0.000106	0.152450	-0.816873	0.027142	-1.566358
3908	34710	2002-11-01	17.038563	12.432687	0.000064	11.684344	28.970219	63.145031	0.001698	0.757312	0.000123	0.383868	-0.415819	0.059694	-1.224066
4145	10206	2002-11-09	11.155719	3.428062	0.000984	1.562844	NaN	67.108219	1000.000000	-0.786375	0.003551	0.132682	-0.877188	-0.005089	NaN
4147	11089	2002-11-09	14.220969	19.677781	0.000065	-6.951906	28.742188	64.193281	0.003868	-17.539250	0.000126	0.201879	-0.694909	0.049429	-1.306018

	id	time	lat	spd	var_lat	vn	temp	lon	var_tmp	ve	var_lon	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
3886	10206	2002-11-01	10.873656	11.188906	0.000352	6.509875	NaN	67.351188	1000.000000	-6.823625	0.000996	0.137771	-0.860842	-0.012681	NaN
3888	11089	2002-11-01	14.269219	13.679406	0.000057	4.337844	28.969813	65.099156	0.003679	-11.122000	0.000106	0.152450	-0.816873	0.027142	-1.566358
3908	34710	2002-11-01	17.038563	12.432687	0.000064	11.684344	28.970219	63.145031	0.001698	0.757312	0.000123	0.383868	-0.415819	0.059694	-1.224066
4145	10206	2002-11-09	11.155719	3.428062	0.000984	1.562844	NaN	67.108219	1000.000000	-0.786375	0.003551	0.132682	-0.877188	-0.005089	NaN
4147	11089	2002-11-09	14.220969	19.677781	0.000065	-6.951906	28.742188	64.193281	0.003868	-17.539250	0.000126	0.201879	-0.694909	0.049429	-1.306018