5Day subsampling on the OceanColor Dataset



In [1]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [2]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [3]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [4]:

    
[ds.load() for ds in both_datasets]









    Out[4]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [5]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [6]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[6]:





[None, None]



In [7]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[7]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [8]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[8]:





<matplotlib.collections.QuadMesh at 0x118bf9eb8>



In [9]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[9]:





<matplotlib.text.Text at 0x1134ab358>



In [10]:

    
#ds_8day



In [11]:

    
#ds_daily



In [12]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[12]:





<matplotlib.collections.QuadMesh at 0x11b392a20>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [13]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [14]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[14]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [15]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[15]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [16]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [17]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [18]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[18]:





<matplotlib.legend.Legend at 0x11a1ac5c0>

Seasonal Climatology



In [19]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [20]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[20]:





<matplotlib.legend.Legend at 0x129c710f0>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [21]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[21]:





<matplotlib.collections.QuadMesh at 0x129cfecf8>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [22]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[22]:





<matplotlib.collections.QuadMesh at 0x129874b38>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [23]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[23]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lon      (points) float64 65.04 70.04
    lat      (points) float64 16.04 18.04
  * points   (points) int64 0 1



In [24]:

    
#ds_daily.chlor_a.sel_points?



In [25]:

    
ds_5day = ds_daily.resample('5D', dim='time')
ds_5day









    Out[25]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 1059)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-09 2002-07-14 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [26]:

    
plt.figure(figsize=(8,6))
ds_5day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[26]:





<matplotlib.collections.QuadMesh at 0x12a3cae48>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [27]:

    
# check the range for the longitude
print(ds_5day.lon.min(),'\n' ,ds_5day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [28]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [29]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [30]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[30]:





0



In [31]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[31]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [32]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [33]:

    
# avoid run this line repeatedly
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[33]:





<matplotlib.axes._subplots.AxesSubplot at 0x233261908>



In [34]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [35]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[35]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [36]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_5D =floatsDSAll.resample('5D', dim='time')
floatsDSAll_5D









    Out[36]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 1023)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-09 2002-07-14 ...
Data variables:
    lat      (time, id) float64 nan 16.19 13.7 16.28 13.72 nan 20.16 nan ...
    spd      (time, id) float64 nan 9.792 19.85 15.81 19.42 nan 27.59 nan ...
    var_lon  (time, id) float64 nan 0.006856 0.0001267 0.000128 0.0001035 ...
    vn       (time, id) float64 nan -3.258 -14.38 -7.189 -13.63 nan -1.261 ...
    var_lat  (time, id) float64 nan 0.001761 6.558e-05 6.664e-05 5.604e-05 ...
    lon      (time, id) float64 nan 66.44 69.68 64.83 69.7 nan 69.06 nan ...
    ve       (time, id) float64 nan 8.431 11.62 13.13 11.28 nan 26.06 nan ...
    temp     (time, id) float64 nan nan nan 27.86 28.56 nan 28.94 nan 27.67 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.00362 0.08265 nan 0.001685 ...



In [37]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_5D = floatsDSAll_5D.to_dataframe()
floatsDFAll_5D
floatsDFAll_5D = floatsDFAll_5D.reset_index()
floatsDFAll_5D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_5D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[37]:





<matplotlib.axes._subplots.AxesSubplot at 0x23323b358>



In [38]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_5Dtimeorder = floatsDFAll_5D.sort_values(['time','id'],ascending=True)
floatsDFAll_5Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[38]:






  
    
      
      id
      time
      lat
      spd
      var_lon
      vn
      var_lat
      lon
      ve
      temp
      var_tmp
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1023
      10206
      2002-07-04
      16.19070
      9.79220
      0.006856
      -3.25785
      0.001761
      66.43755
      8.43105
      NaN
      1000.000000
    
    
      2046
      10208
      2002-07-04
      13.70170
      19.84795
      0.000127
      -14.38240
      0.000066
      69.68000
      11.61810
      NaN
      1000.000000
    
    
      3069
      11089
      2002-07-04
      16.28310
      15.81310
      0.000128
      -7.18860
      0.000067
      64.83270
      13.12565
      27.86090
      0.003620
    
    
      4092
      15703
      2002-07-04
      13.71820
      19.42420
      0.000103
      -13.63140
      0.000056
      69.70245
      11.28315
      28.56105
      0.082646
    
    
      5115
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6138
      27069
      2002-07-04
      20.15560
      27.59125
      0.000100
      -1.26115
      0.000055
      69.06060
      26.05590
      28.94305
      0.001685
    
    
      7161
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8184
      28842
      2002-07-04
      18.76870
      20.85820
      0.000192
      -6.57405
      0.000093
      60.79630
      6.22225
      27.67170
      0.003329
    
    
      9207
      34159
      2002-07-04
      12.63455
      28.87670
      0.000098
      8.09605
      0.000053
      59.21560
      27.17165
      NaN
      1000.000000
    
    
      10230
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11253
      34210
      2002-07-04
      6.25245
      23.24195
      0.000137
      -18.27155
      0.000069
      56.81465
      -12.81945
      26.72150
      0.003663
    
    
      12276
      34211
      2002-07-04
      8.41740
      27.24930
      0.000108
      -13.14195
      0.000058
      68.18405
      22.85300
      28.35265
      0.003441
    
    
      13299
      34212
      2002-07-04
      6.46140
      44.91405
      0.000093
      17.38350
      0.000052
      65.18250
      38.18655
      28.52460
      0.003544
    
    
      14322
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15345
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16368
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17391
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18414
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19437
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20460
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      21483
      34708
      2002-07-04
      10.20515
      40.52780
      0.000093
      2.85870
      0.000051
      60.17130
      40.13740
      27.16585
      0.001793
    
    
      22506
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      23529
      34710
      2002-07-04
      13.23525
      47.69970
      0.000094
      15.57255
      0.000050
      50.02370
      8.34430
      31.06135
      0.001846
    
    
      24552
      34714
      2002-07-04
      13.70670
      39.88645
      0.000109
      8.81165
      0.000058
      64.10040
      38.42605
      27.73010
      0.001814
    
    
      25575
      34716
      2002-07-04
      7.57385
      38.37455
      0.000101
      6.56485
      0.000055
      65.78450
      37.06290
      28.80910
      0.001781
    
    
      26598
      34718
      2002-07-04
      15.95885
      34.51250
      0.000114
      -27.92385
      0.000059
      72.64835
      19.48765
      29.12910
      0.001700
    
    
      27621
      34719
      2002-07-04
      17.60155
      24.58105
      0.000114
      -14.54685
      0.000060
      71.17005
      17.55510
      28.93850
      0.001604
    
    
      28644
      34720
      2002-07-04
      14.41830
      32.49445
      0.000120
      -29.74220
      0.000064
      69.29585
      10.30660
      28.66575
      0.001826
    
    
      29667
      34721
      2002-07-04
      17.08435
      12.96220
      0.000115
      -9.46595
      0.000061
      65.46475
      7.37165
      27.90860
      0.001799
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      235289
      3098682
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      236312
      60073460
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      237335
      60074440
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      238358
      60077450
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      239381
      60150420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      240404
      60454500
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      241427
      60656200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      242450
      60657200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      243473
      60658190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      244496
      60659110
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      245519
      60659120
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      246542
      60659190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      247565
      60659200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      248588
      60940960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      249611
      60940970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      250634
      60941960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      251657
      60941970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      252680
      60942960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      253703
      60942970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      254726
      60943960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      255749
      60943970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      256772
      60944960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      257795
      60944970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      258818
      60945970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      259841
      60946960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      260864
      60947960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      261887
      60947970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      262910
      60948960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      263933
      60950430
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      264956
      62321420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

264957 rows × 11 columns



In [39]:

    
floatsDFAll_5Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (7349,)









    Out[39]:





(5955,)



In [40]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[40]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [41]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_5day.chlor_a.sel_points(time=list(floatsDFAll_5Dtimeorder.time),lon=list(floatsDFAll_5Dtimeorder.lon), lat=list(floatsDFAll_5Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 262906



In [42]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (2051,) good values









    Out[42]:





(2051,)



In [44]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_5Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_5Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_5Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_5Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_5Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_5Dtimeorder['chlor_a_log10'] = floatsDFAll_5Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_5Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_5Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAll_5Dtimeorder.chlor_a.dropna().shape  # (2051,)
floatsDFAll_5Dtimeorder.chlor_a_log10.dropna().shape  # (2051,)









    



after editing the dataframe the nan values in 'chlor_a' is 262906






    Out[44]:





(2051,)



In [45]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_5Dtimeorder









    Out[45]:






  
    
      
      id
      time
      lat
      spd
      var_lon
      vn
      var_lat
      lon
      ve
      temp
      var_tmp
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1023
      10206
      2002-07-04
      16.19070
      9.79220
      0.006856
      -3.25785
      0.001761
      66.43755
      8.43105
      NaN
      1000.000000
      NaN
      NaN
    
    
      2046
      10208
      2002-07-04
      13.70170
      19.84795
      0.000127
      -14.38240
      0.000066
      69.68000
      11.61810
      NaN
      1000.000000
      NaN
      NaN
    
    
      3069
      11089
      2002-07-04
      16.28310
      15.81310
      0.000128
      -7.18860
      0.000067
      64.83270
      13.12565
      27.86090
      0.003620
      NaN
      NaN
    
    
      4092
      15703
      2002-07-04
      13.71820
      19.42420
      0.000103
      -13.63140
      0.000056
      69.70245
      11.28315
      28.56105
      0.082646
      NaN
      NaN
    
    
      5115
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6138
      27069
      2002-07-04
      20.15560
      27.59125
      0.000100
      -1.26115
      0.000055
      69.06060
      26.05590
      28.94305
      0.001685
      NaN
      NaN
    
    
      7161
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8184
      28842
      2002-07-04
      18.76870
      20.85820
      0.000192
      -6.57405
      0.000093
      60.79630
      6.22225
      27.67170
      0.003329
      NaN
      NaN
    
    
      9207
      34159
      2002-07-04
      12.63455
      28.87670
      0.000098
      8.09605
      0.000053
      59.21560
      27.17165
      NaN
      1000.000000
      NaN
      NaN
    
    
      10230
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11253
      34210
      2002-07-04
      6.25245
      23.24195
      0.000137
      -18.27155
      0.000069
      56.81465
      -12.81945
      26.72150
      0.003663
      NaN
      NaN
    
    
      12276
      34211
      2002-07-04
      8.41740
      27.24930
      0.000108
      -13.14195
      0.000058
      68.18405
      22.85300
      28.35265
      0.003441
      NaN
      NaN
    
    
      13299
      34212
      2002-07-04
      6.46140
      44.91405
      0.000093
      17.38350
      0.000052
      65.18250
      38.18655
      28.52460
      0.003544
      NaN
      NaN
    
    
      14322
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15345
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16368
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17391
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18414
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19437
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20460
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      21483
      34708
      2002-07-04
      10.20515
      40.52780
      0.000093
      2.85870
      0.000051
      60.17130
      40.13740
      27.16585
      0.001793
      NaN
      NaN
    
    
      22506
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      23529
      34710
      2002-07-04
      13.23525
      47.69970
      0.000094
      15.57255
      0.000050
      50.02370
      8.34430
      31.06135
      0.001846
      NaN
      NaN
    
    
      24552
      34714
      2002-07-04
      13.70670
      39.88645
      0.000109
      8.81165
      0.000058
      64.10040
      38.42605
      27.73010
      0.001814
      NaN
      NaN
    
    
      25575
      34716
      2002-07-04
      7.57385
      38.37455
      0.000101
      6.56485
      0.000055
      65.78450
      37.06290
      28.80910
      0.001781
      0.110575
      -0.956343
    
    
      26598
      34718
      2002-07-04
      15.95885
      34.51250
      0.000114
      -27.92385
      0.000059
      72.64835
      19.48765
      29.12910
      0.001700
      NaN
      NaN
    
    
      27621
      34719
      2002-07-04
      17.60155
      24.58105
      0.000114
      -14.54685
      0.000060
      71.17005
      17.55510
      28.93850
      0.001604
      NaN
      NaN
    
    
      28644
      34720
      2002-07-04
      14.41830
      32.49445
      0.000120
      -29.74220
      0.000064
      69.29585
      10.30660
      28.66575
      0.001826
      NaN
      NaN
    
    
      29667
      34721
      2002-07-04
      17.08435
      12.96220
      0.000115
      -9.46595
      0.000061
      65.46475
      7.37165
      27.90860
      0.001799
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      235289
      3098682
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      236312
      60073460
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      237335
      60074440
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      238358
      60077450
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      239381
      60150420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      240404
      60454500
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      241427
      60656200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      242450
      60657200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      243473
      60658190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      244496
      60659110
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      245519
      60659120
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      246542
      60659190
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      247565
      60659200
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      248588
      60940960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      249611
      60940970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      250634
      60941960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      251657
      60941970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      252680
      60942960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      253703
      60942970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      254726
      60943960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      255749
      60943970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      256772
      60944960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      257795
      60944970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      258818
      60945970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      259841
      60946960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      260864
      60947960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      261887
      60947970
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      262910
      60948960
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      263933
      60950430
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      264956
      62321420
      2016-06-30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

264957 rows × 13 columns



In [46]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [48]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_5Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_5Dtimeorder=pd.merge(floatsDFAll_5Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_5Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_5Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_5Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_5Dtimeorder['chl_rate_log10'] = floatsDFAllRate_5Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_5Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_5Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAllRate_5Dtimeorder.chl_rate.dropna().shape   # (1101,) data points
#floatsDFAllRate_5Dtimeorder.chl_rate_log10.dropna().shape   # (488,) data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge 121.3619077630341
check the sum of the chlor_a after the merge 121.3619077630341






    Out[48]:





(1101,)



In [49]:

    
pd.to_datetime(floatsDFAllRate_5Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_5Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_5Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_5Dtimeorder[selector].chl_rate.dropna().shape) # total (754,)
print('all the data count is', floatsDFAllRate_5Dtimeorder.chl_rate.dropna().shape )   # total (1101,)









    



shape of the selector (264957,)
all the data count in [11-01, 03-31]  is (754,)
all the data count is (1101,)



In [50]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_5Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('5-Day chl_rate')









    Out[50]:





<matplotlib.text.Text at 0x11a9c9320>



In [51]:

    
# standarized series
ts = floatsDFAllRate_5Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('5-Day standardized chl_rate')









    Out[51]:





<matplotlib.text.Text at 0x4ed837780>



In [52]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_5Dtimeorder[ (floatsDFAllRate_5Dtimeorder.time > str(i))  & (floatsDFAllRate_5Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1101
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(53,)
(59,)
(1,)
(44,)
(104,)
(76,)
(151,)
(34,)
(60,)
(14,)
(49,)
(44,)
(226,)
(129,)
(57,)



In [54]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_5Dtimeorder[ (floatsDFAllRate_5Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_5Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 754
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(73,)
(1,)
(6,)
(71,)
(39,)
(120,)
(30,)
(45,)
(5,)
(46,)
(0,)
(153,)
(105,)
(60,)



In [ ]:



In [ ]:



In [55]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_5Dtimeorder[ (floatsDFAllRate_5Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_5Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is  (754,)
df_chl_out_5D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_5D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (754,)






    Out[55]:






  
    
      
      id
      time
      lat
      spd
      var_lon
      vn
      var_lat
      lon
      ve
      temp
      var_tmp
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      6239
      34710
      2002-11-01
      16.90790
      13.19585
      0.000120
      12.28250
      0.000063
      63.13095
      1.02405
      28.99885
      0.001754
      0.386388
      -0.412976
      0.060749
      -1.216459
    
    
      6476
      10206
      2002-11-06
      11.04970
      9.69775
      0.001602
      6.51645
      0.000517
      67.17675
      -4.16140
      NaN
      1000.000000
      0.133946
      -0.873070
      0.005581
      -2.253287
    
    
      6498
      34710
      2002-11-06
      17.33905
      11.97215
      0.000159
      10.54180
      0.000079
      63.14845
      -2.06740
      28.83270
      0.001884
      0.379611
      -0.420661
      -0.006777
      NaN
    
    
      6504
      34721
      2002-11-06
      12.58915
      15.20435
      0.000143
      0.91040
      0.000071
      67.82805
      10.47775
      29.49700
      0.001856
      0.148202
      -0.829147
      0.009522
      -2.021272
    
    
      6735
      10206
      2002-11-11
      11.16030
      2.94440
      0.001463
      1.00360
      0.000474
      67.11145
      -0.92305
      NaN
      1000.000000
      0.125101
      -0.902739
      -0.008845
      NaN



In [56]:

    
df_chl_out_5D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_5D_modisa.to_csv('df_chl_out_5D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_5D_modisa.csv', index_col='index')
test.head()









    Out[56]:






  
    
      
      id
      time
      lat
      spd
      var_lon
      vn
      var_lat
      lon
      ve
      temp
      var_tmp
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      6239
      34710
      2002-11-01
      16.90790
      13.19585
      0.000120
      12.28250
      0.000063
      63.13095
      1.02405
      28.99885
      0.001754
      0.386388
      -0.412976
      0.060749
      -1.216459
    
    
      6476
      10206
      2002-11-06
      11.04970
      9.69775
      0.001602
      6.51645
      0.000517
      67.17675
      -4.16140
      NaN
      1000.000000
      0.133946
      -0.873070
      0.005581
      -2.253287
    
    
      6498
      34710
      2002-11-06
      17.33905
      11.97215
      0.000159
      10.54180
      0.000079
      63.14845
      -2.06740
      28.83270
      0.001884
      0.379611
      -0.420661
      -0.006777
      NaN
    
    
      6504
      34721
      2002-11-06
      12.58915
      15.20435
      0.000143
      0.91040
      0.000071
      67.82805
      10.47775
      29.49700
      0.001856
      0.148202
      -0.829147
      0.009522
      -2.021272
    
    
      6735
      10206
      2002-11-11
      11.16030
      2.94440
      0.001463
      1.00360
      0.000474
      67.11145
      -0.92305
      NaN
      1000.000000
      0.125101
      -0.902739
      -0.008845
      NaN



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	lat	spd	var_lon	vn	var_lat	lon	ve	temp	var_tmp
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1023	10206	2002-07-04	16.19070	9.79220	0.006856	-3.25785	0.001761	66.43755	8.43105	NaN	1000.000000
2046	10208	2002-07-04	13.70170	19.84795	0.000127	-14.38240	0.000066	69.68000	11.61810	NaN	1000.000000
3069	11089	2002-07-04	16.28310	15.81310	0.000128	-7.18860	0.000067	64.83270	13.12565	27.86090	0.003620
4092	15703	2002-07-04	13.71820	19.42420	0.000103	-13.63140	0.000056	69.70245	11.28315	28.56105	0.082646
5115	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6138	27069	2002-07-04	20.15560	27.59125	0.000100	-1.26115	0.000055	69.06060	26.05590	28.94305	0.001685
7161	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8184	28842	2002-07-04	18.76870	20.85820	0.000192	-6.57405	0.000093	60.79630	6.22225	27.67170	0.003329
9207	34159	2002-07-04	12.63455	28.87670	0.000098	8.09605	0.000053	59.21560	27.17165	NaN	1000.000000
10230	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
11253	34210	2002-07-04	6.25245	23.24195	0.000137	-18.27155	0.000069	56.81465	-12.81945	26.72150	0.003663
12276	34211	2002-07-04	8.41740	27.24930	0.000108	-13.14195	0.000058	68.18405	22.85300	28.35265	0.003441
13299	34212	2002-07-04	6.46140	44.91405	0.000093	17.38350	0.000052	65.18250	38.18655	28.52460	0.003544
14322	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
15345	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
16368	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
17391	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
18414	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
19437	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20460	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
21483	34708	2002-07-04	10.20515	40.52780	0.000093	2.85870	0.000051	60.17130	40.13740	27.16585	0.001793
22506	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
23529	34710	2002-07-04	13.23525	47.69970	0.000094	15.57255	0.000050	50.02370	8.34430	31.06135	0.001846
24552	34714	2002-07-04	13.70670	39.88645	0.000109	8.81165	0.000058	64.10040	38.42605	27.73010	0.001814
25575	34716	2002-07-04	7.57385	38.37455	0.000101	6.56485	0.000055	65.78450	37.06290	28.80910	0.001781
26598	34718	2002-07-04	15.95885	34.51250	0.000114	-27.92385	0.000059	72.64835	19.48765	29.12910	0.001700
27621	34719	2002-07-04	17.60155	24.58105	0.000114	-14.54685	0.000060	71.17005	17.55510	28.93850	0.001604
28644	34720	2002-07-04	14.41830	32.49445	0.000120	-29.74220	0.000064	69.29585	10.30660	28.66575	0.001826
29667	34721	2002-07-04	17.08435	12.96220	0.000115	-9.46595	0.000061	65.46475	7.37165	27.90860	0.001799
...	...	...	...	...	...	...	...	...	...	...	...
235289	3098682	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
236312	60073460	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
237335	60074440	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
238358	60077450	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
239381	60150420	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
240404	60454500	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
241427	60656200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
242450	60657200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
243473	60658190	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
244496	60659110	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
245519	60659120	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
246542	60659190	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
247565	60659200	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
248588	60940960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
249611	60940970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
250634	60941960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
251657	60941970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
252680	60942960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
253703	60942970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
254726	60943960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
255749	60943970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
256772	60944960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
257795	60944970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
258818	60945970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
259841	60946960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
260864	60947960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
261887	60947970	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
262910	60948960	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
263933	60950430	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
264956	62321420	2016-06-30	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	lat	spd	var_lon	vn	var_lat	lon	ve	temp	var_tmp	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
6239	34710	2002-11-01	16.90790	13.19585	0.000120	12.28250	0.000063	63.13095	1.02405	28.99885	0.001754	0.386388	-0.412976	0.060749	-1.216459
6476	10206	2002-11-06	11.04970	9.69775	0.001602	6.51645	0.000517	67.17675	-4.16140	NaN	1000.000000	0.133946	-0.873070	0.005581	-2.253287
6498	34710	2002-11-06	17.33905	11.97215	0.000159	10.54180	0.000079	63.14845	-2.06740	28.83270	0.001884	0.379611	-0.420661	-0.006777	NaN
6504	34721	2002-11-06	12.58915	15.20435	0.000143	0.91040	0.000071	67.82805	10.47775	29.49700	0.001856	0.148202	-0.829147	0.009522	-2.021272
6735	10206	2002-11-11	11.16030	2.94440	0.001463	1.00360	0.000474	67.11145	-0.92305	NaN	1000.000000	0.125101	-0.902739	-0.008845	NaN

	id	time	lat	spd	var_lon	vn	var_lat	lon	ve	temp	var_tmp	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
6239	34710	2002-11-01	16.90790	13.19585	0.000120	12.28250	0.000063	63.13095	1.02405	28.99885	0.001754	0.386388	-0.412976	0.060749	-1.216459
6476	10206	2002-11-06	11.04970	9.69775	0.001602	6.51645	0.000517	67.17675	-4.16140	NaN	1000.000000	0.133946	-0.873070	0.005581	-2.253287
6498	34710	2002-11-06	17.33905	11.97215	0.000159	10.54180	0.000079	63.14845	-2.06740	28.83270	0.001884	0.379611	-0.420661	-0.006777	NaN
6504	34721	2002-11-06	12.58915	15.20435	0.000143	0.91040	0.000071	67.82805	10.47775	29.49700	0.001856	0.148202	-0.829147	0.009522	-2.021272
6735	10206	2002-11-11	11.16030	2.94440	0.001463	1.00360	0.000474	67.11145	-0.92305	NaN	1000.000000	0.125101	-0.902739	-0.008845	NaN