3Day subsampling on the OceanColor Dataset



In [8]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [9]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [10]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [11]:

    
[ds.load() for ds in both_datasets]









    Out[11]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [12]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [13]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[13]:





[None, None]



In [14]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[14]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [15]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[15]:





<matplotlib.collections.QuadMesh at 0x11949c358>



In [16]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[16]:





<matplotlib.text.Text at 0x13d340c88>



In [17]:

    
#ds_8day



In [18]:

    
#ds_daily



In [19]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[19]:





<matplotlib.collections.QuadMesh at 0x1292f55c0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [20]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [21]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[21]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...



In [22]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[22]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [23]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day, ds_daily)]



In [24]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [25]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[25]:





<matplotlib.legend.Legend at 0x129b690f0>

Seasonal Climatology



In [26]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [27]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[27]:





<matplotlib.legend.Legend at 0x128d4ada0>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [28]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[28]:





<matplotlib.collections.QuadMesh at 0x129e85c88>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [29]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[29]:





<matplotlib.collections.QuadMesh at 0x12b43b9e8>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [30]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[30]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lon      (points) float64 65.04 70.04
    lat      (points) float64 16.04 18.04
  * points   (points) int64 0 1



In [31]:

    
#ds_daily.chlor_a.sel_points?



In [32]:

    
ds_3day = ds_daily.resample('3D', dim='time')
ds_3day









    Out[32]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 1765)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-07 2002-07-10 ...
Data variables:
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...



In [33]:

    
plt.figure(figsize=(8,6))
ds_3day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[33]:





<matplotlib.collections.QuadMesh at 0x13d2d3240>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [34]:

    
# check the range for the longitude
print(ds_3day.lon.min(),'\n' ,ds_3day.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [35]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [36]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [37]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[37]:





0



In [38]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[38]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [39]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [40]:

    
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[40]:





<matplotlib.axes._subplots.AxesSubplot at 0x4dfa202b0>



In [41]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [42]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[42]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [51]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_3D =floatsDSAll.resample('3D', dim='time')
floatsDSAll_3D









    Out[51]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 1704)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-07 2002-07-10 ...
Data variables:
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003607 0.07764 nan ...
    lon      (time, id) float64 nan 66.38 69.59 64.73 69.62 nan 68.84 nan ...
    vn       (time, id) float64 nan -6.362 -19.1 -7.017 -18.71 nan 4.231 nan ...
    lat      (time, id) float64 nan 16.21 13.82 16.33 13.83 nan 20.18 nan ...
    var_lon  (time, id) float64 nan 0.006395 9.575e-05 0.0001482 9.875e-05 ...
    temp     (time, id) float64 nan nan nan 27.92 28.56 nan 28.97 nan 27.66 ...
    spd      (time, id) float64 nan 12.92 21.86 14.67 21.2 nan 27.52 nan ...
    var_lat  (time, id) float64 nan 0.001675 5.309e-05 7.571e-05 5.407e-05 ...
    ve       (time, id) float64 nan 10.94 9.9 12.24 9.378 nan 26.28 nan ...



In [44]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_3D = floatsDSAll_3D.to_dataframe()
floatsDFAll_3D
floatsDFAll_3D = floatsDFAll_3D.reset_index()
floatsDFAll_3D
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_3D.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x11a4a18d0>



In [45]:

    
# get the value for the chllorophy for each data entry
floatsDFAll_3Dtimeorder = floatsDFAll_3D.sort_values(['time','id'],ascending=True)
floatsDFAll_3Dtimeorder # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[45]:






  
    
      
      id
      time
      var_tmp
      lon
      vn
      lat
      var_lon
      temp
      spd
      var_lat
      ve
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1704
      10206
      2002-07-04
      1000.000000
      66.375833
      -6.362417
      16.208333
      0.006395
      NaN
      12.924000
      0.001675
      10.941333
    
    
      3408
      10208
      2002-07-04
      1000.000000
      69.589833
      -19.104583
      13.816917
      0.000096
      NaN
      21.864250
      0.000053
      9.899750
    
    
      5112
      11089
      2002-07-04
      0.003607
      64.731500
      -7.016583
      16.331167
      0.000148
      27.917667
      14.670833
      0.000076
      12.239583
    
    
      6816
      15703
      2002-07-04
      0.077642
      69.617667
      -18.706167
      13.829750
      0.000099
      28.555167
      21.204917
      0.000054
      9.378000
    
    
      8520
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      27069
      2002-07-04
      0.001681
      68.844083
      4.231083
      20.177000
      0.000107
      28.973000
      27.516167
      0.000058
      26.284000
    
    
      11928
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13632
      28842
      2002-07-04
      0.003285
      60.748083
      -5.116833
      18.852000
      0.000225
      27.663833
      24.501167
      0.000106
      11.585000
    
    
      15336
      34159
      2002-07-04
      1000.000000
      59.009333
      5.826667
      12.568250
      0.000109
      NaN
      26.245667
      0.000058
      25.174250
    
    
      17040
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18744
      34210
      2002-07-04
      0.003603
      56.899000
      -16.234583
      6.409500
      0.000146
      26.715250
      19.873667
      0.000072
      -9.563583
    
    
      20448
      34211
      2002-07-04
      0.003496
      68.015250
      -15.920167
      8.539083
      0.000098
      28.316167
      26.681000
      0.000054
      20.941750
    
    
      22152
      34212
      2002-07-04
      0.003571
      64.844583
      18.941000
      6.327167
      0.000096
      28.476000
      32.034083
      0.000053
      23.924750
    
    
      23856
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      25560
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      27264
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      28968
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      30672
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      32376
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      34080
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      35784
      34708
      2002-07-04
      0.001796
      59.870667
      2.843583
      10.175333
      0.000093
      27.167000
      43.217000
      0.000050
      42.975000
    
    
      37488
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      39192
      34710
      2002-07-04
      0.001769
      49.858667
      28.993750
      13.062167
      0.000066
      30.956917
      47.145833
      0.000037
      -17.011917
    
    
      40896
      34714
      2002-07-04
      0.001840
      63.802250
      11.571500
      13.643167
      0.000110
      27.707000
      39.495917
      0.000058
      37.529500
    
    
      42600
      34716
      2002-07-04
      0.001765
      65.514500
      3.266917
      7.507417
      0.000105
      28.814583
      36.961917
      0.000057
      36.070250
    
    
      44304
      34718
      2002-07-04
      0.001739
      72.491917
      -29.327667
      16.206417
      0.000082
      29.149750
      37.194417
      0.000046
      22.008917
    
    
      46008
      34719
      2002-07-04
      0.001578
      71.027333
      -10.221667
      17.720833
      0.000088
      28.921667
      22.969250
      0.000049
      19.046167
    
    
      47712
      34720
      2002-07-04
      0.001779
      69.224833
      -36.747667
      14.669917
      0.000118
      28.653417
      38.318250
      0.000063
      9.947083
    
    
      49416
      34721
      2002-07-04
      0.001746
      65.406167
      -9.409917
      17.159250
      0.000120
      27.919250
      14.014583
      0.000063
      9.293667
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      391919
      3098682
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      393623
      60073460
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      395327
      60074440
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      397031
      60077450
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      398735
      60150420
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      400439
      60454500
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      402143
      60656200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      403847
      60657200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      405551
      60658190
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      407255
      60659110
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      408959
      60659120
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      410663
      60659190
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      412367
      60659200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      414071
      60940960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      415775
      60940970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      417479
      60941960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      419183
      60941970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      420887
      60942960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      422591
      60942970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      424295
      60943960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      425999
      60943970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      427703
      60944960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      429407
      60944970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      431111
      60945970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      432815
      60946960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      434519
      60947960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      436223
      60947970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      437927
      60948960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      439631
      60950430
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      441335
      62321420
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

441336 rows × 11 columns



In [46]:

    
floatsDFAll_3Dtimeorder.lon.dropna().shape  # the longitude data has lots of values (9689,)









    Out[46]:





(9689,)



In [47]:

    
# a little test for the api in loops for the dataframe   
# check df.itertuples? it is faster and preserves the data format
'''
chl_ocx=[]
for row in floats_timeorder.itertuples():
    #print(row)
    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )
    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation
    chl_ocx.append(tmp)
floats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)
chl_ocx[0].to_series
'''









    Out[47]:





"\nchl_ocx=[]\nfor row in floats_timeorder.itertuples():\n    #print(row)\n    #print('row.time = %s, row.id=%d, row.lon=%4.3f, row.lat=%4.3f' % (row.time,row.id,row.lon,row.lat)  )\n    tmp=ds_2day.chl_ocx.sel_points(time=[row.time],lon=[row.lon], lat=[row.lat], method='nearest') # interpolation\n    chl_ocx.append(tmp)\nfloats_timeorder['chl_ocx'] = pd.Series(chl_ocx, index=floats_timeorder.index)\nchl_ocx[0].to_series\n"



In [48]:

    
# this one line avoid the list above
# it took a really long time for 2D interpolation, it takes an hour
tmpAll = ds_3day.chlor_a.sel_points(time=list(floatsDFAll_3Dtimeorder.time),lon=list(floatsDFAll_3Dtimeorder.lon), lat=list(floatsDFAll_3Dtimeorder.lat), method='nearest')
print('the count of nan vaues in tmpAll is',tmpAll.to_series().isnull().sum())









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less
  indexer = np.where(op(left_distances, right_distances) |
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/indexes/base.py:2352: RuntimeWarning: invalid value encountered in less_equal
  indexer = np.where(op(left_distances, right_distances) |






    



the count of nan vaues in tmpAll is 438948



In [49]:

    
#print(tmpAll.dropna().shape)
tmpAll.to_series().dropna().shape  # (2388,) good values









    Out[49]:





(2388,)



In [50]:

    
# tmp.to_series() to transfer it from xarray dataset to series
floatsDFAll_3Dtimeorder['chlor_a'] = pd.Series(np.array(tmpAll.to_series()), index=floatsDFAll_3Dtimeorder.index)
print("after editing the dataframe the nan values in 'chlor_a' is", floatsDFAll_3Dtimeorder.chlor_a.isnull().sum() )  # they should be the same values as above

# take a look at the data
floatsDFAll_3Dtimeorder

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_3Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a', cmap='RdBu_r', edgecolor='none', ax=ax)

def scale(x):
    logged = np.log10(x)
    return logged

#print(floatsAll_timeorder['chlor_a'].apply(scale))
floatsDFAll_3Dtimeorder['chlor_a_log10'] = floatsDFAll_3Dtimeorder['chlor_a'].apply(scale)
floatsDFAll_3Dtimeorder
#print("after the transformation the nan values in 'chlor_a_log10' is", floatsAll_timeorder.chlor_a_log10.isnull().sum() )

# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_3Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chlor_a_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
#floatsDFAll_3Dtimeorder.chlor_a.dropna().shape  # (2388,)
floatsDFAll_3Dtimeorder.chlor_a_log10.dropna().shape  # (2388,)









    



after editing the dataframe the nan values in 'chlor_a' is 438948






    Out[50]:





(2388,)



In [59]:

    
# take the diff of the chlor_a, and this has to be done in xarray
# transfer the dataframe into xarry dataset again
# take the difference
floatsDFAll_3Dtimeorder









    Out[59]:






  
    
      
      id
      time
      var_tmp
      lon
      vn
      lat
      var_lon
      temp
      spd
      var_lat
      ve
      chlor_a
      chlor_a_log10
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1704
      10206
      2002-07-04
      1000.000000
      66.375833
      -6.362417
      16.208333
      0.006395
      NaN
      12.924000
      0.001675
      10.941333
      NaN
      NaN
    
    
      3408
      10208
      2002-07-04
      1000.000000
      69.589833
      -19.104583
      13.816917
      0.000096
      NaN
      21.864250
      0.000053
      9.899750
      NaN
      NaN
    
    
      5112
      11089
      2002-07-04
      0.003607
      64.731500
      -7.016583
      16.331167
      0.000148
      27.917667
      14.670833
      0.000076
      12.239583
      NaN
      NaN
    
    
      6816
      15703
      2002-07-04
      0.077642
      69.617667
      -18.706167
      13.829750
      0.000099
      28.555167
      21.204917
      0.000054
      9.378000
      NaN
      NaN
    
    
      8520
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10224
      27069
      2002-07-04
      0.001681
      68.844083
      4.231083
      20.177000
      0.000107
      28.973000
      27.516167
      0.000058
      26.284000
      NaN
      NaN
    
    
      11928
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13632
      28842
      2002-07-04
      0.003285
      60.748083
      -5.116833
      18.852000
      0.000225
      27.663833
      24.501167
      0.000106
      11.585000
      NaN
      NaN
    
    
      15336
      34159
      2002-07-04
      1000.000000
      59.009333
      5.826667
      12.568250
      0.000109
      NaN
      26.245667
      0.000058
      25.174250
      NaN
      NaN
    
    
      17040
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18744
      34210
      2002-07-04
      0.003603
      56.899000
      -16.234583
      6.409500
      0.000146
      26.715250
      19.873667
      0.000072
      -9.563583
      NaN
      NaN
    
    
      20448
      34211
      2002-07-04
      0.003496
      68.015250
      -15.920167
      8.539083
      0.000098
      28.316167
      26.681000
      0.000054
      20.941750
      NaN
      NaN
    
    
      22152
      34212
      2002-07-04
      0.003571
      64.844583
      18.941000
      6.327167
      0.000096
      28.476000
      32.034083
      0.000053
      23.924750
      NaN
      NaN
    
    
      23856
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      25560
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      27264
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      28968
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      30672
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      32376
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      34080
      34374
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      35784
      34708
      2002-07-04
      0.001796
      59.870667
      2.843583
      10.175333
      0.000093
      27.167000
      43.217000
      0.000050
      42.975000
      NaN
      NaN
    
    
      37488
      34709
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      39192
      34710
      2002-07-04
      0.001769
      49.858667
      28.993750
      13.062167
      0.000066
      30.956917
      47.145833
      0.000037
      -17.011917
      NaN
      NaN
    
    
      40896
      34714
      2002-07-04
      0.001840
      63.802250
      11.571500
      13.643167
      0.000110
      27.707000
      39.495917
      0.000058
      37.529500
      NaN
      NaN
    
    
      42600
      34716
      2002-07-04
      0.001765
      65.514500
      3.266917
      7.507417
      0.000105
      28.814583
      36.961917
      0.000057
      36.070250
      NaN
      NaN
    
    
      44304
      34718
      2002-07-04
      0.001739
      72.491917
      -29.327667
      16.206417
      0.000082
      29.149750
      37.194417
      0.000046
      22.008917
      NaN
      NaN
    
    
      46008
      34719
      2002-07-04
      0.001578
      71.027333
      -10.221667
      17.720833
      0.000088
      28.921667
      22.969250
      0.000049
      19.046167
      NaN
      NaN
    
    
      47712
      34720
      2002-07-04
      0.001779
      69.224833
      -36.747667
      14.669917
      0.000118
      28.653417
      38.318250
      0.000063
      9.947083
      NaN
      NaN
    
    
      49416
      34721
      2002-07-04
      0.001746
      65.406167
      -9.409917
      17.159250
      0.000120
      27.919250
      14.014583
      0.000063
      9.293667
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      391919
      3098682
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      393623
      60073460
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      395327
      60074440
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      397031
      60077450
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      398735
      60150420
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      400439
      60454500
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      402143
      60656200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      403847
      60657200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      405551
      60658190
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      407255
      60659110
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      408959
      60659120
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      410663
      60659190
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      412367
      60659200
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      414071
      60940960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      415775
      60940970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      417479
      60941960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      419183
      60941970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      420887
      60942960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      422591
      60942970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      424295
      60943960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      425999
      60943970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      427703
      60944960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      429407
      60944970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      431111
      60945970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      432815
      60946960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      434519
      60947960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      436223
      60947970
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      437927
      60948960
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      439631
      60950430
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      441335
      62321420
      2016-06-29
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

441336 rows × 13 columns



In [72]:

    
# unstack() will provide a 2d dataframe
# reset_index() will reset all the index as columns



In [74]:

    
# prepare the data in dataset and about to take the diff
tmp = xr.Dataset.from_dataframe(floatsDFAll_3Dtimeorder.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
# take the diff on the chlor_a
chlor_a_rate = tmp.diff(dim='time',n=1).chlor_a.to_series().reset_index()
# make the column to a proper name
chlor_a_rate.rename(columns={'chlor_a':'chl_rate'}, inplace='True')
chlor_a_rate


# merge the two dataframes {floatsDFAll_XDtimeorder; chlor_a_rate} into one dataframe based on the index {id, time} and use the left method
floatsDFAllRate_3Dtimeorder=pd.merge(floatsDFAll_3Dtimeorder,chlor_a_rate, on=['time','id'], how = 'left')
floatsDFAllRate_3Dtimeorder

# check 
print('check the sum of the chlor_a before the merge', chlor_a_rate.chl_rate.sum())
print('check the sum of the chlor_a after the merge',floatsDFAllRate_3Dtimeorder.chl_rate.sum())


# visualize the chlorophyll rate, it is *better* to visualize at this scale
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_3Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.8, vmax=0.8, edgecolor='none', ax=ax)

# visualize the chlorophyll rate on the log scale
floatsDFAllRate_3Dtimeorder['chl_rate_log10'] = floatsDFAllRate_3Dtimeorder['chl_rate'].apply(scale)
floatsDFAllRate_3Dtimeorder
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAllRate_3Dtimeorder.plot(kind='scatter', x='lon', y='lat', c='chl_rate_log10', cmap='RdBu_r', edgecolor='none', ax=ax)
floatsDFAllRate_3Dtimeorder.chl_rate.dropna().shape   # (1018,) data points
#floatsDFAllRate_3Dtimeorder.chl_rate_log10.dropna().shape   # (452,)data points..... notice, chl_rate can be negative, so do not take log10









    



check the sum of the chlor_a before the merge -25.318965535610925
check the sum of the chlor_a after the merge -25.318965535610925






    Out[74]:





(1018,)



In [75]:

    
pd.to_datetime(floatsDFAllRate_3Dtimeorder.time)
type(pd.to_datetime(floatsDFAllRate_3Dtimeorder.time))
ts = pd.Series(0, index=pd.to_datetime(floatsDFAllRate_3Dtimeorder.time) ) # creat a target time series for masking purpose

# take the month out
month = ts.index.month 
# month.shape # a check on the shape of the month.
selector = ((11==month) | (12==month) | (1==month) | (2==month) | (3==month) )  
selector
print('shape of the selector', selector.shape)

print('all the data count in [11-01, 03-31]  is', floatsDFAllRate_3Dtimeorder[selector].chl_rate.dropna().shape) # total  (739,)
print('all the data count is', floatsDFAllRate_3Dtimeorder.chl_rate.dropna().shape )   # total (1018,)









    



shape of the selector (441336,)
all the data count in [11-01, 03-31]  is (739,)
all the data count is (1018,)



In [76]:

    
# histogram for non standarized data
axfloat = floatsDFAllRate_3Dtimeorder[selector].chl_rate.dropna().hist(bins=100,range=[-0.3,0.3])
axfloat.set_title('3-Day chl_rate')









    Out[76]:





<matplotlib.text.Text at 0x1335064a8>



In [77]:

    
# standarized series
ts = floatsDFAllRate_3Dtimeorder[selector].chl_rate.dropna()
ts_standardized = (ts - ts.mean())/ts.std()
axts = ts_standardized.hist(bins=100,range=[-0.3,0.3])
axts.set_title('3-Day standardized chl_rate')









    Out[77]:





<matplotlib.text.Text at 0x123682be0>



In [78]:

    
# all the data
fig, axes = plt.subplots(nrows=8, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2017), axes.flat) :
    tmpyear = floatsDFAllRate_3Dtimeorder[ (floatsDFAllRate_3Dtimeorder.time > str(i))  & (floatsDFAllRate_3Dtimeorder.time < str(i+1)) ] # if year i
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)   # total is 1016
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r',vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)     
    
# remove the extra figure
ax = plt.subplot(8,2,16)
fig.delaxes(ax)









    



(47,)
(56,)
(3,)
(39,)
(92,)
(75,)
(123,)
(44,)
(50,)
(18,)
(38,)
(46,)
(227,)
(118,)
(40,)



In [79]:

    
fig, axes = plt.subplots(nrows=7, ncols=2, figsize=(12, 10))
fig.subplots_adjust(hspace=0.05, wspace=0.05)

for i, ax in zip(range(2002,2016), axes.flat) :
    tmpyear = floatsDFAllRate_3Dtimeorder[ (floatsDFAllRate_3Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_3Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    #fig, ax  = plt.subplots(figsize=(12,10))
    print(tmpyear.chl_rate.dropna().shape)  # the total is 739
    tmpyear.plot(kind='scatter', x='lon', y='lat', c='chl_rate', cmap='RdBu_r', vmin=-0.6, vmax=0.6, edgecolor='none', ax=ax)
    ax.set_title('year %g' % i)









    



(76,)
(0,)
(5,)
(65,)
(38,)
(108,)
(35,)
(44,)
(3,)
(36,)
(0,)
(160,)
(119,)
(50,)



In [ ]:



In [ ]:



In [81]:

    
# let's output the data as a csv or hdf file to disk to save the experiment time

df_list = []
for i in range(2002,2017) :
    tmpyear = floatsDFAllRate_3Dtimeorder[ (floatsDFAllRate_3Dtimeorder.time >= (str(i)+ '-11-01') )  & (floatsDFAllRate_3Dtimeorder.time <= (str(i+1)+'-03-31') ) ] # if year i
    # select only particular month, Nov 1 to March 31
    df_list.append(tmpyear)
    
df_tmp = pd.concat(df_list)
print('all the data count in [11-01, 03-31]  is ', df_tmp.chl_rate.dropna().shape) # again, the total is (739,)
df_chl_out_3D_modisa = df_tmp[~df_tmp.chl_rate.isnull()] # only keep the non-nan values
#list(df_chl_out_XD.groupby(['id']))   # can see the continuity pattern of the Lagarangian difference for each float id

# output to a csv or hdf file
df_chl_out_3D_modisa.head()









    



all the data count in [11-01, 03-31]  is  (739,)






    Out[81]:






  
    
      
      id
      time
      var_tmp
      lon
      vn
      lat
      var_lon
      temp
      spd
      var_lat
      ve
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
  
  
    
      10620
      10206
      2002-11-04
      1000.000000
      67.315250
      6.904000
      10.885583
      0.001747
      NaN
      11.224333
      0.000579
      -6.069667
      0.145567
      -0.836937
      0.017202
      -1.764421
    
    
      10648
      34721
      2002-11-04
      0.001778
      67.626250
      -0.428083
      12.628833
      0.000122
      29.590750
      13.099250
      0.000064
      6.291000
      0.129693
      -0.887083
      -0.024359
      NaN
    
    
      10879
      10206
      2002-11-07
      1000.000000
      67.174083
      6.697417
      11.064250
      0.000558
      NaN
      10.497583
      0.000221
      -5.759333
      0.129001
      -0.889407
      -0.016566
      NaN
    
    
      10881
      11089
      2002-11-07
      0.003795
      64.770000
      1.865000
      14.365167
      0.000151
      28.995083
      16.718083
      0.000075
      -15.957833
      0.192121
      -0.716425
      0.033696
      -1.472422
    
    
      10883
      15707
      2002-11-07
      1000.000000
      67.346250
      -24.346083
      13.640333
      0.000132
      NaN
      29.831500
      0.000068
      -15.104667
      0.158005
      -0.801329
      -0.008466
      NaN



In [82]:

    
df_chl_out_3D_modisa.index.name = 'index'  # make it specific for the index name

# CSV CSV CSV CSV with specfic index
df_chl_out_3D_modisa.to_csv('df_chl_out_3D_modisa.csv', sep=',', index_label = 'index')

# load CSV output
test = pd.read_csv('df_chl_out_3D_modisa.csv', index_col='index')
test.head()









    Out[82]:






  
    
      
      id
      time
      var_tmp
      lon
      vn
      lat
      var_lon
      temp
      spd
      var_lat
      ve
      chlor_a
      chlor_a_log10
      chl_rate
      chl_rate_log10
    
    
      index
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      10620
      10206
      2002-11-04
      1000.000000
      67.315250
      6.904000
      10.885583
      0.001747
      NaN
      11.224333
      0.000579
      -6.069667
      0.145567
      -0.836937
      0.017202
      -1.764421
    
    
      10648
      34721
      2002-11-04
      0.001778
      67.626250
      -0.428083
      12.628833
      0.000122
      29.590750
      13.099250
      0.000064
      6.291000
      0.129693
      -0.887083
      -0.024359
      NaN
    
    
      10879
      10206
      2002-11-07
      1000.000000
      67.174083
      6.697417
      11.064250
      0.000558
      NaN
      10.497583
      0.000221
      -5.759333
      0.129001
      -0.889407
      -0.016566
      NaN
    
    
      10881
      11089
      2002-11-07
      0.003795
      64.770000
      1.865000
      14.365167
      0.000151
      28.995083
      16.718083
      0.000075
      -15.957833
      0.192121
      -0.716425
      0.033696
      -1.472422
    
    
      10883
      15707
      2002-11-07
      1000.000000
      67.346250
      -24.346083
      13.640333
      0.000132
      NaN
      29.831500
      0.000068
      -15.104667
      0.158005
      -0.801329
      -0.008466
      NaN



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	var_tmp	lon	vn	lat	var_lon	temp	spd	var_lat	ve
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1704	10206	2002-07-04	1000.000000	66.375833	-6.362417	16.208333	0.006395	NaN	12.924000	0.001675	10.941333
3408	10208	2002-07-04	1000.000000	69.589833	-19.104583	13.816917	0.000096	NaN	21.864250	0.000053	9.899750
5112	11089	2002-07-04	0.003607	64.731500	-7.016583	16.331167	0.000148	27.917667	14.670833	0.000076	12.239583
6816	15703	2002-07-04	0.077642	69.617667	-18.706167	13.829750	0.000099	28.555167	21.204917	0.000054	9.378000
8520	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
10224	27069	2002-07-04	0.001681	68.844083	4.231083	20.177000	0.000107	28.973000	27.516167	0.000058	26.284000
11928	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
13632	28842	2002-07-04	0.003285	60.748083	-5.116833	18.852000	0.000225	27.663833	24.501167	0.000106	11.585000
15336	34159	2002-07-04	1000.000000	59.009333	5.826667	12.568250	0.000109	NaN	26.245667	0.000058	25.174250
17040	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
18744	34210	2002-07-04	0.003603	56.899000	-16.234583	6.409500	0.000146	26.715250	19.873667	0.000072	-9.563583
20448	34211	2002-07-04	0.003496	68.015250	-15.920167	8.539083	0.000098	28.316167	26.681000	0.000054	20.941750
22152	34212	2002-07-04	0.003571	64.844583	18.941000	6.327167	0.000096	28.476000	32.034083	0.000053	23.924750
23856	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25560	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
27264	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
28968	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
30672	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
32376	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
34080	34374	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
35784	34708	2002-07-04	0.001796	59.870667	2.843583	10.175333	0.000093	27.167000	43.217000	0.000050	42.975000
37488	34709	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
39192	34710	2002-07-04	0.001769	49.858667	28.993750	13.062167	0.000066	30.956917	47.145833	0.000037	-17.011917
40896	34714	2002-07-04	0.001840	63.802250	11.571500	13.643167	0.000110	27.707000	39.495917	0.000058	37.529500
42600	34716	2002-07-04	0.001765	65.514500	3.266917	7.507417	0.000105	28.814583	36.961917	0.000057	36.070250
44304	34718	2002-07-04	0.001739	72.491917	-29.327667	16.206417	0.000082	29.149750	37.194417	0.000046	22.008917
46008	34719	2002-07-04	0.001578	71.027333	-10.221667	17.720833	0.000088	28.921667	22.969250	0.000049	19.046167
47712	34720	2002-07-04	0.001779	69.224833	-36.747667	14.669917	0.000118	28.653417	38.318250	0.000063	9.947083
49416	34721	2002-07-04	0.001746	65.406167	-9.409917	17.159250	0.000120	27.919250	14.014583	0.000063	9.293667
...	...	...	...	...	...	...	...	...	...	...	...
391919	3098682	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
393623	60073460	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
395327	60074440	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
397031	60077450	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
398735	60150420	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
400439	60454500	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
402143	60656200	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
403847	60657200	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
405551	60658190	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
407255	60659110	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
408959	60659120	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
410663	60659190	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
412367	60659200	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
414071	60940960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
415775	60940970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
417479	60941960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
419183	60941970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
420887	60942960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
422591	60942970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
424295	60943960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
425999	60943970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
427703	60944960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
429407	60944970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
431111	60945970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
432815	60946960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
434519	60947960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
436223	60947970	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
437927	60948960	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
439631	60950430	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
441335	62321420	2016-06-29	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	id	time	var_tmp	lon	vn	lat	var_lon	temp	spd	var_lat	ve	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
10620	10206	2002-11-04	1000.000000	67.315250	6.904000	10.885583	0.001747	NaN	11.224333	0.000579	-6.069667	0.145567	-0.836937	0.017202	-1.764421
10648	34721	2002-11-04	0.001778	67.626250	-0.428083	12.628833	0.000122	29.590750	13.099250	0.000064	6.291000	0.129693	-0.887083	-0.024359	NaN
10879	10206	2002-11-07	1000.000000	67.174083	6.697417	11.064250	0.000558	NaN	10.497583	0.000221	-5.759333	0.129001	-0.889407	-0.016566	NaN
10881	11089	2002-11-07	0.003795	64.770000	1.865000	14.365167	0.000151	28.995083	16.718083	0.000075	-15.957833	0.192121	-0.716425	0.033696	-1.472422
10883	15707	2002-11-07	1000.000000	67.346250	-24.346083	13.640333	0.000132	NaN	29.831500	0.000068	-15.104667	0.158005	-0.801329	-0.008466	NaN

	id	time	var_tmp	lon	vn	lat	var_lon	temp	spd	var_lat	ve	chlor_a	chlor_a_log10	chl_rate	chl_rate_log10
index
10620	10206	2002-11-04	1000.000000	67.315250	6.904000	10.885583	0.001747	NaN	11.224333	0.000579	-6.069667	0.145567	-0.836937	0.017202	-1.764421
10648	34721	2002-11-04	0.001778	67.626250	-0.428083	12.628833	0.000122	29.590750	13.099250	0.000064	6.291000	0.129693	-0.887083	-0.024359	NaN
10879	10206	2002-11-07	1000.000000	67.174083	6.697417	11.064250	0.000558	NaN	10.497583	0.000221	-5.759333	0.129001	-0.889407	-0.016566	NaN
10881	11089	2002-11-07	0.003795	64.770000	1.865000	14.365167	0.000151	28.995083	16.718083	0.000075	-15.957833	0.192121	-0.716425	0.033696	-1.472422
10883	15707	2002-11-07	1000.000000	67.346250	-24.346083	13.640333	0.000132	NaN	29.831500	0.000068	-15.104667	0.158005	-0.801329	-0.008466	NaN