2Day subsampling on the OceanColor Dataset

LDS-fitting framework:
after resampling based on frequency, we drop NANs on float dataset for ['id', 'lat', 'lon', 'time'] before carrying out the multilinear interpolation

-> {e.g. row_case4.dropna(subset=['id', 'lat', 'lon', 'time'], how = 'any') # these four fields are critical}

-> {for ['id', 'time'], you will get lots of nonsense data, typical life of float is 1~2 years}

-> {after drop nans in ['id', 'lat', 'lon', 'time'], one can assume the float data has a good quality and is continuous during its life time}

-> {therefore, LDS is applid to the chl-a after dropping nans above}
the data is processed(interpolated) to generate {chl-a} and {distance to coast} only, more later...
{more features to be added at this step if needed}
{fill-in preprocessing here, if needed}
#not# taking any temporal difference to compute chl-rate; before chl_rates.add_chl_rates
#not# taking any reduction to the November-March period to keep the time index intact
visualize the chl-a for one float
output the dataframe for LDS
fit LDS for one float



In [1]:

    
import xarray as xr
import numpy as np
import pandas as pd
%matplotlib inline
from matplotlib import pyplot as plt
from dask.diagnostics import ProgressBar
import seaborn as sns
from matplotlib.colors import LogNorm



In [2]:

    
# resampling frequency in number of days
freq=2

Load data from disk

We already downloaded a subsetted MODIS-Aqua chlorophyll-a dataset for the Arabian Sea.

We can read all the netcdf files into one xarray Dataset using the open_mfsdataset function. Note that this does not load the data into memory yet. That only happens when we try to access the values.



In [3]:

    
ds_8day = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_8D.nc')
ds_daily = xr.open_mfdataset('./data_collector_modisa_chla9km/ModisA_Arabian_Sea_chlor_a_9km_*_D.nc')
both_datasets = [ds_8day, ds_daily]

How much data is contained here? Let's get the answer in MB.



In [4]:

    
print([(ds.nbytes / 1e6) for ds in both_datasets])









    



[534.295504, 4241.4716]

The 8-day dataset is ~534 MB while the daily dataset is 4.2 GB. These both easily fit in RAM. So let's load them all into memory



In [5]:

    
[ds.load() for ds in both_datasets]









    Out[5]:





[<xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 667)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...,
 <xarray.Dataset>
 Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 5295)
 Coordinates:
   * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
   * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
   * rgb            (rgb) int64 0 1 2
   * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
   * time           (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
 Data variables:
     chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
     palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...]

Fix bad data

In preparing this demo, I noticed that small number of maps had bad data--specifically, they contained large negative values of chlorophyll concentration. Looking closer, I realized that the land/cloud mask had been inverted. So I wrote a function to invert it back and correct the data.



In [6]:

    
def fix_bad_data(ds):
    # for some reason, the cloud / land mask is backwards on some data
    # this is obvious because there are chlorophyl values less than zero
    bad_data = ds.chlor_a.groupby('time').min() < 0
    # loop through and fix
    for n in np.nonzero(bad_data.values)[0]:
        data = ds.chlor_a[n].values 
        ds.chlor_a.values[n] = np.ma.masked_less(data, 0).filled(np.nan)



In [7]:

    
[fix_bad_data(ds) for ds in both_datasets]









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in less
  if not reflexive






    Out[7]:





[None, None]



In [8]:

    
ds_8day.chlor_a>0









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[8]:





<xarray.DataArray 'chlor_a' (time: 667, lat: 276, lon: 360)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ...,  True, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False,  True],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]], dtype=bool)
Coordinates:
  * lat      (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 27.37 ...
  * lon      (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 45.63 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-12 2002-07-20 ...

Count the number of ocean data points

First we have to figure out the land mask. Unfortunately it doesn't come with the dataset. But we can infer it by counting all the points that have at least one non-nan chlorophyll value.



In [9]:

    
(ds_8day.chlor_a>0).sum(dim='time').plot()









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[9]:





<matplotlib.collections.QuadMesh at 0x123b29f60>



In [10]:

    
#  find a mask for the land
ocean_mask = (ds_8day.chlor_a>0).sum(dim='time')>0
#ocean_mask = (ds_daily.chlor_a>0).sum(dim='time')>0
num_ocean_points = ocean_mask.sum().values  # compute the total nonzeros regions(data point)
ocean_mask.plot()
plt.title('%g total ocean points' % num_ocean_points)









    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/xarray/core/variable.py:1046: RuntimeWarning: invalid value encountered in greater
  if not reflexive






    Out[10]:





<matplotlib.text.Text at 0x146d48940>



In [11]:

    
#ds_8day



In [12]:

    
#ds_daily



In [13]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time='2002-11-18',method='nearest').plot(norm=LogNorm())
#ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[13]:





<matplotlib.collections.QuadMesh at 0x124b217b8>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [14]:

    
#list(ds_daily.groupby('time')) # take a look at what's inside

Now we count up the number of valid points in each snapshot and divide by the total number of ocean points.



In [15]:

    
'''
<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 144, lon: 276, rgb: 3, time: 4748)
'''
ds_daily.groupby('time').count() # information from original data









    Out[15]:





<xarray.Dataset>
Dimensions:  (time: 5295)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...
Data variables:
    chlor_a  (time) int64 658 1170 1532 2798 2632 1100 1321 636 2711 1163 ...
    palette  (time) int64 768 768 768 768 768 768 768 768 768 768 768 768 ...



In [16]:

    
ds_daily.chlor_a.groupby('time').count()/float(num_ocean_points)









    Out[16]:





<xarray.DataArray 'chlor_a' (time: 5295)>
array([ 0.01053255,  0.01872809,  0.02452259, ...,  0.        ,
        0.        ,  0.        ])
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-05 2002-07-06 ...



In [17]:

    
count_8day,count_daily = [ds.chlor_a.groupby('time').count()/float(num_ocean_points)
                            for ds in (ds_8day,ds_daily)]



In [18]:

    
#count_8day = ds_8day.chl_ocx.groupby('time').count()/float(num_ocean_points)
#coundt_daily = ds_daily.chl_ocx.groupby('time').count()/float(num_ocean_points)

#count_8day, coundt_daily = [ds.chl_ocx.groupby('time').count()/float(num_ocean_points)
#                            for ds in ds_8day, ds_daily] # not work in python 3



In [19]:

    
plt.figure(figsize=(12,4))
count_8day.plot(color='k')
count_daily.plot(color='r')

plt.legend(['8 day','daily'])









    Out[19]:





<matplotlib.legend.Legend at 0x123b26a20>

Seasonal Climatology



In [20]:

    
count_8day_clim, coundt_daily_clim = [count.groupby('time.month').mean()  # monthly data
                                      for count in (count_8day, count_daily)]



In [21]:

    
# mean value of the monthly data on the count of nonzeros
plt.figure(figsize=(12,4))
count_8day_clim.plot(color='k')
coundt_daily_clim.plot(color='r')
plt.legend(['8 day', 'daily'])









    Out[21]:





<matplotlib.legend.Legend at 0x12722e358>

From the above figure, we see that data coverage is highest in the winter (especially Feburary) and lowest in summer.

Maps of individual days

Let's grab some data from Febrauary and plot it.



In [22]:

    
target_date = '2003-02-15'
plt.figure(figsize=(8,6))
ds_8day.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[22]:





<matplotlib.collections.QuadMesh at 0x127044048>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [23]:

    
plt.figure(figsize=(8,6))
ds_daily.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[23]:





<matplotlib.collections.QuadMesh at 0x1272d9f28>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [24]:

    
ds_daily.chlor_a[0].sel_points(lon=[65, 70], lat=[16, 18], method='nearest')   # the time is selected!
#ds_daily.chl_ocx[0].sel_points(time= times, lon=lons, lat=times, method='nearest')









    Out[24]:





<xarray.DataArray 'chlor_a' (points: 2)>
array([ nan,  nan])
Coordinates:
    time     datetime64[ns] 2002-07-04
    lat      (points) float64 16.04 18.04
    lon      (points) float64 65.04 70.04
  * points   (points) int64 0 1



In [25]:

    
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]], 
               index=['a', 'b', 'c', 'd'], columns=['one', 'two'])
df.mean(axis=1,skipna=None) # try skipna=False, skipna=True(seems to equiv to skipna=None)









    Out[25]:





a    1.400
b    1.300
c      NaN
d   -0.275
dtype: float64



In [26]:

    
freq_resample = str(freq) + 'D'
ds_resample = ds_daily.resample(freq_resample, dim='time')  # see the above for doc, test case, & default behavior
ds_resample









    Out[26]:





<xarray.Dataset>
Dimensions:        (eightbitcolor: 256, lat: 276, lon: 360, rgb: 3, time: 2648)
Coordinates:
  * lat            (lat) float64 27.96 27.87 27.79 27.71 27.62 27.54 27.46 ...
  * lon            (lon) float64 45.04 45.13 45.21 45.29 45.38 45.46 45.54 ...
  * rgb            (rgb) int64 0 1 2
  * eightbitcolor  (eightbitcolor) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...
  * time           (time) datetime64[ns] 2002-07-04 2002-07-06 2002-07-08 ...
Data variables:
    chlor_a        (time, lat, lon) float64 nan nan nan nan nan nan nan nan ...
    palette        (time, rgb, eightbitcolor) float64 -109.0 0.0 108.0 ...



In [27]:

    
plt.figure(figsize=(8,6))
ds_resample.chlor_a.sel(time=target_date, method='nearest').plot(norm=LogNorm())









    Out[27]:





<matplotlib.collections.QuadMesh at 0x1cefa3be0>






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/colors.py:1022: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0



In [28]:

    
# check the range for the longitude
print(ds_resample.lon.min(),'\n' ,ds_resample.lat.min())









    



<xarray.DataArray 'lon' ()>
array(45.04166793823242) 
 <xarray.DataArray 'lat' ()>
array(5.041661739349365)

++++++++++++++++++++++++++++++++++++++++++++++

All GDP Floats

Load the float data

Map a (time, lon, lat) to a value on the cholorphlly value



In [29]:

    
# in the following we deal with the data from the gdp float
from buyodata import buoydata
import os



In [30]:

    
# a list of files
fnamesAll = ['./gdp_float/buoydata_1_5000.dat','./gdp_float/buoydata_5001_10000.dat','./gdp_float/buoydata_10001_15000.dat','./gdp_float/buoydata_15001_jun16.dat']



In [31]:

    
# read them and cancatenate them into one DataFrame
dfAll = pd.concat([buoydata.read_buoy_data(f) for f in fnamesAll])  # around 4~5 minutes

#mask = df.time>='2002-07-04' # we only have data after this data for chlor_a
dfvvAll = dfAll[dfAll.time>='2002-07-04']

sum(dfvvAll.time<'2002-07-04') # recheck whether the time is









    Out[31]:





0



In [32]:

    
# process the data so that the longitude are all >0
print('before processing, the minimum longitude is%f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(), dfvvAll.lon.max()))
mask = dfvvAll.lon<0
dfvvAll.lon[mask] = dfvvAll.loc[mask].lon + 360
print('after processing, the minimum longitude is %f4.3 and maximum is %f4.3' % (dfvvAll.lon.min(),dfvvAll.lon.max()) )

dfvvAll.describe()









    



before processing, the minimum longitude is0.0000004.3 and maximum is 360.0000004.3






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/pandas/core/generic.py:4695: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2881: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)






    



after processing, the minimum longitude is 0.0000004.3 and maximum is 360.0000004.3






    Out[32]:






  
    
      
      id
      lat
      lon
      temp
      ve
      vn
      spd
      var_lat
      var_lon
      var_tmp
    
  
  
    
      count
      2.147732e+07
      2.131997e+07
      2.131997e+07
      1.986179e+07
      2.129142e+07
      2.129142e+07
      2.129142e+07
      2.147732e+07
      2.147732e+07
      2.147732e+07
    
    
      mean
      1.765662e+06
      -2.263128e+00
      2.124412e+02
      1.986121e+01
      2.454172e-01
      4.708192e-01
      2.613427e+01
      7.326258e+00
      7.326555e+00
      7.522298e+01
    
    
      std
      9.452835e+06
      3.401115e+01
      9.746941e+01
      8.339498e+00
      2.525050e+01
      2.052160e+01
      1.939087e+01
      8.527853e+01
      8.527851e+01
      2.637454e+02
    
    
      min
      2.578000e+03
      -7.764700e+01
      0.000000e+00
      -1.685000e+01
      -2.916220e+02
      -2.601400e+02
      0.000000e+00
      5.268300e-07
      -3.941600e-02
      1.001300e-03
    
    
      25%
      4.897500e+04
      -3.186000e+01
      1.490720e+02
      1.437300e+01
      -1.411400e+01
      -1.044700e+01
      1.290300e+01
      4.366500e-06
      7.512600e-06
      1.435700e-03
    
    
      50%
      7.141300e+04
      -4.920000e+00
      2.153940e+02
      2.214400e+01
      -5.560000e-01
      1.970000e-01
      2.176700e+01
      8.833600e-06
      1.495800e-05
      1.691700e-03
    
    
      75%
      1.094330e+05
      2.756000e+01
      3.064370e+02
      2.688900e+01
      1.356100e+01
      1.109300e+01
      3.405900e+01
      1.833300e-05
      3.627900e-05
      2.294200e-03
    
    
      max
      6.399288e+07
      8.989900e+01
      3.600000e+02
      4.595000e+01
      4.417070e+02
      2.783220e+02
      4.421750e+02
      1.000000e+03
      1.000000e+03
      1.000000e+03



In [33]:

    
# Select only the arabian sea region
arabian_sea = (dfvvAll.lon > 45) & (dfvvAll.lon< 75) & (dfvvAll.lat> 5) & (dfvvAll.lat <28)
# arabian_sea = {'lon': slice(45,75), 'lat': slice(5,28)} # later use this longitude and latitude
floatsAll = dfvvAll.loc[arabian_sea]   # directly use mask
print('dfvvAll.shape is %s, floatsAll.shape is %s' % (dfvvAll.shape, floatsAll.shape) )









    



dfvvAll.shape is (21477317, 11), floatsAll.shape is (111894, 11)



In [34]:

    
''' takes took long
# visualize the float around global region
fig, ax  = plt.subplots(figsize=(12,10))
dfvvAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)
'''

'''
# visualize the float around the arabian sea region
fig, ax  = plt.subplots(figsize=(12,10))
floatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)
'''









    Out[34]:





"\n# visualize the float around the arabian sea region\nfig, ax  = plt.subplots(figsize=(12,10))\nfloatsAll.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)\n"



In [35]:

    
# pands dataframe cannot do the resamplingn properly
# cause we are really indexing on ['time','id'], pandas.dataframe.resample cannot do this
# TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
print()



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [36]:

    
# dump the surface floater data from pandas.dataframe to xarray.dataset
floatsDSAll = xr.Dataset.from_dataframe(floatsAll.set_index(['time','id']) ) # set time & id as the index); use reset_index to revert this operation
floatsDSAll









    Out[36]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 17499)
Coordinates:
  * time     (time) datetime64[ns] 2002-07-04 2002-07-04T06:00:00 ...
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
Data variables:
    lat      (time, id) float64 nan 16.3 14.03 16.4 14.04 nan 20.11 nan ...
    lon      (time, id) float64 nan 66.23 69.48 64.58 69.51 nan 68.55 nan ...
    temp     (time, id) float64 nan nan nan 28.0 28.53 nan 28.93 nan 27.81 ...
    ve       (time, id) float64 nan 8.68 5.978 6.286 4.844 nan 32.9 nan ...
    vn       (time, id) float64 nan -13.18 -18.05 -7.791 -17.47 nan 15.81 ...
    spd      (time, id) float64 nan 15.78 19.02 10.01 18.13 nan 36.51 nan ...
    var_lat  (time, id) float64 nan 0.0002661 5.01e-05 5.018e-05 5.024e-05 ...
    var_lon  (time, id) float64 nan 0.0006854 8.851e-05 9.018e-05 8.968e-05 ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003733 0.0667 nan 0.001683 ...



In [37]:

    
floatsDSAll.dims









    Out[37]:





Frozen(SortedKeysDict({'id': 259, 'time': 17499}))



In [38]:

    
# resample on the xarray.dataset onto two-day frequency
floatsDSAll_resample =floatsDSAll.resample(freq_resample, dim='time')
print(floatsDSAll_resample.dims)   # downsampling on the 'time' dimension 17499/9/4=  around 486
floatsDSAll_resample









    



Frozen(SortedKeysDict(OrderedDict([('id', 259), ('time', 2556)])))






    Out[38]:





<xarray.Dataset>
Dimensions:  (id: 259, time: 2556)
Coordinates:
  * id       (id) int64 7574 10206 10208 11089 15703 15707 27069 27139 28842 ...
  * time     (time) datetime64[ns] 2002-07-04 2002-07-06 2002-07-08 ...
Data variables:
    ve       (time, id) float64 nan 13.06 8.505 12.17 8.686 nan 26.96 nan ...
    temp     (time, id) float64 nan nan nan 27.95 28.55 nan 29.01 nan 27.7 ...
    lon      (time, id) float64 nan 66.33 69.55 64.68 69.58 nan 68.74 nan ...
    vn       (time, id) float64 nan -8.651 -20.75 -7.287 -20.2 nan 7.892 nan ...
    var_lat  (time, id) float64 nan 0.0007236 5.197e-05 7.609e-05 5.339e-05 ...
    var_lon  (time, id) float64 nan 0.002281 9.329e-05 0.0001497 9.757e-05 ...
    lat      (time, id) float64 nan 16.23 13.89 16.35 13.9 nan 20.17 nan ...
    spd      (time, id) float64 nan 15.92 22.6 14.58 22.25 nan 28.45 nan ...
    var_tmp  (time, id) float64 nan 1e+03 1e+03 0.003688 0.07467 nan ...



In [39]:

    
# transfer it back to pandas.dataframe for plotting
floatsDFAll_resample = floatsDSAll_resample.to_dataframe()
floatsDFAll_resample
floatsDFAll_resample = floatsDFAll_resample.reset_index()
floatsDFAll_resample
# visualize the subsamping of floats around arabian region
fig, ax  = plt.subplots(figsize=(12,10))
floatsDFAll_resample.plot(kind='scatter', x='lon', y='lat', c='temp', cmap='RdBu_r', edgecolor='none', ax=ax)









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x180bb19e8>



In [40]:

    
# get the value for the chlorophyll for each data entry
floatsDFAll_resample_timeorder = floatsDFAll_resample.sort_values(['time','id'],ascending=True)
floatsDFAll_resample_timeorder[:20] # check whether it is time ordered!!
# should we drop nan to speed up??









    Out[40]:






  
    
      
      id
      time
      ve
      temp
      lon
      vn
      var_lat
      var_lon
      lat
      spd
      var_tmp
    
  
  
    
      0
      7574
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2556
      10206
      2002-07-04
      13.064500
      NaN
      66.330375
      -8.650875
      0.000724
      0.002281
      16.229625
      15.915625
      1000.000000
    
    
      5112
      10208
      2002-07-04
      8.505125
      NaN
      69.552375
      -20.755000
      0.000052
      0.000093
      13.891875
      22.603500
      1000.000000
    
    
      7668
      11089
      2002-07-04
      12.168000
      27.954125
      64.683750
      -7.286875
      0.000076
      0.000150
      16.354375
      14.582375
      0.003688
    
    
      10224
      15703
      2002-07-04
      8.685875
      28.552250
      69.583125
      -20.195125
      0.000053
      0.000098
      13.903250
      22.251500
      0.074665
    
    
      12780
      15707
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15336
      27069
      2002-07-04
      26.958750
      29.012000
      68.737500
      7.891750
      0.000058
      0.000108
      20.169750
      28.453250
      0.001705
    
    
      17892
      27139
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20448
      28842
      2002-07-04
      10.499125
      27.701750
      60.694625
      -0.333875
      0.000118
      0.000254
      18.878875
      23.813750
      0.003176
    
    
      23004
      34159
      2002-07-04
      27.354250
      NaN
      58.914250
      4.591750
      0.000063
      0.000119
      12.548125
      28.099500
      1000.000000
    
    
      25560
      34173
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      28116
      34210
      2002-07-04
      -9.666750
      26.694875
      56.925000
      -13.510125
      0.000081
      0.000168
      6.476750
      17.964875
      0.003670
    
    
      30672
      34211
      2002-07-04
      20.618125
      28.278000
      67.929125
      -12.301625
      0.000047
      0.000084
      8.602375
      24.182875
      0.003506
    
    
      33228
      34212
      2002-07-04
      14.641875
      28.470750
      64.750250
      13.108750
      0.000050
      0.000090
      6.232000
      20.799125
      0.003616
    
    
      35784
      34223
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      38340
      34310
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      40896
      34311
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      43452
      34312
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      46008
      34314
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      48564
      34315
      2002-07-04
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN



In [41]:

    
'''
<xarray.Dataset>
Dimensions:  (id: 259, time: 639)   # 259*639
'''
floatsDFAll_resample_timeorder.lon.shape









    Out[41]:





(662004,)



In [42]:

    
# can we try to fit the missing value at thisstep
floatsDFAll_resample_timeorder.lon.dropna().shape  # the longitude data has lots of values (3855,)









    Out[42]:





(14340,)



In [43]:

    
############ interpolation starts from here



In [44]:

    
# to understand the float data better
# a: Look into the floatsDFAll_9Dtimeorder data in more details
# check the nan counts in each id
# plot the trajectory of {time, lat, lon, temperature,} for each float id,  
# this steps helps to understand the float dataset and if there is a need, improve it.

# b: take the float data as it is, and do the interpolation, whenever there is a nan value use the nearest neigbhours....
# check whether the quality of interpolataion is improved, if not, then have to fall back to to task 1

# c: vectorization

# DataFrame panel data 
# floatsDFAll_9Dtimeorder.set_index(['id','time']) 
# the inverse operation # floatsDFAll_9Dtimeorder.reset_index()
# look into the data
print(floatsDFAll_resample_timeorder[100:105])
# so far there is no need to convert it into a panel 
# floatsDFAll_9DtimeorderPanel = floatsDFAll_9Dtimeorder.to_panel

# plot the temperature for one float, the temperature do have a trend
#maskid = (floatsDFAll_resample_timeorder.id == 63069) & (floatsDFAll_resample_timeorder.time > '2007-01-01') & (floatsDFAll_resample_timeorder.time < '2009-01-01')
maskid = (floatsDFAll_resample_timeorder.id == 63069)
print(floatsDFAll_resample_timeorder[maskid].dropna(subset=['id', 'lat', 'lon', 'time']) )
floatsDFAll_resample_timeorder[maskid].dropna(subset=['id', 'lat', 'lon', 'time']).plot(x='time', y ='temp')

# set of all float ids
floatsDFAll_resample_timeorder.id.unique()









    



           id       time  ve  temp  lon  vn  var_lat  var_lon  lat  spd  \
255600  62558 2002-07-04 NaN   NaN  NaN NaN      NaN      NaN  NaN  NaN   
258156  63036 2002-07-04 NaN   NaN  NaN NaN      NaN      NaN  NaN  NaN   
260712  63067 2002-07-04 NaN   NaN  NaN NaN      NaN      NaN  NaN  NaN   
263268  63068 2002-07-04 NaN   NaN  NaN NaN      NaN      NaN  NaN  NaN   
265824  63069 2002-07-04 NaN   NaN  NaN NaN      NaN      NaN  NaN  NaN   

        var_tmp  
255600      NaN  
258156      NaN  
260712      NaN  
263268      NaN  
265824      NaN  
           id       time          ve       temp        lon         vn  \
266761  63069 2007-08-21   62.497000  25.931500  54.434000   8.793000   
266762  63069 2007-08-23   62.043625  25.712375  55.111875  -3.057000   
266763  63069 2007-08-25   86.457375  25.809375  56.167375 -19.289750   
266764  63069 2007-08-27  100.464875  25.981875  57.796250 -34.631875   
266765  63069 2007-08-29   47.875875  25.657875  59.034125  18.670750   
266766  63069 2007-08-31   17.336375  25.688750  59.493125  27.887875   
266767  63069 2007-09-02    6.499750  25.716875  59.689250  13.692000   
266768  63069 2007-09-04   -5.587375  25.792375  59.693750   9.786375   
266769  63069 2007-09-06  -18.758375  25.741125  59.476375   4.272625   
266770  63069 2007-09-08  -33.324000  25.694250  59.048250  21.010625   
266771  63069 2007-09-10  -19.777250  25.947125  58.594125  54.314500   
266772  63069 2007-09-12   20.128750  26.394875  58.594375  56.158125   
266773  63069 2007-09-14   31.869125  26.731875  59.073500  22.064375   
266774  63069 2007-09-16   19.684000  26.819875  59.511625  -7.610375   
266775  63069 2007-09-18   15.900625  27.005500  59.756500 -31.353625   
266776  63069 2007-09-20   27.798875  27.175000  60.125750 -52.872750   
266777  63069 2007-09-22   35.371875  27.439500  60.638375 -25.948250   
266778  63069 2007-09-24   21.463625  27.380250  61.137000 -16.527375   
266779  63069 2007-09-26   12.366000  27.394750  61.402625 -12.149250   
266780  63069 2007-09-28    2.623500  27.533625  61.517375 -14.678375   
266781  63069 2007-09-30   -6.608125  28.111125  61.477625 -11.801125   
266782  63069 2007-10-02   -9.302000  28.141625  61.353000  -8.771500   
266783  63069 2007-10-04  -11.738625  28.212750  61.169250  -5.729375   
266784  63069 2007-10-06  -12.871625  28.226750  60.973625  -3.174375   
266785  63069 2007-10-08  -13.055500  28.065750  60.752250   0.419375   
266786  63069 2007-10-10   -6.561875  28.111750  60.587500   8.985625   
266787  63069 2007-10-12   -2.124750  28.227000  60.532250  14.419500   
266788  63069 2007-10-14   -3.912625  28.561750  60.499125  24.301625   
266789  63069 2007-10-16   -3.105000  28.519500  60.429000  30.229750   
266790  63069 2007-10-18    3.205000  28.458250  60.433250  38.891625   
...       ...        ...         ...        ...        ...        ...   
266818  63069 2007-12-13  -11.753125  26.552875  60.273500  -4.315125   
266819  63069 2007-12-15  -26.595125  26.316875  59.991750   0.149500   
266820  63069 2007-12-17  -36.194875  26.176500  59.454375   7.990000   
266821  63069 2007-12-19  -49.710625  26.217000  58.770500  29.922375   
266822  63069 2007-12-21  -51.801250  26.122375  57.927625  47.622625   
266823  63069 2007-12-23  -46.688250  25.890125  57.167250  51.015500   
266824  63069 2007-12-25  -40.835000  25.177625  56.433875  -7.196500   
266825  63069 2007-12-27  -27.695375  24.789625  55.906875 -43.860750   
266826  63069 2007-12-29  -20.685750  24.479000  55.506875 -44.101500   
266827  63069 2007-12-31  -25.550375  24.398500  55.139000 -17.643500   
266828  63069 2008-01-02  -23.455375  24.708625  54.749750  -5.210500   
266829  63069 2008-01-04  -47.252500  25.292375  54.230875   1.245750   
266830  63069 2008-01-06 -101.494875  25.735375  52.939250   4.512375   
266831  63069 2008-01-08  -42.520125  25.410375  51.694125 -44.294625   
266832  63069 2008-01-10   -7.634625  25.422125  51.430500 -85.098375   
266833  63069 2008-01-12  -32.802375  25.897500  51.122375 -35.695125   
266834  63069 2008-01-14  -73.010875  25.800125  50.337875  -9.684625   
266835  63069 2008-01-16  -64.138750  25.796000  49.097750  50.450250   
266836  63069 2008-01-18  -30.949000  25.623500  48.397875  41.206750   
266837  63069 2008-01-20  -34.547000  25.692750  47.899000 -11.641375   
266838  63069 2008-01-22  -10.114625  25.727000  47.556000 -61.297875   
266839  63069 2008-01-24  -52.465375  26.040875  47.113375 -47.916000   
266840  63069 2008-01-26  -56.218250  25.832875  46.092000   7.179000   
266841  63069 2008-01-28  -37.569875  26.163625  45.464250  18.926125   
266842  63069 2008-01-30  -53.074000  26.013000  45.084000  10.170000   
266845  63069 2008-02-05   11.639667  25.749333  45.083500  18.906167   
266846  63069 2008-02-07    6.857000  25.542000  45.198875   2.783625   
266847  63069 2008-02-09   -3.911875  25.503750  45.224375  -2.769125   
266848  63069 2008-02-11   -3.053000  25.508000  45.183625  35.035875   
266849  63069 2008-02-13  -29.751667  25.644333  45.067333  52.212333   

         var_lat   var_lon        lat         spd   var_tmp  
266761  0.000010  0.000019  16.042000   63.113000  0.002250  
266762  0.000022  0.000045  16.107375   64.619875  0.002193  
266763  0.000027  0.000061  15.777000   92.233375  0.013653  
266764  0.000029  0.000068  15.466000  106.636125  0.002322  
266765  0.000019  0.000038  15.254375   54.356250  0.002089  
266766  0.000015  0.000029  15.709875   33.199750  0.001945  
266767  0.000020  0.000040  16.004750   16.304375  0.001764  
266768  0.000011  0.000019  16.208375   15.382000  0.001962  
266769  0.000011  0.000021  16.309000   20.458500  0.001848  
266770  0.000019  0.000041  16.464250   40.368125  0.001852  
266771  0.000010  0.000017  17.038750   58.870625  0.001852  
266772  0.000008  0.000014  17.980875   60.820000  0.001785  
266773  0.000020  0.000043  18.590875   39.308000  0.001877  
266774  0.000022  0.000045  18.686250   22.396375  0.002000  
266775  0.000035  0.000082  18.417125   35.362625  0.002109  
266776  0.000021  0.000045  17.687875   59.864250  0.001942  
266777  0.000008  0.000013  17.072250   44.270625  0.001814  
266778  0.000028  0.000067  16.729250   27.663500  0.001818  
266779  0.000015  0.000029  16.526125   19.001500  0.001796  
266780  0.000013  0.000024  16.294500   16.548125  0.001848  
266781  0.000010  0.000018  16.102750   15.596375  0.002010  
266782  0.000035  0.000088  15.941875   13.415125  0.001869  
266783  0.000010  0.000018  15.846000   13.463000  0.001742  
266784  0.000012  0.000022  15.774250   13.674750  0.001759  
266785  0.000010  0.000017  15.732000   14.071875  0.001806  
266786  0.000007  0.000012  15.810250   12.853000  0.001762  
266787  0.000013  0.000025  16.006500   16.758875  0.001872  
266788  0.000011  0.000019  16.309375   25.457750  0.001750  
266789  0.000008  0.000013  16.727875   30.847875  0.003181  
266790  0.000008  0.000014  17.241250   39.448500  0.001832  
...          ...       ...        ...         ...       ...  
266818  0.000015  0.000029  14.477875   13.366250  0.001822  
266819  0.000008  0.000014  14.477625   26.925250  0.001751  
266820  0.000010  0.000018  14.517875   37.694125  0.001761  
266821  0.000013  0.000023  14.807000   58.449875  0.002058  
266822  0.000011  0.000020  15.411500   70.698250  0.002173  
266823  0.000025  0.000059  16.233375   69.631625  0.002194  
266824  0.000013  0.000023  16.594250   43.528500  0.002180  
266825  0.000013  0.000024  16.202250   52.876125  0.002065  
266826  0.000012  0.000022  15.441250   49.114000  0.002560  
266827  0.000008  0.000014  14.959625   31.734250  0.001985  
266828  0.000012  0.000025  14.782125   24.298250  0.001748  
266829  0.000011  0.000021  14.760875   47.405625  0.001685  
266830  0.000015  0.000033  14.847750  102.120625  0.002701  
266831  0.000009  0.000017  14.612125   71.641375  0.001779  
266832  0.000007  0.000013  13.495000   85.505625  0.001801  
266833  0.000021  0.000053  12.463000   50.852125  0.001727  
266834  0.000009  0.000018  12.146000   74.592625  0.001815  
266835  0.000012  0.000025  12.461250   83.254000  0.002085  
266836  0.000011  0.000024  13.329000   53.796875  0.001845  
266837  0.000016  0.000034  13.501000   37.928750  0.001813  
266838  0.000014  0.000029  12.941000   62.712375  0.001778  
266839  0.000010  0.000019  11.937750   75.409250  0.001767  
266840  0.000012  0.000026  11.697750   57.082250  0.001803  
266841  0.000038  0.000104  11.901125   42.349375  0.003592  
266842  0.000017  0.000035  12.074000   54.040000  0.002023  
266845  0.000012  0.000024  10.847500   22.588833  0.001889  
266846  0.000009  0.000019  11.001375   10.278375  0.001925  
266847  0.000014  0.000030  10.952625    8.016375  0.001820  
266848  0.000016  0.000035  11.165625   35.757875  0.001753  
266849  0.000014  0.000028  11.653333   61.050667  0.001732  

[87 rows x 11 columns]






    Out[44]:





array([    7574,    10206,    10208,    11089,    15703,    15707,
          27069,    27139,    28842,    34159,    34173,    34210,
          34211,    34212,    34223,    34310,    34311,    34312,
          34314,    34315,    34374,    34708,    34709,    34710,
          34714,    34716,    34718,    34719,    34720,    34721,
          34722,    34723,    36530,    36537,    37192,    37194,
          37200,    37213,    37214,    37641,    39204,    40084,
          40273,    40552,    41317,    42519,    43598,    43740,
          43743,    43744,    43746,    43748,    43995,    45748,
          46470,    46471,    46472,    53352,    53358,    53363,
          53364,    53365,    53366,    53367,    53404,    54017,
          54038,    54371,    57939,    57940,    59213,    59215,
          59362,    59363,    59365,    59366,    59367,    59368,
          59369,    59371,    59372,    59373,    59394,    60492,
          62193,    62195,    62196,    62197,    62198,    62199,
          62200,    62201,    62202,    62207,    62552,    62553,
          62554,    62555,    62556,    62557,    62558,    63036,
          63067,    63068,    63069,    63070,    63071,    63072,
          63073,    63074,    63075,    63076,    63926,    63928,
          63929,    63930,    63934,    63935,    70695,    70696,
          70697,    70699,    70952,    71138,    71139,    71140,
          71141,    71142,    71158,    71159,    71160,    71161,
          71162,    72633,    72634,    72638,    73076,    73077,
          73079,    75137,    75138,    75140,    75141,    75142,
          79184,    79185,    79188,    79322,    81824,    81828,
          81851,    81938,    81996,    81997,    82625,    83499,
          88651,    88652,    88659,    88671,    90485,    90513,
          92626,    92627,    92630,    92631,    98672,    98673,
          98674,    98675,    98676,    98679,   101609,   101833,
         109290,   109377,   109382,   109402,   109404,   109544,
         109551,   114553,   114556,   114559,   114575,   114873,
         114874,   114875,   114876,   114917,   114945,   114948,
         116006,   116184,   116187,   116212,   116345,   116463,
         116464,   116465,   116466,   116467,   116468,   126933,
         126935,   126950,   127055,   127406,   127429,   133654,
         133659,   135776,   135780,   135781,   135782,   135784,
         135785,   135786,   135787,   135788,   135789,   135790,
         145074,   147127,  2134712,  2343739,  2444350,  3098671,
        3098678,  3098682, 60073460, 60074440, 60077450, 60150420,
       60454500, 60656200, 60657200, 60658190, 60659110, 60659120,
       60659190, 60659200, 60940960, 60940970, 60941960, 60941970,
       60942960, 60942970, 60943960, 60943970, 60944960, 60944970,
       60945970, 60946960, 60947960, 60947970, 60948960, 60950430, 62321420])



In [45]:

    
# this is a float that explains the need for temperature data
maskid2 = floatsDFAll_resample_timeorder.id == 10208
floatsDFAll_resample_timeorder[maskid2].head()



In [46]:

    
################
# test case 1: take a single entry (southeast corner for valid values)
row_case1 =  pd.DataFrame(data = {'time':'2002-07-13 00:00:00', 'id': 10206, 'lon':74.7083358765, 'lat':5.20833349228},index=[1])
print(row_case1)

################
# test case 2
# take a {time-list, id-list, lon-list, lat-list}, index-list 
# carry out the interpolation
#row_case2 =  pd.DataFrame(data = {'time':['2002-07-13 00:00:00','2002-07-22 00:00:00'] , 'id': [10206, 10206], 'lon':[74.7083358765, 74.6250076294], 'lat':[5.20833349228, 5.29166173935]},index=[2,3])
#print(row_case2)
################
# test case 3
row_case2 =  pd.DataFrame(data = {'time':['2002-07-13 00:00:00', '2002-07-22 00:00:00', '2002-07-13 00:00:00'] , 'id': [10206, 10206, 10206], 'lon':[74.7083358765, 74.6250076294,74.7083358765], 'lat':[5.20833349228, 5.29166173935, 5.20833349228]},index=[1,2,3])
print(row_case2)



####
## get the indices of time, lat, lon
idx_time = ds_resample.indexes['time']
idx_lat = ds_resample.indexes['lat']
idx_lon = ds_resample.indexes['lon']

#### 
#interpolation on the time dimension
time_len = len(row_case2.time.values)
xtime_test = list([ np.datetime64(row_case2.time.values[i]) for i in range(0,time_len)  ] )  # for delta 
print('\n xtime_test \n', xtime_test)

'''caution: cannot do this inside the function get_loc,
see https://github.com/pandas-dev/pandas/issues/3488
'''
itime_nearest = [idx_time.get_loc(xtime_test[i], method='nearest') for i in range(0, time_len)]
print('\n itime_nearest \n', itime_nearest)  # [1,2]

xtime_nearest =  ds_resample.time[itime_nearest].values  #  ['2002-07-13T00:00:00.000000000' '2002-07-22T00:00:00.000000000']
print('\n xtime_nearest\n', xtime_nearest)  # ['2002-07-13T00:00:00.000000000' '2002-07-22T00:00:00.000000000']
print('xtime_nearest', type(xtime_nearest)) # xtime_nearest <class 'numpy.ndarray'> # time_nearest <class 'numpy.datetime64'>

# the time distance in days
delta_xtime = (xtime_test - xtime_nearest) / np.timedelta64(1, 'D')
print('\n delta_xtime in days \n', delta_xtime)
print(type(delta_xtime))

itime_next = [itime_nearest[i]+1 if  delta_xtime[i] >=0  else itime_nearest[i]-1  for i in range(0, time_len) ]
print('\n itime_next \n',itime_next)  # [2, 3]

# find the next coordinate values
xtime_next = ds_resample.time[itime_next].values
print('\n xtime_next \n', xtime_next) # ['2002-07-22T00:00:00.000000000' '2002-07-31T00:00:00.000000000']

# prepare for the Tri-linear interpolation
base_time = (xtime_next - xtime_nearest) / np.timedelta64(1, 'D')  # [ 9.  9.]
print('\n base_time \n', base_time)
w_time = delta_xtime / base_time  
print('\n w_time \n', w_time) # [ 0.  0.]


#### 
#interpolation on the lat dimension
xlat_test = row_case2.lat.values + 0.06   # base [ 5.20833349  5.29166174] # cell distance around .8, use .2 & .6 as two tests
print('\n xlat_test \n', xlat_test)       # xlat_test [ 5.26833349  5.35166174]

ilat_nearest = [idx_lat.get_loc(xlat_test[i], method='nearest') for i in range(0, time_len)]
print('\n ilat_nearest \n', ilat_nearest) # [272, 271]

xlat_nearest = ds_resample.lat[ilat_nearest].values  
print('\n xlat_nearest \n', xlat_nearest) # [ 5.29166174  5.37499762]

delta_xlat = xlat_test - xlat_nearest
print("\n delta_xlat \n",delta_xlat)      #  [-0.02332825 -0.02333588]


# the nearest index is on the right; but order of the latitude is different, it is descending
ilat_next = [ilat_nearest[i]-1 if  delta_xlat[i] >=0  else ilat_nearest[i]+1  for i in range(0, time_len) ]
print('\n ilat_next \n', ilat_next)  # [273, 272]

# find the next coordinates value
xlat_next = ds_resample.lat[ilat_next].values
print('\n xlat_next \n', xlat_next)  # [ 5.20833349  5.29166174]

# prepare for the Tri-linear interpolation
w_lat = delta_xlat / (xlat_next - xlat_nearest)
print('\n w_lat \n', w_lat) # [ 0.27995605  0.28002197]

#### 
#interpolation on the lon dimension
xlon_test = row_case2.lon.values +0.06 # base [74.7083358765, 74.6250076294] # cell distance around .8, use .2 & .6 as two tests
print('\n xlon_test \n', xlon_test)  # [ 74.76833588  74.68500763]

ilon_nearest = [idx_lon.get_loc(xlon_test[i], method='nearest') for i in range(0, time_len)]
print('\n ilon_nearest \n', ilon_nearest) # [357, 356]

xlon_nearest = ds_resample.lon[ilon_nearest].values  
print('\n xlon_nearest \n', xlon_nearest) # [ 74.79166412  74.70833588]

delta_xlon = xlon_test - xlon_nearest     
print("\n delta_xlon \n", delta_xlon)     #  [-0.02332825 -0.02332825]

ilon_next = [ilon_nearest[i]+1 if  delta_xlon[i] >=0  else ilon_nearest[i]-1  for i in range(0, time_len) ]
print('\n ilon_next \n',ilon_next)  # [356, 355]

# find the next coordinate values
xlon_next = ds_resample.lon[ilon_next].values
print("\n xlon_next \n", xlon_next) # [ 74.70833588  74.62500763]

# prepare for the Tri-linear interpolation
w_lon = delta_xlon / (xlon_next - xlon_nearest)
print("\n w_lon \n", w_lon) # [ 0.27995605  0.27995605]

####
# local Tensor product for Trilinear interpolation
# caution: nan values, store as "list_of_array to 2d_array" first, then sum

# no casting to list needed here, inputs are already lists
tmp = np.array([
         ds_resample.chlor_a.isel_points(time=itime_nearest, lat=ilat_nearest, lon=ilon_nearest).values,
         ds_resample.chlor_a.isel_points(time=itime_nearest, lat=ilat_nearest, lon=ilon_next).values,
         ds_resample.chlor_a.isel_points(time=itime_nearest, lat=ilat_next, lon=ilon_nearest).values,
         ds_resample.chlor_a.isel_points(time=itime_nearest, lat=ilat_next, lon=ilon_next).values,
         ds_resample.chlor_a.isel_points(time=itime_next, lat=ilat_nearest, lon=ilon_nearest).values,
         ds_resample.chlor_a.isel_points(time=itime_next, lat=ilat_nearest, lon=ilon_next).values,
         ds_resample.chlor_a.isel_points(time=itime_next, lat=ilat_next, lon=ilon_nearest).values,
         ds_resample.chlor_a.isel_points(time=itime_next, lat=ilat_next, lon=ilon_next).values ])

weights =  np.array([(1-w_time)*(1-w_lat)*(1-w_lon), 
                     (1-w_time)*(1-w_lat)*w_lon,
                     (1-w_time)*w_lat*(1-w_lon), 
                     (1-w_time)*w_lat*w_lon,
                        w_time*(1-w_lat)*(1-w_lon),
                        w_time*(1-w_lat)*w_lon,
                        w_time*w_lat*(1-w_lon),
                        w_time*w_lat*w_lon ])


# how to deal with "nan" values, fill in missing values for the np.array tmpAll 
# or fill the mean values to the unweighted array
# http://stackoverflow.com/questions/18689235/numpy-array-replace-nan-values-with-average-of-columns

print('\n neighbouring tensor used \n', tmp)
'''
 neighbouring tensor used 
 [[        nan  0.181841  ]
 [ 0.245878           nan]
 [        nan         nan]
 [        nan         nan]
 [ 0.19680101         nan]
 [        nan         nan]
 [        nan         nan]
 [ 0.18532801         nan]]
'''

# column min: (nan+0.245878 + nan + nan + 0.19680101 + nan +  nan + 0.18532801)/8 = 0.20933567333
col_mean = np.nanmean(tmp, axis=0)
print('\n its mean along axis 0(column) \n', col_mean)  #  [ 0.20933567  0.181841  ]


# filling the missing values.
inds = np.where(np.isnan(tmp))
print('\n nan index\n', inds)
tmp[inds]=np.take(col_mean, inds[1])
print('\n values after the fill \n', tmp)

print('\n weighting tensor used \n', weights)

print("weights.shape", weights.shape) # (8, 3)
print("tmp.shape", tmp.shape)  # (8, 3)

nrow_w, ncol_w = weights.shape
nrow_t, ncol_t = tmp.shape
assert nrow_w == nrow_t, "the row count of weights and values are not the same!"
assert ncol_w == ncol_t, "the row count of weights and values are not the same!"
print('\n tensor product\n', np.dot(weights[:,0], tmp[:,0]) ) # 0.216701896135 should be [ 0.2167019]

# new interpolation process of the Chl_a
chl_new = np.empty(ncol_w)
for i in range(0, ncol_w, 1):
    chl_new[i] =  np.dot(weights[:,i], tmp[:,i])

print('chl_newInt',  chl_new) #  [ 0.2167019  0.181841   0.2167019]
# validate by 1D array
# val = np.array([0.20933567, 0.245878,  0.20933567,
#                0.20933567, 0.19680101, 0.20933567,
#               0.20933567,0.18532801]) 
# np.dot(val, weights) # 0.21670189702309739


## output xarray.dataarray of points, see examples below
# this is the way how xarray.Dataset works
# if you want a xarray.DataArray, first generate a xarray.Dataset, then select DataArray from there.
chl_newInt = xr.Dataset({'chl': (['points'], np.random.randn(3))},
                        coords={
                                'time':(['points'],['2002-07-13 00:00:00', '2002-07-22 00:00:00', '2002-07-13 00:00:00']) , 
                                'id': (['points'], [10206, 10206, 10206]), 
                                'lon': (['points'], [74.7083358765, 74.6250076294,74.7083358765]),
                                'lat':(['points'], [5.20833349228, 5.29166173935, 5.20833349228]), 
                                'points': (['points'], [0,1,2])}) # this way the dims is set to point

print('\n',chl_newInt.chl)
'''
### Task: output xarray.dataarray as points
## example 1
arr = xr.DataArray(np.random.rand(4,3), [('time', pd.date_range('2000-01-01', periods=4)), ('space', ['IA', 'IL', 'IN'])] )
print("first example",arr)
print("\n \n")

## example2 -- concrete xr.DataArray
data = np.random.rand(4, 3)
locs = ['IA', 'IL', 'IN']
times = pd.date_range('2000-01-01', periods=4)
brr = xr.DataArray(data, coords={'time': times, 'space': locs, 'const': 42, \
                          'ranking': ('space', [1, 2, 3])}, \
             dims=['time', 'space'])
print("second example",brr)
'''

'''
### the target output
#the output generated by the xarray.DataSet => xarray.DataArray
<xarray.DataArray 'chlor_a' (points: 147112)>
array([ nan,  nan,  nan, ...,  nan,  nan,  nan])
Coordinates:
    time     (points) datetime64[ns] 2002-07-04 2002-07-04 2002-07-04 ...
    lon      (points) float64 74.96 66.54 69.88 65.04 69.88 74.96 69.46 ...
    lat      (points) float64 27.96 16.21 13.62 16.04 13.62 27.96 20.04 ...
  * points   (points) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  
### the solution approach we tried to is: first,  generate a xarray.DataSet
                                          second, generte a xarray.DataArray from the xarray DataSet if needed.
 
# 1. looking at this example, DataArray seems to be working.

# works here
da_test1 = xr.DataArray(np.random.rand(3), dims=['x'],
                        coords={'x': np.array([10206, 10206, 10206]), 'y':  2} )
print('\n da_test1', da_test1)

##########
# but not works here
da_test1 = xr.DataArray(np.random.rand(3), dims=['x'],
                        coords={'x': np.array([10206, 10206, 10206]), 'y':  np.array([10206, 10206, 10206])} )
print('\n da_test1', da_test1)

# err's
##########

##########
# works here
da_test1 = xr.DataArray(np.random.rand(3), dims=['x'],
                        coords={'x': np.array([10206, 10206, 10206]), 'y':  2} )
print('\n da_test1', da_test1)

# da_test1 <xarray.DataArray (x: 3)>
# array([ 0.90386212,  0.21516239,  0.44707272])
# Coordinates:
#   * x        (x) int64 10206 10206 10206
#     y        int64 2
##########

##########
# so we do this
# this is the way how xarray.Dataset works
# if you want a xarray.DataArray, first generate a xarray.Dataset, then select DataArray from there.
chl_newInt = xr.Dataset({'chl': (['id'], np.random.randn(3))},
                        coords={ 
                           'id':  (['id'], np.array([10206, 10206, 10206])), 
                           'lon': (['id'], np.array([74.7083358765, 74.6250076294,74.7083358765])), 
                           'lat': (['id'], np.array([5.20833349228, 5.29166173935, 5.20833349228]))})
print('chl_newInt', chl_newInt) # generate the xarray.DataSet
print('\n \n')
print('chl_newInt.chl', chl_newInt.chl) # xarray.DataSet contains many xarray.DataArray


#chl_newInt <xarray.Dataset>
#Dimensions:  (id: 3)
#Coordinates:
#    lon      (id) float64 74.71 74.63 74.71
#  * id       (id) int64 10206 10206 10206
#    lat      (id) float64 5.208 5.292 5.208
#Data variables:
#    chl      (id) float64 0.783 -0.9714 -0.3206



#chl_newInt.chl <xarray.DataArray 'chl' (id: 3)>
#array([ 0.78301614, -0.97144208, -0.3206447 ])
#Coordinates:
#    lon      (id) float64 74.71 74.63 74.71
#  * id       (id) int64 10206 10206 10206
#    lat      (id) float64 5.208 5.292 5.208

'''
print()









    



      id       lat        lon                 time
1  10206  5.208333  74.708336  2002-07-13 00:00:00
      id       lat        lon                 time
1  10206  5.208333  74.708336  2002-07-13 00:00:00
2  10206  5.291662  74.625008  2002-07-22 00:00:00
3  10206  5.208333  74.708336  2002-07-13 00:00:00

 xtime_test 
 [numpy.datetime64('2002-07-13T00:00:00'), numpy.datetime64('2002-07-22T00:00:00'), numpy.datetime64('2002-07-13T00:00:00')]

 itime_nearest 
 [5, 9, 5]

 xtime_nearest
 ['2002-07-14T00:00:00.000000000' '2002-07-22T00:00:00.000000000'
 '2002-07-14T00:00:00.000000000']
xtime_nearest <class 'numpy.ndarray'>

 delta_xtime in days 
 [-1.  0. -1.]
<class 'numpy.ndarray'>

 itime_next 
 [4, 10, 4]

 xtime_next 
 ['2002-07-12T00:00:00.000000000' '2002-07-24T00:00:00.000000000'
 '2002-07-12T00:00:00.000000000']

 base_time 
 [-2.  2. -2.]

 w_time 
 [ 0.5  0.   0.5]

 xlat_test 
 [ 5.26833349  5.35166174  5.26833349]

 ilat_nearest 
 [272, 271, 272]

 xlat_nearest 
 [ 5.29166174  5.37499762  5.29166174]

 delta_xlat 
 [-0.02332825 -0.02333588 -0.02332825]

 ilat_next 
 [273, 272, 273]

 xlat_next 
 [ 5.20833349  5.29166174  5.20833349]

 w_lat 
 [ 0.27995605  0.28002197  0.27995605]

 xlon_test 
 [ 74.76833588  74.68500763  74.76833588]

 ilon_nearest 
 [357, 356, 357]

 xlon_nearest 
 [ 74.79166412  74.70833588  74.79166412]

 delta_xlon 
 [-0.02332825 -0.02332825 -0.02332825]

 ilon_next 
 [356, 355, 356]

 xlon_next 
 [ 74.70833588  74.62500763  74.70833588]

 w_lon 
 [ 0.27995605  0.27995605  0.27995605]

 neighbouring tensor used 
 [[ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]]

 its mean along axis 0(column) 
 [ nan  nan  nan]

 nan index
 (array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7,
       7]), array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1,
       2]))

 values after the fill 
 [[ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]
 [ nan  nan  nan]]

 weighting tensor used 
 [[ 0.25923164  0.51841582  0.25923164]
 [ 0.10079033  0.20156221  0.10079033]
 [ 0.10079033  0.20162813  0.10079033]
 [ 0.0391877   0.07839385  0.0391877 ]
 [ 0.25923164  0.          0.25923164]
 [ 0.10079033  0.          0.10079033]
 [ 0.10079033  0.          0.10079033]
 [ 0.0391877   0.          0.0391877 ]]
weights.shape (8, 3)
tmp.shape (8, 3)

 tensor product
 nan
chl_newInt [ nan  nan  nan]

 <xarray.DataArray 'chl' (points: 3)>
array([-2.3970807 ,  0.54459705,  2.73304803])
Coordinates:
    id       (points) int64 10206 10206 10206
  * points   (points) int64 0 1 2
    lat      (points) float64 5.208 5.292 5.208
    lon      (points) float64 74.71 74.63 74.71
    time     (points) <U19 '2002-07-13 00:00:00' '2002-07-22 00:00:00' ...







    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:703: RuntimeWarning: Mean of empty slice
  warnings.warn("Mean of empty slice", RuntimeWarning)



In [47]:

    
### output benchmark
### output the dataset ds_9day and output the dataframe  
#ds_9day.to_netcdf("ds_9day.nc")

#row_case4 = pd.DataFrame(data={'time':list(floatsDFAll_9Dtimeorder.time), 'lon':list(floatsDFAll_9Dtimeorder.lon), 'lat':list(floatsDFAll_9Dtimeorder.lat), 'id':list(floatsDFAll_9Dtimeorder.id) } )
##print('\n before dropping nan \n', row_case4)
## process to drop nan in any of the columns [id], [lat], [lon], [time]
#row_case4 = row_case4.dropna(subset=['id', 'lat', 'lon', 'time'], how = 'any') # these four fields are critical
#row_case4.to_csv("row_case4.csv")



In [48]:

    
#### the approach using Linear Interpolations with 3D tensors
# Keyword Arguments
# approach 1 
# each of the indexers component might be ordered differently
#############
# def sel_points_multilinear(dset, dims='points', out = 'str', **indexers):
## test case
# def sel_points_multilinear(ds_9day, dims = 'points', out ='chlor_a', 
#                            time = list(['2002-07-13 00:00:00']),
#                            lat = list([5]),  lon = list([70]) ):
############
# e.g. time-ascending, lat-descending, need to tell 'time' from 'lat'
# use different parameters for inputs
## approach 2 
## from dataframe to dataset
## input:
##     dframe: list of {time}, {lan}, {lon}, {id}. bounds-aware
##     dset:   carry out the interpolation use dset data structure and its component 
## output:
##     a list or dataframe with chl_newInt.chl

# remember to take log_e instead of log_10

# clean up this notebook, seperate, clean, and take notes

# test case 4: use the full real data
#del(interpolate)
#del(sel_points_multilinear)
# froms dir import src  # to call src.functions
from tools.time_lat_lon_interpolate import interpolate
import importlib
importlib.reload(interpolate)

floatsDF_tmp = floatsDFAll_resample_timeorder

row_case4 = pd.DataFrame(data={'time':list(floatsDF_tmp.time), 'lon':list(floatsDF_tmp.lon), 'lat':list(floatsDF_tmp.lat), 'id':list(floatsDF_tmp.id) } )
# process to drop nan in any of the columns [id], [lat], [lon], [time]
row_case4 = row_case4.dropna(subset=['id', 'lat', 'lon', 'time'], how = 'any') # these four fields are critical
# print('\n after dropping nan \n', row_case4)
result_out4 = interpolate.sel_points_multilinear_time_lat_lon(ds_resample, row_case4, dims = 'points', col_name ='chlor_a')
print('\n *** after the interpolation *** \n', result_out4)
# important: keep the id, since the dataframe has been modified in a bound-aware way in the function
print('\n *** this two length should be equal *** %d >= %d?' %(len(row_case4.index), len(result_out4.points) ) )









    



 *** after the interpolation *** 
 <xarray.Dataset>
Dimensions:  (points: 14273)
Coordinates:
    id       (points) int64 10206 10208 11089 15703 27069 28842 34159 34210 ...
  * points   (points) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
    lat      (points) float64 16.23 13.89 16.35 13.9 20.17 18.88 12.55 6.477 ...
    lon      (points) float64 66.33 69.55 64.68 69.58 68.74 60.69 58.91 ...
    time     (points) datetime64[ns] 2002-07-04 2002-07-04 2002-07-04 ...
Data variables:
    chlor_a  (points) float64 nan nan nan nan nan nan nan 0.2092 0.1224 nan ...

 *** this two length should be equal *** 14340 >= 14273?






    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:703: RuntimeWarning: Mean of empty slice
  warnings.warn("Mean of empty slice", RuntimeWarning)



In [171]:

    
plt.close("all")



In [ ]:



In [58]:

    
### output data for fitting LDS
result_out4.to_dataframe().to_csv("df_LDS_2d.csv")
print(pd.read_csv("df_LDS_2d.csv"))

### plot for id 125776, which will be fit by LDS
plt.figure(figsize=(8,6))
tmp_df_result4[tmp_df_result4.id == 135776].plot(x='time', y ='chlor_a', title=('id - %d' % 135776) )
plt.show();
plt.close("all")









    



       points   chlor_a       id        lat        lon        time
0           0       NaN    10206  16.229625  66.330375  2002-07-04
1           1       NaN    10208  13.891875  69.552375  2002-07-04
2           2       NaN    11089  16.354375  64.683750  2002-07-04
3           3       NaN    15703  13.903250  69.583125  2002-07-04
4           4       NaN    27069  20.169750  68.737500  2002-07-04
5           5       NaN    28842  18.878875  60.694625  2002-07-04
6           6       NaN    34159  12.548125  58.914250  2002-07-04
7           7  0.209177    34210   6.476750  56.925000  2002-07-04
8           8  0.122430    34211   8.602375  67.929125  2002-07-04
9           9       NaN    34212   6.232000  64.750250  2002-07-04
10         10       NaN    34708  10.167500  59.691500  2002-07-04
11         11       NaN    34710  12.933625  49.905250  2002-07-04
12         12       NaN    34714  13.594750  63.649625  2002-07-04
13         13       NaN    34716   7.491000  65.384500  2002-07-04
14         14       NaN    34718  16.328750  72.396375  2002-07-04
15         15       NaN    34719  17.764375  70.946250  2002-07-04
16         16       NaN    34720  14.825375  69.187000  2002-07-04
17         17       NaN    34721  17.190250  65.375250  2002-07-04
18         18       NaN    34722  11.729625  70.472250  2002-07-04
19         19       NaN    34723  16.749625  66.254750  2002-07-04
20         20  0.236013  2134712   9.742125  63.583375  2002-07-04
21         21       NaN    10206  16.162625  66.489250  2002-07-06
22         22       NaN    10208  13.612125  69.713375  2002-07-06
23         23       NaN    11089  16.257125  64.871250  2002-07-06
24         24       NaN    15703  13.629125  69.731500  2002-07-06
25         25       NaN    27069  20.178125  69.169625  2002-07-06
26         26       NaN    28842  18.739000  60.853750  2002-07-06
27         27       NaN    34159  12.652750  59.301875  2002-07-06
28         28       NaN    34210   6.190250  56.802375  2002-07-06
29         29       NaN    34211   8.339625  68.264250  2002-07-06
...       ...       ...      ...        ...        ...         ...
14243   14243       NaN   114945  11.688000  65.274375  2016-06-16
14244   14244       NaN   147127  13.198500  66.470375  2016-06-16
14245   14245       NaN   114873   5.227750  52.343750  2016-06-18
14246   14246       NaN   114917  11.139250  72.663375  2016-06-18
14247   14247       NaN   114945  11.392000  65.253000  2016-06-18
14248   14248       NaN   147127  12.935625  66.789875  2016-06-18
14249   14249       NaN   114873   5.066000  52.513000  2016-06-20
14250   14250       NaN   114917  11.015000  72.678625  2016-06-20
14251   14251       NaN   114945  11.008375  65.172000  2016-06-20
14252   14252       NaN   147127  12.696000  67.062875  2016-06-20
14253   14253       NaN   114873   5.453000  52.665500  2016-06-22
14254   14254       NaN   114917  10.841875  72.786125  2016-06-22
14255   14255       NaN   114945  10.697125  65.110500  2016-06-22
14256   14256       NaN   147127  12.434625  67.440750  2016-06-22
14257   14257       NaN   114873   6.321500  51.996125  2016-06-24
14258   14258       NaN   114917  10.595625  72.947000  2016-06-24
14259   14259       NaN   114945  10.395375  65.087000  2016-06-24
14260   14260       NaN   147127  12.158125  67.803750  2016-06-24
14261   14261       NaN   114873   7.966125  51.618750  2016-06-26
14262   14262       NaN   114917  10.492250  73.100625  2016-06-26
14263   14263       NaN   114945  10.138125  65.141750  2016-06-26
14264   14264       NaN   147127  11.973875  68.088875  2016-06-26
14265   14265       NaN   114873   9.434875  53.205125  2016-06-28
14266   14266       NaN   114917  10.388375  73.351375  2016-06-28
14267   14267       NaN   114945   9.948125  65.369750  2016-06-28
14268   14268       NaN   147127  11.787375  68.386125  2016-06-28
14269   14269       NaN   114873   9.218000  54.750667  2016-06-30
14270   14270       NaN   114917  10.354000  73.458667  2016-06-30
14271   14271       NaN   114945   9.877750  65.566250  2016-06-30
14272   14272       NaN   147127  11.679250  68.542250  2016-06-30

[14273 rows x 6 columns]






    





<matplotlib.figure.Figure at 0x1270cbeb8>



In [ ]:



In [50]:

    
id_set = result_out4.to_dataframe().id.unique()
id_set









    Out[50]:





array([   10206,    10208,    11089,    15703,    27069,    28842,
          34159,    34210,    34211,    34212,    34708,    34710,
          34714,    34716,    34718,    34719,    34720,    34721,
          34722,    34723,  2134712,    34311,    34312,    34709,
          34314,    34315,    15707,    27139,    36530,    34223,
          36537,    34374,    43743,    43740,  2343739,     7574,
          43746,    43744,    43995,    53363,    43748,    53364,
          53365,    53366,    46472,    53367,    46470,    46471,
          53358,    54371,    53352,    57939,    57940,    59371,
          59372,    59373,    59362,    43598,    62200,    62201,
          62202,    62556,    62557,    62558,    62552,    62196,
          62555,    62198,    63036,    62195,    62193,    62553,
          59365,    59367,    70697,    70699,    59363,    70695,
          59366,    70696,    62199,    62554,    62197,    40552,
          63068,    63069,    63070,    63072,    63074,    63075,
          63076,    63926,    63934,    63929,    63928,    63073,
          63935,    63071,    63930,    53404,    59368,    59369,
          63067,    73076,    40273,    72634,    73077,    72633,
          75142,    72638,    71160,    59213,    59215,    59394,
          71158,    71159,    71162,    73079,    71161,    45748,
          71138,    75137,    75138,    75141,    71139,    71140,
          71141,    71142,    75140,    70952,    60492,    79184,
          79188,    79322,    79185,    92626,    92627,    92630,
          92631,    81938,    81996,    81997,    62207,    34173,
          98679,  3098678,  3098682,  3098671,    98674,    98672,
          98673,    83499,    98675,    90485,    98676,    90513,
          81828,    81824,    37214,    39204,    88651,    37194,
          88659,    88671,    37213,    88652,    40084,    37192,
          37200,    54038,    54017,    82625,    81851,   109377,
         109404,   109382,   109402,    37641,    42519,    41317,
         101609, 60656200, 60941960, 60942970, 60946960, 60947970,
         116006, 60077450, 60073460, 60074440, 60657200,   116463,
         116467,   116468,   116464,   116465,   116187,   116466,
       60658190, 60454500,   109544, 60659110, 60659120, 60942960,
       60944970, 60659190, 60659200, 60940960, 60947960,   109551,
       60943960, 60945970, 60948960, 60950430,   135781,   135782,
         135784,   135785,   135786,   135787,   135788,   135789,
         135790,   133654,   133659,   126950, 60944960, 60940970,
       60941970, 60943970,   109290,   116184,   126933,   135780,
         126935,   135776,   114553,   114556,   114559,   114575,
         116212,   116345,   114917,   114948,   114945,   145074,
         101833,   127055,   114873,   114874,   147127,   114875,
         114876, 60150420,   127429,   127406, 62321420])



In [51]:

    
### chl-a concentration plots for each float ###
print("\n ****** chl-a concentration plots for each float ****** \n")
tmp_df_result4 = result_out4.to_dataframe()

plt.figure(figsize=(8,6))
for i in id_set:
    tmp_df_result4[tmp_df_result4.id == i].plot(x='time', y ='chlor_a', title=('id - %d' % i) )
    plt.show();
plt.close("all")









    



 ****** chl-a concentration plots for each float ****** 







    





<matplotlib.figure.Figure at 0x180bae7f0>






    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/axes/_base.py:2782: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=732915.0, right=732915.0
  'left=%s, right=%s') % (left, right))






    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/axes/_base.py:2782: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=735047.0, right=735047.0
  'left=%s, right=%s') % (left, right))






    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    



/Users/vyan2000/local/miniconda3/envs/condapython3/lib/python3.5/site-packages/matplotlib/axes/_base.py:2782: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=735349.0, right=735349.0
  'left=%s, right=%s') % (left, right))



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

	id	lat	lon	temp	ve	vn	spd	var_lat	var_lon	var_tmp
count	2.147732e+07	2.131997e+07	2.131997e+07	1.986179e+07	2.129142e+07	2.129142e+07	2.129142e+07	2.147732e+07	2.147732e+07	2.147732e+07
mean	1.765662e+06	-2.263128e+00	2.124412e+02	1.986121e+01	2.454172e-01	4.708192e-01	2.613427e+01	7.326258e+00	7.326555e+00	7.522298e+01
std	9.452835e+06	3.401115e+01	9.746941e+01	8.339498e+00	2.525050e+01	2.052160e+01	1.939087e+01	8.527853e+01	8.527851e+01	2.637454e+02
min	2.578000e+03	-7.764700e+01	0.000000e+00	-1.685000e+01	-2.916220e+02	-2.601400e+02	0.000000e+00	5.268300e-07	-3.941600e-02	1.001300e-03
25%	4.897500e+04	-3.186000e+01	1.490720e+02	1.437300e+01	-1.411400e+01	-1.044700e+01	1.290300e+01	4.366500e-06	7.512600e-06	1.435700e-03
50%	7.141300e+04	-4.920000e+00	2.153940e+02	2.214400e+01	-5.560000e-01	1.970000e-01	2.176700e+01	8.833600e-06	1.495800e-05	1.691700e-03
75%	1.094330e+05	2.756000e+01	3.064370e+02	2.688900e+01	1.356100e+01	1.109300e+01	3.405900e+01	1.833300e-05	3.627900e-05	2.294200e-03
max	6.399288e+07	8.989900e+01	3.600000e+02	4.595000e+01	4.417070e+02	2.783220e+02	4.421750e+02	1.000000e+03	1.000000e+03	1.000000e+03

	id	time	ve	temp	lon	vn	var_lat	var_lon	lat	spd	var_tmp
0	7574	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2556	10206	2002-07-04	13.064500	NaN	66.330375	-8.650875	0.000724	0.002281	16.229625	15.915625	1000.000000
5112	10208	2002-07-04	8.505125	NaN	69.552375	-20.755000	0.000052	0.000093	13.891875	22.603500	1000.000000
7668	11089	2002-07-04	12.168000	27.954125	64.683750	-7.286875	0.000076	0.000150	16.354375	14.582375	0.003688
10224	15703	2002-07-04	8.685875	28.552250	69.583125	-20.195125	0.000053	0.000098	13.903250	22.251500	0.074665
12780	15707	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
15336	27069	2002-07-04	26.958750	29.012000	68.737500	7.891750	0.000058	0.000108	20.169750	28.453250	0.001705
17892	27139	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20448	28842	2002-07-04	10.499125	27.701750	60.694625	-0.333875	0.000118	0.000254	18.878875	23.813750	0.003176
23004	34159	2002-07-04	27.354250	NaN	58.914250	4.591750	0.000063	0.000119	12.548125	28.099500	1000.000000
25560	34173	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
28116	34210	2002-07-04	-9.666750	26.694875	56.925000	-13.510125	0.000081	0.000168	6.476750	17.964875	0.003670
30672	34211	2002-07-04	20.618125	28.278000	67.929125	-12.301625	0.000047	0.000084	8.602375	24.182875	0.003506
33228	34212	2002-07-04	14.641875	28.470750	64.750250	13.108750	0.000050	0.000090	6.232000	20.799125	0.003616
35784	34223	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
38340	34310	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
40896	34311	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
43452	34312	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
46008	34314	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
48564	34315	2002-07-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN