Build local cache file from Argo data sources - first in a series of Notebooks

Execute commands to pull data from the Internet into a local HDF cache file so that we can better interact with the data

If this Notebook is running on a development system (where pip install oxyfloat has not been executed) — oxyfloat's parent directory needs to be added to the Python search path.



In [1]:

    
import sys
sys.path.insert(0, '../')

Import the ArgoData class and instatiate an ArgoData object (ad) with verbosity set to 2 so that we get INFO messages.



In [2]:

    
from oxyfloat import ArgoData
ad = ArgoData(verbosity=2)

You can now explore what methods the of object has by typing "ad." in a cell and pressing the tab key. One of the methods is get_oxy_floats(); to see what it does select it and press shift-tab with the cursor in the parentheses of "of.get_oxy_floats()". Let's get a list of all the floats that have been out for at least 340 days and print the length of that list.



In [3]:

    
%%time
floats340 = ad.get_oxy_floats_from_status(age_gte=340)
print('{} floats at least 340 days old'.format(len(floats340)))









    



INFO:root:Reading data from http://argo.jcommops.org/FTPRoot/Argo/Status/argo_all.txt
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): argo.jcommops.org






    



563 floats at least 340 days old
CPU times: user 309 ms, sys: 177 ms, total: 486 ms
Wall time: 35.1 s






    



../oxyfloat/ArgoData.py:249: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block2_values] [items->['WMO', 'TELECOM', 'TTYPE', 'MY_ID', 'SERIAL_NO', 'DATE0', 'NOTIF_DATE', 'SHIP', 'CRUISE', 'DATE_', 'MODEL', 'FULL_NAME', 'EMAIL', 'PROGRAM', 'COUNTRY']]

  self._put_df(self._status_to_df(), self._STATUS)

If this the first time you've executed the cell it will take minute or so to read the Argo status information from the Internet (the PerformanceWarning can be ignored - for this small table it doesn't matter much).

Once the status information is read it is cached locally and further calls to get_oxy_floats_from_status() will execute much faster. To demonstrate, let's count all the oxygen labeled floats that have been out for at least 2 years.



In [4]:

    
%%time
floats730 = ad.get_oxy_floats_from_status(age_gte=730)
print('{} floats at least 730 days old'.format(len(floats730)))









    



400 floats at least 730 days old
CPU times: user 32 ms, sys: 2 ms, total: 34 ms
Wall time: 71.7 ms

Now let's find the Data Assembly Center URL for each of the floats in our list. (The returned dictionary of URLs is also locally cached.)



In [5]:

    
%%time
dac_urls = ad.get_dac_urls(floats340)
print(len(dac_urls))









    



INFO:root:Reading data from ftp://ftp.ifremer.fr/ifremer/argo/ar_index_global_meta.txt






    



562
CPU times: user 782 ms, sys: 15 ms, total: 797 ms
Wall time: 4.84 s

Now, whenever we need to get profile data our lookups for status and Data Assembly Centers will be serviced from the local cache. Let's get a Pandas DataFrame (df) of 20 profiles from the float with WMO number 1900650.



In [6]:

    
%%time
wmo_list = ['1900650']
ad.set_verbosity(0)
df = ad.get_float_dataframe(wmo_list, max_profiles=20)









    



CPU times: user 2.16 s, sys: 46 ms, total: 2.2 s
Wall time: 58.4 s

Profile data is also cached locally. To demonstrate, perform the same command as in the previous cell and note the time difference.



In [7]:

    
%%time
df = ad.get_float_dataframe(wmo_list, max_profiles=20)









    



CPU times: user 872 ms, sys: 7 ms, total: 879 ms
Wall time: 2.91 s

Examine the first 5 records of the float data.



In [8]:

    
df.head()









    Out[8]:






  
    
      
      
      
      
      
      DOXY_ADJUSTED
      PSAL_ADJUSTED
      TEMP_ADJUSTED
    
    
      wmo
      time
      lon
      lat
      pressure
      
      
      
    
  
  
    
      1900650
      2010-03-12 01:39:40.003200
      -14.026
      6.031
      4.3
      206.490005
      34.827457
      29.790001
    
    
      5.9
      206.380005
      34.827457
      29.789000
    
    
      9.1
      206.300003
      34.827457
      29.790001
    
    
      13.9
      206.850006
      34.826454
      29.787001
    
    
      19.5
      206.860001
      34.847443
      29.681999

There's a lot that can be done with the profile data in this DataFrame structure. We can construct a time_range string and query for all the data values from less than 10 decibars:



In [9]:

    
time_range = '{} to {}'.format(df.index.get_level_values('time').min(), 
                               df.index.get_level_values('time').max())
df.query('pressure < 10')









    Out[9]:






  
    
      
      
      
      
      
      DOXY_ADJUSTED
      PSAL_ADJUSTED
      TEMP_ADJUSTED
    
    
      wmo
      time
      lon
      lat
      pressure
      
      
      
    
  
  
    
      1900650
      2010-03-12 01:39:40.003200000
      -14.026
      6.031
      4.3
      206.490005
      34.827457
      29.790001
    
    
      5.9
      206.380005
      34.827457
      29.789000
    
    
      9.1
      206.300003
      34.827457
      29.790001
    
    
      2010-02-20 01:52:38.035200000
      -14.866
      5.551
      4.5
      204.110001
      35.149502
      29.492001
    
    
      6.3
      204.399994
      35.153496
      29.457001
    
    
      9.2
      204.110001
      35.152500
      29.434000
    
    
      2010-02-10 01:57:26.956800000
      -14.929
      5.492
      4.5
      190.630005
      34.764793
      29.674000
    
    
      6.2
      190.190002
      34.765797
      29.673000
    
    
      9.4
      187.940002
      34.786785
      29.617001
    
    
      2010-01-31 01:02:30.019199999
      -15.038
      5.725
      4.4
      195.669998
      34.660965
      29.191999
    
    
      6.4
      194.520004
      34.664963
      29.132000
    
    
      9.5
      192.750000
      34.712933
      29.115000
    
    
      2010-01-21 02:05:11.011200000
      -15.077
      5.719
      4.6
      198.949997
      35.037857
      29.114000
    
    
      6.4
      198.880005
      35.034855
      29.025999
    
    
      9.2
      199.139999
      35.023865
      28.968000
    
    
      2010-01-11 01:07:43.996800000
      -15.128
      5.117
      4.5
      199.960007
      34.390392
      29.048000
    
    
      6.2
      199.539993
      34.390392
      29.048000
    
    
      9.1
      199.520004
      34.395390
      29.048000
    
    
      2010-01-01 01:10:18.998400000
      -15.162
      5.184
      4.4
      181.919998
      34.216640
      28.839001
    
    
      6.0
      181.240005
      34.228630
      28.862000
    
    
      8.8
      179.059998
      34.553371
      28.836000
    
    
      2009-12-22 01:12:05.011200000
      -15.207
      5.685
      4.8
      117.779999
      33.655815
      28.965000
    
    
      6.7
      133.360001
      33.865715
      29.115999
    
    
      9.5
      177.119995
      34.214714
      29.010000
    
    
      2009-12-11 23:35:43.987200000
      -15.374
      5.600
      4.3
      165.770004
      34.808380
      28.674000
    
    
      6.1
      163.679993
      34.810379
      28.673000
    
    
      9.0
      159.630005
      34.811378
      28.671000
    
    
      2009-12-01 23:41:39.004800000
      -15.373
      6.082
      4.5
      145.250000
      34.510677
      28.465000
    
    
      6.3
      142.309998
      34.510677
      28.465000
    
    
      9.2
      138.949997
      34.549644
      28.485001
    
    
      2009-12-01 23:41:39.004800000
      -15.373
      6.082
      ...
      ...
      ...
      ...
    
    
      2009-11-11 23:15:41.040000000
      -15.862
      6.140
      4.6
      129.660004
      34.482895
      28.788000
    
    
      6.4
      124.559998
      34.609802
      28.797001
    
    
      9.2
      116.660004
      34.615799
      28.775000
    
    
      2009-11-02 02:46:59.980800000
      -16.458
      5.963
      4.0
      148.809998
      34.665897
      28.396999
    
    
      6.2
      145.509995
      34.667896
      28.402000
    
    
      9.2
      140.770004
      34.669895
      28.412001
    
    
      2009-10-23 01:23:59.971200000
      -16.597
      5.742
      4.7
      43.430000
      34.756943
      28.409000
    
    
      6.2
      44.910000
      34.756943
      28.372000
    
    
      9.0
      64.629997
      34.758938
      28.368000
    
    
      2009-10-13 03:00:00.000000000
      -16.724
      5.618
      4.4
      161.570007
      34.562168
      28.069000
    
    
      6.3
      160.309998
      34.563168
      28.084999
    
    
      9.3
      158.979996
      34.574162
      28.105000
    
    
      2009-10-02 23:46:59.980800000
      -17.357
      5.626
      4.6
      183.649994
      34.812119
      27.764000
    
    
      6.2
      182.570007
      34.814114
      27.766001
    
    
      8.9
      181.380005
      34.814114
      27.768000
    
    
      2009-09-23 01:30:00.000000000
      -17.463
      6.054
      4.2
      192.330002
      34.911167
      27.465000
    
    
      6.0
      191.320007
      34.911167
      27.466000
    
    
      9.0
      191.229996
      34.910168
      27.465000
    
    
      2009-09-12 23:12:59.990400000
      -17.252
      7.049
      4.1
      202.529999
      35.067207
      27.099001
    
    
      6.1
      202.009995
      35.077198
      27.117001
    
    
      8.8
      201.490005
      35.087196
      27.129999
    
    
      2009-09-03 01:39:59.961600000
      -17.122
      7.552
      4.4
      200.639999
      34.633526
      27.277000
    
    
      6.0
      200.589996
      34.633526
      27.278000
    
    
      9.0
      200.229996
      34.633526
      27.275999
    
    
      2009-08-24 01:36:59.990400000
      -17.952
      7.558
      4.3
      196.679993
      34.635628
      27.177000
    
    
      6.1
      196.300003
      34.635628
      27.177000
    
    
      9.4
      195.520004
      34.634624
      27.173000
    
    
      2009-08-14 01:38:00.038400000
      -18.703
      7.582
      4.0
      195.809998
      35.160473
      26.906000
    
    
      5.9
      194.820007
      35.160473
      26.905001
    
    
      8.9
      193.339996
      35.160473
      26.907000
    
  

63 rows × 3 columns

In one command we can take the mean of all the values from the upper 10 decibars:



In [10]:

    
df.query('pressure < 10').groupby(level=['wmo', 'time']).mean()









    Out[10]:






  
    
      
      
      DOXY_ADJUSTED
      PSAL_ADJUSTED
      TEMP_ADJUSTED
    
    
      wmo
      time
      
      
      
    
  
  
    
      1900650
      2009-08-14 01:38:00.038400000
      194.656667
      35.160473
      26.906000
    
    
      2009-08-24 01:36:59.990400000
      196.166667
      34.635293
      27.175667
    
    
      2009-09-03 01:39:59.961600000
      200.486664
      34.633526
      27.277000
    
    
      2009-09-12 23:12:59.990400000
      202.010000
      35.077201
      27.115334
    
    
      2009-09-23 01:30:00.000000000
      191.626668
      34.910834
      27.465333
    
    
      2009-10-02 23:46:59.980800000
      182.533335
      34.813449
      27.766000
    
    
      2009-10-13 03:00:00.000000000
      160.286667
      34.566499
      28.086333
    
    
      2009-10-23 01:23:59.971200000
      50.989999
      34.757608
      28.383000
    
    
      2009-11-02 02:46:59.980800000
      145.029999
      34.667896
      28.403667
    
    
      2009-11-11 23:15:41.040000000
      123.626668
      34.569499
      28.786667
    
    
      2009-11-22 01:18:26.035200000
      118.360001
      34.545078
      29.077667
    
    
      2009-12-01 23:41:39.004800000
      142.169998
      34.523666
      28.471667
    
    
      2009-12-11 23:35:43.987200000
      163.026667
      34.810046
      28.672667
    
    
      2009-12-22 01:12:05.011200000
      142.753332
      33.912081
      29.030333
    
    
      2010-01-01 01:10:18.998400000
      180.740000
      34.332881
      28.845667
    
    
      2010-01-11 01:07:43.996800000
      199.673335
      34.392058
      29.048000
    
    
      2010-01-21 02:05:11.011200000
      198.990000
      35.032192
      29.036000
    
    
      2010-01-31 01:02:30.019199999
      194.313334
      34.679620
      29.146333
    
    
      2010-02-10 01:57:26.956800000
      189.586670
      34.772458
      29.654667
    
    
      2010-02-20 01:52:38.035200000
      204.206665
      35.151833
      29.461000
    
    
      2010-03-12 01:39:40.003200000
      206.390004
      34.827457
      29.789667

We can plot the profiles:



In [11]:

    
%pylab inline
import pylab as plt
# Parameter long_name and units copied from attributes in NetCDF files
parms = {'TEMP_ADJUSTED': 'SEA TEMPERATURE IN SITU ITS-90 SCALE (degree_Celsius)', 
         'PSAL_ADJUSTED': 'PRACTICAL SALINITY (psu)',
         'DOXY_ADJUSTED': 'DISSOLVED OXYGEN (micromole/kg)'}

plt.rcParams['figure.figsize'] = (18.0, 8.0)
fig, ax = plt.subplots(1, len(parms), sharey=True)
ax[0].invert_yaxis()
ax[0].set_ylabel('SEA PRESSURE (decibar)')

for i, (p, label) in enumerate(parms.iteritems()):
    ax[i].set_xlabel(label)
    ax[i].plot(df[p], df.index.get_level_values('pressure'), '.')
    
plt.suptitle('Float(s) ' + ' '.join(wmo_list) + ' from ' + time_range)









    



Populating the interactive namespace from numpy and matplotlib






    Out[11]:





<matplotlib.text.Text at 0x7f5279243590>

We can plot the location of these profiles on a map:



In [12]:

    
from mpl_toolkits.basemap import Basemap

m = Basemap(llcrnrlon=15, llcrnrlat=-90, urcrnrlon=390, urcrnrlat=90, projection='cyl')
m.fillcontinents(color='0.8')

m.scatter(df.index.get_level_values('lon'), df.index.get_level_values('lat'), latlon=True)
plt.title('Float(s) ' + ' '.join(wmo_list) + ' from ' + time_range)









    Out[12]:





<matplotlib.text.Text at 0x7f52770fe450>






    



/home/mccann/VirtualEnvs/oxyfloat/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

					DOXY_ADJUSTED	PSAL_ADJUSTED	TEMP_ADJUSTED
wmo	time	lon	lat	pressure
1900650	2010-03-12 01:39:40.003200	-14.026	6.031	4.3	206.490005	34.827457	29.790001
				5.9	206.380005	34.827457	29.789000
				9.1	206.300003	34.827457	29.790001
				13.9	206.850006	34.826454	29.787001
				19.5	206.860001	34.847443	29.681999

		DOXY_ADJUSTED	PSAL_ADJUSTED	TEMP_ADJUSTED
wmo	time
1900650	2009-08-14 01:38:00.038400000	194.656667	35.160473	26.906000
	2009-08-24 01:36:59.990400000	196.166667	34.635293	27.175667
	2009-09-03 01:39:59.961600000	200.486664	34.633526	27.277000
	2009-09-12 23:12:59.990400000	202.010000	35.077201	27.115334
	2009-09-23 01:30:00.000000000	191.626668	34.910834	27.465333
	2009-10-02 23:46:59.980800000	182.533335	34.813449	27.766000
	2009-10-13 03:00:00.000000000	160.286667	34.566499	28.086333
	2009-10-23 01:23:59.971200000	50.989999	34.757608	28.383000
	2009-11-02 02:46:59.980800000	145.029999	34.667896	28.403667
	2009-11-11 23:15:41.040000000	123.626668	34.569499	28.786667
	2009-11-22 01:18:26.035200000	118.360001	34.545078	29.077667
	2009-12-01 23:41:39.004800000	142.169998	34.523666	28.471667
	2009-12-11 23:35:43.987200000	163.026667	34.810046	28.672667
	2009-12-22 01:12:05.011200000	142.753332	33.912081	29.030333
	2010-01-01 01:10:18.998400000	180.740000	34.332881	28.845667
	2010-01-11 01:07:43.996800000	199.673335	34.392058	29.048000
	2010-01-21 02:05:11.011200000	198.990000	35.032192	29.036000
	2010-01-31 01:02:30.019199999	194.313334	34.679620	29.146333
	2010-02-10 01:57:26.956800000	189.586670	34.772458	29.654667
	2010-02-20 01:52:38.035200000	204.206665	35.151833	29.461000
	2010-03-12 01:39:40.003200000	206.390004	34.827457	29.789667