Exploring multidimensional data using xray

Here is a little graphical representation of the way to think about this data. For clarification on how multidimensional data are represented in xray, visit: http://xray.readthedocs.org/en/latest/



In [1]:

    
from IPython.display import Image
Image(url='http://xray.readthedocs.org/en/latest/_images/dataset-diagram.png', embed=True, width=950, height=300)









    Out[1]:

Loading an example file into a dataset



In [2]:

    
import numpy as np
import pandas as pd
import xray

This is an example of what our soil moisture data from the radio tower install will look like. Each site has a lat, lon, and elevation and at the site we will record rainfall as well as soil temp and soil moisture at two depths. So there are up to 3 dimensions along which the data are recorded: site, depth, and time.



In [3]:

    
temp = 15 + 8 * np.random.randn(2, 2, 3)
VW = 15 + 10 * abs(np.random.randn(2, 2, 3))
precip = 10 * np.random.rand(2, 3)
depths = [5, 20]
lons = [-99.83, -99.79]
lats = [42.63, 42.59]
elevations = [1600, 1650]

ds = xray.Dataset({'temperature': (['site', 'depth', 'time'],  temp, {'units':'C'}),
                   'soil_moisture': (['site', 'depth', 'time'],  VW, {'units':'percent'}),
                   'precipitation': (['site', 'time'], precip, {'units':'mm'})},
                   coords={'lon': (['site'], lons, {'units':'degrees east'}),
                           'lat': (['site'], lats, {'units':'degrees north'}),
                           'elevation': (['site'], elevations, {'units':'m'}),
                           'site': ['Acacia', 'Riverine'],
                           'depth': (['depth'], depths, {'units': 'cm'}),
                           'time': pd.date_range('2015-05-19', periods=3)})



In [4]:

    
ds









    Out[4]:





<xray.Dataset>
Dimensions:        (depth: 2, site: 2, time: 3)
Coordinates:
    elevation      (site) int32 1600 1650
    lon            (site) float64 -99.83 -99.79
  * site           (site) |S8 'Acacia' 'Riverine'
  * depth          (depth) int32 5 20
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
    lat            (site) float64 42.63 42.59
Data variables:
    soil_moisture  (site, depth, time) float64 25.75 18.77 17.04 17.03 18.43 20.84 16.28 20.15 ...
    precipitation  (site, time) float64 6.991 5.378 3.453 6.781 9.125 5.587
    temperature    (site, depth, time) float64 11.69 23.79 28.45 14.41 4.535 -3.386 16.2 13.67 ...

Inspecting and selecting from dataset

To select the data for a specific site we just write:



In [5]:

    
ds.sel(site='Acacia')









    Out[5]:





<xray.Dataset>
Dimensions:        (depth: 2, time: 3)
Coordinates:
    elevation      int32 1600
    lon            float64 -99.83
    site           |S8 'Acacia'
  * depth          (depth) int32 5 20
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
    lat            float64 42.63
Data variables:
    soil_moisture  (depth, time) float64 25.75 18.77 17.04 17.03 18.43 20.84
    precipitation  (time) float64 6.991 5.378 3.453
    temperature    (depth, time) float64 11.69 23.79 28.45 14.41 4.535 -3.386

Now if we are only interested in soil moisture at the upper depth at a specific time, we can pull out just that one data point:



In [6]:

    
print ds.soil_moisture.sel(site='Acacia', time='2015-05-19', depth=5).values









    



25.7471624799

For precip there are no depth values, so a specific data point can be pulled just by selecting for time and site:



In [7]:

    
print ds.precipitation.sel(site='Acacia', time='2015-05-19').values









    



6.99050820472

Test what this dataset looks like in pandas and netCDF



In [8]:

    
ds.to_dataframe()









    Out[8]:






  
    
      
      
      
      elevation
      lat
      lon
      precipitation
      soil_moisture
      temperature
    
    
      depth
      site
      time
      
      
      
      
      
      
    
  
  
    
      5
      Acacia
      2015-05-19
      1600
      42.63
      -99.83
      6.990508
      25.747162
      11.693499
    
    
      2015-05-20
      1600
      42.63
      -99.83
      5.378432
      18.766930
      23.785339
    
    
      2015-05-21
      1600
      42.63
      -99.83
      3.452580
      17.036814
      28.453882
    
    
      Riverine
      2015-05-19
      1650
      42.59
      -99.79
      6.781172
      16.283327
      16.204980
    
    
      2015-05-20
      1650
      42.59
      -99.79
      9.125452
      20.152783
      13.669500
    
    
      2015-05-21
      1650
      42.59
      -99.79
      5.586781
      18.621126
      12.947183
    
    
      20
      Acacia
      2015-05-19
      1600
      42.63
      -99.83
      6.990508
      17.028542
      14.405504
    
    
      2015-05-20
      1600
      42.63
      -99.83
      5.378432
      18.429830
      4.534717
    
    
      2015-05-21
      1600
      42.63
      -99.83
      3.452580
      20.835446
      -3.385908
    
    
      Riverine
      2015-05-19
      1650
      42.59
      -99.79
      6.781172
      19.487425
      20.924311
    
    
      2015-05-20
      1650
      42.59
      -99.79
      9.125452
      15.338002
      13.283215
    
    
      2015-05-21
      1650
      42.59
      -99.79
      5.586781
      23.741205
      14.730681



In [9]:

    
ds.to_netcdf('test.nc')

Going back and forth between datasets and dataframes



In [10]:

    
sites = ['MainTower']  # can be replaced if there are more specific sites
lons = [36.8701]       # degrees east
lats = [0.4856]        # degrees north
elevations = [1610]    # m above see level

coords={'site': (['site'], sites),
        'lon': (['site'], lons, dict(units='degrees east')),
        'lat': (['site'], lats, dict(units='degrees north')),
        'elevation': (['site'], elevations, dict(units='m')),
        'time': pd.date_range('2015-05-19', periods=3)}

precip = 10 * np.random.rand(1, 3)
ds = xray.Dataset({'precipitation': (['site', 'time'], precip, {'units':'mm'})},
                   coords=coords)
ds









    Out[10]:





<xray.Dataset>
Dimensions:        (site: 1, time: 3)
Coordinates:
    lat            (site) float64 0.4856
    elevation      (site) int32 1610
    lon            (site) float64 36.87
  * site           (site) |S9 'MainTower'
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
Data variables:
    precipitation  (site, time) float64 0.741 1.297 7.212



In [11]:

    
df = ds.to_dataframe()
df









    Out[11]:






  
    
      
      
      elevation
      lat
      lon
      precipitation
    
    
      site
      time
      
      
      
      
    
  
  
    
      MainTower
      2015-05-19
      1610
      0.4856
      36.8701
      0.740951
    
    
      2015-05-20
      1610
      0.4856
      36.8701
      1.297482
    
    
      2015-05-21
      1610
      0.4856
      36.8701
      7.212289



In [12]:

    
df.index









    Out[12]:





MultiIndex(levels=[[u'MainTower'], [2015-05-19 00:00:00, 2015-05-20 00:00:00, 2015-05-21 00:00:00]],
           labels=[[0, 0, 0], [0, 1, 2]],
           names=[u'site', u'time'])

Loading dataframes and transfering to datasets



In [13]:

    
from __init__ import *
from TOA5_to_netcdf import *



In [14]:

    
lons = [36.8701]       # degrees east
lats = [0.4856]        # degrees north
elevations = [1610]    # m above see level

coords={'lon': (['site'], lons, dict(units='degrees east')),
        'lat': (['site'], lats, dict(units='degrees north')),
        'elevation': (['site'], elevations, dict(units='m'))}



In [15]:

    
path = os.getcwd().replace('\\','/')+'/current_data/'
input_file = path + 'CR3000_SN4709_flux.dat'
input_dict = {'has_header': True,
              'header_file': input_file,
              'datafile': 'soil',
              'path': path,
              'filename': 'CR3000_SN4709_flux.dat'}



In [16]:

    
df = createDF(input_file, input_dict, attrs)[0]
attrs, local_attrs = get_attrs(input_dict['header_file'], attrs)
ds = createDS(df, input_dict, attrs, local_attrs, site, coords_vals)
ds.to_netcdf(path='test2.nc', format='NETCDF3_64BIT')
xray.open_dataset('test2.nc')









    Out[16]:





<xray.Dataset>
Dimensions:                (site: 1, time: 34)
Coordinates:
    lon                    (site) float32 36.8701
  * site                   (site) |S9 'MainTower'
    elevation              (site) int32 1610
    lat                    (site) float32 0.4856
  * time                   (time) datetime64[ns] 2014-11-14T07:00:00 2014-11-14T07:30:00 ...
Data variables:
    batt_volt_Std          (site, time) float32 0.016 0.016 0.016 0.016 0.016 0.016 0.016 0.016 ...
    agc_Avg                (site, time) float32 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 ...
    WVIA_H2Oppm_Std        (site, time) float32 224.66 248.511 183.528 209.882 383.109 158.485 ...
    cov_h2o_Uz             (site, time) float32 0.0378247 0.0349794 0.035367 0.0510279 0.0520805 ...
    cov_h2o_Ux             (site, time) float32 -0.0849143 -0.0074305 -0.0363381 -0.055827 ...
    cov_h2o_Uy             (site, time) float32 -0.0219625 0.00759393 -0.0273952 0.00218496 ...
    stdev_h2o              (site, time) float32 0.155446 0.184698 0.151865 0.169678 0.318237 ...
    e_hmp_Std              (site, time) float32 0.017 0.023 0.018 0.021 0.042 0.021 0.03 0.031 ...
    co2_wpl_LE             (site, time) float32 0.0366397 0.0338781 0.0342262 0.0493333 ...
    Uz_Std                 (site, time) float32 0.57 0.63 0.607 0.786 0.73 0.766 0.832 0.796 ...
    amp_h_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    batt_volt_Avg          (site, time) float32 13.3874 13.3855 13.3855 13.3827 13.383 13.3841 ...
    WVIA_CavTempC_Std      (site, time) float32 0.0226088 0.0560195 0.0229745 0.00420734 ...
    WVIA_PressTorr_Std     (site, time) float32 0.0123983 0.00463331 0.00469628 0.00472634 ...
    H                      (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
    Uz_Avg                 (site, time) float32 0.149465 0.171783 0.112868 0.203524 0.153066 ...
    e_hmp_Avg              (site, time) float32 1.465 1.452 1.431 1.424 1.375 1.335 1.295 1.285 ...
    h2o_wpl_H              (site, time) float32 10.4263 18.8259 11.4737 19.5092 16.6185 21.8298 ...
    std_wnd_dir            (site, time) float32 19.1276 22.9743 20.1864 16.6435 17.7829 17.5051 ...
    rho_a_mean             (site, time) float32 0.992901 0.991127 0.99032 0.988077 0.986364 ...
    co2_wpl_H              (site, time) float32 0.218513 0.408513 0.248213 0.426289 0.374028 ...
    WVIA_delD_Avg          (site, time) float32 -71.4143 -73.5358 -75.6481 -74.6131 -75.3503 ...
    press_mean             (site, time) float32 83.8841 83.8784 83.8594 83.8305 83.7991 83.7527 ...
    WVIA_del18O_Std        (site, time) float32 1.61055 0.891676 0.861896 0.862261 1.02595 ...
    WVIA_O18overO16_Avg    (site, time) float32 0.00197438 0.00197301 0.00197577 0.00197649 ...
    chopper_f_Tot          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    Ux_Std                 (site, time) float32 1.069 0.849 0.962 1.307 1.403 1.453 1.546 1.38 ...
    irga_warnings          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    panel_temp_Avg         (site, time) float32 29.1243 28.211 27.1383 26.4963 26.1689 26.7043 ...
    Hs                     (site, time) float32 113.18 206.478 127.846 218.719 193.391 261.991 ...
    WVIA_CavTempC_Avg      (site, time) float32 48.5153 48.3382 48.2206 48.181 48.1828 48.2329 ...
    WVIA_HDOoverH2O_Std    (site, time) float32 7.14242e-07 8.69306e-07 7.55491e-07 7.61553e-07 ...
    stdev_Uz               (site, time) float32 0.570468 0.629907 0.606553 0.785866 0.730341 ...
    Hc                     (site, time) float32 106.175 198.892 121.038 208.491 183.32 250.135 ...
    stdev_Ux               (site, time) float32 1.06949 0.849402 0.96229 1.30665 1.40336 1.45279 ...
    stdev_Uy               (site, time) float32 1.14111 1.36759 1.58961 1.62282 1.61014 1.77302 ...
    WVIA_H2Oppm_Avg        (site, time) float32 14931.3 14773.1 14480.8 14446.7 14042.0 13701.7 ...
    WVIA_del18O_Avg        (site, time) float32 -15.3671 -16.0518 -14.6781 -14.3197 -14.8288 ...
    del_T_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    t_hmp_Std              (site, time) float32 0.171 0.313 0.233 0.195 0.246 0.27 0.294 0.148 ...
    t_hmp_Avg              (site, time) float32 19.58 20.1 20.3 20.87 21.33 21.65 22.24 22.54 ...
    WVIA_HDOoverH2O_Avg    (site, time) float32 0.000287411 0.000286755 0.000286101 0.000286421 ...
    co2_mean               (site, time) float32 590.464 589.368 588.514 586.629 585.641 585.341 ...
    n_Tot                  (site, time) int32 18000 18000 18000 18000 18000 18000 18000 18000 ...
    LE_irga                (site, time) float32 92.2922 85.3496 86.2955 124.508 127.076 145.108 ...
    LE_wpl                 (site, time) float32 104.358 105.679 99.267 146.169 145.815 169.288 ...
    cov_Ts_Uy              (site, time) float32 0.00766579 -0.0514139 -0.0210965 -0.0798197 ...
    cov_Ts_Ux              (site, time) float32 -0.175227 -0.149267 -0.119752 -0.224446 ...
    cov_Ts_Uz              (site, time) float32 0.11346 0.207358 0.128495 0.220329 0.195153 ...
    cov_Uy_Uz              (site, time) float32 0.0526281 0.0746325 0.101163 -0.0307808 ...
    WVIA_MirrorTempUS_Avg  (site, time) float32 8.0557 8.05161 8.05301 8.04756 8.06418 8.061 ...
    KH_rho_w_Avg           (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
    WVIA_H2O_O18_16_Avg    (site, time) float32 0.0019863 0.00198492 0.00198769 0.00198841 ...
    t_hmp_mean             (site, time) float32 19.5751 20.0955 20.2964 20.869 21.3349 21.6509 ...
    h2o_Avg                (site, time) float32 11.439 11.3134 11.1473 11.0878 10.6933 10.4035 ...
    WVIA_DoverH_Avg        (site, time) float32 0.000144627 0.000144297 0.000143968 0.000144129 ...
    WVIA_PressTorr_Avg     (site, time) float32 39.5321 39.5507 39.5548 39.556 39.6093 39.6107 ...
    WVIA_O18overO16_Std    (site, time) float32 3.22977e-06 1.78816e-06 1.72746e-06 1.72873e-06 ...
    detector_f_Tot         (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    h2o_hmp_mean           (site, time) float32 10.832 10.7223 10.5556 10.4884 10.1089 9.80297 ...
    kh_mV_Avg              (site, time) float32 -2.57945 -2.48721 -2.49531 -2.33123 -2.38383 ...
    WVIA_H2O_O18_16_Std    (site, time) float32 3.24921e-06 1.79927e-06 1.73822e-06 1.73927e-06 ...
    Uy_Std                 (site, time) float32 1.141 1.368 1.59 1.623 1.61 1.773 1.366 1.516 ...
    Ux_Avg                 (site, time) float32 3.31151 2.82275 3.60617 4.31085 4.57026 4.72029 ...
    cov_Ux_Uy              (site, time) float32 0.00495589 -0.0266461 0.109639 0.317565 0.461053 ...
    Uy_Avg                 (site, time) float32 0.802545 1.16517 1.79512 2.48395 1.66233 1.85764 ...
    cov_Ux_Uz              (site, time) float32 -0.248603 -0.121524 -0.137646 -0.288274 ...
    Fc_wpl                 (site, time) float32 -0.266433 -0.454724 -0.253893 -0.344292 ...
    stdev_Ts               (site, time) float32 0.388251 0.559185 0.464061 0.583613 0.647009 ...
    WVIA_DoverH_Std        (site, time) float32 3.59551e-07 4.37687e-07 3.8039e-07 3.83365e-07 ...
    Ts_mean                (site, time) float32 21.3199 21.749 22.0668 22.6476 23.1166 23.3476 ...
    tau                    (site, time) float32 0.252309 0.141346 0.169169 0.286456 0.323494 ...
    rh_hmp_Avg             (site, time) float32 64.35 61.79 60.11 57.77 54.2 51.6 48.3 47.04 ...
    u_star                 (site, time) float32 0.504096 0.377639 0.413307 0.538436 0.572683 ...
    wnd_spd                (site, time) float32 3.60859 3.32093 4.29502 5.19459 5.10947 5.32119 ...
    cov_co2_Uy             (site, time) float32 -0.0842502 0.447055 0.374189 0.317796 1.01696 ...
    cov_co2_Ux             (site, time) float32 0.732 0.594694 0.540849 1.00302 1.55531 1.7098 ...
    cov_co2_Uz             (site, time) float32 -0.521585 -0.897115 -0.536332 -0.819914 -0.71077 ...
    RECORD                 (site, time) int32 29214 29215 29216 29217 29218 29219 29220 29221 ...
    sig_lck_f_Tot          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    wnd_dir_compass        (site, time) float32 86.377 77.5702 73.5362 70.049 80.0123 78.5182 ...
    csat_warnings          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    amp_l_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    WVIA_delD_Std          (site, time) float32 2.30835 2.8101 2.44208 2.46221 2.59453 2.78947 ...
    rh_hmp_Std             (site, time) float32 1.003 1.717 0.909 0.946 1.939 0.869 1.289 1.043 ...
    stdev_co2              (site, time) float32 1.81483 2.47358 1.95025 2.27201 2.39509 2.26651 ...
    WVIA_MirrorTempUS_Std  (site, time) float32 0.0160431 0.0151895 0.0153116 0.0148065 ...
    panel_temp_Std         (site, time) float32 0.228 0.287 0.305 0.121 0.048 0.323 0.476 0.453 ...
    pll_f_Tot              (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    sync_f_Tot             (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    h2o_wpl_LE             (site, time) float32 1.64005 1.50387 1.49787 2.15217 2.11998 2.35022 ...
    rslt_wnd_spd           (site, time) float32 3.40736 3.05377 4.02826 4.97528 4.8632 5.07267 ...
    wnd_dir_csat3          (site, time) float32 13.6229 22.4298 26.4637 29.951 19.9877 21.4817 ...
    Fc_irga                (site, time) float32 -0.521585 -0.897115 -0.536332 -0.819914 -0.71077 ...
    ln_kh_Avg              (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    featureType: timeSeries
    datafile: flux
    logger: CR3000_SN4709
    creator_name: Kelly Caylor
    license: MIT License
    acknowledgement: Funded by NSF and Princeton University
    creator_email: kcaylor@princeton.edu
    format: TOA5
    local_timezone: Africa/Nairobi
    Conventions: CF-1.6
    source: Flux tower sensor data CR3000_SN4709_flux.dat, MainTowerCR3000_V11.cr3, 39879
    program: MainTowerCR3000_V11.cr3
    keywords: eddy covariance, isotope hydrology, land surface flux
    title: Flux Tower Data from MPALA
    station_name: MPALA Tower
    summary: This raw data comes from the MPALA Flux Tower, which is maintained by the Ecohydrology Lab at Mpala Research Centre in Laikipia, Kenya. It is part of a long-term monitoring project that was originally funded by NSF and now runs with support from Princeton. Its focus is on using stable isotopes to better understand water balance in drylands, particularly transpiration and evaporation fluxes.
    institution: Princeton University
    naming_authority: caylor.princeton.edu

			elevation	lat	lon	precipitation	soil_moisture	temperature
depth	site	time
5	Acacia	2015-05-19	1600	42.63	-99.83	6.990508	25.747162	11.693499
		2015-05-20	1600	42.63	-99.83	5.378432	18.766930	23.785339
		2015-05-21	1600	42.63	-99.83	3.452580	17.036814	28.453882
	Riverine	2015-05-19	1650	42.59	-99.79	6.781172	16.283327	16.204980
		2015-05-20	1650	42.59	-99.79	9.125452	20.152783	13.669500
		2015-05-21	1650	42.59	-99.79	5.586781	18.621126	12.947183
20	Acacia	2015-05-19	1600	42.63	-99.83	6.990508	17.028542	14.405504
		2015-05-20	1600	42.63	-99.83	5.378432	18.429830	4.534717
		2015-05-21	1600	42.63	-99.83	3.452580	20.835446	-3.385908
	Riverine	2015-05-19	1650	42.59	-99.79	6.781172	19.487425	20.924311
		2015-05-20	1650	42.59	-99.79	9.125452	15.338002	13.283215
		2015-05-21	1650	42.59	-99.79	5.586781	23.741205	14.730681

		elevation	lat	lon	precipitation
site	time
MainTower	2015-05-19	1610	0.4856	36.8701	0.740951
	2015-05-20	1610	0.4856	36.8701	1.297482
	2015-05-21	1610	0.4856	36.8701	7.212289