Exploring multidimensional data using xray

Here is a little graphical representation of the way to think about this data. For clarification on how multidimensional data are represented in xray, visit: http://xray.readthedocs.org/en/latest/


In [1]:
from IPython.display import Image
Image(url='http://xray.readthedocs.org/en/latest/_images/dataset-diagram.png', embed=True, width=950, height=300)


Out[1]:

Loading an example file into a dataset


In [2]:
import numpy as np
import pandas as pd
import xray

This is an example of what our soil moisture data from the radio tower install will look like. Each site has a lat, lon, and elevation and at the site we will record rainfall as well as soil temp and soil moisture at two depths. So there are up to 3 dimensions along which the data are recorded: site, depth, and time.


In [3]:
temp = 15 + 8 * np.random.randn(2, 2, 3)
VW = 15 + 10 * abs(np.random.randn(2, 2, 3))
precip = 10 * np.random.rand(2, 3)
depths = [5, 20]
lons = [-99.83, -99.79]
lats = [42.63, 42.59]
elevations = [1600, 1650]

ds = xray.Dataset({'temperature': (['site', 'depth', 'time'],  temp, {'units':'C'}),
                   'soil_moisture': (['site', 'depth', 'time'],  VW, {'units':'percent'}),
                   'precipitation': (['site', 'time'], precip, {'units':'mm'})},
                   coords={'lon': (['site'], lons, {'units':'degrees east'}),
                           'lat': (['site'], lats, {'units':'degrees north'}),
                           'elevation': (['site'], elevations, {'units':'m'}),
                           'site': ['Acacia', 'Riverine'],
                           'depth': (['depth'], depths, {'units': 'cm'}),
                           'time': pd.date_range('2015-05-19', periods=3)})

In [4]:
ds


Out[4]:
<xray.Dataset>
Dimensions:        (depth: 2, site: 2, time: 3)
Coordinates:
    elevation      (site) int32 1600 1650
    lon            (site) float64 -99.83 -99.79
  * site           (site) |S8 'Acacia' 'Riverine'
  * depth          (depth) int32 5 20
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
    lat            (site) float64 42.63 42.59
Data variables:
    soil_moisture  (site, depth, time) float64 25.75 18.77 17.04 17.03 18.43 20.84 16.28 20.15 ...
    precipitation  (site, time) float64 6.991 5.378 3.453 6.781 9.125 5.587
    temperature    (site, depth, time) float64 11.69 23.79 28.45 14.41 4.535 -3.386 16.2 13.67 ...

Inspecting and selecting from dataset

To select the data for a specific site we just write:


In [5]:
ds.sel(site='Acacia')


Out[5]:
<xray.Dataset>
Dimensions:        (depth: 2, time: 3)
Coordinates:
    elevation      int32 1600
    lon            float64 -99.83
    site           |S8 'Acacia'
  * depth          (depth) int32 5 20
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
    lat            float64 42.63
Data variables:
    soil_moisture  (depth, time) float64 25.75 18.77 17.04 17.03 18.43 20.84
    precipitation  (time) float64 6.991 5.378 3.453
    temperature    (depth, time) float64 11.69 23.79 28.45 14.41 4.535 -3.386

Now if we are only interested in soil moisture at the upper depth at a specific time, we can pull out just that one data point:


In [6]:
print ds.soil_moisture.sel(site='Acacia', time='2015-05-19', depth=5).values


25.7471624799

For precip there are no depth values, so a specific data point can be pulled just by selecting for time and site:


In [7]:
print ds.precipitation.sel(site='Acacia', time='2015-05-19').values


6.99050820472

Test what this dataset looks like in pandas and netCDF


In [8]:
ds.to_dataframe()


Out[8]:
elevation lat lon precipitation soil_moisture temperature
depth site time
5 Acacia 2015-05-19 1600 42.63 -99.83 6.990508 25.747162 11.693499
2015-05-20 1600 42.63 -99.83 5.378432 18.766930 23.785339
2015-05-21 1600 42.63 -99.83 3.452580 17.036814 28.453882
Riverine 2015-05-19 1650 42.59 -99.79 6.781172 16.283327 16.204980
2015-05-20 1650 42.59 -99.79 9.125452 20.152783 13.669500
2015-05-21 1650 42.59 -99.79 5.586781 18.621126 12.947183
20 Acacia 2015-05-19 1600 42.63 -99.83 6.990508 17.028542 14.405504
2015-05-20 1600 42.63 -99.83 5.378432 18.429830 4.534717
2015-05-21 1600 42.63 -99.83 3.452580 20.835446 -3.385908
Riverine 2015-05-19 1650 42.59 -99.79 6.781172 19.487425 20.924311
2015-05-20 1650 42.59 -99.79 9.125452 15.338002 13.283215
2015-05-21 1650 42.59 -99.79 5.586781 23.741205 14.730681

In [9]:
ds.to_netcdf('test.nc')

Going back and forth between datasets and dataframes


In [10]:
sites = ['MainTower']  # can be replaced if there are more specific sites
lons = [36.8701]       # degrees east
lats = [0.4856]        # degrees north
elevations = [1610]    # m above see level

coords={'site': (['site'], sites),
        'lon': (['site'], lons, dict(units='degrees east')),
        'lat': (['site'], lats, dict(units='degrees north')),
        'elevation': (['site'], elevations, dict(units='m')),
        'time': pd.date_range('2015-05-19', periods=3)}

precip = 10 * np.random.rand(1, 3)
ds = xray.Dataset({'precipitation': (['site', 'time'], precip, {'units':'mm'})},
                   coords=coords)
ds


Out[10]:
<xray.Dataset>
Dimensions:        (site: 1, time: 3)
Coordinates:
    lat            (site) float64 0.4856
    elevation      (site) int32 1610
    lon            (site) float64 36.87
  * site           (site) |S9 'MainTower'
  * time           (time) datetime64[ns] 2015-05-19 2015-05-20 2015-05-21
Data variables:
    precipitation  (site, time) float64 0.741 1.297 7.212

In [11]:
df = ds.to_dataframe()
df


Out[11]:
elevation lat lon precipitation
site time
MainTower 2015-05-19 1610 0.4856 36.8701 0.740951
2015-05-20 1610 0.4856 36.8701 1.297482
2015-05-21 1610 0.4856 36.8701 7.212289

In [12]:
df.index


Out[12]:
MultiIndex(levels=[[u'MainTower'], [2015-05-19 00:00:00, 2015-05-20 00:00:00, 2015-05-21 00:00:00]],
           labels=[[0, 0, 0], [0, 1, 2]],
           names=[u'site', u'time'])

Loading dataframes and transfering to datasets


In [13]:
from __init__ import *
from TOA5_to_netcdf import *

In [14]:
lons = [36.8701]       # degrees east
lats = [0.4856]        # degrees north
elevations = [1610]    # m above see level

coords={'lon': (['site'], lons, dict(units='degrees east')),
        'lat': (['site'], lats, dict(units='degrees north')),
        'elevation': (['site'], elevations, dict(units='m'))}

In [15]:
path = os.getcwd().replace('\\','/')+'/current_data/'
input_file = path + 'CR3000_SN4709_flux.dat'
input_dict = {'has_header': True,
              'header_file': input_file,
              'datafile': 'soil',
              'path': path,
              'filename': 'CR3000_SN4709_flux.dat'}

In [16]:
df = createDF(input_file, input_dict, attrs)[0]
attrs, local_attrs = get_attrs(input_dict['header_file'], attrs)
ds = createDS(df, input_dict, attrs, local_attrs, site, coords_vals)
ds.to_netcdf(path='test2.nc', format='NETCDF3_64BIT')
xray.open_dataset('test2.nc')


Out[16]:
<xray.Dataset>
Dimensions:                (site: 1, time: 34)
Coordinates:
    lon                    (site) float32 36.8701
  * site                   (site) |S9 'MainTower'
    elevation              (site) int32 1610
    lat                    (site) float32 0.4856
  * time                   (time) datetime64[ns] 2014-11-14T07:00:00 2014-11-14T07:30:00 ...
Data variables:
    batt_volt_Std          (site, time) float32 0.016 0.016 0.016 0.016 0.016 0.016 0.016 0.016 ...
    agc_Avg                (site, time) float32 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 ...
    WVIA_H2Oppm_Std        (site, time) float32 224.66 248.511 183.528 209.882 383.109 158.485 ...
    cov_h2o_Uz             (site, time) float32 0.0378247 0.0349794 0.035367 0.0510279 0.0520805 ...
    cov_h2o_Ux             (site, time) float32 -0.0849143 -0.0074305 -0.0363381 -0.055827 ...
    cov_h2o_Uy             (site, time) float32 -0.0219625 0.00759393 -0.0273952 0.00218496 ...
    stdev_h2o              (site, time) float32 0.155446 0.184698 0.151865 0.169678 0.318237 ...
    e_hmp_Std              (site, time) float32 0.017 0.023 0.018 0.021 0.042 0.021 0.03 0.031 ...
    co2_wpl_LE             (site, time) float32 0.0366397 0.0338781 0.0342262 0.0493333 ...
    Uz_Std                 (site, time) float32 0.57 0.63 0.607 0.786 0.73 0.766 0.832 0.796 ...
    amp_h_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    batt_volt_Avg          (site, time) float32 13.3874 13.3855 13.3855 13.3827 13.383 13.3841 ...
    WVIA_CavTempC_Std      (site, time) float32 0.0226088 0.0560195 0.0229745 0.00420734 ...
    WVIA_PressTorr_Std     (site, time) float32 0.0123983 0.00463331 0.00469628 0.00472634 ...
    H                      (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
    Uz_Avg                 (site, time) float32 0.149465 0.171783 0.112868 0.203524 0.153066 ...
    e_hmp_Avg              (site, time) float32 1.465 1.452 1.431 1.424 1.375 1.335 1.295 1.285 ...
    h2o_wpl_H              (site, time) float32 10.4263 18.8259 11.4737 19.5092 16.6185 21.8298 ...
    std_wnd_dir            (site, time) float32 19.1276 22.9743 20.1864 16.6435 17.7829 17.5051 ...
    rho_a_mean             (site, time) float32 0.992901 0.991127 0.99032 0.988077 0.986364 ...
    co2_wpl_H              (site, time) float32 0.218513 0.408513 0.248213 0.426289 0.374028 ...
    WVIA_delD_Avg          (site, time) float32 -71.4143 -73.5358 -75.6481 -74.6131 -75.3503 ...
    press_mean             (site, time) float32 83.8841 83.8784 83.8594 83.8305 83.7991 83.7527 ...
    WVIA_del18O_Std        (site, time) float32 1.61055 0.891676 0.861896 0.862261 1.02595 ...
    WVIA_O18overO16_Avg    (site, time) float32 0.00197438 0.00197301 0.00197577 0.00197649 ...
    chopper_f_Tot          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    Ux_Std                 (site, time) float32 1.069 0.849 0.962 1.307 1.403 1.453 1.546 1.38 ...
    irga_warnings          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    panel_temp_Avg         (site, time) float32 29.1243 28.211 27.1383 26.4963 26.1689 26.7043 ...
    Hs                     (site, time) float32 113.18 206.478 127.846 218.719 193.391 261.991 ...
    WVIA_CavTempC_Avg      (site, time) float32 48.5153 48.3382 48.2206 48.181 48.1828 48.2329 ...
    WVIA_HDOoverH2O_Std    (site, time) float32 7.14242e-07 8.69306e-07 7.55491e-07 7.61553e-07 ...
    stdev_Uz               (site, time) float32 0.570468 0.629907 0.606553 0.785866 0.730341 ...
    Hc                     (site, time) float32 106.175 198.892 121.038 208.491 183.32 250.135 ...
    stdev_Ux               (site, time) float32 1.06949 0.849402 0.96229 1.30665 1.40336 1.45279 ...
    stdev_Uy               (site, time) float32 1.14111 1.36759 1.58961 1.62282 1.61014 1.77302 ...
    WVIA_H2Oppm_Avg        (site, time) float32 14931.3 14773.1 14480.8 14446.7 14042.0 13701.7 ...
    WVIA_del18O_Avg        (site, time) float32 -15.3671 -16.0518 -14.6781 -14.3197 -14.8288 ...
    del_T_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    t_hmp_Std              (site, time) float32 0.171 0.313 0.233 0.195 0.246 0.27 0.294 0.148 ...
    t_hmp_Avg              (site, time) float32 19.58 20.1 20.3 20.87 21.33 21.65 22.24 22.54 ...
    WVIA_HDOoverH2O_Avg    (site, time) float32 0.000287411 0.000286755 0.000286101 0.000286421 ...
    co2_mean               (site, time) float32 590.464 589.368 588.514 586.629 585.641 585.341 ...
    n_Tot                  (site, time) int32 18000 18000 18000 18000 18000 18000 18000 18000 ...
    LE_irga                (site, time) float32 92.2922 85.3496 86.2955 124.508 127.076 145.108 ...
    LE_wpl                 (site, time) float32 104.358 105.679 99.267 146.169 145.815 169.288 ...
    cov_Ts_Uy              (site, time) float32 0.00766579 -0.0514139 -0.0210965 -0.0798197 ...
    cov_Ts_Ux              (site, time) float32 -0.175227 -0.149267 -0.119752 -0.224446 ...
    cov_Ts_Uz              (site, time) float32 0.11346 0.207358 0.128495 0.220329 0.195153 ...
    cov_Uy_Uz              (site, time) float32 0.0526281 0.0746325 0.101163 -0.0307808 ...
    WVIA_MirrorTempUS_Avg  (site, time) float32 8.0557 8.05161 8.05301 8.04756 8.06418 8.061 ...
    KH_rho_w_Avg           (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
    WVIA_H2O_O18_16_Avg    (site, time) float32 0.0019863 0.00198492 0.00198769 0.00198841 ...
    t_hmp_mean             (site, time) float32 19.5751 20.0955 20.2964 20.869 21.3349 21.6509 ...
    h2o_Avg                (site, time) float32 11.439 11.3134 11.1473 11.0878 10.6933 10.4035 ...
    WVIA_DoverH_Avg        (site, time) float32 0.000144627 0.000144297 0.000143968 0.000144129 ...
    WVIA_PressTorr_Avg     (site, time) float32 39.5321 39.5507 39.5548 39.556 39.6093 39.6107 ...
    WVIA_O18overO16_Std    (site, time) float32 3.22977e-06 1.78816e-06 1.72746e-06 1.72873e-06 ...
    detector_f_Tot         (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    h2o_hmp_mean           (site, time) float32 10.832 10.7223 10.5556 10.4884 10.1089 9.80297 ...
    kh_mV_Avg              (site, time) float32 -2.57945 -2.48721 -2.49531 -2.33123 -2.38383 ...
    WVIA_H2O_O18_16_Std    (site, time) float32 3.24921e-06 1.79927e-06 1.73822e-06 1.73927e-06 ...
    Uy_Std                 (site, time) float32 1.141 1.368 1.59 1.623 1.61 1.773 1.366 1.516 ...
    Ux_Avg                 (site, time) float32 3.31151 2.82275 3.60617 4.31085 4.57026 4.72029 ...
    cov_Ux_Uy              (site, time) float32 0.00495589 -0.0266461 0.109639 0.317565 0.461053 ...
    Uy_Avg                 (site, time) float32 0.802545 1.16517 1.79512 2.48395 1.66233 1.85764 ...
    cov_Ux_Uz              (site, time) float32 -0.248603 -0.121524 -0.137646 -0.288274 ...
    Fc_wpl                 (site, time) float32 -0.266433 -0.454724 -0.253893 -0.344292 ...
    stdev_Ts               (site, time) float32 0.388251 0.559185 0.464061 0.583613 0.647009 ...
    WVIA_DoverH_Std        (site, time) float32 3.59551e-07 4.37687e-07 3.8039e-07 3.83365e-07 ...
    Ts_mean                (site, time) float32 21.3199 21.749 22.0668 22.6476 23.1166 23.3476 ...
    tau                    (site, time) float32 0.252309 0.141346 0.169169 0.286456 0.323494 ...
    rh_hmp_Avg             (site, time) float32 64.35 61.79 60.11 57.77 54.2 51.6 48.3 47.04 ...
    u_star                 (site, time) float32 0.504096 0.377639 0.413307 0.538436 0.572683 ...
    wnd_spd                (site, time) float32 3.60859 3.32093 4.29502 5.19459 5.10947 5.32119 ...
    cov_co2_Uy             (site, time) float32 -0.0842502 0.447055 0.374189 0.317796 1.01696 ...
    cov_co2_Ux             (site, time) float32 0.732 0.594694 0.540849 1.00302 1.55531 1.7098 ...
    cov_co2_Uz             (site, time) float32 -0.521585 -0.897115 -0.536332 -0.819914 -0.71077 ...
    RECORD                 (site, time) int32 29214 29215 29216 29217 29218 29219 29220 29221 ...
    sig_lck_f_Tot          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    wnd_dir_compass        (site, time) float32 86.377 77.5702 73.5362 70.049 80.0123 78.5182 ...
    csat_warnings          (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    amp_l_f_Tot            (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    WVIA_delD_Std          (site, time) float32 2.30835 2.8101 2.44208 2.46221 2.59453 2.78947 ...
    rh_hmp_Std             (site, time) float32 1.003 1.717 0.909 0.946 1.939 0.869 1.289 1.043 ...
    stdev_co2              (site, time) float32 1.81483 2.47358 1.95025 2.27201 2.39509 2.26651 ...
    WVIA_MirrorTempUS_Std  (site, time) float32 0.0160431 0.0151895 0.0153116 0.0148065 ...
    panel_temp_Std         (site, time) float32 0.228 0.287 0.305 0.121 0.048 0.323 0.476 0.453 ...
    pll_f_Tot              (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    sync_f_Tot             (site, time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    h2o_wpl_LE             (site, time) float32 1.64005 1.50387 1.49787 2.15217 2.11998 2.35022 ...
    rslt_wnd_spd           (site, time) float32 3.40736 3.05377 4.02826 4.97528 4.8632 5.07267 ...
    wnd_dir_csat3          (site, time) float32 13.6229 22.4298 26.4637 29.951 19.9877 21.4817 ...
    Fc_irga                (site, time) float32 -0.521585 -0.897115 -0.536332 -0.819914 -0.71077 ...
    ln_kh_Avg              (site, time) float32 nan nan nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    featureType: timeSeries
    datafile: flux
    logger: CR3000_SN4709
    creator_name: Kelly Caylor
    license: MIT License
    acknowledgement: Funded by NSF and Princeton University
    creator_email: kcaylor@princeton.edu
    format: TOA5
    local_timezone: Africa/Nairobi
    Conventions: CF-1.6
    source: Flux tower sensor data CR3000_SN4709_flux.dat, MainTowerCR3000_V11.cr3, 39879
    program: MainTowerCR3000_V11.cr3
    keywords: eddy covariance, isotope hydrology, land surface flux
    title: Flux Tower Data from MPALA
    station_name: MPALA Tower
    summary: This raw data comes from the MPALA Flux Tower, which is maintained by the Ecohydrology Lab at Mpala Research Centre in Laikipia, Kenya. It is part of a long-term monitoring project that was originally funded by NSF and now runs with support from Princeton. Its focus is on using stable isotopes to better understand water balance in drylands, particularly transpiration and evaporation fluxes.
    institution: Princeton University
    naming_authority: caylor.princeton.edu