xframe is a dataframe for C++, based on xtensor and xtl


In [ ]:
#include <string>
#include <iostream>

#include "xtensor/xrandom.hpp"
#include "xtensor/xmath.hpp"

#include "xframe/xio.hpp"
#include "xframe/xvariable.hpp"
#include "xframe/xvariable_view.hpp"
#include "xframe/xvariable_masked_view.hpp"
#include "xframe/xreindex_view.hpp"

Let's first define some useful type aliases so we can reduce the amount of typing


In [ ]:
using coordinate_type = xf::xcoordinate<xf::fstring>;
using variable_type = xf::xvariable<double, coordinate_type>;
using data_type = variable_type::data_type;

 1. Variables

 1.1. Creating variable

In the following we define a 2D variable called dry_temperature. A variable in xframe is the composition of a tensor data and a coordinate system. It is the equivalent of DataArray from xarray. The tensor data can be any valid xtensor expression whose value_type is xoptional. Common types are xarray_optional, xtensor_optional and xoptional_assembly, which allows to create an optional expression from existing regular tensor expressions.


In [ ]:
data_type dry_temperature_data = xt::eval(xt::random::rand({6, 3}, 15., 25.));
dry_temperature_data(0, 0).has_value() = false;
dry_temperature_data(2, 1).has_value() = false;

In [ ]:
dry_temperature_data

Once the data is defined, we can define the coordinate system. A coordinate system is a mapping of dimension names with label axes. Although it is possible to create an axe from a vector of labels, then the coordinate system from a map containing axes and dimension names, and finally the variable from this coordinate system and the previously created data, xframe makes use of the initialize-list syntax so everything can be created in place with a very expressive syntax:


In [ ]:
auto time_axis = xf::axis({"2018-01-01", "2018-01-02", "2018-01-03", "2018-01-04", "2018-01-05", "2018-01-06"});

In [ ]:
auto dry_temperature = variable_type(
    dry_temperature_data,
    {
        {"date", time_axis},
        {"city", xf::axis({"London", "Paris", "Brussels"})}
    }
);

In [ ]:
dry_temperature

1.2. Indexing and selecting data

Like xarray, xframe supports four different kinds of indexing as described below:

Dimension lookup: Positional - Index lookup: By integer


In [ ]:
dry_temperature(3, 0)

Dimension lookup: Positional - Index lookup: By label


In [ ]:
dry_temperature.locate("2018-01-04", "London")

Dimension lookup: By name - Index lookup: By integer


In [ ]:
dry_temperature.iselect({{"date", 3}, {"city", 0}})

Dimension lookup: By name - Index lookup: By label


In [ ]:
dry_temperature.select({{"date", "2018-01-04"}, {"city", "London"}})

Contrary to xarray, these methods return a single value, they do not allow to create views of the variable by selecting many data points. This feature is possible with xframe though, by using the free function counterparts of the methods described above, and will be covered in a next section.

1.3. Maths and broadcasting

Variable support all the common mathematics operations and functions; like xtensor, these operations are lazy and return expressions. xframe supports operations on variables with different dimensions and labels thanks to broadcasting. This one is performed according the dimension names rather than the dimension positions as shown below.

Let's first define a variable containing the relative humidity for cities:


In [ ]:
data_type relative_humidity_data = xt::eval(xt::random::rand({3}, 50.0, 70.0));

auto relative_humidity = variable_type(
    relative_humidity_data,
    {
        {"city", xf::axis({"Paris", "London", "Brussels"})}
    }
);

relative_humidity

We will use it and the previously defined dry_temperature variable (that we show again below) to compute the water_pour_pressure


In [ ]:
dry_temperature

In [ ]:
auto water_vapour_pressure = 0.01 * relative_humidity * 6.1 * xt::exp((17.27 * dry_temperature) / (237.7 + dry_temperature));

In [ ]:
water_vapour_pressure

The relative humidity has been broadcasted so its values are repeated for each date. When the labels of variables involved in an operation are not the same, the result contains the intersection of the label sets:


In [ ]:
data_type coeff_data = xt::eval(xt::random::rand({6, 3}, 0.7, 0.9));
dry_temperature_data(0, 0).has_value() = false;
dry_temperature_data(2, 1).has_value() = false;

auto coeff = variable_type(
    coeff_data,
    {
        {"date", time_axis},
        {"city", xf::axis({"London", "New York", "Brussels"})}
    }
);
coeff

In [ ]:
auto res = coeff * dry_temperature;
res

 1.4. Higher dimension variables

The following code creates and displays a three-dimensional variable.


In [ ]:
data_type pressure_data = {{{ 1.,  2., 3. },
                            { 4.,  5., 6. },
                            { 7.,  8., 9. }},
                           {{ 1.3, 1.5, 1.},
                            { 2., 2.3, 2.4},
                            { 3.1, 3.8, 3.}},
                           {{ 8.5, 8.2, 8.6},
                            { 7.5, 8.6, 9.7},
                            { 4.5, 4.4, 4.3}}};

In [ ]:
auto pressure = variable_type(
    pressure_data,
    {
        {"x", xf::axis(3)},
        {"y", xf::axis(3, 6, 1)},
        {"z", xf::axis(3)},
    }
);

In [ ]:
pressure

2. Views

2.1. Multiselection

Views can be used to select many data points in a variable. The syntax is similar to the one used for selecting a single data point, excpet that it uses free functions instead of methods of variable.


In [ ]:
dry_temperature

Dimension lookup: Positional - Index lookup: By integer


In [ ]:
auto v1 = ilocate(dry_temperature, xf::irange(0, 5, 2), xf::irange(1, 3));
v1

Dimension lookup: Positional - Index lookup: By label


In [ ]:
auto v2 = locate(dry_temperature, xf::range("2018-01-01", "2018-01-06", 2), xf::range("Paris", "Brussels"));
v2

Dimension lookup: By name - Index lookup: By integer


In [ ]:
auto v3 = iselect(dry_temperature, {{"city", xf::irange(1, 3)}, {"date", xf::irange(0, 5, 2)}});
v3

Dimension lookup: By name - Index lookup: By label


In [ ]:
auto v4 = select(dry_temperature, 
                 {{"city", xf::range("Paris", "Brussels")},
                  {"date", xf::range("2018-01-01", "2018-01-06", 2)}});
v4

2.2. Keeping and dropping labels

The previous selection made use of ranges (label range from xframe and index range from xtensor), however it is also possible to select data points by explicitly specifying a list of labels to keep or to drop.

Dimension lookup: Positional - Index lookup: By integer


In [ ]:
auto v5 = ilocate(dry_temperature, xf::ikeep(0, 2, 4), xf::idrop(0));
v5

Dimension lookup: By name - Index lookup: By integer


In [ ]:
auto v6 = locate(dry_temperature, xf::keep("2018-01-01", "2018-01-03", "2018-01-05"), xf::drop("London"));
v6

Dimension lookup: By name - Index lookup: By integer


In [ ]:
auto v7 = iselect(dry_temperature, {{"city", xf::idrop(0)}, {"date", xf::ikeep(0, 2, 4)}});
v7

Dimension lookup: By name - Index lookup: By label


In [ ]:
auto v8 = select(dry_temperature,
                 {{"city", xf::drop("London")},
                  {"date", xf::keep("2018-01-01", "2018-01-03", "2018-01-05")}});
v8

 2.3 Masking views

Masking views allow to select data points based on conditions expressed on labels. These conditons can be complicated boolean expressions.


In [ ]:
pressure

In [ ]:
auto masked_pressure = xf::where(
    pressure,
    not_equal(pressure.axis<int>("x"), 2) && pressure.axis<int>("z") < 2
);

In [ ]:
masked_pressure

When assigning to a masking view, masked values are not changed. Like other views, a masking view is a proxy on its junderlying expression, no copy is made, so changing a unmasked value actually changes the corresponding value in the underlying expression.


In [ ]:
masked_pressure = 1.;
masked_pressure

In [ ]:
pressure

2.4 Reindexing views

Reindexing views give variables new set of coordinates to corresponding dimensions. Like other views, no copy is involved. Asking for values corresponding to new labels not found in the original set of coordinates returns missing values. In the next example, we reindex the city dimension.


In [ ]:
dry_temperature

In [ ]:
auto temp = reindex(dry_temperature, {{"city", xf::axis({"London", "New York", "Brussels"})}});
temp

The reindex_like is a shortcut that allows to reindex a variable given the set of coordinates of another variable


In [ ]:
auto dry_temp2 = variable_type(
    dry_temperature_data,
    {
        {"date", time_axis},
        {"city", xf::axis({"London", "New York", "Brussels"})}
    }
);
auto temp2 = reindex_like(dry_temperature, dry_temp2);
temp2

In [ ]: