Import the LArray library:
In [ ]:
from larray import *
Import the test array population
:
In [ ]:
# let's start with
population = load_example_data('demography_eurostat').population
population
In [ ]:
population['Belgium', 'Female', 2017]
As long as there is no ambiguity (i.e. axes sharing one or several same label(s)), the order of indexing does not matter. So you usually do not care/have to remember about axes positions during computation. It only matters for output.
In [ ]:
# order of index doesn't matter
population['Female', 2017, 'Belgium']
Selecting a subset is done by using slices or lists of labels:
In [ ]:
population[['Belgium', 'Germany'], 2014:2016]
Slices bounds are optional: if not given, start is assumed to be the first label and stop is the last one.
In [ ]:
# select all years starting from 2015
population[2015:]
In [ ]:
# select all first years until 2015
population[:2015]
Slices can also have a step (defaults to 1), to take every Nth labels:
In [ ]:
# select all even years starting from 2014
population[2014::2]
In [ ]:
immigration = load_example_data('demography_eurostat').immigration
# the 'immigration' array has two axes (country and citizenship) which share the same labels
immigration
In [ ]:
# LArray doesn't use the position of the labels used inside the brackets
# to determine the corresponding axes. Instead LArray will try to guess the
# corresponding axis for each label whatever is its position.
# Then, if a label is shared by two or more axes, LArray will not be able
# to choose between the possible axes and will raise an error.
try:
immigration['Belgium', 'Netherlands']
except Exception as e:
print(type(e).__name__, ':', e)
In [ ]:
# the solution is simple. You need to precise the axes on which you make a selection
immigration[immigration.country['Belgium'], immigration.citizenship['Netherlands']]
When selecting, assigning or using aggregate functions, an axis can be
referred via the special variable X
:
This gives you access to axes of the array you are manipulating. The main
drawback of using X
is that you lose the autocompletion available from
many editors. It only works with non-anonymous axes for which names do not contain whitespaces or special characters.
In [ ]:
# the previous example can also be written as
immigration[X.country['Belgium'], X.citizenship['Netherlands']]
Sometimes it is more practical to use indices (positions) along the axis, instead of labels.
You need to add the character i
before the brackets: .i[indices]
.
As for selection with labels, you can use a single index, a slice or a list of indices.
Indices can be also negative (-1 represent the last element of an axis).
In [ ]:
# select the last year
population[X.time.i[-1]]
In [ ]:
# same but for the last 3 years
population[X.time.i[-3:]]
In [ ]:
# using a list of indices
population[X.time.i[0, 2, 4]]
In [ ]:
year = 2015
# with labels
population[X.time[:year]]
In [ ]:
# with indices (i.e. using the .i[indices] syntax)
index_year = population.time.index(year)
population[X.time.i[:index_year]]
You can use .i[]
selection directly on array instead of axes.
In this context, if you want to select a subset of the first and third axes for example, you must use a full slice :
for the second one.
In [ ]:
# select first country and last three years
population.i[0, :, -3:]
In [ ]:
even_years = population.time[2014::2]
population[even_years]
In [ ]:
# select even years
population[X.time % 2 == 0]
or data:
In [ ]:
# select population for the year 2017
population_2017 = population[2017]
# select all data with a value greater than 30 million
population_2017[population_2017 > 30e6]
Arrays can also be used to create boolean filters:
In [ ]:
start_year = Array([2015, 2016, 2017], axes=population.country)
start_year
In [ ]:
population[X.time >= start_year]
In [ ]:
for year in population.time:
print(year)
In [ ]:
population[2017] = 0
population
Now, let's store a subset in a new variable and modify it:
In [ ]:
# store the data associated with the year 2016 in a new variable
population_2016 = population[2016]
population_2016
In [ ]:
# now, we modify the new variable
population_2016['Belgium'] = 0
# and we can see that the original array has been also modified
population
One very important gotcha though...
Remember:
The same warning apply for entire arrays:
In [ ]:
# reload the 'population' array
population = load_example_data('demography_eurostat').population
# create a second 'population2' variable
population2 = population
population2
In [ ]:
# set all data corresponding to the year 2017 to 0
population2[2017] = 0
population2
In [ ]:
# and now take a look of what happened to the original array 'population'
# after modifying the 'population2' array
population
In [ ]:
# reload the 'population' array
population = load_example_data('demography_eurostat').population
# copy the 'population' array and store the copy in a new variable
population2 = population.copy()
# modify the copy
population2[2017] = 0
population2
In [ ]:
# the data from the original array have not been modified
population
In [ ]:
# select population for the year 2015
population_2015 = population[2015]
# propagate population for the year 2015 to all next years
population[2016:] = population_2015
population
In [ ]:
# replace 'Male' and 'Female' labels by 'M' and 'F'
population_2015 = population_2015.set_labels('gender', 'M,F')
population_2015
In [ ]:
# now let's try to repeat the assignement operation above with the new labels.
# An error is raised because of incompatible axes
try:
population[2016:] = population_2015
except Exception as e:
print(type(e).__name__, ':', e)