Import the LArray library:
In [ ]:
from larray import *
Load the population array from the demography_eurostat dataset:
In [ ]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')
# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time
# extract the 'population' array
population = demography_eurostat.population
# show the 'population' array
population
One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually
In [ ]:
# 'true' division
population_in_millions = population / 1_000_000
population_in_millions
In [ ]:
# 'floor' division
population_in_millions = population // 1_000_000
population_in_millions
In [ ]:
# % means modulo (aka remainder of division)
population % 1_000_000
In [ ]:
# ** means raising to the power
print(ndtest(4))
ndtest(4) ** 3
More interestingly, binary operators as above also works between two arrays.
Let us imagine a rate of population growth which is constant over time but different by gender and country:
In [ ]:
growth_rate = Array(data=[[1.011, 1.010], [1.013, 1.011], [1.010, 1.009]], axes=[country, gender])
growth_rate
In [ ]:
# we store the population of the year 2017 in a new variable
population_2017 = population[2017]
population_2017
In [ ]:
# perform an arithmetic operation between two arrays
population_2018 = population_2017 * growth_rate
population_2018
In [ ]:
# force the resulting matrix to be an integer matrix
population_2018 = (population_2017 * growth_rate).astype(int)
population_2018
In [ ]:
# let's change the order of axes of the 'constant_growth_rate' array
transposed_growth_rate = growth_rate.transpose()
# look at the order of the new 'transposed_growth_rate' array:
# 'gender' is the first axis while 'country' is the second
transposed_growth_rate
In [ ]:
# look at the order of the 'population_2017' array:
# 'country' is the first axis while 'gender' is the second
population_2017
In [ ]:
# LArray doesn't care of axes order when performing
# arithmetic operations between arrays
population_2018 = population_2017 * transposed_growth_rate
population_2018
In [ ]:
# show 'population_2017'
population_2017
In [ ]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are in a different order
# than in the 'population_2017' array
reordered_growth_rate = growth_rate.reindex('country', ['Germany', 'Belgium', 'France'])
reordered_growth_rate
In [ ]:
# when doing arithmetic operations,
# the order of labels counts
try:
population_2018 = population_2017 * reordered_growth_rate
except Exception as e:
print(type(e).__name__, e)
In [ ]:
# let us imagine that the 'country' axis of
# the 'constant_growth_rate' array has an extra
# label 'Netherlands' compared to the same axis of
# the 'population_2017' array
growth_rate_netherlands = Array([1.012, 1.], population.gender)
growth_rate_extra_country = growth_rate.append('country', growth_rate_netherlands, label='Netherlands')
growth_rate_extra_country
In [ ]:
# when doing arithmetic operations,
# no extra or missing labels are permitted
try:
population_2018 = population_2017 * growth_rate_extra_country
except Exception as e:
print(type(e).__name__, e)
In [ ]:
# let us imagine that the labels of the 'country' axis
# of the 'constant_growth_rate' array are the
# country codes instead of the country full names
growth_rate_country_codes = growth_rate.set_labels('country', ['BE', 'FR', 'DE'])
growth_rate_country_codes
In [ ]:
# use the .ignore_labels() method on axis 'country'
# to avoid the incompatible axes error (risky)
population_2018 = population_2017 * growth_rate_country_codes.ignore_labels('country')
population_2018
The condition that axes must be compatible only applies on common axes.
Making arithmetic operations between two arrays having the same axes is intuitive.
However, arithmetic operations between two arrays can be performed even if the second array has extra and/or missing axes compared to the first one. Such mechanism is called broadcasting. It allows to make a lot of arithmetic operations without using any loop. This is a great advantage since using loops in Python can be highly time consuming (especially nested loops) and should be avoided as much as possible.
To understand how broadcasting works, let us start with a simple example. We assume we have the population of both men and women cumulated for each country:
In [ ]:
population_by_country = population_2017['Male'] + population_2017['Female']
population_by_country
We also assume we have the proportion of each gender in the population and that proportion is supposed to be the same for all countries:
In [ ]:
gender_proportion = Array([0.49, 0.51], gender)
gender_proportion
Using the two 1D arrays above, we can naively compute the population by country and gender as follow:
In [ ]:
# define a new variable with both 'country' and 'gender' axes to store the result
population_by_country_and_gender = zeros([country, gender], dtype=int)
# loop over the 'country' and 'gender' axes
for c in country:
for g in gender:
population_by_country_and_gender[c, g] = population_by_country[c] * gender_proportion[g]
# display the result
population_by_country_and_gender
Relying on the broadcasting mechanism, the calculation above becomes:
In [ ]:
# the outer product is done automatically.
# No need to use any loop -> saves a lot of computation time
population_by_country_and_gender = population_by_country * gender_proportion
# display the result
population_by_country_and_gender.astype(int)
In the calculation above, LArray automatically creates a resulting array with axes given by the union of the axes of the two arrays involved in the arithmetic operation.
Let us do the same calculation but we add a common time axis:
In [ ]:
population_by_country_and_year = population['Male'] + population['Female']
population_by_country_and_year
In [ ]:
gender_proportion_by_year = Array([[0.49, 0.485, 0.495, 0.492, 0.498],
[0.51, 0.515, 0.505, 0.508, 0.502]], [gender, time])
gender_proportion_by_year
Without the broadcasting mechanism, the computation of the population by country, gender and year would have been:
In [ ]:
# define a new variable to store the result.
# Its axes is the union of the axes of the two arrays
# involved in the arithmetic operation
population_by_country_gender_year = zeros([country, gender, time], dtype=int)
# loop over axes which are not present in both arrays
# involved in the arithmetic operation
for c in country:
for g in gender:
# all subsets below have the same 'time' axis
population_by_country_gender_year[c, g] = population_by_country_and_year[c] * gender_proportion_by_year[g]
population_by_country_gender_year
Once again, the above calculation can be simplified as:
In [ ]:
# No need to use any loop -> saves a lot of computation time
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year
# display the result
population_by_country_gender_year.astype(int)
For example, imagine that the name of the time axis is time for the first array but period for the second:
In [ ]:
gender_proportion_by_year = gender_proportion_by_year.rename('time', 'period')
gender_proportion_by_year
In [ ]:
population_by_country_and_year
In [ ]:
# the two arrays below have a "time" axis with two different names: 'time' and 'period'.
# LArray will treat the "time" axis of the two arrays as two different "time" axes
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year
# as a consequence, the result of the multiplication of the two arrays is not what we expected
population_by_country_gender_year.astype(int)
In [ ]:
# test which values are greater than 10 millions
population > 10e6
Comparison operations can be combined using Python bitwise operators:
| Operator | Meaning |
|---|---|
| & | and |
| | | or |
| ~ | not |
In [ ]:
# test which values are greater than 10 millions and less than 40 millions
(population > 10e6) & (population < 40e6)
In [ ]:
# test which values are less than 10 millions or greater than 40 millions
(population < 10e6) | (population > 40e6)
In [ ]:
# test which values are not less than 10 millions
~(population < 10e6)
The returned boolean array can then be used in selections and assignments:
In [ ]:
population_copy = population.copy()
# set all values greater than 40 millions to 40 millions
population_copy[population_copy > 40e6] = 40e6
population_copy
Boolean operations can be made between arrays:
In [ ]:
# test where the two arrays have the same values
population == population_copy
To test if all values between are equals, use the equals method:
In [ ]:
population.equals(population_copy)