Arithmetic Operations

Import the LArray library:


In [ ]:
from larray import *

Load the population array from the demography_eurostat dataset:


In [ ]:
# load the 'demography_eurostat' dataset
demography_eurostat = load_example_data('demography_eurostat')

# extract the 'country', 'gender' and 'time' axes
country = demography_eurostat.country
gender = demography_eurostat.gender
time = demography_eurostat.time

# extract the 'population' array
population = demography_eurostat.population

# show the 'population' array
population

Basics

One can do all usual arithmetic operations on an array, it will apply the operation to all elements individually


In [ ]:
# 'true' division
population_in_millions = population / 1_000_000
population_in_millions

In [ ]:
# 'floor' division
population_in_millions = population // 1_000_000
population_in_millions
**Warning:** Python has two different division operators: - the 'true' division (/) always returns a float. - the 'floor' division (//) returns an integer result (discarding any fractional result).

In [ ]:
# % means modulo (aka remainder of division)
population % 1_000_000

In [ ]:
# ** means raising to the power
print(ndtest(4))
ndtest(4) ** 3

More interestingly, binary operators as above also works between two arrays.

Let us imagine a rate of population growth which is constant over time but different by gender and country:


In [ ]:
growth_rate = Array(data=[[1.011, 1.010], [1.013, 1.011], [1.010, 1.009]], axes=[country, gender])
growth_rate

In [ ]:
# we store the population of the year 2017 in a new variable
population_2017 = population[2017]
population_2017

In [ ]:
# perform an arithmetic operation between two arrays
population_2018 = population_2017 * growth_rate
population_2018
**Note:** Be careful when mixing different data types. You can use the method [astype](../_generated/larray.Array.astype.rst#larray.Array.astype) to change the data type of an array.

In [ ]:
# force the resulting matrix to be an integer matrix
population_2018 = (population_2017 * growth_rate).astype(int)
population_2018

Axis order does not matter much (except for output)

You can do operations between arrays having different axes order. The axis order of the result is the same as the left array


In [ ]:
# let's change the order of axes of the 'constant_growth_rate' array
transposed_growth_rate = growth_rate.transpose()

# look at the order of the new 'transposed_growth_rate' array:
# 'gender' is the first axis while 'country' is the second
transposed_growth_rate

In [ ]:
# look at the order of the 'population_2017' array:
# 'country' is the first axis while 'gender' is the second
population_2017

In [ ]:
# LArray doesn't care of axes order when performing 
# arithmetic operations between arrays
population_2018 = population_2017 * transposed_growth_rate
population_2018

Axes must be compatible

Arithmetic operations between two arrays only works when they have compatible axes (i.e. same list of labels in the same order).


In [ ]:
# show 'population_2017'
population_2017

Order of labels matters


In [ ]:
# let us imagine that the labels of the 'country' axis 
# of the 'constant_growth_rate' array are in a different order
# than in the 'population_2017' array
reordered_growth_rate = growth_rate.reindex('country', ['Germany', 'Belgium', 'France'])
reordered_growth_rate

In [ ]:
# when doing arithmetic operations, 
# the order of labels counts
try:
    population_2018 = population_2017 * reordered_growth_rate
except Exception as e:
    print(type(e).__name__, e)

No extra or missing labels are permitted


In [ ]:
# let us imagine that the 'country' axis of 
# the 'constant_growth_rate' array has an extra 
# label 'Netherlands' compared to the same axis of 
# the 'population_2017' array
growth_rate_netherlands = Array([1.012, 1.], population.gender)
growth_rate_extra_country = growth_rate.append('country', growth_rate_netherlands, label='Netherlands')
growth_rate_extra_country

In [ ]:
# when doing arithmetic operations, 
# no extra or missing labels are permitted 
try:
    population_2018 = population_2017 * growth_rate_extra_country
except Exception as e:
    print(type(e).__name__, e)

Ignoring labels (risky)

**Warning:** Operations between two arrays only works when they have compatible axes (i.e. same labels) but this behavior can be override via the [ignore_labels](../_generated/larray.Array.ignore_labels.rst#larray.Array.ignore_labels) method. In that case only the position on the axis is used and not the labels. Using this method is done at your own risk and SHOULD NEVER BEEN USED IN A MODEL. Use this method only for quick tests or rapid data exploration.

In [ ]:
# let us imagine that the labels of the 'country' axis 
# of the 'constant_growth_rate' array are the 
# country codes instead of the country full names
growth_rate_country_codes = growth_rate.set_labels('country', ['BE', 'FR', 'DE'])
growth_rate_country_codes

In [ ]:
# use the .ignore_labels() method on axis 'country'
# to avoid the incompatible axes error (risky)
population_2018 = population_2017 * growth_rate_country_codes.ignore_labels('country')
population_2018

Extra Or Missing Axes (Broadcasting)

The condition that axes must be compatible only applies on common axes. Making arithmetic operations between two arrays having the same axes is intuitive. However, arithmetic operations between two arrays can be performed even if the second array has extra and/or missing axes compared to the first one. Such mechanism is called broadcasting. It allows to make a lot of arithmetic operations without using any loop. This is a great advantage since using loops in Python can be highly time consuming (especially nested loops) and should be avoided as much as possible.

To understand how broadcasting works, let us start with a simple example. We assume we have the population of both men and women cumulated for each country:


In [ ]:
population_by_country = population_2017['Male'] + population_2017['Female']
population_by_country

We also assume we have the proportion of each gender in the population and that proportion is supposed to be the same for all countries:


In [ ]:
gender_proportion = Array([0.49, 0.51], gender)
gender_proportion

Using the two 1D arrays above, we can naively compute the population by country and gender as follow:


In [ ]:
# define a new variable with both 'country' and 'gender' axes to store the result
population_by_country_and_gender = zeros([country, gender], dtype=int)

# loop over the 'country' and 'gender' axes 
for c in country:
    for g in gender:
        population_by_country_and_gender[c, g] = population_by_country[c] * gender_proportion[g]

# display the result
population_by_country_and_gender

Relying on the broadcasting mechanism, the calculation above becomes:


In [ ]:
# the outer product is done automatically.
# No need to use any loop -> saves a lot of computation time
population_by_country_and_gender = population_by_country * gender_proportion

# display the result
population_by_country_and_gender.astype(int)

In the calculation above, LArray automatically creates a resulting array with axes given by the union of the axes of the two arrays involved in the arithmetic operation.

Let us do the same calculation but we add a common time axis:


In [ ]:
population_by_country_and_year = population['Male'] + population['Female']
population_by_country_and_year

In [ ]:
gender_proportion_by_year = Array([[0.49, 0.485, 0.495, 0.492, 0.498], 
                                   [0.51, 0.515, 0.505, 0.508, 0.502]], [gender, time])
gender_proportion_by_year

Without the broadcasting mechanism, the computation of the population by country, gender and year would have been:


In [ ]:
# define a new variable to store the result.
# Its axes is the union of the axes of the two arrays 
# involved in the arithmetic operation
population_by_country_gender_year = zeros([country, gender, time], dtype=int)

# loop over axes which are not present in both arrays
# involved in the arithmetic operation
for c in country:
    for g in gender:
        # all subsets below have the same 'time' axis
        population_by_country_gender_year[c, g] = population_by_country_and_year[c] * gender_proportion_by_year[g]
        
population_by_country_gender_year

Once again, the above calculation can be simplified as:


In [ ]:
# No need to use any loop -> saves a lot of computation time
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year

# display the result
population_by_country_gender_year.astype(int)
**Warning:** Broadcasting is a powerful mechanism but can be confusing at first. It can lead to unexpected results. In particular, if axes which are supposed to be common are not, you will get a resulting array with extra axes you didn't want.

For example, imagine that the name of the time axis is time for the first array but period for the second:


In [ ]:
gender_proportion_by_year = gender_proportion_by_year.rename('time', 'period')
gender_proportion_by_year

In [ ]:
population_by_country_and_year

In [ ]:
# the two arrays below have a "time" axis with two different names: 'time' and 'period'.
# LArray will treat the "time" axis of the two arrays as two different "time" axes
population_by_country_gender_year = population_by_country_and_year * gender_proportion_by_year

# as a consequence, the result of the multiplication of the two arrays is not what we expected
population_by_country_gender_year.astype(int)

Boolean Operations

Python comparison operators are:

Operator Meaning
== equal
!= not equal
> greater than
>= greater than or equal
< less than
<= less than or equal

Applying a comparison operator on an array returns a boolean array:


In [ ]:
# test which values are greater than 10 millions
population > 10e6

Comparison operations can be combined using Python bitwise operators:

Operator Meaning
& and
| or
~ not

In [ ]:
# test which values are greater than 10 millions and less than 40 millions
(population > 10e6) & (population < 40e6)

In [ ]:
# test which values are less than 10 millions or greater than 40 millions
(population < 10e6) | (population > 40e6)

In [ ]:
# test which values are not less than 10 millions
~(population < 10e6)

The returned boolean array can then be used in selections and assignments:


In [ ]:
population_copy = population.copy()

# set all values greater than 40 millions to 40 millions
population_copy[population_copy > 40e6] = 40e6
population_copy

Boolean operations can be made between arrays:


In [ ]:
# test where the two arrays have the same values
population == population_copy

To test if all values between are equals, use the equals method:


In [ ]:
population.equals(population_copy)