Import the LArray library:
In [ ]:
from larray import *
If you are not yet comfortable with creating, saving and loading sessions, please read first the Creating Sessions section of the tutorial before going further.
In [ ]:
# create a new Session object and load all arrays, axes, groups and metadata
# from all CSV files located in the passed directory
csv_dir = get_example_filepath('demography_eurostat')
demography_session = Session(csv_dir)
# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed Excel file
filepath_excel = get_example_filepath('demography_eurostat.xlsx')
demography_session = Session(filepath_excel)
# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed HDF5 file
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)
print(demography_session.summary())
2) Call the load
method on an existing session and pass the path to the Excel/HDF5 file or to the directory containing CSV files as first argument:
In [ ]:
# create a session containing 3 axes, 2 groups and one array 'population'
filepath = get_example_filepath('population_only.xlsx')
demography_session = Session(filepath)
print(demography_session.summary())
In [ ]:
# call the load method on the previous session and add the 'births' and 'deaths' arrays to it
filepath = get_example_filepath('births_and_deaths.xlsx')
demography_session.load(filepath)
print(demography_session.summary())
The load
method offers some options:
1) Using the names
argument, you can specify which items to load:
In [ ]:
births_and_deaths_session = Session()
# use the names argument to only load births and deaths arrays
births_and_deaths_session.load(filepath_hdf, names=['births', 'deaths'])
print(births_and_deaths_session.summary())
2) Setting the display
argument to True, the load
method will print a message each time a new item is loaded:
In [ ]:
demography_session = Session()
# with display=True, the load method will print a message
# each time a new item is loaded
demography_session.load(filepath_hdf, display=True)
In [ ]:
# save items of a session in CSV files.
# Here, the save method will create a 'demography' directory in which CSV files will be written
demography_session.save('demography')
# save the session to an HDF5 file
demography_session.save('demography.h5')
# save the session to an Excel file
demography_session.save('demography.xlsx')
# load session saved in 'demography.h5' to see its content
Session('demography.h5')
The save
method has several arguments:
1) Using the names
argument, you can specify which items to save:
In [ ]:
# use the names argument to only save births and deaths arrays
demography_session.save('demography.h5', names=['births', 'deaths'])
# load session saved in 'demography.h5' to see its content
Session('demography.h5')
2) By default, dumping a session to an Excel or HDF5 file will overwrite it. By setting the overwrite
argument to False, you can choose to update the existing Excel or HDF5 file:
In [ ]:
population = read_csv('./demography/population.csv')
population_session = Session([('population', population)])
# by setting overwrite to False, the destination file is updated instead of overwritten.
# The items already stored in the file but not present in the session are left intact.
# On the contrary, the items that exist in both the file and the session are completely overwritten.
population_session.save('demography.h5', overwrite=False)
# load session saved in 'demography.h5' to see its content
Session('demography.h5')
3) Setting the display
argument to True, the save
method will print a message each time an item is dumped:
In [ ]:
# with display=True, the save method will print a message
# each time an item is dumped
demography_session.save('demography.h5', display=True)
To get the list of items names of a session, use the names shortcut (be careful that the list is sorted alphabetically and does not follow the internal order!):
In [ ]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)
# print the content of the session
print(demography_session.names)
To get more information of items of a session, the summary will provide not only the names of items but also the list of labels in the case of axes or groups and the list of axes, the shape and the dtype in the case of arrays:
In [ ]:
# print the content of the session
print(demography_session.summary())
In [ ]:
demography_session['population']
A simpler way consists in the use the syntax <session_var>.<item_name>
:
In [ ]:
demography_session.population
To return a new session with selected items, use the syntax <session_var>[list, of, item, names]
:
In [ ]:
demography_session_new = demography_session['population', 'births', 'deaths']
demography_session_new.names
The filter method allows you to select all items of the same kind (i.e. all axes, or groups or arrays) or all items with names satisfying a given pattern:
In [ ]:
# select only arrays of a session
demography_session.filter(kind=Array)
In [ ]:
# selection all items with a name starting with a letter between a and k
demography_session.filter(pattern='[a-k]*')
In [ ]:
# iterate over item names
for key in demography_session.keys():
print(key)
In [ ]:
# iterate over items
for value in demography_session.values():
if isinstance(value, Array):
print(value.info)
else:
print(repr(value))
print()
In [ ]:
# iterate over names and items
for key, value in demography_session.items():
if isinstance(value, Array):
print(key, ':')
print(value.info)
else:
print(key, ':', repr(value))
print()
In [ ]:
# get population, births and deaths in millions
demography_session_div = demography_session / 1e6
demography_session_div.population
with an array (please read the documentation of the random.choice function first if you don't know it):
In [ ]:
from larray import random
random_increment = random.choice([-1, 0, 1], p=[0.3, 0.4, 0.3], axes=demography_session.population.axes) * 1000
random_increment
In [ ]:
# add some variables of a session by a common array
demography_session_rand = demography_session['population', 'births', 'deaths'] + random_increment
demography_session_rand.population
with another session:
In [ ]:
# compute the difference between each array of the two sessions
s_diff = demography_session - demography_session_rand
s_diff.births
In addition to the classical arithmetic operations, the apply method can be used to apply the same function on all arrays. This function should take a single element argument and return a single value:
In [ ]:
# add the next year to all arrays
def add_next_year(array):
if 'time' in array.axes.names:
last_year = array.time.i[-1]
return array.append('time', 0, last_year + 1)
else:
return array
demography_session_with_next_year = demography_session.apply(add_next_year)
print('population array before calling apply:')
print(demography_session.population)
print()
print('population array after calling apply:')
print(demography_session_with_next_year.population)
It is possible to pass a function with additional arguments:
In [ ]:
# add the next year to all arrays.
# Use the 'copy_values_from_last_year flag' to indicate
# whether or not to copy values from the last year
def add_next_year(array, copy_values_from_last_year):
if 'time' in array.axes.names:
last_year = array.time.i[-1]
value = array[last_year] if copy_values_from_last_year else 0
return array.append('time', value, last_year + 1)
else:
return array
demography_session_with_next_year = demography_session.apply(add_next_year, True)
print('population array before calling apply:')
print(demography_session.population)
print()
print('population array after calling apply:')
print(demography_session_with_next_year.population)
It is also possible to apply a function on non-Array objects of a session. Please refer the documentation of the apply method.
Session objects provide the two methods to compare two sessions: equals and element_equals:
equals
method will return True if all items from both sessions are identical, False otherwise.element_equals
method will compare items of two sessions one by one and return an array of boolean values.
In [ ]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)
# create a copy of the original session
demography_session_copy = demography_session.copy()
In [ ]:
# 'element_equals' compare arrays one by one
demography_session.element_equals(demography_session_copy)
In [ ]:
# 'equals' returns True if all items of the two sessions have exactly the same items
demography_session.equals(demography_session_copy)
In [ ]:
# slightly modify the 'population' array for some labels combination
demography_session_copy.population += random_increment
In [ ]:
# the 'population' array is different between the two sessions
demography_session.element_equals(demography_session_copy)
In [ ]:
# 'equals' returns False if at least one item of the two sessions are different in values or axes
demography_session.equals(demography_session_copy)
In [ ]:
# reset the 'copy' session as a copy of the original session
demography_session_copy = demography_session.copy()
# add an array to the 'copy' session
demography_session_copy.gender_ratio = demography_session_copy.population.ratio('gender')
In [ ]:
# the 'gender_ratio' array is not present in the original session
demography_session.element_equals(demography_session_copy)
In [ ]:
# 'equals' returns False if at least one item is not present in the two sessions
demography_session.equals(demography_session_copy)
The ==
operator return a new session with boolean arrays with elements compared element-wise:
In [ ]:
# reset the 'copy' session as a copy of the original session
demography_session_copy = demography_session.copy()
# slightly modify the 'population' array for some labels combination
demography_session_copy.population += random_increment
In [ ]:
s_check_same_values = demography_session == demography_session_copy
s_check_same_values.population
This also works for axes and groups:
In [ ]:
s_check_same_values.time
The !=
operator does the opposite of ==
operator:
In [ ]:
s_check_different_values = demography_session != demography_session_copy
s_check_different_values.population
A more visual way is to use the compare function which will open the Editor
.
compare(demography_session, demography_session_alternative, names=['baseline', 'lower_birth_rate'])
Please go to the Session section of the API Reference to get the list of all methods of Session objects.