Working With Sessions

Import the LArray library:


In [ ]:
from larray import *

Before To Continue

If you are not yet comfortable with creating, saving and loading sessions, please read first the Creating Sessions section of the tutorial before going further.

Loading and Dumping Sessions

One of the main advantages of grouping arrays, axes and groups in session objects is that you can load and save all of them in one shot. Like arrays, it is possible to associate metadata to a session. These can be saved and loaded in all file formats.

Loading Sessions (CSV, Excel, HDF5)

To load the items of a session, you have two options:

1) Instantiate a new session and pass the path to the Excel/HDF5 file or to the directory containing CSV files to the Session constructor:


In [ ]:
# create a new Session object and load all arrays, axes, groups and metadata 
# from all CSV files located in the passed directory
csv_dir = get_example_filepath('demography_eurostat')
demography_session = Session(csv_dir)

# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed Excel file
filepath_excel = get_example_filepath('demography_eurostat.xlsx')
demography_session = Session(filepath_excel)

# create a new Session object and load all arrays, axes, groups and metadata
# stored in the passed HDF5 file
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)

print(demography_session.summary())

2) Call the load method on an existing session and pass the path to the Excel/HDF5 file or to the directory containing CSV files as first argument:


In [ ]:
# create a session containing 3 axes, 2 groups and one array 'population'
filepath = get_example_filepath('population_only.xlsx')
demography_session = Session(filepath)

print(demography_session.summary())

In [ ]:
# call the load method on the previous session and add the 'births' and 'deaths' arrays to it
filepath = get_example_filepath('births_and_deaths.xlsx')
demography_session.load(filepath)

print(demography_session.summary())

The load method offers some options:

1) Using the names argument, you can specify which items to load:


In [ ]:
births_and_deaths_session = Session()

# use the names argument to only load births and deaths arrays
births_and_deaths_session.load(filepath_hdf, names=['births', 'deaths'])

print(births_and_deaths_session.summary())

2) Setting the display argument to True, the load method will print a message each time a new item is loaded:


In [ ]:
demography_session = Session()

# with display=True, the load method will print a message
# each time a new item is loaded
demography_session.load(filepath_hdf, display=True)

Dumping Sessions (CSV, Excel, HDF5)

To save a session, you need to call the save method. The first argument is the path to a Excel/HDF5 file or to a directory if items are saved to CSV files:


In [ ]:
# save items of a session in CSV files.
# Here, the save method will create a 'demography' directory in which CSV files will be written 
demography_session.save('demography')

# save the session to an HDF5 file
demography_session.save('demography.h5')

# save the session to an Excel file
demography_session.save('demography.xlsx')

# load session saved in 'demography.h5' to see its content
Session('demography.h5')
Note: Concerning the CSV and Excel formats, the metadata is saved in one Excel sheet (CSV file) named `__metadata__(.csv)`. This sheet (CSV file) name cannot be changed.

The save method has several arguments:

1) Using the names argument, you can specify which items to save:


In [ ]:
# use the names argument to only save births and deaths arrays
demography_session.save('demography.h5', names=['births', 'deaths'])

# load session saved in 'demography.h5' to see its content
Session('demography.h5')

2) By default, dumping a session to an Excel or HDF5 file will overwrite it. By setting the overwrite argument to False, you can choose to update the existing Excel or HDF5 file:


In [ ]:
population = read_csv('./demography/population.csv')
population_session = Session([('population', population)])

# by setting overwrite to False, the destination file is updated instead of overwritten.
# The items already stored in the file but not present in the session are left intact. 
# On the contrary, the items that exist in both the file and the session are completely overwritten.
population_session.save('demography.h5', overwrite=False)

# load session saved in 'demography.h5' to see its content
Session('demography.h5')

3) Setting the display argument to True, the save method will print a message each time an item is dumped:


In [ ]:
# with display=True, the save method will print a message
# each time an item is dumped
demography_session.save('demography.h5', display=True)

Exploring Content

To get the list of items names of a session, use the names shortcut (be careful that the list is sorted alphabetically and does not follow the internal order!):


In [ ]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)

# print the content of the session
print(demography_session.names)

To get more information of items of a session, the summary will provide not only the names of items but also the list of labels in the case of axes or groups and the list of axes, the shape and the dtype in the case of arrays:


In [ ]:
# print the content of the session
print(demography_session.summary())

Selecting And Filtering Items

Session objects work like ordinary dict Python objects. To select an item, use the usual syntax <session_var>['<item_name>']:


In [ ]:
demography_session['population']

A simpler way consists in the use the syntax <session_var>.<item_name>:


In [ ]:
demography_session.population
**Warning:** The syntax ``session_var.item_name`` will work as long as you don't use any special character like ``, ; :`` in the item's name.

To return a new session with selected items, use the syntax <session_var>[list, of, item, names]:


In [ ]:
demography_session_new = demography_session['population', 'births', 'deaths']

demography_session_new.names

The filter method allows you to select all items of the same kind (i.e. all axes, or groups or arrays) or all items with names satisfying a given pattern:


In [ ]:
# select only arrays of a session
demography_session.filter(kind=Array)

In [ ]:
# selection all items with a name starting with a letter between a and k
demography_session.filter(pattern='[a-k]*')

Iterating over Items

Like the built-in Python dict objects, Session objects provide methods to iterate over items:


In [ ]:
# iterate over item names
for key in demography_session.keys():
    print(key)

In [ ]:
# iterate over items
for value in demography_session.values():
    if isinstance(value, Array):
        print(value.info)
    else:
        print(repr(value))
    print()

In [ ]:
# iterate over names and items
for key, value in demography_session.items():
    if isinstance(value, Array):
        print(key, ':')
        print(value.info)
    else:
        print(key, ':', repr(value))
    print()

Arithmetic Operations On Sessions

Session objects accept binary operations with a scalar:


In [ ]:
# get population, births and deaths in millions
demography_session_div = demography_session / 1e6

demography_session_div.population

with an array (please read the documentation of the random.choice function first if you don't know it):


In [ ]:
from larray import random
random_increment = random.choice([-1, 0, 1], p=[0.3, 0.4, 0.3], axes=demography_session.population.axes) * 1000
random_increment

In [ ]:
# add some variables of a session by a common array
demography_session_rand = demography_session['population', 'births', 'deaths'] + random_increment

demography_session_rand.population

with another session:


In [ ]:
# compute the difference between each array of the two sessions
s_diff = demography_session - demography_session_rand

s_diff.births

Applying Functions On All Arrays

In addition to the classical arithmetic operations, the apply method can be used to apply the same function on all arrays. This function should take a single element argument and return a single value:


In [ ]:
# add the next year to all arrays
def add_next_year(array):
    if 'time' in array.axes.names:
        last_year = array.time.i[-1] 
        return array.append('time', 0, last_year + 1)
    else:
        return array

demography_session_with_next_year = demography_session.apply(add_next_year)

print('population array before calling apply:')
print(demography_session.population)
print()
print('population array after calling apply:')
print(demography_session_with_next_year.population)

It is possible to pass a function with additional arguments:


In [ ]:
# add the next year to all arrays.
# Use the 'copy_values_from_last_year flag' to indicate 
# whether or not to copy values from the last year
def add_next_year(array, copy_values_from_last_year):
    if 'time' in array.axes.names:
        last_year = array.time.i[-1]
        value = array[last_year] if copy_values_from_last_year else 0
        return array.append('time', value, last_year + 1)
    else:
        return array

demography_session_with_next_year = demography_session.apply(add_next_year, True)

print('population array before calling apply:')
print(demography_session.population)
print()
print('population array after calling apply:')
print(demography_session_with_next_year.population)

It is also possible to apply a function on non-Array objects of a session. Please refer the documentation of the apply method.

Comparing Sessions

Being able to compare two sessions may be useful when you want to compare two different models expected to give the same results or when you have updated your model and want to see what are the consequences of the recent changes.

Session objects provide the two methods to compare two sessions: equals and element_equals:

  • The equals method will return True if all items from both sessions are identical, False otherwise.
  • The element_equals method will compare items of two sessions one by one and return an array of boolean values.

In [ ]:
# load a session representing the results of a demographic model
filepath_hdf = get_example_filepath('demography_eurostat.h5')
demography_session = Session(filepath_hdf)

# create a copy of the original session
demography_session_copy = demography_session.copy()

In [ ]:
# 'element_equals' compare arrays one by one
demography_session.element_equals(demography_session_copy)

In [ ]:
# 'equals' returns True if all items of the two sessions have exactly the same items
demography_session.equals(demography_session_copy)

In [ ]:
# slightly modify the 'population' array for some labels combination
demography_session_copy.population += random_increment

In [ ]:
# the 'population' array is different between the two sessions
demography_session.element_equals(demography_session_copy)

In [ ]:
# 'equals' returns False if at least one item of the two sessions are different in values or axes
demography_session.equals(demography_session_copy)

In [ ]:
# reset the 'copy' session as a copy of the original session
demography_session_copy = demography_session.copy()

# add an array to the 'copy' session
demography_session_copy.gender_ratio = demography_session_copy.population.ratio('gender')

In [ ]:
# the 'gender_ratio' array is not present in the original session
demography_session.element_equals(demography_session_copy)

In [ ]:
# 'equals' returns False if at least one item is not present in the two sessions
demography_session.equals(demography_session_copy)

The == operator return a new session with boolean arrays with elements compared element-wise:


In [ ]:
# reset the 'copy' session as a copy of the original session
demography_session_copy = demography_session.copy()

# slightly modify the 'population' array for some labels combination
demography_session_copy.population += random_increment

In [ ]:
s_check_same_values = demography_session == demography_session_copy

s_check_same_values.population

This also works for axes and groups:


In [ ]:
s_check_same_values.time

The != operator does the opposite of == operator:


In [ ]:
s_check_different_values = demography_session != demography_session_copy

s_check_different_values.population

A more visual way is to use the compare function which will open the Editor.

compare(demography_session, demography_session_alternative, names=['baseline', 'lower_birth_rate'])

Session API

Please go to the Session section of the API Reference to get the list of all methods of Session objects.