Presenting LArray objects (Axis, Groups, Array, Session)

Import the LArray library:


In [ ]:
from larray import *

Axis

An Axis represents a dimension of an Array object. It consists of a name and a list of labels.

They are several ways to create an axis:


In [ ]:
# create a wildcard axis
age = Axis(3, 'age')
# labels given as a list
time = Axis([2007, 2008, 2009], 'time')
# create an axis using one string
gender = Axis('gender=M,F')
# labels generated using a special syntax
other = Axis('other=A01..C03')

age, gender, time, other

See the Axis section of the API Reference to explore all methods of Axis objects.

Groups

A Group represents a selection of labels from an Axis. It can optionally have a name (using operator >>). Groups can be used when selecting a subset of an array and in aggregations.

Group objects are created as follow:


In [ ]:
# define an Axis object 'age'
age = Axis('age=0..100')

# create an anonymous Group object 'teens'
teens = age[10:20]
# create a Group object 'pensioners' with a name 
pensioners = age[67:] >> 'pensioners'

teens

It is possible to set a name or to rename a group after its declaration:


In [ ]:
# method 'named' returns a new group with the given name
teens = teens.named('teens')

# operator >> is just a shortcut for the call of the method named
teens = teens >> 'teens'

teens

See the Group section of the API Reference to explore all methods of Group objects.

Array

An Array object represents a multidimensional array with labeled axes.

Create an array from scratch

To create an array from scratch, you need to provide the data and a list of axes. Optionally, metadata (title, description, creation date, authors, ...) can be associated to the array:


In [ ]:
import numpy as np

# list of the axes
axes = [age, gender, time, other]
# data (the shape of data array must match axes lengths)
data = np.random.randint(100, size=[len(axis) for axis in axes])
# metadata
meta = [('title', 'random array')]

arr = Array(data, axes, meta=meta)
arr

Metadata can be added to an array at any time using:


In [ ]:
arr.meta.description = 'array containing random values between 0 and 100'

arr.meta
**Warning:**
  • Currently, only the HDF (.h5) file format supports saving and loading array metadata.
  • Metadata is not kept when actions or methods are applied on an array except for operations modifying the object in-place, such as `population[age < 10] = 0`, and when the method `copy()` is called. Do not add metadata to an array if you know you will apply actions or methods on it before dumping it.

Array creation functions

Arrays can also be generated in an easier way through creation functions:

  • ndtest : creates a test array with increasing numbers as data
  • empty : creates an array but leaves its allocated memory unchanged (i.e., it contains "garbage". Be careful !)
  • zeros: fills an array with 0
  • ones : fills an array with 1
  • full : fills an array with a given value
  • sequence : creates an array from an axis by iteratively applying a function to a given initial value.

Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:

  • as Axis objects
  • as integers defining the lengths of auto-generated wildcard axes
  • as a string : 'gender=M,F;time=2007,2008,2009' (name is optional)
  • as pairs (name, labels)

Optionally, the type of data stored by the array can be specified using argument dtype.


In [ ]:
# start defines the starting value of data
ndtest(['age=0..2', 'gender=M,F', 'time=2007..2009'], start=-1)

In [ ]:
# start defines the starting value of data
# label_start defines the starting index of labels
ndtest((3, 3), start=-1, label_start=2)

In [ ]:
# empty generates uninitialised array with correct axes
# (much faster but use with care!).
# This not really random either, it just reuses a portion
# of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data
# will be overridden.
empty(['age=0..2', 'gender=M,F', 'time=2007..2009'])

In [ ]:
# example with anonymous axes
zeros(['0..2', 'M,F', '2007..2009'])

In [ ]:
# dtype=int forces to store int data instead of default float
ones(['age=0..2', 'gender=M,F', 'time=2007..2009'], dtype=int)

In [ ]:
full(['age=0..2', 'gender=M,F', 'time=2007..2009'], 1.23)

All the above functions exist in (func)_like variants which take axes from another array


In [ ]:
ones_like(arr)

Create an array using the special sequence function (see link to documention of sequence in API reference for more examples):


In [ ]:
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
sequence('gender=M,F', initial=1.0, inc=0.5)

Inspecting Array objects


In [ ]:
# create a test array
arr = ndtest([age, gender, time, other])

Get array summary : metadata + dimensions + description of axes + dtype + size in memory


In [ ]:
arr.info

Get axes


In [ ]:
arr.axes

Get number of dimensions


In [ ]:
arr.ndim

Get length of each dimension


In [ ]:
arr.shape

Get total number of elements of the array


In [ ]:
arr.size

Get type of internal data (int, float, ...)


In [ ]:
arr.dtype

Get size in memory


In [ ]:
arr.memory_used

Display the array in the viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.

view(arr)

Or load it in Excel:

arr.to_excel()

Extract an axis from an array

It is possible to extract an axis belonging to an array using its name:


In [ ]:
# extract the 'time' axis belonging to the 'arr' array
time = arr.time
time

More on Array objects

To know how to save and load arrays in CSV, Excel or HDF format, please refer to the Loading and Dumping Arrays section of the tutorial.

See the Array section of the API Reference to explore all methods of Array objects.

Session

A Session object is a dictionary-like object used to gather several arrays, axes and groups. A session is particularly adapted to gather all input objects of a model or to gather the output arrays from different scenarios. Like with arrays, it is possible to associate metadata to sessions.

Creating Sessions

To create a session, you can first create an empty session and then populate it with arrays, axes and groups:


In [ ]:
# create an empty session
demography_session = Session()

# add axes to the session
gender = Axis("gender=Male,Female")
demography_session.gender = gender
time = Axis("time=2013..2017")
demography_session.time = time

# add arrays to the session
demography_session.population = zeros((gender, time))
demography_session.births = zeros((gender, time))
demography_session.deaths = zeros((gender, time))

# add metadata after creation
demography_session.meta.title = 'Demographic Model of Belgium'
demography_session.meta.description = 'Models the demography of Belgium'

# print content of the session
print(demography_session.summary())

or you can create and populate a session in one step:


In [ ]:
gender = Axis("gender=Male,Female")
time = Axis("time=2013..2017")

# create and populate a new session in one step
# Python <= 3.5
demography_session = Session([('gender', gender), ('time', time), ('population', zeros((gender, time))), 
                    ('births', zeros((gender, time))), ('deaths', zeros((gender, time)))], 
                     meta=[('title', 'Demographic Model of Belgium'),('description', 'Modelize the demography of Belgium')])
# Python 3.6+
demography_session = Session(gender=gender, time=time, population=zeros((gender, time)), 
                     births=zeros((gender, time)), deaths=zeros((gender, time)), 
                     meta=Metadata(title='Demographic Model of Belgium', description='Modelize the demography of Belgium'))

# print content of the session
print(demography_session.summary())
**Warning:**
  • Contrary to array metadata, saving and loading session metadata is supported for all current session file formats: Excel, CSV and HDF (.h5).
  • Metadata is not kept when actions or methods are applied on a session except for operations modifying a session in-place, such as: `s.arr1 = 0`. Do not add metadata to a session if you know you will apply actions or methods on it before dumping it.

More on Session objects

To know how to save and load sessions in CSV, Excel or HDF format, please refer to the Loading and Dumping Sessions section of the tutorial.

To see how to work with sessions, please read the Working With Sessions section of the tutorial.

Finally, see the Session section of the API Reference to explore all methods of Session objects.