"Fruit" example (from Hybrid Sankey diagrams paper)

This notebook gives a fairly complicated example of building a Sankey diagram from the sample "fruit" database used in the paper Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use.

For more explanation of the steps and concepts, see the tutorials.



In [ ]:

    
from floweaver import *

Load the dataset:



In [ ]:

    
dataset = Dataset.from_csv('fruit_flows.csv', 'fruit_processes.csv')

This made-up dataset describes flows from farms to consumers:



In [ ]:

    
dataset._flows.head()

Additional information is available in the process dimension table:



In [ ]:

    
dataset._dim_process.head()

We'll also define some partitions that will be useful:



In [ ]:

    
farm_ids = ['farm{}'.format(i) for i in range(1, 16)]

farm_partition_5 = Partition.Simple('process', [('Other farms', farm_ids[5:])] + farm_ids[:5])
partition_fruit = Partition.Simple('material', ['bananas', 'apples', 'oranges'])
partition_sector = Partition.Simple('process.sector', ['government', 'industry', 'domestic'])

Now define the Sankey diagram definition.

Process groups represent sets of processes in the underlying database. The underlying processes can be specified as a list of ids (e.g. ['inputs']) or as a Pandas query expression (e.g. 'function == "landfill"').
Waypoints allow extra control over the partitioning and placement of flows.



In [ ]:

    
nodes = {
    'inputs':     ProcessGroup(['inputs'], title='Inputs'),
    'compost':    ProcessGroup('function == "composting stock"', title='Compost'),
    'farms':      ProcessGroup('function in ["allotment", "large farm", "small farm"]', farm_partition_5),
    'eat':        ProcessGroup('function == "consumers" and location != "London"', partition_sector,
                               title='consumers by sector'),
    'landfill':   ProcessGroup('function == "landfill" and location != "London"', title='Landfill'),
    'composting': ProcessGroup('function == "composting process" and location != "London"', title='Composting'),

    'fruit':        Waypoint(partition_fruit, title='fruit type'),
    'w1':           Waypoint(direction='L', title=''),
    'w2':           Waypoint(direction='L', title=''),
    'export fruit': Waypoint(Partition.Simple('material', ['apples', 'bananas', 'oranges'])),
    'exports':      Waypoint(title='Exports'),
}

The ordering defines how the process groups and waypoints are arranged in the final diagram. It is structured as a list of vertical layers (from left to right), each containing a list of horizontal bands (from top to bottom), each containing a list of process group and waypoint ids (from top to bottom).



In [ ]:

    
ordering = [
    [[], ['inputs', 'compost'], []],
    [[], ['farms'], ['w2']],
    [['exports'], ['fruit'], []],
    [[], ['eat'], []],
    [['export fruit'], ['landfill', 'composting'], ['w1']],
]

Bundles represent flows in the underlying database:



In [ ]:

    
bundles = [
    Bundle('inputs', 'farms'),
    Bundle('compost', 'farms'),
    Bundle('farms', 'eat', waypoints=['fruit']),
    Bundle('farms', 'compost', waypoints=['w2']),
    Bundle('eat', 'landfill'),
    Bundle('eat', 'composting'),
    Bundle('composting', 'compost', waypoints=['w1', 'w2']),
    Bundle('farms', Elsewhere, waypoints=['exports', 'export fruit']),
]

Finally, the process groups, waypoints, bundles and ordering are combined into a Sankey diagram definition (SDD). When applied to the dataset, the result is a Sankey diagram!



In [ ]:

    
sdd = SankeyDefinition(nodes, bundles, ordering,
                       flow_partition=dataset.partition('material'))
weave(sdd, dataset) \
    .to_widget(width=570, height=550, margins=dict(left=70, right=90))