This notebook gives a fairly complicated example of building a Sankey diagram from the sample "fruit" database used in the paper Hybrid Sankey diagrams: Visual analysis of multidimensional data for understanding resource use.
For more explanation of the steps and concepts, see the tutorials.
In [ ]:
from floweaver import *
Load the dataset:
In [ ]:
dataset = Dataset.from_csv('fruit_flows.csv', 'fruit_processes.csv')
This made-up dataset describes flows from farms to consumers:
In [ ]:
dataset._flows.head()
Additional information is available in the process dimension table:
In [ ]:
dataset._dim_process.head()
We'll also define some partitions that will be useful:
In [ ]:
farm_ids = ['farm{}'.format(i) for i in range(1, 16)]
farm_partition_5 = Partition.Simple('process', [('Other farms', farm_ids[5:])] + farm_ids[:5])
partition_fruit = Partition.Simple('material', ['bananas', 'apples', 'oranges'])
partition_sector = Partition.Simple('process.sector', ['government', 'industry', 'domestic'])
Now define the Sankey diagram definition.
['inputs']
) or as a Pandas query expression (e.g. 'function == "landfill"'
).
In [ ]:
nodes = {
'inputs': ProcessGroup(['inputs'], title='Inputs'),
'compost': ProcessGroup('function == "composting stock"', title='Compost'),
'farms': ProcessGroup('function in ["allotment", "large farm", "small farm"]', farm_partition_5),
'eat': ProcessGroup('function == "consumers" and location != "London"', partition_sector,
title='consumers by sector'),
'landfill': ProcessGroup('function == "landfill" and location != "London"', title='Landfill'),
'composting': ProcessGroup('function == "composting process" and location != "London"', title='Composting'),
'fruit': Waypoint(partition_fruit, title='fruit type'),
'w1': Waypoint(direction='L', title=''),
'w2': Waypoint(direction='L', title=''),
'export fruit': Waypoint(Partition.Simple('material', ['apples', 'bananas', 'oranges'])),
'exports': Waypoint(title='Exports'),
}
The ordering defines how the process groups and waypoints are arranged in the final diagram. It is structured as a list of vertical layers (from left to right), each containing a list of horizontal bands (from top to bottom), each containing a list of process group and waypoint ids (from top to bottom).
In [ ]:
ordering = [
[[], ['inputs', 'compost'], []],
[[], ['farms'], ['w2']],
[['exports'], ['fruit'], []],
[[], ['eat'], []],
[['export fruit'], ['landfill', 'composting'], ['w1']],
]
Bundles represent flows in the underlying database:
In [ ]:
bundles = [
Bundle('inputs', 'farms'),
Bundle('compost', 'farms'),
Bundle('farms', 'eat', waypoints=['fruit']),
Bundle('farms', 'compost', waypoints=['w2']),
Bundle('eat', 'landfill'),
Bundle('eat', 'composting'),
Bundle('composting', 'compost', waypoints=['w1', 'w2']),
Bundle('farms', Elsewhere, waypoints=['exports', 'export fruit']),
]
Finally, the process groups, waypoints, bundles and ordering are combined into a Sankey diagram definition (SDD). When applied to the dataset, the result is a Sankey diagram!
In [ ]:
sdd = SankeyDefinition(nodes, bundles, ordering,
flow_partition=dataset.partition('material'))
weave(sdd, dataset) \
.to_widget(width=570, height=550, margins=dict(left=70, right=90))