sand
's underlying graph implementation is igraph
. igraph
offers several ways to load data, but sand
provides a few convenience functions that simplify the workflow:
In [1]:
import sand
csv_to_dicts
csv_to_dicts
reads a CSV into a list of Python dictionaries. Each column in the CSV becomes a corresponding key in each dictionary.
Let's load a CSV with function dependencies in a Clojure library from lein-topology
into a list of Dictionaries:
In [2]:
edgelist_file = './data/lein-topology-57af741.csv'
edgelist_data = sand.csv_to_dicts(edgelist_file,header=['source', 'target', 'weight'])
edgelist_data[:5]
Out[2]:
In [3]:
functions = sand.from_edges(edgelist_data)
functions.summary()
Out[3]:
from_vertices_and_edges
with two lists of dictionariesA richer network model includes attributes on the vertex and edge collections, including unique identifiers for each vertex.
We can use Jupyter's cell magic to generate some sample data. Here we'll represent a network of students reviewing one another's work. Students (vertices) will be in people.csv
and reviews (edges) will be in reviews.csv
:
In [4]:
people_file = './data/people.csv'
In [5]:
%%writefile $people_file
uuid,name,cohort
6aacd73c-0be5-412d-95a3-ca54149c9952,Mark Taylor,Day 1 - Period 6
5205741f-3ea9-4c30-9c50-4bab229a51ce,Aidin Aslani,Day 1 - Period 6
14a36491-5a3d-42c9-b012-6a53654d9bac,Charlie Brown,Day 1 - Period 2
9dc7633a-e493-4ec0-a252-8616f2148705,Armin Norton,Day 1 - Period 2
In [6]:
review_file = './data/reviews.csv'
In [7]:
%%writefile $review_file
reviewer_uuid,student_uuid,feedback,date,weight
6aacd73c-0be5-412d-95a3-ca54149c9952,14a36491-5a3d-42c9-b012-6a53654d9bac,Awesome work!,2015-02-12,1
5205741f-3ea9-4c30-9c50-4bab229a51ce,9dc7633a-e493-4ec0-a252-8616f2148705,WOW!,2014-02-12,1
We again load this data into Lists of Dictionaries with csv_to_dicts
:
In [8]:
people_data = sand.csv_to_dicts(people_file)
people_data
Out[8]:
In [9]:
review_data = sand.csv_to_dicts(review_file)
review_data
Out[9]:
In [10]:
reviews = sand.from_vertices_and_edges(
vertices=people_data,
edges=review_data,
vertex_name_key='name',
vertex_id_key='uuid',
edge_foreign_keys=('reviewer_uuid', 'student_uuid'))
reviews.summary()
Out[10]:
In [11]:
reviews.vs['indegree']
Out[11]:
In [12]:
reviews.vs['outdegree']
Out[12]:
In [13]:
reviews.vs['label']
Out[13]:
In [14]:
reviews.vs['name']
Out[14]:
Groups represent modules or communities in the network. Groups are based on the labels by default.
In [15]:
reviews.vs['group']
Out[15]:
The vertices in the lein topology
data set contain fully-qualified namespaces for functions. Grouping by name isn't particularly useful here:
In [16]:
len(set(functions.vs['group']))
Out[16]:
In [17]:
len(functions.vs)
Out[17]:
Because sand
was build specifically for analyzing software and system networks, a fqn_to_groups
grouping function is built in:
In [18]:
functions.vs['group'] = sand.fqn_to_groups(functions.vs['label'])
In [19]:
len(set(functions.vs['group']))
Out[19]:
This is a much more managable number of groups. We'll see one way that these groups are useful when we render a visualization of the network: