Getting Started with GraphLab Create - Tutorial

Code available via turi-code tutorials


In [1]:
import graphlab as gl

In [2]:
gl.canvas.set_target('ipynb') # use IPython Notebook output for GraphLab Canvas

In [3]:
vertices = gl.SFrame.read_csv('https://static.turi.com/datasets/bond/bond_vertices.csv')
edges = gl.SFrame.read_csv('https://static.turi.com/datasets/bond/bond_edges.csv')


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1472322388.log
This non-commercial license of GraphLab Create for academic use is assigned to daina.bouquin@spsmail.cuny.edu and will expire on August 27, 2017.
Downloading https://static.turi.com/datasets/bond/bond_vertices.csv to /var/tmp/graphlab-dainabouquin/3885/36c94d2f-e9d0-49b4-9488-c03f03acfdfa.csv
Finished parsing file https://static.turi.com/datasets/bond/bond_vertices.csv
Parsing completed. Parsed 10 lines in 0.047944 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file https://static.turi.com/datasets/bond/bond_vertices.csv
Parsing completed. Parsed 10 lines in 0.017223 secs.
Downloading https://static.turi.com/datasets/bond/bond_edges.csv to /var/tmp/graphlab-dainabouquin/3885/4c04b0d6-a376-44a1-9a57-af6f10d6d7ca.csv
Finished parsing file https://static.turi.com/datasets/bond/bond_edges.csv
Parsing completed. Parsed 20 lines in 0.016904 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file https://static.turi.com/datasets/bond/bond_edges.csv
Parsing completed. Parsed 20 lines in 0.017081 secs.

In [4]:
# SFrame has a number of methods to explore and transform your data
vertices.show()



In [5]:
# this shows the summary of the edges SFrame
edges.show()



In [6]:
#Create a graph object
g = gl.SGraph()

In [7]:
#Add vertices and edges to this graph
# add some vertices in a dataflow-ish way
g = g.add_vertices(vertices=vertices, vid_field='name')

In [8]:
# more dataflow
g = g.add_edges(edges=edges, src_field='src', dst_field='dst')

In [9]:
# Show all the vertices
g.get_vertices()


Out[9]:
__id gender license_to_kill villian
Inga Bergstorm F 0 0
Moneypenny F 1 0
Henry Gupta M 0 1
Wai Lin F 1 0
Q M 1 0
James Bond M 1 0
M M 1 0
Paris Carver F 0 1
Elliot Carver M 0 1
Gotz Otto M 0 1
[10 rows x 4 columns]

In [10]:
# Show all the edges
g.get_edges()


Out[10]:
__src_id __dst_id relation
Moneypenny M managed_by
Inga Bergstorm James Bond friend
Moneypenny Q colleague
Henry Gupta Elliot Carver killed_by
Q Moneypenny colleague
M Moneypenny worksfor
James Bond Inga Bergstorm friend
James Bond M managed_by
Q M managed_by
Wai Lin James Bond friend
[20 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

In [11]:
# Get all the "friend" edges
g.get_edges(fields={'relation': 'friend'})


Out[11]:
__src_id __dst_id relation
Inga Bergstorm James Bond friend
James Bond Inga Bergstorm friend
Wai Lin James Bond friend
James Bond Wai Lin friend
[4 rows x 3 columns]

In [12]:
#Apply the pagerank algorithm to our graph
pr = gl.pagerank.create(g)


Counting out degree
Done counting out degree
+-----------+-----------------------+
| Iteration | L1 change in pagerank |
+-----------+-----------------------+
| 1         | 6.65833               |
| 2         | 4.65611               |
| 3         | 3.46298               |
| 4         | 2.55686               |
| 5         | 1.95422               |
| 6         | 1.42139               |
| 7         | 1.10464               |
| 8         | 0.806704              |
| 9         | 0.631771              |
| 10        | 0.465388              |
| 11        | 0.364898              |
| 12        | 0.271257              |
| 13        | 0.212255              |
| 14        | 0.159062              |
| 15        | 0.124071              |
| 16        | 0.0935911             |
| 17        | 0.0727674             |
| 18        | 0.0551714             |
| 19        | 0.0427744             |
| 20        | 0.0325555             |
+-----------+-----------------------+

In [14]:
pr.get('pagerank').topk(column_name='pagerank')
#We see, not unexpectedly, that James Bond is a very important person, and that bad guys aren't that popular...


Out[14]:
__id pagerank delta
James Bond 2.52743578524 0.0132914517076
M 1.87718696576 0.00666194771763
Moneypenny 1.18363921275 0.00143637385736
Q 1.18363921275 0.00143637385736
Wai Lin 0.869872717136 0.00477951418076
Inga Bergstorm 0.869872717136 0.00477951418076
Elliot Carver 0.634064732205 0.000113553313724
Henry Gupta 0.284762885673 1.89255522873e-05
Paris Carver 0.284762885673 1.89255522873e-05
Gotz Otto 0.284762885673 1.89255522873e-05
[10 rows x 3 columns]

In [ ]: