Baseball Demo

This notebook contains a detailed example, demonstrating the typical workflow Graft aims to support. The dataset used here was constructed by splicing together two separate datasets:

  1. SOCR Data MLB HeightsWeights: Heights, ages and weights of Baseball players (Vertex Data). References:
    • Jarron M. Saint Onge, Patrick M. Krueger, Richard G. Rogers. (2008) Historical trends in height, weight, and body mass: Data from U.S. Major League Baseball players, 1869-1983, Economics & Human Biology, Volume 6, Issue 3, Symposium on the Economics of Obesity, December 2008, Pages 482-488, ISSN 1570-677X, DOI: 10.1016/j.ehb.2008.06.008.
    • Jarron M. Saint Onge, Richard G. Rogers, Patrick M. Krueger. (2008) Major League Baseball Players' Life Expectancies, Southwestern Social Science Association, Volume 89, Issue 3, pages 817–830, DOI: 10.1111/j.1540-6237.2008.00562.x.
  2. Advogato Trust Network : Edge weights between 0 and 1. References:
    • Advogato network dataset -- KONECT, July 2016. http
    • Paolo Massa, Martino Salvetti, and Danilo Tomasoni. Bowling alone and trust decline in social network sites. In Proc. Int. Conf. Dependable, Autonomic and Secure Computing, pages 658--663, 2009.

The dataset has 6541 vertices, 51127 edges. Vertex properties: Age, Height(cm), Weight(kg) Edge properties : Trust(float)


In [1]:
## Load and summarize the graph.
using Graft
using StatsBase
import LightGraphs

# Load the graph
download(
"https://raw.githubusercontent.com/pranavtbhat/Graft.jl/gh-pages/Datasets/baseball.txt",
joinpath(Pkg.dir("Graft"), "examples/baseball.txt")
);


INFO: Recompiling stale cache file /Users/pranav/.julia/lib/v0.5/Graft.ji for module Graft.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1900k  100 1900k    0     0   163k      0  0:00:11  0:00:11 --:--:--  288k

In [2]:
g = loadgraph(joinpath(Pkg.dir("Graft"), "examples/baseball.txt"))


Out[2]:
Graph(6541 vertices, 51127 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [3]:
# Get the graph's size
size(g)


Out[3]:
(6541,51127)

In [ ]:
# Get an iterator over the graph's edges
edges(g)

In [ ]:
# List vertex labels
encode(g)

In [12]:
# Split the graph into vertex and edge descriptors
V,E = g;

In [13]:
# Display the vertex table
V


Out[13]:
│ VertexID │ Labels       │ Age   │ Height │ Weight  │
├──────────┼──────────────┼───────┼────────┼─────────┤
│ 1        │ "gc"         │ 26.03 │ 182.88 │ 104.545 │
│ 2        │ "prigaux"    │ 25.43 │ 175.26 │ 84.0909 │
│ 3        │ "fred"       │ 24.51 │ 182.88 │ 93.6364 │
│ 4        │ "quintela"   │ 31.81 │ 193.04 │ 86.3636 │
│ 5        │ "jgarzik"    │ 27.32 │ 185.42 │ 90.9091 │
│ 6        │ "penso"      │ 25.5  │ 185.42 │ 86.3636 │
│ 7        │ "leviramsey" │ 32.68 │ 182.88 │ 90.9091 │
│ 8        │ "havardk"    │ 30.22 │ 182.88 │ 88.6364 │
│ 9        │ "sh"         │ 28.8  │ 182.88 │ 90.9091 │
│ 10       │ "zappy"      │ 29.54 │ 193.04 │ 104.545 │
│ 11       │ "ollesson"   │ 29.12 │ 180.34 │ 106.818 │
⋮
│ 6530     │ "boerner"    │ 30.46 │ 187.96 │ 88.1818 │
│ 6531     │ "barismetin" │ 32.68 │ 193.04 │ 89.5455 │
│ 6532     │ "baris"      │ 28.11 │ 180.34 │ 88.6364 │
│ 6533     │ "obritim"    │ 26.63 │ 187.96 │ 84.0909 │
│ 6534     │ "arabouma36" │ 24.15 │ 193.04 │ 81.8182 │
│ 6535     │ "MachX"      │ 26.49 │ 190.5  │ 90.9091 │
│ 6536     │ "seahawk"    │ 22.89 │ 190.5  │ 90.9091 │
│ 6537     │ "xchaix"     │ 28.53 │ 177.8  │ 95.4545 │
│ 6538     │ "sktrdie"    │ 25.35 │ 180.34 │ 86.3636 │
│ 6539     │ "KellyHo"    │ 22.34 │ 185.42 │ 95.4545 │
│ 6540     │ "mike1086"   │ 23.45 │ 190.5  │ 86.3636 │
│ 6541     │ "thelema"    │ 29.99 │ 193.04 │ 88.6364 │

In [14]:
# Display the edge table
E


Out[14]:
│ Index │ Source        │ Target        │ Trust     │
├───────┼───────────────┼───────────────┼───────────┤
│ 1     │ "gc"          │ "gc"          │ 0.42739   │
│ 2     │ "gc"          │ "prigaux"     │ 0.978998  │
│ 3     │ "gc"          │ "fred"        │ 0.714178  │
│ 4     │ "gc"          │ "penso"       │ 0.999861  │
│ 5     │ "gc"          │ "leviramsey"  │ 0.993962  │
│ 6     │ "gc"          │ "sh"          │ 0.336044  │
│ 7     │ "gc"          │ "fxn"         │ 0.0949308 │
│ 8     │ "gc"          │ "chromatic"   │ 0.778156  │
│ 9     │ "gc"          │ "strider"     │ 0.874019  │
│ 10    │ "gc"          │ "sdodji"      │ 0.282097  │
│ 11    │ "gc"          │ "Nyco"        │ 0.142455  │
⋮
│ 51116 │ "hulver"      │ "hulver"      │ 0.410183  │
│ 51117 │ "asanders"    │ "asanders"    │ 0.754016  │
│ 51118 │ "Aracnus"     │ "Aracnus"     │ 0.868275  │
│ 51119 │ "billstewart" │ "billstewart" │ 0.976183  │
│ 51120 │ "boerner"     │ "slok"        │ 0.864808  │
│ 51121 │ "baris"       │ "barismetin"  │ 0.934414  │
│ 51122 │ "MachX"       │ "arabouma36"  │ 0.059897  │
│ 51123 │ "seahawk"     │ "seahawk"     │ 0.935393  │
│ 51124 │ "xchaix"      │ "xchaix"      │ 0.966611  │
│ 51125 │ "sktrdie"     │ "sktrdie"     │ 0.323029  │
│ 51126 │ "KellyHo"     │ "KellyHo"     │ 0.404737  │
│ 51127 │ "mike1086"    │ "mike1086"    │ 0.370529  │

In [15]:
# Find the average BMI of baseball players
@query(g |> eachvertex(v.Weight / (v.Height / 100) ^ 2)) |> mean


Out[15]:
26.23778373854929

In [16]:
# Find the median height of baseball players in their 20s
@query(g |> filter(v.Age < 30,v.Age >= 20) |> eachvertex(v.Height * 0.0328084)) |> median


Out[16]:
6.166666864000001

In [17]:
# Find the mean age difference in strong relationships
@query(g |> filter(e.Trust > 0.8) |> eachedge(s.Age - t.Age)) |> abs |> mean


Out[17]:
4.163929464037767

In [18]:
# Find fred's 3 hop neighborhood (friends and friends-of-friends and so on)
fred_nhood = hopgraph(g, "fred", 3)


Out[18]:
Graph(1957 vertices, 29901 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [19]:
# See how well younger players in fred's neighborhood trust each other
@query(fred_nhood |> filter(v.Age > 30) |> eachedge(e.Trust)) |> mean


Out[19]:
0.5495668265206273

In [20]:
# Find the 2 hop neighborhood of 2 separate vertices (multi seed traversal)
sg = hopgraph(g, ["nikolay", "jbert"], 3)


Out[20]:
Graph(1615 vertices, 23569 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)

In [22]:
# Generate an edge distance property on the inverse of normalized-trust
dists = @query(sg |> eachedge(1 / e.Trust ));
seteprop!(sg, :, dists, :Dist);

In [23]:
# Trim edges of very high distance
sg = @query(sg |> filter(e.Dist < 10))


Out[23]:
Graph(1615 vertices, 22108 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust,:Dist] edge properties)

In [24]:
# Export the graph's adjacency matrix
M = export_adjacency(sg)
lg = LightGraphs.DiGraph(M)


Out[24]:
{1615, 22108} directed graph

In [26]:
# Export the edge distance property
D = export_edge_property(sg, :Dist);

In [ ]:
# Compute betweenness centrailty
centrality = LightGraphs.betweenness_centrality(lg)

In [29]:
# Set the centrality as a vertex property
setvprop!(sg, :, centrality, :Centrality);

In [30]:
# Apply all pair shortest paths on the graph
apsp = LightGraphs.floyd_warshall_shortest_paths(lg, D).dists;

In [31]:
# Add the new shortest paths as a property to the graph
eit = edges(sg);
seteprop!(sg, :, [apsp[e.second,e.first] for e in eit], :Shortest_Dists);

In [32]:
# Show new vertex descriptor
VertexDescriptor(sg)


Out[32]:
│ VertexID │ Labels        │ Age   │ Height │ Weight  │ Centrality  │
├──────────┼───────────────┼───────┼────────┼─────────┼─────────────┤
│ 1        │ "lkcl"        │ 30.51 │ 190.5  │ 95.4545 │ 0.0352864   │
│ 2        │ "chalst"      │ 27.16 │ 187.96 │ 79.5455 │ 0.0180542   │
│ 3        │ "jrf"         │ 27.23 │ 182.88 │ 81.8182 │ 0.0145245   │
│ 4        │ "Astinus"     │ 33.77 │ 190.5  │ 81.8182 │ 1.73845e-5  │
│ 5        │ "halcy0n"     │ 30.8  │ 187.96 │ 90.9091 │ 0.00615578  │
│ 6        │ "mbp"         │ 24.21 │ 182.88 │ 113.182 │ 0.0232976   │
│ 7        │ "sulaiman"    │ 33.15 │ 198.12 │ 100.0   │ 0.00730561  │
│ 8        │ "crackmonkey" │ 27.08 │ 185.42 │ 109.091 │ 0.00691493  │
│ 9        │ "ajv"         │ 32.84 │ 180.34 │ 90.9091 │ 0.00599549  │
│ 10       │ "lukeh"       │ 30.99 │ 185.42 │ 81.8182 │ 0.0074532   │
│ 11       │ "AndreyGolub" │ 29.84 │ 193.04 │ 86.3636 │ 0.0         │
⋮
│ 1604     │ "jwoolley"    │ 40.66 │ 180.34 │ 77.2727 │ 1.26063e-5  │
│ 1605     │ "gozer"       │ 26.75 │ 185.42 │ 104.545 │ 0.0         │
│ 1606     │ "rederpj"     │ 24.76 │ 185.42 │ 103.182 │ 0.0         │
│ 1607     │ "elsharkco"   │ 24.69 │ 182.88 │ 95.9091 │ 0.0         │
│ 1608     │ "netgod"      │ 26.59 │ 185.42 │ 95.4545 │ 0.000582392 │
│ 1609     │ "hadess"      │ 28.48 │ 175.26 │ 95.4545 │ 0.00283106  │
│ 1610     │ "largo"       │ 33.57 │ 185.42 │ 88.6364 │ 0.000318805 │
│ 1611     │ "kazen"       │ 22.52 │ 175.26 │ 84.0909 │ 2.99768e-5  │
│ 1612     │ "bluets"      │ 31.63 │ 180.34 │ 102.273 │ 0.0         │
│ 1613     │ "secabeen"    │ 28.56 │ 193.04 │ 90.9091 │ 0.0         │
│ 1614     │ "nikolay"     │ 23.29 │ 180.34 │ 100.0   │ 0.0         │
│ 1615     │ "jbert"       │ 31.84 │ 193.04 │ 85.9091 │ 0.0         │

In [33]:
# Show the new edge descriptor
EdgeDescriptor(sg)


Out[33]:
│ Index │ Source     │ Target        │ Trust    │ Dist    │ Shortest_Dists │
├───────┼────────────┼───────────────┼──────────┼─────────┼────────────────┤
│ 1     │ "lkcl"     │ "chalst"      │ 0.753731 │ 1.32673 │ 1.32673        │
│ 2     │ "lkcl"     │ "jrf"         │ 0.837243 │ 1.1944  │ 1.1944         │
│ 3     │ "lkcl"     │ "Astinus"     │ 0.620516 │ 1.61156 │ 1.61156        │
│ 4     │ "lkcl"     │ "halcy0n"     │ 0.704766 │ 1.41891 │ 1.41891        │
│ 5     │ "lkcl"     │ "mbp"         │ 0.879317 │ 1.13725 │ 1.13725        │
│ 6     │ "lkcl"     │ "sulaiman"    │ 0.352907 │ 2.83361 │ 2.33345        │
│ 7     │ "lkcl"     │ "crackmonkey" │ 0.223243 │ 4.47942 │ 3.25504        │
│ 8     │ "lkcl"     │ "ajv"         │ 0.427735 │ 2.3379  │ 2.3379         │
│ 9     │ "lkcl"     │ "AndreyGolub" │ 0.896434 │ 1.11553 │ 1.11553        │
│ 10    │ "lkcl"     │ "fxn"         │ 0.187012 │ 5.34724 │ 2.21906        │
│ 11    │ "lkcl"     │ "splork"      │ 0.103399 │ 9.67129 │ 2.17231        │
⋮
│ 22097 │ "largo"    │ "hadess"      │ 0.953549 │ 1.04871 │ 1.04871        │
│ 22098 │ "largo"    │ "largo"       │ 0.849266 │ 1.17749 │ 0.0            │
│ 22099 │ "kazen"    │ "teknix"      │ 0.673166 │ 1.48552 │ 1.48552        │
│ 22100 │ "kazen"    │ "kroah"       │ 0.658092 │ 1.51954 │ 1.51954        │
│ 22101 │ "kazen"    │ "kazen"       │ 0.672931 │ 1.48604 │ 0.0            │
│ 22102 │ "bluets"   │ "teknix"      │ 0.179145 │ 5.58207 │ 5.58207        │
│ 22103 │ "bluets"   │ "Stevey"      │ 0.307009 │ 3.25723 │ 3.25723        │
│ 22104 │ "bluets"   │ "bluets"      │ 0.473882 │ 2.11023 │ 0.0            │
│ 22105 │ "secabeen" │ "teknix"      │ 0.607547 │ 1.64596 │ 1.64596        │
│ 22106 │ "nikolay"  │ "lkcl"        │ 0.335673 │ 2.97909 │ 2.97909        │
│ 22107 │ "nikolay"  │ "chalst"      │ 0.600938 │ 1.66407 │ 1.66407        │
│ 22108 │ "jbert"    │ "jrf"         │ 0.577956 │ 1.73024 │ 1.73024        │