Test Datasets

  • This notebooks generates a variety of large datasets that can be used for debugging and perf testing.

Import the necessary libaries


In [ ]:
import random
import graphistry as g
import pandas as pd
from random import choice
from string import ascii_letters
from IPython.display import IFrame

Check the version of the Graphistry module


In [ ]:
g.__version__

Set your API key and Graphistry Server Location

  • To use our public server at labs.graphistry.com, you must have a valid API key

In [ ]:
API_KEY = 'Go to www.graphistry.com/api-request to get your API key'

In [ ]:
g.register(api=1, key=API_KEY)

800K Edges, 1K Nodes (no attributes)


In [ ]:
edges = pd.DataFrame({'src': [ random.randint(0, 1000) for x in range(800000)], 
                      'dest': [random.randint(0, 1000) for x in range(800000)]})
edges[:3]

In [ ]:
g.edges(edges).bind(source='src', destination='dest').plot()

800K Edges, 1K Nodes (5 integer node and edge attributes)


In [ ]:
edges2 = edges;
nodes = pd.DataFrame({'name':[x for x in range(0, 1000)]})

In [ ]:
for i in range(5):
    edges2['intFld' + str(i)] = edges2.src.map(lambda x: random.randint(0, 100000))
    
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))

In [ ]:
g.edges(edges2).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

50K edges, 100 nodes, 100 integer edges attributes 5 integer node attributes


In [ ]:
edges = pd.DataFrame({'src': [ random.randint(0, 100) for x in range(50000)], 'dest': [random.randint(0, 100) for x in range(50000)]})

In [ ]:
nodes = pd.DataFrame({'name':[x for x in range(0, 100)]})

In [ ]:
for i in range(100):
    edges['intFld' + str(i)] = edges.src.map(lambda x: random.randint(0, 100000))
    
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))

In [ ]:
g.edges(edges).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

10K Edges, 100 Nodes (100 32 bytes random string edge attributes, 5 integer node attributes)


In [ ]:
edges = pd.DataFrame({'src': [ random.randint(0, 100) for x in range(10000)], 
                      'dest': [random.randint(0, 100) for x in range(10000)]})

In [ ]:
for i in range(100):
    edges['intFld' + str(i)] = edges.src.map(lambda x: (''.join(choice(ascii_letters) for i in range(32))))
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))

In [ ]:
g.edges(edges).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

Epinions - 75,877 nodes, 508,836 edges

Nodes represent users Edges represe


In [ ]:
url = 'http://' + SERVER + '/graph/graph.html?dataset=Epinions&scene=default&info=true&play=10000&mapper=splunk&splashAfter=1477695505'
IFrame(url, width=700, height=350)

Facebook - 4039 Nodes, 88234 Edges

  • Nodes People. Color indicates community and size shows popularity.
  • Edges Friendships

In [ ]:
url = 'http://' + SERVER + '/graph/graph.html?dataset=Facebook&scene=default&info=true&play=10000&mapper=opentsdb&splashAfter=1477695505'
IFrame(url, width=700, height=350)

Amazon 262111 nodes , 1,234,877 Edges

  • Nodes - Products or Customers
  • Edges - A customer review

In [ ]:
# Much larger than 800,000 nodes + edges. Does not need to render in 20 seconds. 
url = 'http://' + SERVER + '/graph/graph.html?dataset=Amazon&scene=default&info=true&play=10000&mapper=splunk&splashAfter=1477695505'
IFrame(url, width=700, height=350)