Test Datasets

This notebooks generates a variety of large datasets that can be used for debugging and perf testing.

Import the necessary libaries



In [ ]:

    
import random
import graphistry as g
import pandas as pd
from random import choice
from string import ascii_letters
from IPython.display import IFrame

Check the version of the Graphistry module



In [ ]:

    
g.__version__

Set your API key and Graphistry Server Location

To use our public server at labs.graphistry.com, you must have a valid API key



In [ ]:

    
API_KEY = 'Go to www.graphistry.com/api-request to get your API key'



In [ ]:

    
g.register(api=1, key=API_KEY)

800K Edges, 1K Nodes (no attributes)



In [ ]:

    
edges = pd.DataFrame({'src': [ random.randint(0, 1000) for x in range(800000)], 
                      'dest': [random.randint(0, 1000) for x in range(800000)]})
edges[:3]



In [ ]:

    
g.edges(edges).bind(source='src', destination='dest').plot()

800K Edges, 1K Nodes (5 integer node and edge attributes)



In [ ]:

    
edges2 = edges;
nodes = pd.DataFrame({'name':[x for x in range(0, 1000)]})



In [ ]:

    
for i in range(5):
    edges2['intFld' + str(i)] = edges2.src.map(lambda x: random.randint(0, 100000))
    
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))



In [ ]:

    
g.edges(edges2).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

50K edges, 100 nodes, 100 integer edges attributes 5 integer node attributes



In [ ]:

    
edges = pd.DataFrame({'src': [ random.randint(0, 100) for x in range(50000)], 'dest': [random.randint(0, 100) for x in range(50000)]})



In [ ]:

    
nodes = pd.DataFrame({'name':[x for x in range(0, 100)]})



In [ ]:

    
for i in range(100):
    edges['intFld' + str(i)] = edges.src.map(lambda x: random.randint(0, 100000))
    
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))



In [ ]:

    
g.edges(edges).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

10K Edges, 100 Nodes (100 32 bytes random string edge attributes, 5 integer node attributes)



In [ ]:

    
edges = pd.DataFrame({'src': [ random.randint(0, 100) for x in range(10000)], 
                      'dest': [random.randint(0, 100) for x in range(10000)]})



In [ ]:

    
for i in range(100):
    edges['intFld' + str(i)] = edges.src.map(lambda x: (''.join(choice(ascii_letters) for i in range(32))))
for i in range(5):
    nodes['intFld' + str(i)] = nodes.name.map(lambda x: random.randint(0, 100000))



In [ ]:

    
g.edges(edges).nodes(nodes).bind(source='src', destination='dest', node='name').plot()

Epinions - 75,877 nodes, 508,836 edges

Nodes represent users Edges represe



In [ ]:

    
url = 'http://' + SERVER + '/graph/graph.html?dataset=Epinions&scene=default&info=true&play=10000&mapper=splunk&splashAfter=1477695505'
IFrame(url, width=700, height=350)

Facebook - 4039 Nodes, 88234 Edges

Nodes People. Color indicates community and size shows popularity.
Edges Friendships



In [ ]:

    
url = 'http://' + SERVER + '/graph/graph.html?dataset=Facebook&scene=default&info=true&play=10000&mapper=opentsdb&splashAfter=1477695505'
IFrame(url, width=700, height=350)

Amazon 262111 nodes , 1,234,877 Edges

Nodes - Products or Customers
Edges - A customer review



In [ ]:

    
# Much larger than 800,000 nodes + edges. Does not need to render in 20 seconds. 
url = 'http://' + SERVER + '/graph/graph.html?dataset=Amazon&scene=default&info=true&play=10000&mapper=splunk&splashAfter=1477695505'
IFrame(url, width=700, height=350)