Sparse Datasets

This notebook is used for benchmarking and debugging sparse datasets

Import the necessary libaries



In [ ]:

    
import random
import graphistry as g
import pandas as pd

Check the version of the Graphistry module



In [ ]:

    
g.__version__

Set your API key and Graphistry Server Location

To use our public server at labs.graphistry.com, you must have a valid API key



In [ ]:

    
API_KEY = 'Go to www.graphistry.com/api-request to get your API key'



In [2]:

    
g.register(api=2, key=API_KEY)

1000 sparse columns with 8K edges, 800K elements

In many datasets, there may ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from 100 different float values

The time-to-render for an uploaded dataset should be 20s.

Opening the edges panel, and seeing the results, should also be within 20s.



In [ ]:

    
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 2, 'c': 3}, {'c': 4, 'd': 5}])



In [ ]:

    
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = fld
edges = pd.DataFrame(edges)
edges[:3]



In [ ]:

    
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (random float) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from 100 a random float value



In [ ]:

    
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = random.random()
edges = pd.DataFrame(edges)
edges[:3]



In [ ]:

    
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (string) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from a set of 100 different string values.



In [ ]:

    
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = 'String: ' + str(fld)
edges = pd.DataFrame(edges)
edges[:3]



In [ ]:

    
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (random string) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are randomly generated strings.



In [ ]:

    
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = 'String: ' + str(random.random())
edges = pd.DataFrame(edges)
edges[:3]



In [ ]:

    
g.edges(edges).bind(source='src', destination='dst').plot()