Sparse Datasets

  • This notebook is used for benchmarking and debugging sparse datasets

Import the necessary libaries


In [ ]:
import random
import graphistry as g
import pandas as pd

Check the version of the Graphistry module


In [ ]:
g.__version__

Set your API key and Graphistry Server Location

  • To use our public server at labs.graphistry.com, you must have a valid API key

In [ ]:
API_KEY = 'Go to www.graphistry.com/api-request to get your API key'

In [2]:
g.register(api=2, key=API_KEY)

1000 sparse columns with 8K edges, 800K elements

In many datasets, there may ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from 100 different float values

The time-to-render for an uploaded dataset should be 20s.

Opening the edges panel, and seeing the results, should also be within 20s.


In [ ]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 2, 'c': 3}, {'c': 4, 'd': 5}])

In [ ]:
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = fld
edges = pd.DataFrame(edges)
edges[:3]

In [ ]:
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (random float) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from 100 a random float value


In [ ]:
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = random.random()
edges = pd.DataFrame(edges)
edges[:3]

In [ ]:
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (string) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are selected from a set of 100 different string values.


In [ ]:
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = 'String: ' + str(fld)
edges = pd.DataFrame(edges)
edges[:3]

In [ ]:
g.edges(edges).bind(source='src', destination='dst').plot()

1000 sparse (random string) columns with 8K edges, 800K elements

There are ~1000 possible attributes, though each node/edge likely only has 5-100.

Attributes are randomly generated strings.


In [ ]:
edges = [{'src': x, 'dst': (x + 1) % 8000} for x in range(0, 8000)]
for i, edge in enumerate(edges):
    for fld in range(0, 100):
        edge['fld' + str((i + fld) % 1000)] = 'String: ' + str(random.random())
edges = pd.DataFrame(edges)
edges[:3]

In [ ]:
g.edges(edges).bind(source='src', destination='dst').plot()