SQL Example

Get & plot data from the first table in a SQL DB

  • Uses SQL ODBC drivers prepacked with Graphistry
  • Default: visualizes the schema migration table used within Graphistry
  • Shows several viz modes + a convenience function for sql->interactive viz
  • Try: Modify the indicated lines to change to visualize any other table

Further docs

Setup

Graphistry


In [ ]:
import graphistry

#pip install graphistry -q
#graphistry.register(protocol='http', server='my.server.com', key='MY_API_KEY')

SQL connection string

  • Modify with your own db connection strig
  • For heavier use and sharing, see sample for hiding creds from the notebook while still reusing them across sessions

In [ ]:
user = "graphistry"
pwd = "password"
server = "postgres:5432"

##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in
#import json
#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:
#    cfg = json.load(json_file)
#    user = cfg['user']
#    pwd = cfg['pwd']
#    server = cfg['server']
#
## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:
#import json
#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:
#    json.dump({
#        "user": "graphistry",
#        "pwd": "password",
#        "server": "postgres:5432"
#    }, outfile)
## Delete ^^^ after use

db_string = "postgres://" + user + ":" + pwd + "@" + server
### Take care not to save a print of the result

In [ ]:
# OPTIONAL: Install ODBC drivers in other environments:
# ! apt-get update
# ! apt-get install -y g++ unixodbc unixodbc-dev
# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5

Connect to DB


In [ ]:
import pandas as pd
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String
from sqlalchemy.orm import sessionmaker

In [ ]:
engine = create_engine(db_string)
Session = sessionmaker(bind=engine)
session = Session()

Inspect available tables


In [ ]:
table_names = engine.table_names()
", ".join(table_names)

Optional: Modify to pick your own table!


In [ ]:
if 'django_migrations' in table_names:
    table = 'django_migrations'
else:
    table = table_names[0]

Initialize viz: Get data


In [ ]:
result = engine.execute("SELECT * FROM \"" + table + "\" LIMIT 1000")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print("table", table, '# rows', len(df))
df.sample(min(3, len(df)))

Plot

Several variants:

  1. Treat each row & cell value as a node, and connect row<>cell values
  2. Treat each cell value as a node, and connect all cell values together when they occur on the same row
  3. Treat each cell value as an edge, and specify which columns to to connect values together on
  4. Use explict node/edge tables

1. Treat each row & cell value as a node, and connect row<>cell values


In [ ]:
graphistry.hypergraph(df)['graph'].plot()

2. Treat each cell value as a node, and connect all cell values together when they occur on the same row


In [ ]:
graphistry.hypergraph(df, direct=True)['graph'].plot()

3. Treat each cell value as an edge, and specify which columns to to connect values together on


In [ ]:
graphistry.hypergraph(df, direct=True,
    opts={
        'EDGES': {
            'id': ['name'],
            'applied': ['name'],
            'name': ['app']
        }
    })['graph'].plot()

4. Use explict node/edge tables


In [ ]:
g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))
g.plot()

In [ ]:
# Add node bindings..
nodes_df = pd.concat([
    df[['name', 'id', 'applied']],
    df[['app']].drop_duplicates().assign(\
          id = df[['app']].drop_duplicates()['app'], \
        name = df[['app']].drop_duplicates()['app'])
], ignore_index=True, sort=False)

g = g.bind(node='id', point_title='name').nodes(nodes_df)

g.plot()

Convenience function


In [ ]:
def explore(sql, *args, **kvargs):
    result = engine.execute(sql)
    df = pd.DataFrame(result.fetchall(), columns=result.keys())
    print('# rows', len(df))
    g = graphistry.hypergraph(df, *args, **kvargs)['graph']
    return g

Simple use


In [ ]:
explore("SELECT * FROM django_migrations LIMIT 1000").plot()

Pass in graphistry.hypergraph() options


In [ ]:
explore("SELECT * FROM django_migrations LIMIT 1000", direct=True).plot()

Get data back


In [ ]:
explore("SELECT * FROM django_migrations LIMIT 1000")._nodes

In [ ]: