Tutorial: Data Analysis in Graphistry

Load data
Plot:
- Simple: input is a list of edges
- Arbitrary: input is a table (hypergraph transform)
Advanced bindings
Further docs



In [2]:

    
import graphistry
#graphistry.register(key='MY_API_KEY', server='labs.graphistry.com')

1. Load CSV

Graphistry works seamlessly with Pandas dataframes



In [3]:

    
import pandas as pd

df = pd.read_csv('./data/honeypot.csv')
df.sample(3)









    Out[3]:







  
    
      
      attackerIP
      victimIP
      victimPort
      vulnName
      count
      time(max)
      time(min)
    
  
  
    
      168
      59.91.217.236
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      5
      1.416331e+09
      1.416330e+09
    
    
      16
      117.194.34.106
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      9
      1.415973e+09
      1.415972e+09
    
    
      107
      195.189.111.210
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      8
      1.416838e+09
      1.416836e+09

2. Plot

A. Simple graphs

Build up a set of bindings. Simple graphs are for edge lists, or an edge list + node list.
See UI Guide for in-tool activity

Demo graph schema:

Edges: Alerts linking attackerIP -> victimIP
Nodes: Synthesized from attackerIP -> victimIP edges
Default colors: Automatic based on inferred commmunity
Default node size: Number of edges



In [4]:

    
g = graphistry.edges(df).bind(source='attackerIP', destination='victimIP')



In [5]:

    
g.plot()









    Out[5]:

B. Hypergraphs -- Plot arbitrary tables

To quickly understand correlations across all your table's values, hypergraph is a convenient transformation.

A hypergraph will link values occurring in the sample table row to one another. By default, the hypergraph plot does not link values directly to one another, but indirects through a node representing the row.

Approach 1: Each row is a node, and links to each value in it

Demo graph schema:

Edges: row -> attckerIP, row -> victimIP, row -> victimPort, row -> volnName
Nodes: row, attackerIP, victimIP, victimPort, vulnName
Default colors: Automatic based on inferred commmunity
Default node size: Number of edges

To allow nodes from the attackerIP and victimIP columns to merge together when they have the same value, instead of generating distinct nodes such as attackerIP::127.0.0.1 and victimIP::127.0.0.1, we combine them into one category, ip. The result is one node ip::127.0.0.1.



In [6]:

    
hg1 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    opts={
        'CATAGORIES': {
            'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
        }
    })

hg1_g = hg1['graph']
hg1_g.plot()









    



('# links', 880)
('# events', 220)
('# attrib entities', 221)






    Out[6]:

Approach 2: Link values from entries

For more advanced hypergraph control, we can skip the row node, and control which edges are generated, by enabling direct.

Demo graph schema:

Edges:
- attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName
- victimPort -> victimIP
- vulnName -> victimIP
Nodes: attackerIP, victimIP, victimPort, vulnName
Default colors: Automatic based on inferred commmunity
Default node size: Number of edges



In [7]:

    
hg2 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    direct=True,
    opts={
        'EDGES': { ### OPTIONAL, DEFAULTS TO CREATING ALL-TO-ALL
            'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
            'victimPort': ['victimIP'],
            'vulnName': ['victimIP']         
        },
        'CATAGORIES': {
            'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
        }
    })

hg2_g = hg2['graph']
hg2_g.plot()









    



('# links', 1100)
('# events', 220)
('# attrib entities', 221)






    Out[7]:

3. Advanced bindings

By default, you do not need to explictly create a table of nodes. However, if you do provide one, you can then drive visual styles based on node attributes.

Demo schema:

Point size based on number of attacks
Point color based on attacker vs victim
- Color palette values: https://labs.graphistry.com/graphistry/docs/palette.html
Save dynamic workbook settings across sessions



In [12]:

    
# 1. Create nodes, tag type as `attacker`

targets_df = df[['victimIP']].drop_duplicates().rename(columns={'victimIP': 'node_id'})\
    .assign(type='victim')

attackers_df = df.groupby(['attackerIP']).agg({'count': {'attacks': 'sum'}}).reset_index()
attackers_df.columns = attackers_df.columns.get_level_values(0)
attackers_df = attackers_df.rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
attackers_df

nodes_df = pd.concat([targets_df, attackers_df], ignore_index=True)
nodes_df.sample(3)









    Out[12]:







  
    
      
      count
      node_id
      type
    
  
  
    
      32
      3.0
      124.123.70.99
      attacker
    
    
      177
      2.0
      85.192.166.151
      attacker
    
    
      2
      6.0
      1.235.32.141
      attacker



In [9]:

    
# 2. Plot nodes, and color based on type `attacker`

g2 = g.nodes(nodes_df).bind(node='node_id')

#optional
nodes_df['my_color'] = nodes_df['type'].apply(lambda v: 0 if v == 'attacker' else 2)
nodes_df = nodes_df.fillna(value={'count': (nodes_df['count'].max() + nodes_df['count'].min()) / 2.0 })
g2 = g2.bind(point_size = 'count', point_color='my_color')
g2 = g2.settings(url_params={'workbook': 'my_analysis_wb_1'})

g2.plot()









    Out[9]:

Advanced bindings work with hypergraphs too



In [10]:

    
nodes = hg2_g._nodes

types = list(nodes['type'].unique())
nodes_with_colors = nodes.assign(color=nodes.type.apply(lambda t: types.index(t)))
nodes_with_colors.sample(3)









    Out[10]:







  
    
      
      attackerIP
      nodeID
      nodeTitle
      type
      victimIP
      victimPort
      vulnName
      category
      color
    
  
  
    
      112
      220.172.133.215
      attackerIP::220.172.133.215
      220.172.133.215
      attackerIP
      NaN
      NaN
      NaN
      attackerIP
      0
    
    
      57
      179.25.208.154
      attackerIP::179.25.208.154
      179.25.208.154
      attackerIP
      NaN
      NaN
      NaN
      attackerIP
      0
    
    
      121
      31.135.61.170
      attackerIP::31.135.61.170
      31.135.61.170
      attackerIP
      NaN
      NaN
      NaN
      attackerIP
      0



In [11]:

    
hg2_g\
  .nodes(nodes_with_colors).bind(point_color='color')\
  .settings(url_params={'workbook': 'my_analysis_wb_2'})\
  .plot()









    Out[11]:

Further docs:



In [ ]:



In [ ]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
168	59.91.217.236	172.31.14.66	445.0	MS08067 (NetAPI)	5	1.416331e+09	1.416330e+09
16	117.194.34.106	172.31.14.66	445.0	MS08067 (NetAPI)	9	1.415973e+09	1.415972e+09
107	195.189.111.210	172.31.14.66	445.0	MS08067 (NetAPI)	8	1.416838e+09	1.416836e+09

	count	node_id	type
32	3.0	124.123.70.99	attacker
177	2.0	85.192.166.151	attacker
2	6.0	1.235.32.141	attacker

	attackerIP	nodeID	nodeTitle	type	victimIP	victimPort	vulnName	category
112	220.172.133.215	attackerIP::220.172.133.215	220.172.133.215	attackerIP	NaN	NaN	NaN	attackerIP
57	179.25.208.154	attackerIP::179.25.208.154	179.25.208.154	attackerIP	NaN	NaN	NaN	attackerIP
121	31.135.61.170	attackerIP::31.135.61.170	31.135.61.170	attackerIP	NaN	NaN	NaN	attackerIP