Visualize CSV Mini-App

Jupyter: File -> Make a copy
Colab: File -> Save a copy in Drive
Run notebook cells by pressing shift-enter
Either edit annd run top cells one-by-one, or edit and run the self-contained version at the bottom



In [1]:

    
#!pip install graphistry -q



In [3]:

    
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='labs.graphistry.com')

1. Upload csv

Use a file by uploading it or via URL.

Run help(pd.read_csv) for more options.

File Upload: Jupyter Notebooks

If circle on top right not green, click kernel -> reconnect
Go to file directory (/tree) by clicking the Jupyter logo
Navigate to the directory page containing your notebook
Press the upload button on the top right

File Upload: Google Colab

Open the left sidebar by pressing the right arrow on the left
Go to the Files tab
Press UPLOAD
Make sure goes into /content

File Upload: URL

Uncomment below line and put in the actual data url
Run help(pd.read_csv) for more options



In [4]:

    
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print('# rows', len(df))
df.sample(min(len(df), 3))









    



('# rows', 220)






    Out[4]:







  
    
      
      attackerIP
      victimIP
      victimPort
      vulnName
      count
      time(max)
      time(min)
    
  
  
    
      145
      41.230.211.128
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      2
      1.421730e+09
      1.421729e+09
    
    
      25
      122.121.202.157
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      8
      1.423612e+09
      1.423611e+09
    
    
      75
      182.68.160.230
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      9
      1.417438e+09
      1.417436e+09

2. Optional: Clean up CSV



In [5]:

    
df = df.rename(columns={
#    'attackerIP': 'src_ip',
#    'victimIP': 'dest_ip'
})

df.sample(3)









    Out[5]:







  
    
      
      attackerIP
      victimIP
      victimPort
      vulnName
      count
      time(max)
      time(min)
    
  
  
    
      70
      182.161.224.84
      172.31.14.66
      139.0
      MS08067 (NetAPI)
      4
      1.419954e+09
      1.419952e+09
    
    
      10
      115.115.227.82
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      2
      1.413569e+09
      1.413569e+09
    
    
      152
      46.130.76.13
      172.31.14.66
      445.0
      MS08067 (NetAPI)
      7
      1.421093e+09
      1.421092e+09

3. Configure: Visualize with 3 kinds of graphs

Set mode and the corresponding values:

Mode "A". See graph from table of (src,dst) edges

Mode "B". See hypergraph: Draw row as node and connect it to entities in same row

Pick which cols to make nodes
If multiple cols share same type (e.g., "src_ip", "dest_ip" are both "ip"), unify them

Mode "C". See by creating multiple nodes, edges per row

Pick how different column values point to other column values
If multiple cols share same type (e.g., "src_ip", "dest_ip" are both "ip"), unify them



In [6]:

    
#Pick 'A', 'B', or 'C'
mode = 'B' 
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'



### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'vulnName']
categories = { #optional
    'ip': ['attacker_IP', 'victimIP']
    #, 'user': ['owner', 'seller'],
}



### 'C' == mode
edges = {
      'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
      'victimIP': [ 'victimPort'],
      'vulnName': [ 'victimIP' ]
}
categories = { #optional
      'ip': ['attackerIP', 'victimIP']
       #, user': ['owner', 'seller'], ...
}

4. Plot: Upload & render!

See UI guide: https://labs.graphistry.com/graphistry/ui.html



In [75]:

    
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']
  
#hg
print(len(g._edges))

g.plot()









    



('# links', 1100)
('# events', 220)
('# attrib entities', 221)
1100






    Out[75]:

Alternative: Combined

Split into data loading and cleaning/configuring/plotting.



In [59]:

    
#!pip install graphistry -q
import pandas as pd
import graphistry
#graphistry.register(key='MY_KEY', server='labs.graphistry.com')


##########
#1. Load
file_path = './data/honeypot.csv'
df = pd.read_csv(file_path)

print(df.columns)
print('rows:', len(df))
print(df.sample(min(len(df),3)))









    



Index([u'attackerIP', u'victimIP', u'victimPort', u'vulnName', u'count',
       u'time(max)', u'time(min)'],
      dtype='object')
('rows:', 220)
         attackerIP      victimIP  victimPort             vulnName  count  \
81  187.143.247.231  172.31.14.66       445.0      MS04011 (LSASS)      1   
47   151.252.204.92  172.31.14.66       139.0     MS08067 (NetAPI)      1   
41     125.64.35.68  172.31.14.66      9999.0  MaxDB Vulnerability      6   

       time(max)     time(min)  
81  1.420657e+09  1.420657e+09  
47  1.422929e+09  1.422929e+09  
41  1.420915e+09  1.417479e+09



In [79]:

    
##########
#2. Clean
#df = df.rename(columns={'attackerIP': 'src_ip', 'victimIP: 'dest_ip', 'victimPort': 'protocol'})

    
##########
#3. Config - Pick 'A', 'B', or 'C'
mode = 'C' 
max_rows = 1000


### 'A' == mode
my_src_col = 'attackerIP'
my_dest_col = 'victimIP'

### 'B' == mode
node_cols = ['attackerIP', 'victimIP', 'victimPort', 'vulnName']
categories = { #optional
    'ip': ['src_ip', 'dest_ip']
    #, 'user': ['owner', 'seller'],
}

### 'C' == mode
edges = {
    'attackerIP': [ 'victimIP', 'victimPort', 'vulnName'],
    'victimIP': [ 'victimPort' ],
    'vulnName': ['victimIP' ]
}
categories = { #optional
    'ip': ['attackerIP', 'victimIP']
    #, 'user': ['owner', 'seller'], ...
}

##########
#4. Plot
g = None
hg = None
num_rows = min(max_rows, len(df))
if mode == 'A':
    g = graphistry.edges(df.sample(num_rows)).bind(source=my_src_col, destination=my_dest_col)
elif mode == 'B':
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, opts={'CATEGORIES': categories})
    g = hg['graph']
elif mode == 'C':
    nodes = list(edges.keys())
    for dests in edges.values():
        for dest in dests:
            nodes.append(dest)
    node_cols = list(set(nodes))
    hg = graphistry.hypergraph(df.sample(num_rows), node_cols, direct=True, opts={'CATEGORIES': categories, 'EDGES': edges})
    g = hg['graph']
  

g.plot()









    



('# links', 1100)
('# events', 220)
('# attrib entities', 221)






    Out[79]:



In [ ]:

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
145	41.230.211.128	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.421730e+09	1.421729e+09
25	122.121.202.157	172.31.14.66	445.0	MS08067 (NetAPI)	8	1.423612e+09	1.423611e+09
75	182.68.160.230	172.31.14.66	445.0	MS08067 (NetAPI)	9	1.417438e+09	1.417436e+09

	attackerIP	victimIP	victimPort	vulnName	count	time(max)	time(min)
70	182.161.224.84	172.31.14.66	139.0	MS08067 (NetAPI)	4	1.419954e+09	1.419952e+09
10	115.115.227.82	172.31.14.66	445.0	MS08067 (NetAPI)	2	1.413569e+09	1.413569e+09
152	46.130.76.13	172.31.14.66	445.0	MS08067 (NetAPI)	7	1.421093e+09	1.421092e+09