In [2]:
import graphistry
#graphistry.register(key='MY_API_KEY', server='labs.graphistry.com')
In [3]:
import pandas as pd
df = pd.read_csv('./data/honeypot.csv')
df.sample(3)
Out[3]:
Demo graph schema:
attackerIP -> victimIPattackerIP -> victimIP edges
In [4]:
g = graphistry.edges(df).bind(source='attackerIP', destination='victimIP')
In [5]:
g.plot()
Out[5]:
To quickly understand correlations across all your table's values, hypergraph is a convenient transformation.
A hypergraph will link values occurring in the sample table row to one another. By default, the hypergraph plot does not link values directly to one another, but indirects through a node representing the row.
Demo graph schema:
To allow nodes from the attackerIP and victimIP columns to merge together when they have the same value, instead of generating distinct nodes such as attackerIP::127.0.0.1 and victimIP::127.0.0.1, we combine them into one category, ip. The result is one node ip::127.0.0.1.
In [6]:
hg1 = graphistry.hypergraph(
df,
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
opts={
'CATAGORIES': {
'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
}
})
hg1_g = hg1['graph']
hg1_g.plot()
Out[6]:
For more advanced hypergraph control, we can skip the row node, and control which edges are generated, by enabling direct.
Demo graph schema:
In [7]:
hg2 = graphistry.hypergraph(
df,
entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
direct=True,
opts={
'EDGES': { ### OPTIONAL, DEFAULTS TO CREATING ALL-TO-ALL
'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
'victimPort': ['victimIP'],
'vulnName': ['victimIP']
},
'CATAGORIES': {
'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
}
})
hg2_g = hg2['graph']
hg2_g.plot()
Out[7]:
By default, you do not need to explictly create a table of nodes. However, if you do provide one, you can then drive visual styles based on node attributes.
Demo schema:
In [12]:
# 1. Create nodes, tag type as `attacker`
targets_df = df[['victimIP']].drop_duplicates().rename(columns={'victimIP': 'node_id'})\
.assign(type='victim')
attackers_df = df.groupby(['attackerIP']).agg({'count': {'attacks': 'sum'}}).reset_index()
attackers_df.columns = attackers_df.columns.get_level_values(0)
attackers_df = attackers_df.rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
attackers_df
nodes_df = pd.concat([targets_df, attackers_df], ignore_index=True)
nodes_df.sample(3)
Out[12]:
In [9]:
# 2. Plot nodes, and color based on type `attacker`
g2 = g.nodes(nodes_df).bind(node='node_id')
#optional
nodes_df['my_color'] = nodes_df['type'].apply(lambda v: 0 if v == 'attacker' else 2)
nodes_df = nodes_df.fillna(value={'count': (nodes_df['count'].max() + nodes_df['count'].min()) / 2.0 })
g2 = g2.bind(point_size = 'count', point_color='my_color')
g2 = g2.settings(url_params={'workbook': 'my_analysis_wb_1'})
g2.plot()
Out[9]:
In [10]:
nodes = hg2_g._nodes
types = list(nodes['type'].unique())
nodes_with_colors = nodes.assign(color=nodes.type.apply(lambda t: types.index(t)))
nodes_with_colors.sample(3)
Out[10]:
In [11]:
hg2_g\
.nodes(nodes_with_colors).bind(point_color='color')\
.settings(url_params={'workbook': 'my_analysis_wb_2'})\
.plot()
Out[11]:
In [ ]:
In [ ]: