Splunk<> Graphistry

Graphistry brings modern visual analytics to event data in Splunk. The full platform is intended for enterprise teams, while this tutorials shares visibility techniques for researchers and hunters.

To use:

  • Read along, start the prebuilt visualizations by clicking on them
  • Plug in your Graphistry API Key & Splunk credentials to use for yourself

Further reading:

0. Configure


In [0]:
#graphistry
GRAPHISTRY = {
#    'server': 'MY.graphistry.com',
#    'protocol': 'https',
#    'key': 'MY_GRAPHISTRY_KEY',
#    'api': 2
}    

#splunk
SPLUNK = {
    'host': 'MY.SPLUNK.com',
    'scheme': 'https',
    'port': 8089,
    'username': 'MY_SPLUNK_USER',
    'password': 'MY_SPLUNK_PWD'   
}

1. Imports


In [0]:
import pandas as pd

Graphistry


In [0]:
!pip install graphistry

import graphistry
#graphistry.register(**GRAPHISTRY)
graphistry.__version__


Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2018.5)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)
Out[0]:
u'0.9.56'

Splunk


In [0]:
!pip install splunk-sdk

import splunklib


Collecting splunk-sdk
  Downloading https://files.pythonhosted.org/packages/d4/bb/408c504f4307fcf4a89909cc85bc912d8529c9ca88200682f94a31a06186/splunk-sdk-1.6.5.tar.gz (103kB)
    100% |████████████████████████████████| 112kB 2.6MB/s 
Building wheels for collected packages: splunk-sdk
  Running setup.py bdist_wheel for splunk-sdk ... - done
  Stored in directory: /root/.cache/pip/wheels/87/83/8f/5f78fbc79322715add8f39ba8adc97511f27297852eb4dc270
Successfully built splunk-sdk
Installing collected packages: splunk-sdk
Successfully installed splunk-sdk-1.6.5

In [0]:
#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results

service = client.connect(**SPLUNK)

In [0]:
def extend(o, override):
  for k in override.keys():
    o[k] = override[k]
  return o

STEP = 10000;                       
def splunkToPandas(qry, overrides={}):
    kwargs_blockingsearch = extend({
        "count": 0,
        "earliest_time": "2010-01-24T07:20:38.000-05:00",
        "latest_time": "now",
        "search_mode": "normal",
        "exec_mode": "blocking"
    }, overrides)
    job = service.jobs.create(qry, **kwargs_blockingsearch)

    print "Search results:\n"
    resultCount = job["resultCount"]
    offset = 0;                         

    print 'results', resultCount
    out = None
    while (offset < int(resultCount)):
        print "fetching:", offset, '-', offset + STEP
        kwargs_paginate = extend(kwargs_blockingsearch,
                                 {"count": STEP,
                                  "offset": offset})

        # Get the search results and display them
        blocksearch_results = job.results(**kwargs_paginate)
        reader = results.ResultsReader(blocksearch_results)
        lst = [x for x in reader]
        df2 = pd.DataFrame(lst)    
        out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
        offset += STEP
    return out

2. Get data


In [0]:
query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})

#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)


print 'results', len(df)

df.sample(5)


Search results:

results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035
Out[0]:
dest_ip src_ip protocol time
4324 10.138.235.111 172.30.0.4 TCP 1505519752
2806 10.0.3.5 10.12.15.152 TCP 1505519767
2630 10.0.4.5 10.12.15.152 TCP 1505519769
20 10.0.4.7 10.6.6.7 TCP 1505519795
866 10.0.2.8 10.17.15.10 TCP 1505519787

3. Visualize!

A) Simple IP<>IP: 1326 nodes, 253K edges


In [0]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()


Out[0]:

B) IP<>IP + srcip<>protocol: 1328 nodes, 506K edges


In [0]:
def make_edges(df, src, dst):
  out = df.copy()
  out['src'] = df[src]
  out['dst'] = df[dst]
  return out



ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')

combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)


Out[0]:
dest_ip src_ip protocol time src dst
6889 10.0.3.5 10.13.77.49 TCP 1505519777 10.13.77.49 TCP
3440 10.0.2.6 10.12.15.152 TCP 1505519761 10.12.15.152 10.0.2.6
6396 10.0.4.5 10.138.235.111 TCP 1505519782 10.138.235.111 TCP
1394 10.0.4.5 10.138.235.111 TCP 1505519782 10.138.235.111 10.0.4.5
5975 10.0.2.7 10.17.15.10 TCP 1505519786 10.17.15.10 TCP
8683 10.0.2.4 10.12.15.152 TCP 1505519759 10.12.15.152 TCP

In [0]:
graphistry.bind(source='src', destination='dst').edges(combined).plot()


Out[0]:

3. All<>All via Hypergraph: 254K nodes, 760K edges


In [0]:
hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()


('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']
Out[0]:

In [0]:

Node Colors


In [0]:
nodes = pd.concat([ 
    df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'), 
    df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ], 
    ignore_index=True).drop_duplicates(['id'])

#see https://labs.graphistry.com/docs/docs/palette.html
col2color = { 
    "src_ip": 90005,
    "dest_ip": 46005   
}

nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1))

nodes_with_color.sample(3)


Out[0]:
id orig_col color
4383 172.30.0.3 src_ip 90005
9403 10.0.0.42 dest_ip 46005
4206 172.30.0.4 src_ip 90005

In [0]:
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()


Out[0]:

In [0]: