Splunk<> Graphistry

Graphistry brings modern visual analytics to event data in Splunk. The full platform is intended for enterprise teams, while this tutorials shares visibility techniques for researchers and hunters.

To use:

Read along, start the prebuilt visualizations by clicking on them
Plug in your Graphistry API Key & Splunk credentials to use for yourself

0. Configure



In [0]:

    
#graphistry
GRAPHISTRY = {
#    'server': 'MY.graphistry.com',
#    'protocol': 'https',
#    'key': 'MY_GRAPHISTRY_KEY',
#    'api': 2
}    

#splunk
SPLUNK = {
    'host': 'MY.SPLUNK.com',
    'scheme': 'https',
    'port': 8089,
    'username': 'MY_SPLUNK_USER',
    'password': 'MY_SPLUNK_PWD'   
}

1. Imports



In [0]:

    
import pandas as pd

Graphistry



In [0]:

    
!pip install graphistry

import graphistry
#graphistry.register(**GRAPHISTRY)
graphistry.__version__









    



Requirement already satisfied: graphistry in /usr/local/lib/python2.7/dist-packages (0.9.56)
Requirement already satisfied: pandas>=0.17.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.22.0)
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from graphistry) (1.14.6)
Requirement already satisfied: requests in /usr/local/lib/python2.7/dist-packages (from graphistry) (2.18.4)
Requirement already satisfied: future>=0.15.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (0.16.0)
Requirement already satisfied: protobuf>=2.6.0 in /usr/local/lib/python2.7/dist-packages (from graphistry) (3.6.1)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2018.5)
Requirement already satisfied: python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas>=0.17.0->graphistry) (2.5.3)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (2018.8.24)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python2.7/dist-packages (from requests->graphistry) (3.0.4)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (1.11.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=2.6.0->graphistry) (39.1.0)






    Out[0]:





u'0.9.56'

Splunk



In [0]:

    
!pip install splunk-sdk

import splunklib









    



Collecting splunk-sdk
  Downloading https://files.pythonhosted.org/packages/d4/bb/408c504f4307fcf4a89909cc85bc912d8529c9ca88200682f94a31a06186/splunk-sdk-1.6.5.tar.gz (103kB)
    100% |████████████████████████████████| 112kB 2.6MB/s 
Building wheels for collected packages: splunk-sdk
  Running setup.py bdist_wheel for splunk-sdk ... - done
  Stored in directory: /root/.cache/pip/wheels/87/83/8f/5f78fbc79322715add8f39ba8adc97511f27297852eb4dc270
Successfully built splunk-sdk
Installing collected packages: splunk-sdk
Successfully installed splunk-sdk-1.6.5



In [0]:

    
#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results

service = client.connect(**SPLUNK)



In [0]:

    
def extend(o, override):
  for k in override.keys():
    o[k] = override[k]
  return o

STEP = 10000;                       
def splunkToPandas(qry, overrides={}):
    kwargs_blockingsearch = extend({
        "count": 0,
        "earliest_time": "2010-01-24T07:20:38.000-05:00",
        "latest_time": "now",
        "search_mode": "normal",
        "exec_mode": "blocking"
    }, overrides)
    job = service.jobs.create(qry, **kwargs_blockingsearch)

    print "Search results:\n"
    resultCount = job["resultCount"]
    offset = 0;                         

    print 'results', resultCount
    out = None
    while (offset < int(resultCount)):
        print "fetching:", offset, '-', offset + STEP
        kwargs_paginate = extend(kwargs_blockingsearch,
                                 {"count": STEP,
                                  "offset": offset})

        # Get the search results and display them
        blocksearch_results = job.results(**kwargs_paginate)
        reader = results.ResultsReader(blocksearch_results)
        lst = [x for x in reader]
        df2 = pd.DataFrame(lst)    
        out = df2 if type(out) == type(None) else pd.concat([out, df2], ignore_index=True)
        offset += STEP
    return out

2. Get data



In [0]:

    
query = 'search index="vast" srcip=* destip=* | rename destip -> dest_ip, srcip -> src_ip | fields dest_ip _time src_ip protocol | eval time=_time | fields - _* '
%time df = splunkToPandas(query, {"sample_ratio": 1000})

#df = splunkToPandasAll('search index="vast" | head 10')
#df = pd.concat([ splunkToPandas('search index="vast" | head 10'), splunkToPandas('search index="vast" | head 10') ], ignore_index=True)


print 'results', len(df)

df.sample(5)









    



Search results:

results 5035
fetching: 0 - 10000
CPU times: user 4.95 s, sys: 13.3 ms, total: 4.96 s
Wall time: 7.92 s
results 5035






    Out[0]:







  
    
      
      dest_ip
      src_ip
      protocol
      time
    
  
  
    
      4324
      10.138.235.111
      172.30.0.4
      TCP
      1505519752
    
    
      2806
      10.0.3.5
      10.12.15.152
      TCP
      1505519767
    
    
      2630
      10.0.4.5
      10.12.15.152
      TCP
      1505519769
    
    
      20
      10.0.4.7
      10.6.6.7
      TCP
      1505519795
    
    
      866
      10.0.2.8
      10.17.15.10
      TCP
      1505519787

3. Visualize!

A) Simple IP<>IP: 1326 nodes, 253K edges



In [0]:

    
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).plot()









    Out[0]:

B) IP<>IP + srcip<>protocol: 1328 nodes, 506K edges



In [0]:

    
def make_edges(df, src, dst):
  out = df.copy()
  out['src'] = df[src]
  out['dst'] = df[dst]
  return out



ip2ip = make_edges(df, 'src_ip', 'dest_ip')
srcip2protocol = make_edges(df, 'src_ip', 'protocol')

combined = pd.concat([ip2ip, srcip2protocol], ignore_index=True)
combined.sample(6)









    Out[0]:







  
    
      
      dest_ip
      src_ip
      protocol
      time
      src
      dst
    
  
  
    
      6889
      10.0.3.5
      10.13.77.49
      TCP
      1505519777
      10.13.77.49
      TCP
    
    
      3440
      10.0.2.6
      10.12.15.152
      TCP
      1505519761
      10.12.15.152
      10.0.2.6
    
    
      6396
      10.0.4.5
      10.138.235.111
      TCP
      1505519782
      10.138.235.111
      TCP
    
    
      1394
      10.0.4.5
      10.138.235.111
      TCP
      1505519782
      10.138.235.111
      10.0.4.5
    
    
      5975
      10.0.2.7
      10.17.15.10
      TCP
      1505519786
      10.17.15.10
      TCP
    
    
      8683
      10.0.2.4
      10.12.15.152
      TCP
      1505519759
      10.12.15.152
      TCP



In [0]:

    
graphistry.bind(source='src', destination='dst').edges(combined).plot()









    Out[0]:

3. All<>All via Hypergraph: 254K nodes, 760K edges



In [0]:

    
hg = graphistry.hypergraph(df, entity_types=[ 'src_ip', 'dest_ip', 'protocol'] )
print hg.keys()
hg['graph'].plot()









    



('# links', 15105)
('# event entities', 5035)
('# attrib entities', 170)
['entities', 'nodes', 'edges', 'events', 'graph']






    Out[0]:



In [0]:

Node Colors



In [0]:

    
nodes = pd.concat([ 
    df[['src_ip']].rename(columns={'src_ip': 'id'}).assign(orig_col='src_ip'), 
    df[['dest_ip']].rename(columns={'dest_ip': 'id'}).assign(orig_col='dest_ip') ], 
    ignore_index=True).drop_duplicates(['id'])

#see https://labs.graphistry.com/docs/docs/palette.html
col2color = { 
    "src_ip": 90005,
    "dest_ip": 46005   
}

nodes_with_color = nodes.assign(color=nodes.apply(lambda row: col2color[ row['orig_col'] ], axis=1))

nodes_with_color.sample(3)



In [0]:

    
graphistry.bind(source='src_ip', destination='dest_ip').edges(df).nodes(nodes_with_color).bind(node='id', point_color='color').plot()









    Out[0]:



In [0]:

	dest_ip	src_ip	protocol	time
4324	10.138.235.111	172.30.0.4	TCP	1505519752
2806	10.0.3.5	10.12.15.152	TCP	1505519767
2630	10.0.4.5	10.12.15.152	TCP	1505519769
20	10.0.4.7	10.6.6.7	TCP	1505519795
866	10.0.2.8	10.17.15.10	TCP	1505519787

	dest_ip	src_ip	protocol	time	src	dst
6889	10.0.3.5	10.13.77.49	TCP	1505519777	10.13.77.49	TCP
3440	10.0.2.6	10.12.15.152	TCP	1505519761	10.12.15.152	10.0.2.6
6396	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	TCP
1394	10.0.4.5	10.138.235.111	TCP	1505519782	10.138.235.111	10.0.4.5
5975	10.0.2.7	10.17.15.10	TCP	1505519786	10.17.15.10	TCP
8683	10.0.2.4	10.12.15.152	TCP	1505519759	10.12.15.152	TCP

	id	orig_col	color
4383	172.30.0.3	src_ip	90005
9403	10.0.0.42	dest_ip	46005
4206	172.30.0.4	src_ip	90005