This notebook demonstrates a particularily kewl feature of workbench. Quickly and efficiently going from raw data to a Pandas Dataframe.
Here we're using the workbench server to look at a specific case captured by ThreatGlass. The exploited website for this exercise is gold-xxx.net ThreatGlass_Info.
Tools in this Notebook:
More Info:
Run the workbench server (from somewhere, for the demo we're just going to start a local one)
$ workbench_server
In [5]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client(timeout=120)
c.connect("tcp://127.0.0.1:4242")
Out[5]:
In [6]:
# Load in the PCAP file
with open('../data/pcap/gold_xxx.pcap','rb') as f:
pcap_md5 = c.store_sample(f.read(), 'gold_xxx', 'pcap')
In [10]:
# We can also ask workbench for a python dictionary of all the info from this PCAP,
# because sometimes visualization are useful and sometimes organized data is useful.
output = c.work_request('view_pcap_details', pcap_md5)['view_pcap_details']
output
Out[10]:
In [11]:
# Critical Code: Transition from Bro logs to Pandas Dataframes
# This one line of code populates dataframes from the Bro logs,
# streaming client/server generators, zero-copy, efficient, awesome...
import pandas as pd
dataframes = {name:pd.DataFrame(c.stream_sample(bro_log, None)) for name, bro_log in output['bro_logs'].iteritems()}
We're going to use some nice functionality in the Pandas dataframe to look at our network data, specifically we're going to group by host, host-ip, mime_type and uri. The last column represents the aggregated sum of response_body_len.
This type of operation is really just scratching the surface when it comes to dataframes, so quickly and efficiently populating a dataframe is super awesome.
In [20]:
# Look at DNS logs
dataframes['dns_log'][['query','answers','qtype_name']].head(10)
Out[20]:
In [21]:
# Look at Conn logs
dataframes['conn_log'].head(10)
Out[21]:
In [24]:
# Simple Stats with Pandas Dataframe
dataframes['conn_log'][['missed_bytes','orig_ip_bytes','resp_ip_bytes','resp_pkts']].describe()
Out[24]:
In [64]:
# Simple Filtering with Pandas Dataframe
not_80_df = dataframes['conn_log'][dataframes['conn_log']['id.resp_p'] != 80]
not_80_df.head(10)
Out[64]:
In [66]:
# Now we group by host and show the different response mime types for each host
group_host = dataframes['http_log'].groupby(['host','id.resp_h','resp_mime_types','uri'])[['response_body_len']].sum()
group_host
Out[66]:
In [75]:
# Plotting defaults
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 12.0
plt.rcParams['figure.figsize'] = 15.0, 8.0
In [82]:
# Plot hosts and mime-types
plot_df = dataframes['http_log'].groupby(['host','resp_mime_types'])[['response_body_len']].sum().unstack()
plot_df['response_body_len'].plot(kind='bar', stacked=True)
plt.xlabel('Domain')
plt.ylabel('Response Bytes')
plt.xticks(rotation=45, ha='right')
Out[82]:
Well for this short notebook we used 6-7 lines of python to go from PCAP file to a Pandas Dataframe and then we did a bunch of kewl stuff using the Dataframe. We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: