Context: The company's Security Information and Event Management (SIEM) system is giving some DNS alerts coming from 'that' guy's computer. They look like that might be DGA (Dynamic Generation Algorithm) based domains so we pull some pcaps to try and quickly find out what's going on.
Note: This notebook was inspired by the data_hacking notebook called DriveBy PCAP Analysis. Here we're leveraging the workbench server and focusing the notebook at looking at a specific case captured by ThreatGlass. The exploited website for this exercise is kitchboss.com.au ThreatGlass_Info_for_Kitchenboss (arbitrarily choosen).
Tools in this Notebook:
Workbench can be setup to utilize several indexers:
Neo4j also incorporates Lucene based indexing so not only can we capture a rich set of relationships between our data entities but searches and queries are super quick.
Run the workbench server (from somewhere, for the demo we're just going to start a local one)
$ workbench_server
In [10]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")
Out[10]:
In [11]:
# Load in the PCAP file
filename = '../data/pcap/kitchen_boss.pcap'
with open(filename,'rb') as f:
pcap_md5 = c.store_sample(f.read(), filename, 'pcap')
Workbench makes running Bro super easy, it manages the PCAPs and the resulting Bro logs. Perhaps most importantly it allows us to pull back specific logs with super awesome client/server generators for the data that we want to analyze.
Note: we only have one in this case but running sets of PCAPs through Bro is well supported, please see our Batches_and_Sets notebook for more infomation.
In [13]:
# Run the Bro Network Security Monitor on the PCAP (or set of PCAPS) we just loaded.
# We could run several requests here... 'pcap_bro','view_pcap' or 'view_pcap_details',
# workbench is super granular and it's easy to try them all and add your own as well
output = c.work_request('view_pcap_details', pcap_md5)['view_pcap_details']
output
Out[13]:
In [8]:
# We'll grab the md5s for those files and do some kewl stuff with them later
file_md5s = list(set([item['md5'] for item in output['extracted_files']]))
pe_md5 = '4410133f571476f2e76e29e61767b557'
file_md5s
Out[8]:
In [9]:
# Grab the Bro logs that we want
dns_log = c.stream_sample(output['bro_logs']['dns_log'])
http_log = c.stream_sample(output['bro_logs']['http_log'])
files_log = c.stream_sample(output['bro_logs']['files_log'])
dns_log
Out[9]:
In [10]:
import pandas as pd
# Okay take the generators returned by stream_sample and efficiently create dataframes
# LIKE BUTTER I TELL YOU!
dns_df = pd.DataFrame(dns_log)
http_df = pd.DataFrame(http_log)
files_df = pd.DataFrame(files_log)
files_df.head()
Out[10]:
Image below shows the Workbench database, each worker stores data in a separate collection. The data is transparent, organized and accessible
Given that this exercise started because the SIEM was flagging DNS traffic, we start there. Now that our Bro log data has been streamed into a Pandas Dataframe we can do all kinds of wonderful things. See Pandas to behold the awesome.
In [11]:
dns_df.head()
Out[11]:
In [61]:
dns_df[['query','answers','qtype_name']]
Out[61]:
We want to look at the HTTP data to see what kind of data was transferred from the domains of interest. Now that our Bro log data has been streamed into a Pandas Dataframe we can do all kinds of wonderful things. See Pandas to behold the awesome.
In [27]:
# Now we group by host and show the different response mime types for each host
group_host = http_df.groupby(['host','uid','resp_mime_types','uri'])[['response_body_len']].sum()
group_host.head(10)
Out[27]:
Okay last but certainly not least we want to do a deep dive into the files that were downloaded to the computer, so we pull out the list of md5 that we saved earlier in the notebook from the workbench 'view_pcap_details' output. Note that the batch request below also returns a client/server generator (see Generator_Pipelines Notebook for more information).
Note: Workbench is in desperate need of workers for PDF, SWF, and JAR files. If you'd like to contribute please contact briford@supercowpowers.com :)
For now we'll just look at some Virus Total results and take a quick peek at the SWF and PE files.
In [28]:
# Get Meta-data for each of the extracted files from the PCAP
file_views = c.batch_work_request('meta_deep',{'md5_list':file_md5s})
[view for view in file_views]
Out[28]:
In [29]:
# Virus Total Queries (as of 4-20-2014)
vt_output = c.batch_work_request('vt_query', {'md5_list':file_md5s})
[output for output in vt_output]
Out[29]:
In [30]:
# Well VirusTotal only found two of the files (SWF and JAR). The SWF has
# zero positives (we're going to take that with a grain of salt). The PDF
# and PE files don't even show up. So we'll take a closer look at the SWF
# and PE file with some of the workers in workbench.
swf_view = c.work_request('swf_meta','16cf037b8c8caad6759afc8c309de0f9')
swf_view
Out[30]:
In [31]:
pe_view = c.work_request('pe_indicators', '4410133f571476f2e76e29e61767b557')
pe_view
Out[31]:
Okay we're now pretty sure that at least two of the four files are bad, but exactly where did the files come from within the context of the network information that we can extract from the PCAP file. The image on the right shows all of the relevant information that we gather using the 'pcap_graph' worker called below. The blue nodes are the four files (a close-up image is given below). All graph images were pulled by simply going to the Neo4j graphical interface at http://localhost:7474/browser/.
In [37]:
graph = c.work_request('pcap_graph', pcap_md5)
graph
Out[37]:
This graph image below was generated by going to http://localhost:7474/browser and executing this query
match (n)-[r]-() return n,r
This graph image below which focuses on the files themselves and the path they took to get to our infected host (orange node in the middle) was generated by going to http://localhost:7474/browser and executing this query
match (s:file),(t{name:'192.168.22.10'}), p=shortestPath((s)--(t)) return p
Looking at the file graph above, the SWF even looks more suspicious (even though VT has 0 hits out of 51 as of 4-20-2014). So we want to take a look at the timing of the file downloads and DNS requests.
But for now we're going to look at timing data which is admittedly a bit more circumstantial.
In [22]:
# Lets look at the timing of the dns requests and the file downloads
# Make a new column in both dataframe with a proper datetime stamp
dns_df['time'] = pd.to_datetime(dns_df['ts'], unit='s')
files_df['time'] = pd.to_datetime(files_df['ts'], unit='s')
# Now make time the new index for both dataframes
dns_df.set_index(['time'], inplace=True)
files_df.set_index(['time'], inplace=True)
In [54]:
interesting_files = files_df[files_df['md5'].isin(file_md5s)]
In [62]:
domains = ['kitchenboss.com.au','www.kitchenboss.com.au','p22x62n0yr63872e-qh6.focondteavrt.ru',
'2496128308-6.focondteavrt.ru','92.194.4.142.in-addr.arpa']
interesting_dns = dns_df[dns_df['query'].isin(domains)]
In [64]:
all_time = pd.concat([interesting_dns[['query','answers','qtype_name']], interesting_files[['md5','mime_type','tx_hosts']]])
all_time.sort_index(inplace=True)
all_time
Out[64]:
Well that's it for this notebook. We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: