This notebook demonstrates a particularily kewl feature of workbench. Quickly and efficiently going from raw data to a Pandas Dataframe.
Here we're using the workbench server to look at a forensic memory image that workbench processes with the Rekall python module https://github.com/google/rekall. Any thing that is kewl in this notebook is because of Rekall, anything that is lame is probably Workbench (our Rekall integration is days old).
Super Big Thanks
Tools in this Notebook:
More Info:
See PCAP_to_Graph for a short notebook on turning a PCAP into a Neo4j graph.
See Workbench Demo Notebook for a lot more info on using workbench.
Run the workbench server (from somewhere, for the demo we're just going to start a local one)
$ workbench_server
In [1]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client(timeout=120)
c.connect("tcp://127.0.0.1:4242")
Out[1]:
In [2]:
# Load in the Memory Image file
with open('../data/mem_images/exemplar4.vmem','rb') as f:
mem_md5 = c.store_sample(f.read(), 'exemplar4.vmem', 'mem')
In [3]:
# Lets look at the workers that we might invoke
print c.help_workers()
In [4]:
# Now we invoke the mem_meta worker (all memory workers start with mem_)
output = c.work_request('mem_meta', mem_md5)['mem_meta']
output
Out[4]:
In [5]:
# Now we look at the pslist worker (which is just a big blog of python data)
output = c.work_request('mem_pslist', mem_md5)['mem_pslist']
str(output)[:50]
Out[5]:
We're going to use some nice functionality in the Pandas dataframe to look at our memory image data, specifically we're going to group by Parent Process IDs (PPIDs) and see which processes came from which parents.
This type of operation is really just scratching the surface when it comes to dataframes, so quickly and efficiently populating a dataframe is super awesome.
In [6]:
# Okay that didn't seem very useful, just a gigantic ugly blob of python.
# Lets push the pslist info section into a pandas dataframe
import pandas as pd
df = pd.DataFrame(output['sections']['Info'])
df.head()
Out[6]:
In [7]:
# Now lets use the Pandas groupby methods
df['count'] = 1
df.groupby(['PPID','Name','PID']).sum()
Out[7]:
In [8]:
# Now we look at the connscan worker
output = c.work_request('mem_connscan', mem_md5)['mem_connscan']
output
Out[8]:
In [9]:
# Same as above we'll throw it into a Dataframe and do a group by
conn_df = pd.DataFrame(output['sections']['Info'])
conn_df['count'] = 1
conn_df.groupby(['Pid','Remote Address']).sum()
Out[9]:
In [10]:
# Now lets look at the DLL for the various processes
output = c.work_request('mem_dlllist', mem_md5)['mem_dlllist']
In [11]:
# Each process has it's own section
output['sections'].keys()
Out[11]:
In [12]:
# Lets look at the process of interest
dll_df = pd.DataFrame(output['sections']['svhost_exe pid: 1936'])
dll_df
Out[12]:
In [13]:
# Dump PE Files from all the processes
output = c.work_request('mem_procdump', mem_md5)['mem_procdump']
output
Out[13]:
In [14]:
# Okay nice, now let look deeper out the files with Workbench
# First the file that we're pretty sure is naughty
c.work_request('view', '0374a3a1689771e93432f8803cc2a09c')
Out[14]:
In [20]:
# Now smss_exe_516.exe
c.work_request('view', 'a11279e7f15a0a9f342e0809d3460e26')
Out[20]:
In [16]:
# Virus Total Query (on svhost.exe)
c.work_request('vt_query', '0374a3a1689771e93432f8803cc2a09c')
Out[16]:
In [17]:
# Virus Total Query (on smss_exe_516.exe)
c.work_request('vt_query', 'a11279e7f15a0a9f342e0809d3460e26')
Out[17]:
Well for this notebook we went from a forensic memory image to a Pandas Dataframe with the Power of Rekall (http://www.rekall-forensic.com). We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: