Workbench: Adding a worker

We believe that transparency, flexibility and on-site customization are critical to an agile security framework. In this notebook we illustrate how easy it is to add workers to Workbench and more importantly how to codify your teams domain knowledge for on-site customization and agility.

Tools in this Notebook:

More Info:

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server

In [1]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")


Out[1]:
[None]

So I'm confused what am I suppose to do with workbench?


Workbench is often confusing for new users (we're trying to work on that). Please see our github repository https://github.com/SuperCowPowers/workbench for the latest documentation and notebooks examples (the notebook examples can really help). New users can start by typing **c.help()** after they connect to workbench.


In [2]:
# I forgot what stuff I can do with workbench
print c.help()


Welcome to Workbench: Here's a list of help commands:
	 - Run c.help_basic() for beginner help
	 - Run c.help_commands() for command help
	 - Run c.help_workers() for a list of workers
	 - Run c.help_advanced() for advanced help

See https://github.com/SuperCowPowers/workbench for more information

In [3]:
# Now lets get infomation about the dynamically loaded workers (your site may have many more!)
# Next to each worker name is the list of dependences that worker has declared
print c.help_workers()


Workbench Workers:
	json_meta ['sample', 'meta']
	log_meta ['sample', 'meta']
	meta ['sample']
	meta_deep ['sample', 'meta']
	my_meta ['sample', 'meta']
	pcap_bro ['sample']
	pcap_graph ['pcap_bro']
	pcap_http_graph ['pcap_bro']
	pe_classifier ['pe_features', 'pe_indicators']
	pe_deep_sim ['meta_deep']
	pe_features ['sample']
	pe_indicators ['sample']
	pe_peid ['sample']
	strings ['sample']
	swf_meta ['sample', 'meta']
	unzip ['sample']
	url ['strings']
	view ['meta']
	view_customer ['meta']
	view_log_meta ['log_meta']
	view_meta ['meta']
	view_pcap ['pcap_bro']
	view_pcap_details ['view_pcap']
	view_pdf ['meta', 'strings']
	view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier']
	view_zip ['meta', 'unzip']
	vt_query ['meta']
	yara_sigs ['sample']

In [4]:
# Lets gets the infomation about the meta worker
print c.help_worker('meta')


 Worker: meta ['sample']
	 This worker computes meta data for any file type. 

In [5]:
# Okay lets load up a file, and see what this silly meta thing gives back
filename = '../data/pe/bad/9e42ff1e6f75ae3e60b24e48367c8f26'
with open(filename,'rb') as f:
    my_md5 = c.store_sample(f.read(), filename, 'exe')
output = c.work_request('meta', my_md5)
output


Out[5]:
{'meta': {'customer': 'Huge Inc',
  'encoding': 'binary',
  'file_size': 51200,
  'file_type': 'PE32 executable (console) Intel 80386, for MS Windows',
  'filename': '../../data/pe/bad/9e42ff1e6f75ae3e60b24e48367c8f26',
  'import_time': '2014-06-21T23:51:49.122000Z',
  'length': 51200,
  'md5': '9e42ff1e6f75ae3e60b24e48367c8f26',
  'mime_type': 'application/x-dosexec',
  'type_tag': 'exe'}}

In [6]:
# Pfff... my meta data worker will be WAY better!
# Err.. okay I'll just copy the meta worker file and see what happens.
# Note: obviously you'd just go to the shell and cp meta.py my_meta.py
#       but since we're in IPython...
%cd /Users/briford/work/workbench/server/workers
%cp meta.py my_meta.py
%cd /Users/briford/work/workbench/notebooks


/Users/briford/work/workbench/server/workers

In [7]:
# Okay just cause I'm feeling crazy lets look at help_workers again
print c.help_workers()


Workbench Workers:
	json_meta ['sample', 'meta']
	log_meta ['sample', 'meta']
	meta ['sample']
	meta_deep ['sample', 'meta']
	my_meta ['sample', 'meta']
	pcap_bro ['sample']
	pcap_graph ['pcap_bro']
	pcap_http_graph ['pcap_bro']
	pe_classifier ['pe_features', 'pe_indicators']
	pe_deep_sim ['meta_deep']
	pe_features ['sample']
	pe_indicators ['sample']
	pe_peid ['sample']
	strings ['sample']
	swf_meta ['sample', 'meta']
	unzip ['sample']
	url ['strings']
	view ['meta']
	view_customer ['meta']
	view_log_meta ['log_meta']
	view_meta ['meta']
	view_pcap ['pcap_bro']
	view_pcap_details ['view_pcap']
	view_pdf ['meta', 'strings']
	view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier']
	view_zip ['meta', 'unzip']
	vt_query ['meta']
	yara_sigs ['sample']

In [8]:
# My mind must be playing tricks, lets see if I can run my worker
output = c.work_request('my_meta', my_md5)
output


Out[8]:
{'my_meta': {'entropy': 7.250194413754419,
  'md5': '9e42ff1e6f75ae3e60b24e48367c8f26',
  'sha1': 'e0a6d12499ed16b33c71ddec42ca8aa7bcecaaf9',
  'sha256': '88eea1726a149ac5c08b74547a05177398757f328c0faf821b822789d76863b7',
  'ssdeep': '1536:pTrBy35F8qNwtqKiE/n5zTY+LK9lqB9HtZeV0D:hrEpF8q6qKiE/npi9UDHtZeV4'}}

Errr... wha?... But I'm hitting the Workbench server!? So WTF just happened?

Workbench has a dynamic plugin loader/validator, when a new file is detected in the worker directory the following steps happen automatically:

  • The plugin goes through several validation checks
  • If the validation succeeds the plugin is dynamically loaded
  • Your new plugins in now running on the local server
  • Also all of the CI build/test/coverage/docs now include your plug in!

Okay I'm going to call BS... lets run the tests and see what happens!


In [12]:
# I've been around software... testing, server integration, test coverage all that stuff is
# a complete PITA, heck I spend half my time doing that.. there's no way all that just happened.
!./runtests


<<< Note: Most of these tests require a local server running >>>

............................
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
__init__   NoSource: No source for code: '/Users/briford/work/workbench/server/workers/__init__.py'
json_meta                    33      2    94%   22, 57
log_meta                     30      1    97%   49
meta                         40      1    98%   61
meta_deep                    38      1    97%   58
my_meta                      40      1    98%   61
pcap_bro                    122      9    93%   23-26, 115-116, 119, 121, 123, 197
pcap_graph                  112      6    95%   90, 117, 177, 181, 188, 231
pcap_http_graph              90      5    94%   65, 121, 125, 134, 177
pe_classifier                30      1    97%   54
pe_deep_sim                  39      1    97%   64
pe_features                 208     21    90%   98-100, 148-149, 167-171, 179, 200, 222, 233, 244, 283, 290, 301, 304-305, 349
pe_indicators               240     23    90%   52-53, 80, 103, 151, 159, 163, 173, 181, 193, 205, 217, 250, 260, 283, 301, 331, 349, 358, 386, 396-397, 439
pe_peid                      38      3    92%   23-24, 64
strings                      27      1    96%   44
swf_meta                     23      1    96%   44
unzip                        38      1    97%   60
url                          28      1    96%   46
view                         51      3    94%   25, 27, 86
view_customer                23      1    96%   42
view_log_meta                23      1    96%   40
view_meta                    23      1    96%   41
view_pcap                    31      1    97%   57
view_pcap_details            32      1    97%   78
view_pdf                     28      2    93%   12, 46
view_pe                      39      2    95%   13, 64
view_zip                     36      2    94%   17, 61
vt_query                     55      7    87%   18, 22, 36-37, 50-51, 98
workbench_keys/__init__       1      0   100%   
yara_sigs                    42      2    95%   20, 75
-------------------------------------------------------
TOTAL                      1560    102    93%   
----------------------------------------------------------------------
Ran 28 tests in 29.193s

OK

My new plugin has 98% test coverage...

Alright, are we feeling awesome yet?

Welcome to a higher dimensional plane of awesomeness. Now that we're 'locked and loaded' lets focus on the the fun stuff, making the new plugin rock my co-workers minds!

Okay my_meta.py is just a copy of meta.py (lets look at it)


We use our favorite editor, make changes to my_meta.py and hit save.


Alright, now it's time to get fabulous!

The server plugin manager has a file system watchdog that has detected changes in your plugin.

The manager 'looks' at the new plugin and as long as it passes all the validation tests, it's automatically reloaded!!!


In [36]:
# You sir are on some sort of needle drug... so you're saying that all the new functionality
# that I just typed in is already available on the server? Help too? 
print c.help_worker('my_meta')
output = c.work_request('my_meta', my_md5)
output


 Worker: my_meta ['sample']
	This worker computes my more super awesome meta-data
        Seriously:
               1) All the sha hashes
               2) SSDeep (oh yeah)
               3) Entropy (science!)
    
Out[36]:
{'my_meta': {'entropy': 2.440069216444288,
  'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3',
  'sha1': '96e85768a12b2f319f2a4f0c048460e1b73aa573',
  'sha256': '4ecf79302ba0439f62e15d0526a297975e6bb32ea25c8c70a608916a609e5a9c',
  'ssdeep': '192:a8jJIFYrq9ATskBTp2jLDL3P1oynldvSo71nF:oFpNnnX1Tn'}}

Okay that was spiffy I'll give you that

But I want my new worker to have access to the output of other workers, like I want to look the mime_type that the 'meta' worker has and then do some cool stuff based on that.

Lets look at the changes we made to my_meta.py

We changed the dependency line and added the 'meta' worker (could be ANY worker) We also pulled the data from the meta worker and added the line about packed file.


In [16]:
# Run my new code
output = c.work_request('my_meta', my_md5)
output


Out[16]:
{'my_meta': {'entropy': 7.250194413754419,
  'md5': '9e42ff1e6f75ae3e60b24e48367c8f26',
  'packed': 'probably',
  'sha1': 'e0a6d12499ed16b33c71ddec42ca8aa7bcecaaf9',
  'sha256': '88eea1726a149ac5c08b74547a05177398757f328c0faf821b822789d76863b7',
  'ssdeep': '1536:pTrBy35F8qNwtqKiE/n5zTY+LK9lqB9HtZeV0D:hrEpF8q6qKiE/npi9UDHtZeV4'}}

Let enumerate all the neat things that just happened

  • I changed my worker, the plugin manager saw the change, validated my worker and dynamically loaded it
  • Although Workbench utilizes caching (no work is ever recomputed unless it needs to be) in this case it recognized that the 'modification time' of the worker was newer than the work results so it recomputes the results.
  • Lets look at my new plugin output in MongoDB


In [17]:
# So lets do a more complicated worker just for hammering home what's happening..
# Workbench uses Directed Acyclic Graphs to pipeline workers together, it recursively
# satisfies dependencies with agressive caching, shallow memory copies and gevent based
# co-operative processes on the server side. Basicaly six slices of awesome...
output = c.work_request('view', my_md5)
output


Out[17]:
{'view': {'md5': '9e42ff1e6f75ae3e60b24e48367c8f26',
  'view_pe': {'classification': 'Evil!',
   'customer': 'Huge Inc',
   'disass': 'plugin_failed',
   'encoding': 'binary',
   'file_size': 51200,
   'file_type': 'PE32 executable (console) Intel 80386, for MS Windows',
   'filename': '../../data/pe/bad/9e42ff1e6f75ae3e60b24e48367c8f26',
   'import_time': '2014-06-21T23:51:49.122000Z',
   'indicators': [{'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 2. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'attributes': ['queryperformancecounter', 'gettickcount'],
     'category': 'ANTI_DEBUG',
     'description': 'Imported symbols related to anti-debugging',
     'severity': 3},
    {'category': 'MALFORMED',
     'description': 'Checksum of Zero',
     'severity': 1},
    {'category': 'MALFORMED',
     'description': 'Reported Checksum does not match actual checksum',
     'severity': 2},
    {'category': 'MALFORMED',
     'description': 'Image size does not match reported size',
     'severity': 3},
    {'attributes': ['lsicbkg'],
     'category': 'MALFORMED',
     'description': 'Section(s) with a non-standard name, tamper indication',
     'severity': 3},
    {'attributes': ['getmodulehandlea'],
     'category': 'PROCESS_MANIPULATION',
     'description': 'Imported symbols related to process manipulation/injection',
     'severity': 3},
    {'attributes': ['getsystemtimeasfiletime'],
     'category': 'PROCESS_SPAWN',
     'description': 'Imported symbols related to spawning a new process',
     'severity': 2},
    {'attributes': ['findfirstfilew', 'findnextfilew'],
     'category': 'SYSTEM_PROBE',
     'description': 'Imported symbols related to probing the system',
     'severity': 2}],
   'length': 51200,
   'md5': '9e42ff1e6f75ae3e60b24e48367c8f26',
   'mime_type': 'application/x-dosexec',
   'peid_Matches': ['Microsoft Visual C++ v7.0'],
   'type_tag': 'exe'}}}

View is a magic unicorn, it pulls different DAGs based on mime-type



In [21]:
# Yeah but I want to run my new worker on a LOTS of samples and I
# want to put the results into a Pandas dataframes and run some 
# statistics, and do some Machine Learning and kewl plots!

# This is just throwing files at Workbench (could be pdfs, swfs, pcap, memory_images, etc)
import os
file_list = [os.path.join('../data/pe/bad', child) for child in os.listdir('../data/pe/bad')]
working_set = []
for filename in file_list:
    with open(filename,'rb') as f:
        md5 = c.store_sample(f.read(), filename, 'exe')
        working_set.append(md5)

In [27]:
# Now just run a batch request against all the sample we just threw in
results = c.batch_work_request('my_meta', {'md5_list':working_set})
results


Out[27]:
<generator object iterator at 0x10684e5f0>

In [28]:
# Now toss that client-server generator into a dataframe (zero-copy and efficient)
import pandas as pd
df_meta = pd.DataFrame(results)
df_meta.head()


Out[28]:
entropy md5 packed sha1 sha256 ssdeep
0 7.894680 033d91aae8ad29ed9fbb858179271232 probably 83ab10907b254752f312c89125957f10d35cb9d4 eb107c004e6e1bbd3b32ad7961661bbe28a577b0cb5dac... 1536:h6+LbfPbI5dzmJu9Tgj5aOItvEqRCHW9pjVrs2ryr...
1 2.440069 0cb9aa6fb9c4aa3afad7a303e21ac0f3 probably not 96e85768a12b2f319f2a4f0c048460e1b73aa573 4ecf79302ba0439f62e15d0526a297975e6bb32ea25c8c... 192:a8jJIFYrq9ATskBTp2jLDL3P1oynldvSo71nF:oFpN...
2 5.125292 0e882ec9b485979ea84c7843d41ba36f probably not 12fb0a1b7d9c2b2a41f4da9ce5bbfb140fb16939 616cf9e729c883d979212eb55178b7aac80dd9f58cb449... 768:5HyLMqtEM1Htz8kDmP9l+nZZYp41oj7EZmJxl/N9j6...
3 6.303055 0e8b030fb6ae48ffd29e520fc16b5641 probably not 82d57b8302b7497b2f6943f18e2d2687b9b0f5eb feaf72bdad035e198d297bfb0b8d891645f1dacd78f0db... 1536:1uNqjqzs1hQHhInEeJMzcmGqyF7Jwe9pvUo+5TDU4...
4 7.593283 0eb9e990c521b30428a379700ec5ab3e probably b778fc55f0538de865d4853099a3faa0b29f311d dc5e8176a5f012ebdb4835f9b570a12c045d059f6f5bdc... 1536:KcE4iMgXjTJpdGaaJG6Mhawv7r9ZaobsLBq+h5ttB...

5 rows × 6 columns


In [29]:
# Plotting defaults
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 12.0
plt.rcParams['figure.figsize'] = 18.0, 8.0

In [31]:
# Plot stuff (yes this is a silly plot but it's just an example :)
df_meta.boxplot('entropy','packed')
plt.xlabel('Packed')
plt.ylabel('Entropy')
plt.title('Entropy of Sample')
plt.suptitle('')


Out[31]:
<matplotlib.text.Text at 0x10702f2d0>

In [33]:
# Groupby and Statistics (yes silly again but just an example)
df_meta.groupby('packed').describe()


Out[33]:
entropy
packed
probably count 23.000000
mean 7.612472
std 0.290949
min 7.066162
25% 7.339412
50% 7.778263
75% 7.852313
max 7.919686
probably not count 27.000000
mean 5.832247
std 1.017639
min 2.440069
25% 5.564519
50% 5.967721
75% 6.537146
max 6.998708

16 rows × 1 columns

Wrap Up

Well for this notebook we illustrated how simple it is to add a worker to the Workbench project. We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: