Workbench:

A light-weight and flexible approach to task management, execution and pipelining.

Adoption of Workbench:

Open source tools have 3 primary components that are critical for adoption:

### How easy is it to get the system up and running?
- git clone https://github.com/SuperCowPowers/workbench.git
- Go through the 'Workbench Dependencies' section in the readme and about an hour later you should be done...
- So not great but not horrible either

### How easy is it to use?

ZeroRPC has Python, Node.js, or CLI interfaces.See ZeroRPC.

import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")
with open('evil.pcap','rb') as f:
  md5 = c.store_sample(f.read(), 'evil.pcap', 'pcap')
print c.work_request('pcap_meta', md5)

Output from above 'client':

{'pcap_meta': {'encoding': 'binary',
'file_size': 54339570,
'file_type': 'tcpdump (little-endian) - version 2.4 (Ethernet, 65535)',
'filename': 'evil.pcap',
'import_time': '2014-02-08T22:15:50.282000Z',
'md5': 'bba97e16d7f92240196dc0caef9c457a',
'mime_type': 'application/vnd.tcpdump.pcap'}}

### How easy is it for me to put my own code into it?
- Below is the listing of the code for the 'strings' worker: Each worker needs to provide:
  - dependencies (which other workers do you want as input)
  - An execute method that takes a python dictionary as input and returns a python dictionary as output.
  - That's it...

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server

Okay so when the server starts up, it autoloads any worker plugins in the server/worker directory and dynamically monitors the directory, if a new python file shows up, it's validated as a properly formed plugin and if it passes is added to the list of workers.



In [35]:

    
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")









    Out[35]:





[None]

So I'm confused what am I suppose to do with workbench?

Workbench is often confusing for new users (we're trying to work on that). Please see our github repository https://github.com/SuperCowPowers/workbench for the latest documentation and notebooks examples (the notebook examples can really help). New users can start by typing **c.help()** after they connect to workbench.



In [36]:

    
# I forgot what stuff I can do with workbench
print c.help()









    



Welcome to Workbench: Here's a list of help commands:
	 - Run c.help_basic() for beginner help
	 - Run c.help_commands() for command help
	 - Run c.help_workers() for a list of workers
	 - Run c.help_advanced() for advanced help

See https://github.com/SuperCowPowers/workbench for more information



In [37]:

    
print c.help_basic()









    



Workbench: Getting started...
	 - 1) $ print c.help_commands() for a list of commands
	 - 2) $ print c.help_command('store_sample') for into on a specific command
	 - 3) $ print c.help_workers() for a list a workers
	 - 4) $ print c.help_worker('meta') for info on a specific worker
	 - 5) $ my_md5 = c.store_sample(...)
	 - 6) $ output = c.work_request('meta', my_md5)



In [38]:

    
# STEP 1:
# Okay get the list of commands from workbench
print c.help_commands()









    



Workbench Commands:
	add_node(node_id, name, labels)
	add_rel(source_id, target_id, rel)
	clear_db()
	clear_graph_db()
	get_datastore_uri()
	get_sample(md5)
	get_sample_set(md5)
	get_sample_window(type_tag, size)
	has_node(node_id)
	have_sample(md5)
	help()
	help_advanced()
	help_basic()
	help_command(command)
	help_commands()
	help_worker(worker)
	help_workers()
	index_sample(md5, index_name)
	index_worker_output(worker_class, md5, index_name, subfield)
	search(index_name, query)
	store_sample(input_bytes, filename, type_tag)
	store_sample_set(md5_list)
	work_request(worker_class, md5, subkeys=None)



In [39]:

    
# STEP 2:
# Lets gets the infomation on a specific command 'store_sample'
print c.help_command('store_sample')









    



 Command: store_sample(input_bytes, filename, type_tag) 
 Store a sample into the DataStore.
            Args:
                filename: name of the file (used purely as meta data not for lookup)
                input_bytes: the actual bytes of the sample e.g. f.read()
                type_tag: ('exe','pcap','pdf','json','swf', or ...)
            Returns:
                the md5 of the sample



In [40]:

    
# STEP 3:
# Now lets get infomation about the dynamically loaded workers (your site may have many more!)
# Next to each worker name is the list of dependences that worker has declared
print c.help_workers()









    



Workbench Workers:
	json_meta ['sample', 'meta']
	log_meta ['sample', 'meta']
	meta ['sample']
	meta_deep ['sample', 'meta']
	pcap_bro ['sample']
	pcap_graph ['pcap_bro']
	pcap_graph_0_1 ['pcap_bro']
	pcap_http_graph ['pcap_bro']
	pe_classifier ['pe_features', 'pe_indicators']
	pe_deep_sim ['meta_deep']
	pe_features ['sample']
	pe_indicators ['sample']
	pe_peid ['sample']
	strings ['sample']
	swf_meta ['sample', 'meta']
	unzip ['sample']
	url ['strings']
	view ['meta']
	view_customer ['meta']
	view_log_meta ['log_meta']
	view_meta ['meta']
	view_pcap ['pcap_bro']
	view_pcap_details ['view_pcap']
	view_pdf ['meta', 'strings']
	view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier', 'pe_disass']
	view_zip ['meta', 'unzip']
	vt_query ['meta']
	yara_sigs ['sample']



In [41]:

    
# STEP 4:
# Lets gets the infomation about the meta worker
print c.help_worker('meta')









    



 Worker: meta ['sample']
	 This worker computes meta data for any file type.

Alright, are we feeling awesome yet?

Let do some stuff

Workbench has lots of samples that you can try out (some of them are malicious, be careful!). So you can grab a PE file, a PCAP, or a PDF file and throw it in!



In [42]:

    
# STEP 5:
# Okay when we load up a file, we get the md5 back
filename = '../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3'
with open(filename,'rb') as f:
    my_md5 = c.store_sample(f.read(), filename, 'exe')
print my_md5









    



0cb9aa6fb9c4aa3afad7a303e21ac0f3



In [43]:

    
# STEP 6:
# Run a worker on my sample
output = c.work_request('meta', my_md5)
output









    Out[43]:





{'meta': {'customer': 'BearTron',
  'encoding': 'binary',
  'file_size': 20480,
  'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
  'filename': '../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3',
  'import_time': '2014-06-10T20:48:15.321000Z',
  'length': 20480,
  'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3',
  'mime_type': 'application/x-dosexec',
  'type_tag': 'exe'}}

Alright, now it's time to get fabulous!

Let check out some of the other workers

We saw a bunch of workers for PE Files, so lets look at the help for those and start taking it up a notch!



In [44]:

    
# Lets see what view_pe does
print c.help_worker('view_pe')









    



 Worker: view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier', 'pe_disass']
	 Generates a high level summary view for PE files that incorporates a large set of workers



In [45]:

    
# Okay lets give it a try
c.work_request('view_pe', my_md5)









    Out[45]:





{'view_pe': {'classification': 'Evil!',
  'customer': 'BearTron',
  'disass': 'plugin_failed',
  'encoding': 'binary',
  'file_size': 20480,
  'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
  'filename': '../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3',
  'import_time': '2014-06-10T20:48:15.321000Z',
  'indicators': [{'attributes': ['findwindowexa', 'findwindowa'],
    'category': 'ANTI_DEBUG',
    'description': 'Imported symbols related to anti-debugging',
    'severity': 3},
   {'category': 'MALFORMED', 'description': 'Checksum of Zero', 'severity': 1},
   {'category': 'MALFORMED',
    'description': 'Reported Checksum does not match actual checksum',
    'severity': 2},
   {'attributes': ['sendmessagea'],
    'category': 'COMMUNICATION',
    'description': 'Imported symbols related to network communication',
    'severity': 1},
   {'attributes': ['getmodulehandlea', 'getstartupinfoa'],
    'category': 'PROCESS_MANIPULATION',
    'description': 'Imported symbols related to process manipulation/injection',
    'severity': 3},
   {'attributes': ['getsystemmetrics'],
    'category': 'PROCESS_SPAWN',
    'description': 'Imported symbols related to spawning a new process',
    'severity': 2}],
  'length': 20480,
  'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3',
  'mime_type': 'application/x-dosexec',
  'peid_Matches': ['Microsoft Visual C++ v6.0'],
  'type_tag': 'exe'}}

The Workbench framework is client/server so here's what just happened...



In [46]:

    
# Okay, that worker needed the output of pe_features and pe_indicators
# so what happened? The worker has a dependency list and workbench
# recursively satisfies that dependency list.. this is powerful because
# when we're interested in one particular analysis we just want to get
# the darn thing without having to worry about a bunch of details

# Well lets do this for a bunch of files!
import os
file_list = [os.path.join('../data/pe/bad', child) for child in os.listdir('../data/pe/bad')]
working_set = []
for filename in file_list:
    with open(filename,'rb') as f:
        md5 = c.store_sample(f.read(), filename, 'exe')
        results = c.work_request('pe_classifier', md5)
        working_set.append(md5)
        print 'Results: %s' % (results)









    



Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '033d91aae8ad29ed9fbb858179271232'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '0e882ec9b485979ea84c7843d41ba36f'}}
Results: {'pe_classifier': {'classification': 'Benign', 'md5': '0e8b030fb6ae48ffd29e520fc16b5641'}}
Results: {'pe_classifier': {'classification': 'Benign', 'md5': '0eb9e990c521b30428a379700ec5ab3e'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '127f2bade752445b3dbf2cf2ea75c201'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '139385a91b9bca0833bdc1fa77e42b91'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '13dcc5b4570180118eb65529b77f6d89'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '1cac80a2147cd8f3860547e43edcaa00'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '1cea13cf888cd8ce4f869029f1dbb601'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '1d733a9e3e571ce5f5f633a0cfd3d5f0'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2058c50de5976c67a09dfa5e0e1c7eb5'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d012cba541c22fb7250975d5ad0d065'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d015553c7388e4d78f05b24aba0819c'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d017ff228c39a0a727586b33f8168b0'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09133abb48e1c7f3f5c8f8ced8fef4'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d094b6c69020091b68d1bcf5d11fa4b'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d095091983dd0bf6ab7c0bb6dd695f9'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09546831b17d2cc0583362b6d312ae'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d099171876f7301c155bd775fff2b6a'}}
Results: {'pe_classifier': {'classification': 'Benign', 'md5': '2d09a573b0e9d02a9fa47b16e9c01a48'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09b5768e3617523d8afa110361919c'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09b8d9852c3176259915e3509bcbd1'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09ca902990545fec9ac190b0338b50'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09cb38c268aa9297e5a7f27e677267'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09cc92bbe29d96bb3a91b350d1725f'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '2d09e4aff42aebac87ae2fd737aba94f'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '32b24e73cfc3ac4c43f1926f8935e438'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '4ed28b6207560f127d267de639a4e1bf'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '505804ec7c7212a52ec85e075b91ed84'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '60a83c049e135cc199138c1f8861437c'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '69f9633fa6fd5dc1fd917cb435bba8ad'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '79f5e1af9fdb92476045989bda7515c7'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '7f313447b887b078215617fbed1a34a1'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '8006782bdf703e2f3fdf1d1650f45ffd'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '86714940f491bc38c2e842e80c7f778e'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '987bd46899b2a9493e6dec051edcb66c'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '9cd3d7b1b0f2aea5950cbf7d97776f2f'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '9ceccd9f32cb2ad0b140b6d15d8993b6'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': '9e42ff1e6f75ae3e60b24e48367c8f26'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'a7b0a9067d8292b252d741e6fae17cd9'}}
Results: {'pe_classifier': {'classification': 'Benign', 'md5': 'afddc552b31a8f2438768c73674bf29e'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'b681485cb9e0cad73ee85b9274c0d3c2'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'bf1249a258cbcccec0f1b4ea1e9451a1'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'c8c54ac7e827056174762c68db84534f'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'cc113aa59c04b17e7cb832fc417f104d'}}
Results: {'pe_classifier': {'classification': 'Benign', 'md5': 'd94da41e7e809f7366971b3b50f8ef68'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'e9a6c83826deacfbc2281b6c7e401694'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'ea5d95c96a23b21b9038f03b91955c18'}}
Results: {'pe_classifier': {'classification': 'Evil!', 'md5': 'f6190648c2efb764ae1d73b0e9a4fd13'}}



In [47]:

    
# We just ran the classifer on 50 files and you'll note that we ONLY got back the
# information we ask for. On a large amount of files (100k or greater) if you don't
# have a granular system, something this easy WILL NOT BE POSSIBLE! (dramatic enough?)

# So lets look at the features going into the classifier (btw the classifier is currently a TOY EXAMPLE)
c.work_request('pe_features', md5)









    Out[47]:





{'pe_features': {'dense_features': {'check_sum': 0,
   'compile_date': 1074901182,
   'datadir_IMAGE_DIRECTORY_ENTRY_BASERELOC_size': 2348796035,
   'datadir_IMAGE_DIRECTORY_ENTRY_EXPORT_size': 0,
   'datadir_IMAGE_DIRECTORY_ENTRY_IAT_size': 0,
   'datadir_IMAGE_DIRECTORY_ENTRY_IMPORT_size': 20,
   'datadir_IMAGE_DIRECTORY_ENTRY_RESOURCE_size': 262,
   'debug_size': 0,
   'export_size': 0,
   'generated_check_sum': 62500,
   'iat_rva': 86510,
   'major_version': 0,
   'minor_version': 57,
   'number_of_bound_import_symbols': 0,
   'number_of_bound_imports': 0,
   'number_of_export_symbols': 0,
   'number_of_import_symbols': 0,
   'number_of_imports': 0,
   'number_of_rva_and_sizes': 10,
   'number_of_sections': 3,
   'pe_char': 33167,
   'pe_dll': 0,
   'pe_driver': 0,
   'pe_exe': 1,
   'pe_i386': 1,
   'pe_majorlink': 76,
   'pe_minorlink': 111,
   'pe_warnings': 1,
   'sec_entropy_': 7.876937179741062,
   'sec_entropy_@': 5.388003199139927,
   'sec_entropy_data': 0,
   'sec_entropy_ps': 5.388003199139927,
   'sec_entropy_rdata': 0,
   'sec_entropy_reloc': 0,
   'sec_entropy_rsrc': 0,
   'sec_entropy_text': 0,
   'sec_raw_execsize': 6984,
   'sec_rawptr_': 512,
   'sec_rawptr_@': 16,
   'sec_rawptr_data': 0,
   'sec_rawptr_ps': 16,
   'sec_rawptr_rsrc': 0,
   'sec_rawptr_text': 0,
   'sec_rawsize_': 5992,
   'sec_rawsize_@': 496,
   'sec_rawsize_data': 0,
   'sec_rawsize_ps': 496,
   'sec_rawsize_rsrc': 0,
   'sec_rawsize_text': 0,
   'sec_va_execsize': 86016,
   'sec_vasize_': 36864,
   'sec_vasize_@': 4096,
   'sec_vasize_data': 0,
   'sec_vasize_ps': 45056,
   'sec_vasize_rsrc': 0,
   'sec_vasize_text': 0,
   'size_code': 1766614113,
   'size_image': 90112,
   'size_initdata': 1918988898,
   'size_uninit': 16761,
   'std_section_names': 0,
   'total_size_pe': 6504,
   'virtual_address': 4096,
   'virtual_size': 45056,
   'virtual_size_2': 36864},
  'md5': 'f6190648c2efb764ae1d73b0e9a4fd13',
  'sparse_features': {'imp_hash': 'Not found: Install pefile 1.2.10-139 or later',
   'imported_symbols': [],
   'pe_warning_strings': ['Error parsing section 0. PointerToRawData should normally be a multiple of FileAlignment, this might imply the file is trying to confuse tools which parse this incorrectly',
    'Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'Suspicious flags set for section 1. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'Error parsing section 2. PointerToRawData should normally be a multiple of FileAlignment, this might imply the file is trying to confuse tools which parse this incorrectly',
    'Suspicious flags set for section 2. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'Corrupt header "IMAGE_IMPORT_DESCRIPTOR" at file offset 494. Exception: \'Data length less than expected header length.\'',
    "Invalid relocation information. Can't read data at RVA: 0x476ffa5"],
   'section_names': ['ps', '', '@']}}}



In [48]:

    
c.work_request('pe_indicators', md5)









    Out[48]:





{'pe_indicators': {'indicator_list': [{'category': 'PE_WARN',
    'description': 'Error parsing section 0. PointerToRawData should normally be a multiple of FileAlignment, this might imply the file is trying to confuse tools which parse this incorrectly',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': 'Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': 'Suspicious flags set for section 1. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': 'Error parsing section 2. PointerToRawData should normally be a multiple of FileAlignment, this might imply the file is trying to confuse tools which parse this incorrectly',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': 'Suspicious flags set for section 2. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': 'Corrupt header "IMAGE_IMPORT_DESCRIPTOR" at file offset 494. Exception: \'Data length less than expected header length.\'',
    'severity': 2},
   {'category': 'PE_WARN',
    'description': "Invalid relocation information. Can't read data at RVA: 0x476ffa5",
    'severity': 2},
   {'category': 'MALFORMED', 'description': 'Checksum of Zero', 'severity': 1},
   {'category': 'MALFORMED',
    'description': 'Reported Checksum does not match actual checksum',
    'severity': 2},
   {'category': 'MALFORMED',
    'description': 'Image size does not match reported size',
    'severity': 3},
   {'attributes': ['ps', '', '@'],
    'category': 'MALFORMED',
    'description': 'Section(s) with a non-standard name, tamper indication',
    'severity': 3},
   {'attributes': ['PS', '@\x00\x0f@\x00'],
    'category': 'MALFORMED',
    'description': 'Unaligned section, tamper indication',
    'severity': 3}],
  'md5': 'f6190648c2efb764ae1d73b0e9a4fd13'}}

Wow, that's a lot of features... Good thing I can ask for just what I want and minimize network traffic.

On another note, did we just waste some time there? Did workbench have to recompute the features? No everything done by workbench is pushed into the MongoDB backend and then if the work results for that md5 are already in the datastore the a very lightweight call is made to get the results. In fact results are never directly returned, the worker pushes into Mongo and then we pull them out and hand them to the client, that way we ^ensure^ that the bits in the datastore and the bits that you get are the exact same 'gold bits' (seems like overkill but it's important).

Your data - Transparent, Organized, Accessible



In [49]:

    
# Another example.. I want to look at strings for different types of files (not just pe_files)
# So we can load up a few pdfs (the pe's are already in the datastore)
file_list = [os.path.join('../data/pdf/bad', child) for child in os.listdir('../data/pdf/bad')]
for filename in file_list:
    with open(filename,'rb') as f:
        md5 = c.store_sample(f.read(), filename, 'pdf')
        working_set.append(md5)



In [50]:

    
# Now we rip the strings worker on them all
for md5 in working_set:
    result = c.work_request('strings', md5)
    print 'results: %s' % (result['strings']['string_list'][:5]) # strings output is large so just showing the first 5









    



results: ['    ', '!This program cannot be run in DOS mode.', 'Rich', '.text', '.rdata']
results: ['!This program cannot be run in DOS mode.', 'Rich3', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`.data', '.rsrc']
results: ['This program must be run under Win32', 'CODE', '`DATA', '.idata', '.tls']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`.rdata', '@DATA']
results: ['!This program cannot be run in DOS mode.', 'kRich', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', '.text', '.code', '`.data', '.data3']
results: ['!This program cannot be run in DOS mode.', 'YRich', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', '.text', '`.data', '.rsrc', 'blenkxr']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', '.text', '.data', '.rsrc', 'aitrfvl']
results: ['This program must be run under Win32', 'ATSEC0', '`ATSEC1', '@ATSEC2', '@idata']
results: ['This program must be run under Win32', 'CODE', '`DATA', '.idata', '.reloc']
results: ['yrf<[LordPE]', '.text', '.text', 'ExitProcess', 'KERNEL32.dll']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`rdata', '.data']
results: ['!This program cannot be run in DOS mode.', '6Rich#', '^"+M', 'UPX0', 'UPX1']
results: ['!This program cannot be run in DOS mode.', 'Rich"', '.text', '@.code', '`.data']
results: ['yrf<[LordPE]', '.text', '.text', 'ExitProcess', 'KERNEL32.dll']
results: ['!This program cannot be run in DOS mode.', 'Riche', '.text', '.data', '.rsrc']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`rdata', '.data']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.data', '`.data', '`.data']
results: ['!This program cannot be run in DOS mode.', '.text', '`.data', '.idata', '@.rsrc']
results: ['!This program cannot be run in DOS mode.', ">O'X_!tX_!tX_!t", '@+tN_!t', 'C/tR_!t:@2tS_!tX_ t', '@*t[_!t']
results: ['!This program cannot be run in DOS mode.', 'RichX', 'E0TK', '.text', '`.rdata']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`rdata', '.data']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', 'UPX0', 'UPX1', '.rsrc', '3.03']
results: ['    ', '!This program cannot be run in DOS mode.', 'Rich', '.text', '.rdata']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.data', '.pdata', '.ex_cod']
results: ['This program must be run under Win32', 'CODE', '`DATA', '.idata', '.tls']
results: ['!This program cannot be run in DOS mode.', '.text', '`.rdata', '@.data', '_TEXT2']
results: ['!This program cannot be run in DOS mode.', 'ERich', 'UPX0', 'UPX1', 'UPX2']
results: ['This program must be run under Win32', 'mnYD', 'CODE', '`DATA', '.idata']
results: ['!This program cannot be run in DOS mode.', '.text', '`.rdata', '@.data', '_TEXT2']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.PEX', '`.PEX', 'Bome']
results: ['!This program cannot be run in DOS mode.', '|Richv', 'UPX0', 'UPX1', '.rsrc']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', '`.rdata', '@.data']
results: ['!This program cannot be run in DOS mode.', 'sIPE', '.text', '.data', '.rsrc']
results: ['!This program cannot be run in DOS mode.', '.text', '.data', '.rsrc', 'mzphdwa']
results: ['!This program cannot be run in DOS mode.', '.text', '.data', '.rsrc', 'lsicbkg']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.packed', '`.RLPack', 'a?/u']
results: ['!This program cannot be run in DOS mode.', 'Rich', '.text', 'h.rdata', 'H.reloc']
results: ['This program must be run under Win32', 'CODE', '`DATA', '.idata', '.tls']
results: ['    ', '!This program cannot be run in DOS mode.', 'Rich', '.text', '.rdata']
results: ['!This program cannot be run in DOS mode.', '.text', '.data', '.rdata', '@.bss']
results: ['!This program cannot be run in DOS mode.', 'ssaR', "'#K'", '.data', '.text']
results: ['!This program cannot be run in DOS mode.', 'BZ.`', 'Richy', 'J!NH', '.text']
results: ['!This program cannot be run in DOS mode.', 'RichW', '.text', '.data', '.rsrc']
results: ['.text', 'KERNEL32.DLL', 'MSVCRT.DLL', 'USER32.DLL', 'ADVAPI32.DLL']
results: ['MZKERNEL32.DLL', 'LoadLibraryA', 'GetProcAddress', '^]YF', 'W8mu']
results: ['%PDF-1.6', '52 0 obj<</Length 51252212/Root 1 0 R/Info 3 0 R%/F/W[1 2 1]/Index[5 1 7 1 9 4 23 4 50 3]>>stream', '/Filter/FlateDecode/W[1 2 1]/Index[5 1 7 1 9 4 23 4 50 3]>>stream', 'bbb0b`b```', '310Z']
results: ['%PDF-1.1', '1 0 obj', ' /Type /Catalog', ' /Outlines 2 0 R', ' /Pages 3 0 R']
results: ['%PDF-1.3', '2 0 obj', '/OpenAction << /JS 9 0 R /S /JavaScript >>', '/Type /Catalog', '/Pages 3 0 R']
results: ['%PDF-1.4', '1 0 obj', '/Type /Catalog', '/Outlines 3 0 R', '/Pages 4 0 R']
results: ['%PDF-1.6', '11 0 obj', '<</Filter/FlateDecode /Length 2523>>', 'stream', 's--e']
results: ['%PDF-1.4', '1 0 obj', '<</Pages 2 0 R  /OpenAction <<', '/JS 4 0 R /S /JavaScript /Type /Catalog>>>>', 'endobj']
results: ['%PDF-1.0', '1 0 obj<</Type/Catalog/Pages 2 0 R /Names 3 0 R >>endobj', '2 0 obj<</Type/Pages/Count 1/Kids[ 4 0 R ]>>endobj', '3 0 obj<</JavaScript 5 0 R >>endobj', '4 0 obj<</Type/Page/Parent 2 0 R /Contents 12 0 R>>endobj']
results: ['%PDF-1.6', '7 0 obj', '<</Count 1/Type/Pages/Kids[28 0 R]>>', 'endobj', '21 0 obj']
results: ['%PDF-1.3', '4 0 obj', 'endobj', '5 0 obj', '/Producer (substr)']
results: ['%PDF-1.5', '1 0 obj<</#54ype/#43atal#6fg/Outlin#65#73 2 0 R/#50ages 3 0 R/Ope#6e#41ctio#6e 5 0 R>>endobj', '2 0 obj<</T#79#70#65/#4fu#74li#6e#65s/C#6funt 0>>endobj', '3 0 obj<</Ty#70#65/Pa#67#65s/K#69ds[4 0 R]/#43o#75nt 1>>endobj', '4 0 obj<</#54#79pe/P#61g#65/#50a#72#65#6et 3 0 R/#4d#65#64i#61B#6f#78[0 0 612 792]>>endobj']
results: ['%PDF-1.3', '4 0 obj', '<< /Length 5 0 R /Filter /FlateDecode >>', 'stream', '}b%~T\\']
results: ['%PDF-1.7', '3 0 obj', '<</Type /Page', '/Parent 1 0 R', '/MediaBox [0 0 595.28 841.89]']
results: ['%PDF-1.3', '%&#1074;&#1075;&#1055;&#1059;', '1 0 obj', '/Outlines 2 0 R', '/OpenAction 3 0 R']
results: ['%PDF-1.6', '10 0 obj', '<</Filter/FlateDecode /Length 1563>>', 'stream', '}V]o']
results: ['%PDF-1.3', '3 0 obj', '<</Type /Page', '/Parent 1 0 R', '/Resources 2 0 R']
results: ['%PDF-1.6', '12 0 obj', '<</Filter/FlateDecode /Length 2063>>', 'stream', 'W1OOO']
results: ['%PDF-1.4', '1 0 obj', '/Type /Catalog', '/Outlines 3 0 R', '/Pages 4 0 R']
results: ['%PDF-1.0', '1 0 obj<</Type/Catalog/Pages 2 0 R /Names 3 0 R >>endobj', '2 0 obj<</Type/Pages/Count 1/Kids[ 4 0 R ]>>endobj', '3 0 obj<</JavaScript 5 0 R >>endobj', '4 0 obj<</Type/Page/Parent 2 0 R /Contents 12 0 R>>endobj']
results: ['%PDF-1.6', '7 0 obj', '<</Length 2307 /Filter/FlateDecode>>', 'stream', '&^:_']
results: ['%PDF-1.0', '1 0 obj<</Type/Catalog/Pages 2 0 R /Names 3 0 R >>endobj', '2 0 obj<</Type/Pages/Count 1/Kids[ 4 0 R ]>>endobj', '3 0 obj<</JavaScript 5 0 R >>endobj', '4 0 obj<</Type/Page/Parent 2 0 R /Contents 12 0 R>>endobj']
results: ['%PDF-1.6', '9 0 obj', '<</Filter/FlateDecode /Length 2278>>', 'stream', '$z(R']
results: ['%PDF-1.6', '1 0 obj', '<</MediaBox [0 0 1 1] /Type/Page /Contents 3 0 R /Parent 5 0 R>>', 'endobj', '5 0 obj']
results: ['%PDF-1.3', '1 0 obj', '/Kids [ 4 0 R ]', '/Type /Pages', '/Count 1']
results: ['%PDF-1.6', '7 0 obj', '<</Count 1/Type/Pages/Kids[28 0 R]>>', 'endobj', '21 0 obj']
results: ['%PDF-1.6', '3 0 obj', '<</Filter/FlateDecode /Length 1905>>', 'stream', "z]%'"]

Views

Views exemplify the true power of the workbench. They are meta workers in the broadest sense, they can call any set of workers (and other views, which are just workers of course). All of the previous notebook code focused on demonstrating the level of control and granularity you can use with workbench, here the example we're going to show for views will be for those who don't care about granularity and really just want a big 'GO' button.

Views can also be precise or general (example shows the latter):

- Customer billing View
- Sample volume over time View
- All samples that use communications calls View
- DO_EVERYTHING_BECAUSE_I_WANT_TO_PUNCH_GRANULARITY_IN_THE_NUTS! View

So lets look at the last kind .. it's called 'view' and like many of the other workers it's 20 lines of code.

But it's deceptively simple, if you think about what must be happening below... over a dozen workers are getting orchestrated and run only when it makes sense for that MIME type. So with a few 'pull' calls the recursive dependency chains are invoked; work is done if/when it's needed and the whole thing is fantastically elegant and efficient. If your mind isn't a little bit blown by what happens below then you might not be paying attention.



In [51]:

    
# This just grabs all the file_paths recursively
def tag_type(path):
    types = ['bro','json','log','pcap','pdf','exe','swf','zip']
    for try_type in types:
        if try_type in os.path.dirname(path):
            return try_type

file_list = []
for p,d,f_list in os.walk('../data'):
    file_list += [os.path.join(p, f) for f in f_list]

Please note the next cell takes several minutes to complete

The repository has hundreds of test files and the next few lines of python runs them all through 4-5 workers each, yes 'view' is like a magic pony it will figure out what workers to run on which file types (peid is the big time hog in case you're wondering).



In [54]:

    
# We're going to load in all the files which include PE files, PCAPS, PDFs, and ZIPs and run 'view' on them.
# Note: This takes a while :)
import pprint
results = []
for filename in file_list:
    with open(filename,'rb') as f:
        md5 = c.store_sample(f.read(), os.path.basename(filename), tag_type(filename))
        results.append(c.work_request('view', md5))
pprint.pprint(results[:5])









    



[{'view': {'md5': '142372845adfdb668ba5bca0e81e6c19',
           'meta': {'customer': 'Mega Corp',
                    'encoding': 'binary',
                    'file_size': 12292,
                    'file_type': 'Apple Desktop Services Store',
                    'filename': '.DS_Store',
                    'import_time': '2014-06-10T20:48:33.501000Z',
                    'length': 12292,
                    'md5': '142372845adfdb668ba5bca0e81e6c19',
                    'mime_type': 'binary',
                    'type_tag': None}}},
 {'view': {'md5': 'f12f0237be84a8e353477e55ec43589b',
           'meta': {'customer': 'Dorseys Mom',
                    'encoding': 'us-ascii',
                    'file_size': 25218,
                    'file_type': 'ASCII text',
                    'filename': 'conn.log',
                    'import_time': '2014-06-10T20:48:33.514000Z',
                    'length': 25218,
                    'md5': 'f12f0237be84a8e353477e55ec43589b',
                    'mime_type': 'text/plain',
                    'type_tag': 'bro'}}},
 {'view': {'md5': 'a62fd3c72c1d688ff8041e0be87d07aa',
           'meta': {'customer': 'Dorseys Mom',
                    'encoding': 'us-ascii',
                    'file_size': 541,
                    'file_type': 'ASCII text',
                    'filename': 'dhcp.log',
                    'import_time': '2014-06-10T20:48:33.532000Z',
                    'length': 541,
                    'md5': 'a62fd3c72c1d688ff8041e0be87d07aa',
                    'mime_type': 'text/plain',
                    'type_tag': 'bro'}}},
 {'view': {'md5': '438022b94b10cada18414314b0c8584b',
           'meta': {'customer': 'Huge Inc',
                    'encoding': 'us-ascii',
                    'file_size': 23896,
                    'file_type': 'ASCII text',
                    'filename': 'dns.log',
                    'import_time': '2014-06-10T20:48:33.551000Z',
                    'length': 23896,
                    'md5': '438022b94b10cada18414314b0c8584b',
                    'mime_type': 'text/plain',
                    'type_tag': 'bro'}}},
 {'view': {'md5': 'f57da114f0f1e07b93f374d68c65c583',
           'meta': {'customer': 'Dorseys Mom',
                    'encoding': 'us-ascii',
                    'file_size': 40283,
                    'file_type': 'ASCII text',
                    'filename': 'files.log',
                    'import_time': '2014-06-10T20:48:33.573000Z',
                    'length': 40283,
                    'md5': 'f57da114f0f1e07b93f374d68c65c583',
                    'mime_type': 'text/plain',
                    'type_tag': 'bro'}}}]



In [55]:

    
# Okay so views can either aggregate results from multiple workers or they
# can subset to just want you want (webpage presentation for instance)
results = c.batch_work_request('view_customer')
print results









    



<generator object iterator at 0x10e3375a0>

Holy s#@&! The server batch request returned a generator?

Yes generators are awesome but getting one from a server request! Are u serious?! Yes, thanks to ZeroRPC...dead serious.. like chopping off your head and kicking your body into a shallow grave and putting your head on a stick... serious.

For more on client/server generators and client-contructed/server-executed generator pipelines see our super spiffy Generator Pipelines notebook.



In [56]:

    
# At this granularity it opens up a new world
import pandas as pd
df = pd.DataFrame(results)
df.head(10)









    Out[56]:






  
    
      
      customer
      filename
      import_time
      length
      md5
      type_tag
    
  
  
    
      0
          BearTron
       ../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3
       2014-06-10T20:48:15.321000Z
       20480
       0cb9aa6fb9c4aa3afad7a303e21ac0f3
       pe
    
    
      1
         Mega Corp
       ../data/pe/bad/033d91aae8ad29ed9fbb858179271232
       2014-06-10T20:48:19.036000Z
       85504
       033d91aae8ad29ed9fbb858179271232
       pe
    
    
      2
       Dorseys Mom
       ../data/pe/bad/0e882ec9b485979ea84c7843d41ba36f
       2014-06-10T20:48:19.146000Z
       64512
       0e882ec9b485979ea84c7843d41ba36f
       pe
    
    
      3
         Mega Corp
       ../data/pe/bad/0e8b030fb6ae48ffd29e520fc16b5641
       2014-06-10T20:48:19.258000Z
       81920
       0e8b030fb6ae48ffd29e520fc16b5641
       pe
    
    
      4
       Dorseys Mom
       ../data/pe/bad/0eb9e990c521b30428a379700ec5ab3e
       2014-06-10T20:48:19.519000Z
       97280
       0eb9e990c521b30428a379700ec5ab3e
       pe
    
    
      5
       Dorseys Mom
       ../data/pe/bad/127f2bade752445b3dbf2cf2ea75c201
       2014-06-10T20:48:19.697000Z
       66560
       127f2bade752445b3dbf2cf2ea75c201
       pe
    
    
      6
          Huge Inc
       ../data/pe/bad/139385a91b9bca0833bdc1fa77e42b91
       2014-06-10T20:48:19.817000Z
       22510
       139385a91b9bca0833bdc1fa77e42b91
       pe
    
    
      7
         Mega Corp
       ../data/pe/bad/13dcc5b4570180118eb65529b77f6d89
       2014-06-10T20:48:19.897000Z
       29184
       13dcc5b4570180118eb65529b77f6d89
       pe
    
    
      8
          BearTron
       ../data/pe/bad/1cac80a2147cd8f3860547e43edcaa00
       2014-06-10T20:48:20.009000Z
       72704
       1cac80a2147cd8f3860547e43edcaa00
       pe
    
    
      9
         Mega Corp
       ../data/pe/bad/1cea13cf888cd8ce4f869029f1dbb601
       2014-06-10T20:48:20.160000Z
       53248
       1cea13cf888cd8ce4f869029f1dbb601
       pe
    
  

10 rows × 6 columns



In [57]:

    
# Lets look at the file submission types broken down by customer
df['count'] = 1
df.groupby(['customer','type_tag']).sum()









    Out[57]:






  
    
      
      
      length
      count
    
    
      customer
      type_tag
      
      
    
  
  
    
      BearTron
      bro
         52422
       12
    
    
      cab
         54007
        1
    
    
      json
        288312
        1
    
    
      own
           554
        1
    
    
      pcap
         35057
        3
    
    
      pdf
        374438
       13
    
    
      pe
       1588841
       28
    
    
      Dorseys Mom
      bro
        146979
       20
    
    
      jar
         10629
        1
    
    
      pcap
       1799435
        1
    
    
      pdf
        251385
        9
    
    
      pe
       1673734
       27
    
    
      zip
        151268
        2
    
    
      Huge Inc
      bro
        197288
        8
    
    
      own
         52961
        3
    
    
      pcap
       3461100
        3
    
    
      pdf
        512620
       18
    
    
      pe
       1120037
       23
    
    
      swf
          7724
        1
    
    
      Mega Corp
      bro
         59415
       12
    
    
      jar
         18643
        1
    
    
      own
         10252
        4
    
    
      pcap
        693558
        1
    
    
      pdf
        390356
       12
    
    
      pe
       1507236
       27
    
  

25 rows × 2 columns



In [58]:

    
# Plotting defaults
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 12.0
plt.rcParams['figure.figsize'] = 18.0, 8.0



In [59]:

    
# Plot box plots based on customer (PDFs)
df[df['type_tag']=='pdf'].boxplot('length','customer')
plt.xlabel('Customer')
plt.ylabel('File Size')
plt.title('File Length (PDF) by Customer')
plt.suptitle('')









    Out[59]:





<matplotlib.text.Text at 0x10e5addd0>



In [60]:

    
# Plot box plots based on customer (PEs)
df[df['type_tag']=='exe'].boxplot('length','customer')
plt.xlabel('Customer')
plt.ylabel('File Size')
plt.title('File Length (PE) by Customer')
plt.suptitle('')









    Out[60]:





<matplotlib.text.Text at 0x10e661cd0>



In [65]:

    
# Okay now lets do some plots on the file meta-data
results = c.batch_work_request('meta_deep')



In [66]:

    
df_meta = pd.DataFrame(results)
df_meta.head()









    Out[66]:






  
    
      
      customer
      encoding
      entropy
      file_size
      file_type
      filename
      import_time
      length
      md5
      mime_type
      sha1
      sha256
      ssdeep
      type_tag
    
  
  
    
      0
          BearTron
       binary
       2.440069
       20480
       PE32 executable (GUI) Intel 80386, for MS Windows
       ../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3
       2014-06-10T20:48:15.321000Z
       20480
       0cb9aa6fb9c4aa3afad7a303e21ac0f3
       application/x-dosexec
       96e85768a12b2f319f2a4f0c048460e1b73aa573
       4ecf79302ba0439f62e15d0526a297975e6bb32ea25c8c...
       192:a8jJIFYrq9ATskBTp2jLDL3P1oynldvSo71nF:oFpN...
       pe
    
    
      1
         Mega Corp
       binary
       7.894680
       85504
       PE32 executable (GUI) Intel 80386, for MS Windows
       ../data/pe/bad/033d91aae8ad29ed9fbb858179271232
       2014-06-10T20:48:19.036000Z
       85504
       033d91aae8ad29ed9fbb858179271232
       application/x-dosexec
       83ab10907b254752f312c89125957f10d35cb9d4
       eb107c004e6e1bbd3b32ad7961661bbe28a577b0cb5dac...
       1536:h6+LbfPbI5dzmJu9Tgj5aOItvEqRCHW9pjVrs2ryr...
       pe
    
    
      2
       Dorseys Mom
       binary
       5.125292
       64512
       PE32 executable (GUI) Intel 80386, for MS Windows
       ../data/pe/bad/0e882ec9b485979ea84c7843d41ba36f
       2014-06-10T20:48:19.146000Z
       64512
       0e882ec9b485979ea84c7843d41ba36f
       application/x-dosexec
       12fb0a1b7d9c2b2a41f4da9ce5bbfb140fb16939
       616cf9e729c883d979212eb55178b7aac80dd9f58cb449...
       768:5HyLMqtEM1Htz8kDmP9l+nZZYp41oj7EZmJxl/N9j6...
       pe
    
    
      3
         Mega Corp
       binary
       6.303055
       81920
       PE32 executable (GUI) Intel 80386, for MS Windows
       ../data/pe/bad/0e8b030fb6ae48ffd29e520fc16b5641
       2014-06-10T20:48:19.258000Z
       81920
       0e8b030fb6ae48ffd29e520fc16b5641
       application/x-dosexec
       82d57b8302b7497b2f6943f18e2d2687b9b0f5eb
       feaf72bdad035e198d297bfb0b8d891645f1dacd78f0db...
       1536:1uNqjqzs1hQHhInEeJMzcmGqyF7Jwe9pvUo+5TDU4...
       pe
    
    
      4
       Dorseys Mom
       binary
       7.593283
       97280
       PE32 executable (GUI) Intel 80386, for MS Windows
       ../data/pe/bad/0eb9e990c521b30428a379700ec5ab3e
       2014-06-10T20:48:19.519000Z
       97280
       0eb9e990c521b30428a379700ec5ab3e
       application/x-dosexec
       b778fc55f0538de865d4853099a3faa0b29f311d
       dc5e8176a5f012ebdb4835f9b570a12c045d059f6f5bdc...
       1536:KcE4iMgXjTJpdGaaJG6Mhawv7r9ZaobsLBq+h5ttB...
       pe
    
  

5 rows × 14 columns



In [67]:

    
# Plot entropy box plots based on file type
df_meta.boxplot('entropy','type_tag')
plt.xlabel('Mime Type')
plt.ylabel('Entropy')









    Out[67]:





<matplotlib.text.Text at 0x10f436410>



In [68]:

    
# Plot customer submissions based on file type
group_df = df[['customer','type_tag']]
group_df['submissions'] = 1
group_df = group_df.groupby(['customer','type_tag']).sum().unstack()
group_df.head()









    Out[68]:






  
    
      
      submissions
    
    
      type_tag
      bro
      cab
      jar
      json
      own
      pcap
      pdf
      pe
      swf
      zip
    
    
      customer
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      BearTron
       12
        1
      NaN
        1
        1
       3
       13
       28
      NaN
      NaN
    
    
      Dorseys Mom
       20
      NaN
        1
      NaN
      NaN
       1
        9
       27
      NaN
        2
    
    
      Huge Inc
        8
      NaN
      NaN
      NaN
        3
       3
       18
       23
        1
      NaN
    
    
      Mega Corp
       12
      NaN
        1
      NaN
        4
       1
       12
       27
      NaN
      NaN
    
  

4 rows × 10 columns



In [80]:

    
# Plot entropy box plots based on mime-type
my_colors = [(x/9.0, .8, 1.0-x/9.0) for x in range(10)] # Why the heck dosen't matplotlib have better categorical cmaps?
group_df['submissions'].plot(kind='bar', stacked=True, color=my_colors)
plt.xlabel('Customer')
plt.ylabel('Submissions')









    Out[80]:





<matplotlib.text.Text at 0x10f9f6e90>

	customer	filename	import_time	length	md5	type_tag
0	BearTron	../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3	2014-06-10T20:48:15.321000Z	20480	0cb9aa6fb9c4aa3afad7a303e21ac0f3	pe
1	Mega Corp	../data/pe/bad/033d91aae8ad29ed9fbb858179271232	2014-06-10T20:48:19.036000Z	85504	033d91aae8ad29ed9fbb858179271232	pe
2	Dorseys Mom	../data/pe/bad/0e882ec9b485979ea84c7843d41ba36f	2014-06-10T20:48:19.146000Z	64512	0e882ec9b485979ea84c7843d41ba36f	pe
3	Mega Corp	../data/pe/bad/0e8b030fb6ae48ffd29e520fc16b5641	2014-06-10T20:48:19.258000Z	81920	0e8b030fb6ae48ffd29e520fc16b5641	pe
4	Dorseys Mom	../data/pe/bad/0eb9e990c521b30428a379700ec5ab3e	2014-06-10T20:48:19.519000Z	97280	0eb9e990c521b30428a379700ec5ab3e	pe
5	Dorseys Mom	../data/pe/bad/127f2bade752445b3dbf2cf2ea75c201	2014-06-10T20:48:19.697000Z	66560	127f2bade752445b3dbf2cf2ea75c201	pe
6	Huge Inc	../data/pe/bad/139385a91b9bca0833bdc1fa77e42b91	2014-06-10T20:48:19.817000Z	22510	139385a91b9bca0833bdc1fa77e42b91	pe
7	Mega Corp	../data/pe/bad/13dcc5b4570180118eb65529b77f6d89	2014-06-10T20:48:19.897000Z	29184	13dcc5b4570180118eb65529b77f6d89	pe
8	BearTron	../data/pe/bad/1cac80a2147cd8f3860547e43edcaa00	2014-06-10T20:48:20.009000Z	72704	1cac80a2147cd8f3860547e43edcaa00	pe
9	Mega Corp	../data/pe/bad/1cea13cf888cd8ce4f869029f1dbb601	2014-06-10T20:48:20.160000Z	53248	1cea13cf888cd8ce4f869029f1dbb601	pe

		length	count
customer	type_tag
BearTron	bro	52422	12
	cab	54007	1
	json	288312	1
	own	554	1
	pcap	35057	3
	pdf	374438	13
	pe	1588841	28
Dorseys Mom	bro	146979	20
	jar	10629	1
	pcap	1799435	1
	pdf	251385	9
	pe	1673734	27
	zip	151268	2
Huge Inc	bro	197288	8
	own	52961	3
	pcap	3461100	3
	pdf	512620	18
	pe	1120037	23
	swf	7724	1
Mega Corp	bro	59415	12
	jar	18643	1
	own	10252	4
	pcap	693558	1
	pdf	390356	12
	pe	1507236	27

	customer	encoding	entropy	file_size	file_type	filename	import_time	length	md5	mime_type	sha1	sha256	ssdeep	type_tag
0	BearTron	binary	2.440069	20480	PE32 executable (GUI) Intel 80386, for MS Windows	../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3	2014-06-10T20:48:15.321000Z	20480	0cb9aa6fb9c4aa3afad7a303e21ac0f3	application/x-dosexec	96e85768a12b2f319f2a4f0c048460e1b73aa573	4ecf79302ba0439f62e15d0526a297975e6bb32ea25c8c...	192:a8jJIFYrq9ATskBTp2jLDL3P1oynldvSo71nF:oFpN...	pe
1	Mega Corp	binary	7.894680	85504	PE32 executable (GUI) Intel 80386, for MS Windows	../data/pe/bad/033d91aae8ad29ed9fbb858179271232	2014-06-10T20:48:19.036000Z	85504	033d91aae8ad29ed9fbb858179271232	application/x-dosexec	83ab10907b254752f312c89125957f10d35cb9d4	eb107c004e6e1bbd3b32ad7961661bbe28a577b0cb5dac...	1536:h6+LbfPbI5dzmJu9Tgj5aOItvEqRCHW9pjVrs2ryr...	pe
2	Dorseys Mom	binary	5.125292	64512	PE32 executable (GUI) Intel 80386, for MS Windows	../data/pe/bad/0e882ec9b485979ea84c7843d41ba36f	2014-06-10T20:48:19.146000Z	64512	0e882ec9b485979ea84c7843d41ba36f	application/x-dosexec	12fb0a1b7d9c2b2a41f4da9ce5bbfb140fb16939	616cf9e729c883d979212eb55178b7aac80dd9f58cb449...	768:5HyLMqtEM1Htz8kDmP9l+nZZYp41oj7EZmJxl/N9j6...	pe
3	Mega Corp	binary	6.303055	81920	PE32 executable (GUI) Intel 80386, for MS Windows	../data/pe/bad/0e8b030fb6ae48ffd29e520fc16b5641	2014-06-10T20:48:19.258000Z	81920	0e8b030fb6ae48ffd29e520fc16b5641	application/x-dosexec	82d57b8302b7497b2f6943f18e2d2687b9b0f5eb	feaf72bdad035e198d297bfb0b8d891645f1dacd78f0db...	1536:1uNqjqzs1hQHhInEeJMzcmGqyF7Jwe9pvUo+5TDU4...	pe
4	Dorseys Mom	binary	7.593283	97280	PE32 executable (GUI) Intel 80386, for MS Windows	../data/pe/bad/0eb9e990c521b30428a379700ec5ab3e	2014-06-10T20:48:19.519000Z	97280	0eb9e990c521b30428a379700ec5ab3e	application/x-dosexec	b778fc55f0538de865d4853099a3faa0b29f311d	dc5e8176a5f012ebdb4835f9b570a12c045d059f6f5bdc...	1536:KcE4iMgXjTJpdGaaJG6Mhawv7r9ZaobsLBq+h5ttB...	pe