WIP: DriveBy PCAP Analysis with Workbench:

Context: The company's Security Information and Event Management (SIEM) system is giving some DNS alerts coming from 'that' guy's computer. They look like that might be DGA (Dynamic Generation Algorithm) based domains so we pull some pcaps to try and quickly find out what's going on.

Note: This notebook was inspired by the data_hacking notebook called DriveBy PCAP Analysis. Here we're leveraging the workbench server and focusing the notebook at looking at a specific case captured by ThreatGlass. The exploited website for this exercise is kitchboss.com.au ThreatGlass_Info_for_Kitchenboss (arbitrarily choosen).

Tools in this Notebook:

Workbench can be setup to utilize several indexers:

  • Straight up Indexing with ElasticSearch
  • Super awesome Neo4j as both an indexer and graph database.

Neo4j also incorporates Lucene based indexing so not only can we capture a rich set of relationships between our data entities but searches and queries are super quick.

Thanks:

  • Thanks to Eric Chavez (SeaDawg) for patiently telling me how computers work. Super shout out... :)
  • Thanks to ThreatGlass for providing a great service and website.
  • Thanks to Mike Sconzo for putting together the original notebook.

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server

Okay so when the server starts up, it autoloads any worker plugins in the server/worker directory and dynamically monitors the directory, if a new python file shows up, it's validated as a properly formed plugin and if it passes is added to the list of workers.


In [10]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")


Out[10]:
[None]

Read in the Data

The data is pulled from [ThreatGlass](http://www.threatglass.com/), the exploited website for this exercise is kitchboss.com.au [ThreatGlass_Info_for_Kitchenboss](http://www.threatglass.com/malicious_urls/60d4098703770bd93c70dbb2f74ba1fb?process_date=2014-04-09)


In [11]:
# Load in the PCAP file
filename = '../data/pcap/kitchen_boss.pcap'
with open(filename,'rb') as f:
    pcap_md5 = c.store_sample(f.read(), filename, 'pcap')

Run the PCAP through Bro

Workbench makes running Bro super easy, it manages the PCAPs and the resulting Bro logs. Perhaps most importantly it allows us to pull back specific logs with super awesome client/server generators for the data that we want to analyze.

Note: we only have one in this case but running sets of PCAPs through Bro is well supported, please see our Batches_and_Sets notebook for more infomation.


In [13]:
# Run the Bro Network Security Monitor on the PCAP (or set of PCAPS) we just loaded.
# We could run several requests here... 'pcap_bro','view_pcap' or 'view_pcap_details',
# workbench is super granular and it's easy to try them all and add your own as well
output = c.work_request('view_pcap_details', pcap_md5)['view_pcap_details']
output


Out[13]:
{'bro_logs': {'conn_log': 'bd20b5e153d44bf43f95838540230200',
  'dhcp_log': '55e8aead94d8edf48e636e39c65e846e',
  'dns_log': '14873855a02c7e3952bf75cc2cf74d20',
  'files_log': 'a3e84dcbfecb881ad81123fe109bf9e8',
  'http_log': 'e58c1abb0182e9c48c4e5e37e68fa875',
  'packet_filter_log': 'c89d73c9ff2a12b66cdf74007af8a952',
  'weird_log': 'e872ecdb518caf4dcddc1044b90962c5'},
 'connectionId': 130,
 'err': None,
 'extracted_files': [{'entropy': 7.889679107924797,
   'file_size': 18643,
   'file_type': 'Java Jar file data (zip)',
   'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
   'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
   'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII'},
  {'entropy': 7.515107193655836,
   'file_size': 273920,
   'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
   'md5': '4410133f571476f2e76e29e61767b557',
   'sha256': 'e4bbdc8f869502183293797f51d6d64cc6c49d39b82effbcb738abe511054b51',
   'ssdeep': '6144:RSxqC+ayi6eWLj622ARbJFMQzynbJDxL3oPlRa:oxqC+ayi6p6EmQz+bf3otA'},
  {'entropy': 7.889679107924797,
   'file_size': 18643,
   'file_type': 'Java Jar file data (zip)',
   'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
   'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
   'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII'},
  {'entropy': 7.9751436379093406,
   'file_size': 7724,
   'file_type': 'Macromedia Flash data (compressed), version 9',
   'md5': '16cf037b8c8caad6759afc8c309de0f9',
   'sha256': '8af130ffe1140894895225e265d5d9a753ad9d85883db742c88f13bae01e2c30',
   'ssdeep': '192:v27CdZAdM5nmNvipKRRkLozlrEKcgTtpEV3+5jxChbNT3:v27QZAAVpKRRkLu/cOEa9ORD'},
  {'entropy': 7.9751436379093406,
   'file_size': 7724,
   'file_type': 'Macromedia Flash data (compressed), version 9',
   'md5': '16cf037b8c8caad6759afc8c309de0f9',
   'sha256': '8af130ffe1140894895225e265d5d9a753ad9d85883db742c88f13bae01e2c30',
   'ssdeep': '192:v27CdZAdM5nmNvipKRRkLozlrEKcgTtpEV3+5jxChbNT3:v27QZAAVpKRRkLu/cOEa9ORD'},
  {'entropy': 7.889679107924797,
   'file_size': 18643,
   'file_type': 'Java Jar file data (zip)',
   'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
   'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
   'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII'},
  {'entropy': 7.889679107924797,
   'file_size': 18643,
   'file_type': 'Java Jar file data (zip)',
   'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
   'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
   'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII'},
  {'entropy': 7.889679107924797,
   'file_size': 18643,
   'file_type': 'Java Jar file data (zip)',
   'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
   'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
   'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII'},
  {'entropy': 7.284164074455011,
   'file_size': 13006,
   'file_type': 'PDF document, version 1.6',
   'md5': '40b8c3c98f50e078251ec272620dfb5b',
   'sha256': '5dac98c66c7440992de9de860dd4312790bcc7bbcbb4aebfdce8b3fdfcfc56af',
   'ssdeep': '384:jW3+XiQMTXgazaO/J9m1DfN6abiiYEVB0T0DM:Ph+fJSE2r0j'}],
 'md5': 'df4f4a2e2bf020be50a12554942edb88',
 'n': 1,
 'ok': 1.0,
 'updatedExisting': False}

Workbench default mode is to run Bro with file extraction.

We can see that there's quite a few embedded files in this PCAP.

  • 5 Java JAR files (all the same)
  • 2 Flash files (same)
  • 1 PE executable file
  • 1 PDF file

In [8]:
# We'll grab the md5s for those files and do some kewl stuff with them later
file_md5s = list(set([item['md5'] for item in output['extracted_files']]))
pe_md5 = '4410133f571476f2e76e29e61767b557'
file_md5s


Out[8]:
['16cf037b8c8caad6759afc8c309de0f9',
 '40b8c3c98f50e078251ec272620dfb5b',
 'c762b6ba4f560692b6b84ac212cd3ec2',
 '4410133f571476f2e76e29e61767b557']

In [9]:
# Grab the Bro logs that we want
dns_log = c.stream_sample(output['bro_logs']['dns_log'])
http_log = c.stream_sample(output['bro_logs']['http_log'])
files_log = c.stream_sample(output['bro_logs']['files_log'])
dns_log


Out[9]:
<generator object iterator at 0x107e46d70>

Yep they are generators

Yes generators are awesome but getting one from a server request! Are u serious?! Yes, dead serious.. like chopping off your head and kicking your body into a shallow grave and putting your head on a stick... serious.

For more on client/server generators and client-contructed/server-executed generator pipelines see our super spiffy Generator Pipelines notebook.

Now that we have a server generator from workbench we can push it into a Pandas Dataframe without a copy, fast and memory efficient...

Data Transformation: One line of code to put workbench output into a Pandas Dataframe!

Putting our data into a Pandas Dataframe opens up a new world and enables tons of functionality for data, temporal and statistical analysis.

df = pd.DataFrame(output)

In [10]:
import pandas as pd

# Okay take the generators returned by stream_sample and efficiently create dataframes
# LIKE BUTTER I TELL YOU!
dns_df = pd.DataFrame(dns_log)
http_df = pd.DataFrame(http_log)
files_df = pd.DataFrame(files_log)
files_df.head()


Out[10]:
analyzers conn_uids depth duration extracted filename fuid is_orig local_orig md5 mime_type missing_bytes overflow_bytes parent_fuid rx_hosts seen_bytes sha1 sha256 source timedout
0 SHA256,MD5,SHA1 CBsm7k2L1ReG6MzFbj 0 0.168808 - - FL607G2jztRHr8Xbz F - bfd039047ebd33f25ffe16d7832d6ceb text/html 0 0 - 192.168.22.10 13944 29fba8185043e6c3748ce6c85f9b1b70fe9c4326 e7517bd0b1654151b5e7284bb8e8b9ec49ce411ff5e1f1... HTTP F ...
1 SHA256,MD5,SHA1 Cz5NkR1gnqOMWUSbB8 0 0.000000 - - FH1gD44gUcIfJgq2J2 F - 14625ee5228c694cf0767e09d12a8d1f text/plain 0 0 - 192.168.22.10 98 5676b357553c6d55f3361dbfab460e3268cb3b55 dfaa8766fad53785e137643e5c685926338f274ec21508... HTTP F ...
2 SHA256,MD5,SHA1 CnHfPx8bQQg2aeWo5 0 0.000056 - - FC8A2X2Jla1xMWhr8 F - 892a543f3abb54e8ec1ada55be3b0649 text/plain 0 0 - 192.168.22.10 10220 5847ed101f55d51c53538a7078971e7de8fb6762 8677971b119ccdb82af697ff0e08f218490d15116f221d... HTTP F ...
3 SHA256,MD5,SHA1 CQUPQx2b0CMrWv1Dag 0 0.000000 - - F3fwSC4WmD0PKQ6HHg F - 0ce8f355891c26c28f057e195e97dcd5 text/html 0 0 - 192.168.22.10 2429 3c7b369485cadd585d24be44701e459c8aa54d60 8c7a9c0470563367ab00307b4fb9bb3052d0a27f0b94e6... HTTP F ...
4 SHA256,MD5,SHA1 CnB5Oj1YktY5UFZu3k 0 0.000047 - - FLK1KW2a1Bwh6FE6hg F - 641cad6161527eb7cdabd4485637634e text/html 0 0 - 192.168.22.10 4022 4bc9306998175f909b167734dad41bd5a6589c82 97b0566bfad0e84bc0eb0db538e66b5dc103a878eb142e... HTTP F ...

5 rows × 23 columns

So I'm confused what just happened?

  • We sent a PCAP file to the workbench server
  • Workbench stores it in a database (MongoDB now, maybe Vertica later)
  • We made a work request 'view_pcap_details' on the PCAP (runs Bro and other stuff)
  • We pulled back just the parts of the Bro output that we specifically wanted
  • We got a set of client/server generators, we populated Pandas dataframes
  • In like a dozen lines of python!

Image below shows the Workbench database, each worker stores data in a separate collection. The data is transparent, organized and accessible

Lets look at the DNS Data

Given that this exercise started because the SIEM was flagging DNS traffic, we start there. Now that our Bro log data has been streamed into a Pandas Dataframe we can do all kinds of wonderful things. See Pandas to behold the awesome.


In [11]:
dns_df.head()


Out[11]:
AA RA RD TC TTLs Z answers id.orig_h id.orig_p id.resp_h id.resp_p proto qclass qclass_name qtype qtype_name query rcode rcode_name rejected
0 F T T F 14372 0 111.223.225.83 192.168.22.10 1035 4.2.2.3 53 udp 1 C_INTERNET 1 A kitchenboss.com.au 0 NOERROR F ...
1 F T T F 14371.000000,14371.000000 0 kitchenboss.com.au,111.223.225.83 192.168.22.10 1035 4.2.2.3 53 udp 1 C_INTERNET 1 A www.kitchenboss.com.au 0 NOERROR F ...
2 F T T F 2601.000000,128.000000 0 googleapis.l.google.com,74.125.128.95 192.168.22.10 1035 4.2.2.3 53 udp 1 C_INTERNET 1 A fonts.googleapis.com 0 NOERROR F ...
3 F T T F 2587.000000,128.000000 0 googleapis.l.google.com,74.125.128.95 192.168.22.10 1035 4.2.2.3 53 udp 1 C_INTERNET 1 A ajax.googleapis.com 0 NOERROR F ...
4 F T T F 36507.000000,271.000000 0 googlecode.l.googleusercontent.com,74.125.128.82 192.168.22.10 1042 4.2.2.3 53 udp 1 C_INTERNET 1 A html5shim.googlecode.com 0 NOERROR F ...

5 rows × 23 columns


In [61]:
dns_df[['query','answers','qtype_name']]


Out[61]:
query answers qtype_name
time
2014-04-09 06:38:54.505653 kitchenboss.com.au 111.223.225.83 A
2014-04-09 06:38:55.370111 www.kitchenboss.com.au kitchenboss.com.au,111.223.225.83 A
2014-04-09 06:38:56.773263 fonts.googleapis.com googleapis.l.google.com,74.125.128.95 A
2014-04-09 06:38:56.814285 ajax.googleapis.com googleapis.l.google.com,74.125.128.95 A
2014-04-09 06:38:56.816699 html5shim.googlecode.com googlecode.l.googleusercontent.com,74.125.128.82 A
2014-04-09 06:38:58.021865 themes.googleusercontent.com googlehosted.l.googleusercontent.com,173.194.1... A
2014-04-09 06:38:58.874743 www.google-analytics.com www-google-analytics.l.google.com,74.125.128.1... A
2014-04-09 06:39:00.805155 fpdownload2.macromedia.com fpdownload2.wip4.adobe.com,fpdownload.macromed... A
2014-04-09 06:39:00.709656 advertdedicated.com 217.12.199.174 A
2014-04-09 06:39:02.607899 p22x62n0yr63872e-qh6.focondteavrt.ru 142.4.194.92 A
2014-04-09 06:39:07.436915 2496128308-6.focondteavrt.ru 142.4.194.92 A
2014-04-09 06:39:10.103892 92.194.4.142.in-addr.arpa - PTR

12 rows × 3 columns

Two things make us nervous about this list

  • They RU domains look like they might be DGA domains DGA Notebook
  • Also it's a bit weird that the PTR queries didn't come back. Wikipedia: Forward_Confirmed_reverse_DNS has the following: 'Common DNS misconfigurations are outlined in RFC 1912, of particular note is section 2.1 that states, under the heading "Inconsistent, Missing or Bad Data", "Make sure your PTR and A records match".'

Lets look at the HTTP Data

We want to look at the HTTP data to see what kind of data was transferred from the domains of interest. Now that our Bro log data has been streamed into a Pandas Dataframe we can do all kinds of wonderful things. See Pandas to behold the awesome.


In [27]:
# Now we group by host and show the different response mime types for each host
group_host = http_df.groupby(['host','uid','resp_mime_types','uri'])[['response_body_len']].sum()
group_host.head(10)


Out[27]:
response_body_len
host uid resp_mime_types uri
2496128308-6.focondteavrt.ru C430GXDwsHcaJQOJa - /f/1397004360/2/2 0
application/jar /1397004360.jar 37286
application/x-dosexec /f/1397004360/2 273920
CqcNvW15fd2wz3n943 application/jar /1397004360.jar 55929
Csbo4X2kd5yinra0Df application/pdf /1397004360.pdf 13006
advertdedicated.com CAPqz5JkJVfWC7h6 text/plain /jQuery.js?id=AJAX&PID=1i&cache=91938.89965358726 13160
ajax.googleapis.com C0jJNB2DFc48BN6IUe text/plain /ajax/libs/swfobject/2.2/swfobject.js 10220
C2Oaxl1r99xWV2Du - /ajax/libs/jquery/1.8.3/jquery.min.js 0
CKE2fs4wR6cxeEpx2g text/plain /ajax/libs/jquery/1.8.3/jquery.min.js 93637
CecMiUQirdvhCW0ja - /ajax/libs/swfobject/2.2/swfobject.js 0

10 rows × 1 columns

Okay so now that we're looking at the http data we're not feeling any better

  • The grouped dataframe above shows clearly that four files were downloaded from the '2496128308-6.focondteavrt.ru' host.
  • A JAR file (twice), a PE file and a PDF file. In fact the JAR/PE file combination was pulled in the same http connection (highly suspicious).
  • Also the URIs look super odd, they all match like they are part of some 'attack bundle'.

Now lets look at the Files Data

Okay last but certainly not least we want to do a deep dive into the files that were downloaded to the computer, so we pull out the list of md5 that we saved earlier in the notebook from the workbench 'view_pcap_details' output. Note that the batch request below also returns a client/server generator (see Generator_Pipelines Notebook for more information).

Note: Workbench is in desperate need of workers for PDF, SWF, and JAR files. If you'd like to contribute please contact briford@supercowpowers.com :)

For now we'll just look at some Virus Total results and take a quick peek at the SWF and PE files.


In [28]:
# Get Meta-data for each of the extracted files from the PCAP
file_views = c.batch_work_request('meta_deep',{'md5_list':file_md5s})
[view for view in file_views]


Out[28]:
[{'encoding': 'binary',
  'entropy': 7.9751436379093406,
  'file_size': 7724,
  'file_type': 'Macromedia Flash data (compressed), version 9',
  'filename': 'HTTP-FhsNCh3n8L16lFIiXe.swf',
  'import_time': '2014-04-15T16:34:48.142000Z',
  'md5': '16cf037b8c8caad6759afc8c309de0f9',
  'mime_type': 'application/x-shockwave-flash',
  'sha1': 'e60071cf2e1460c26449f3a464ef8861043146ed',
  'sha256': '8af130ffe1140894895225e265d5d9a753ad9d85883db742c88f13bae01e2c30',
  'ssdeep': '192:v27CdZAdM5nmNvipKRRkLozlrEKcgTtpEV3+5jxChbNT3:v27QZAAVpKRRkLu/cOEa9ORD',
  'type_tag': 'swf'},
 {'encoding': 'binary',
  'entropy': 7.284164074455011,
  'file_size': 13006,
  'file_type': 'PDF document, version 1.6',
  'filename': 'HTTP-FwHJvw10MLj8tTj9O8.pdf',
  'import_time': '2014-04-15T16:34:48.153000Z',
  'md5': '40b8c3c98f50e078251ec272620dfb5b',
  'mime_type': 'application/pdf',
  'sha1': '1200d521f453fe03596a50cfc401963f3fd15d76',
  'sha256': '5dac98c66c7440992de9de860dd4312790bcc7bbcbb4aebfdce8b3fdfcfc56af',
  'ssdeep': '384:jW3+XiQMTXgazaO/J9m1DfN6abiiYEVB0T0DM:Ph+fJSE2r0j',
  'type_tag': 'pdf'},
 {'encoding': 'binary',
  'entropy': 7.889679107924797,
  'file_size': 18643,
  'file_type': 'Java Jar file data (zip)',
  'filename': 'HTTP-F5XuvS22aJWC2yHiRl.jar',
  'import_time': '2014-04-15T16:34:48.128000Z',
  'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
  'mime_type': 'application/jar',
  'sha1': '81721697e7d4538137bab8efd7e29b4003694294',
  'sha256': 'c776c5f3b979233c8466fc521e38271bbd59081538e126273fe1a75a228bd25d',
  'ssdeep': '384:7SXliKrIvBZFzoceSZNZ2wLk588eHYBAXGIsMeV:AiKrKySZNxg5892SGII',
  'type_tag': 'jar'},
 {'encoding': 'binary',
  'entropy': 7.515107193655836,
  'file_size': 273920,
  'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
  'filename': 'HTTP-F6daIS1TRjI9X6r873.exe',
  'import_time': '2014-04-15T16:34:48.135000Z',
  'md5': '4410133f571476f2e76e29e61767b557',
  'mime_type': 'application/x-dosexec',
  'sha1': '035db69cc80fc56717a42646911d9aa95b2ff39e',
  'sha256': 'e4bbdc8f869502183293797f51d6d64cc6c49d39b82effbcb738abe511054b51',
  'ssdeep': '6144:RSxqC+ayi6eWLj622ARbJFMQzynbJDxL3oPlRa:oxqC+ayi6p6EmQz+bf3otA',
  'type_tag': 'exe'}]

In [29]:
# Virus Total Queries (as of 4-20-2014)
vt_output = c.batch_work_request('vt_query', {'md5_list':file_md5s})
[output for output in vt_output]


Out[29]:
[{'file_type': 'Macromedia Flash data (compressed), version 9',
  'md5': '16cf037b8c8caad6759afc8c309de0f9',
  'positives': 0,
  'scan_date': '2014-04-14 21:15:29',
  'scan_results': [],
  'total': 51},
 {'file_type': 'PDF document, version 1.6',
  'md5': '40b8c3c98f50e078251ec272620dfb5b',
  'not_found': True},
 {'file_type': 'Java Jar file data (zip)',
  'md5': 'c762b6ba4f560692b6b84ac212cd3ec2',
  'positives': 11,
  'scan_date': '2014-04-11 16:29:27',
  'scan_results': [['Exploit-FUG!C762B6BA4F56', 2],
   ['TROJ_GEN.F47V0408', 1],
   ['Exploit:Java/CVE-2012-1723', 1],
   ['UnclassifiedMalware', 1],
   ['Exploit.Zip.CVE20121723.crxrbn', 1]],
  'total': 50},
 {'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
  'md5': '4410133f571476f2e76e29e61767b557',
  'not_found': True}]

In [30]:
# Well VirusTotal only found two of the files (SWF and JAR). The SWF has
# zero positives (we're going to take that with a grain of salt). The PDF
# and PE files don't even show up. So we'll take a closer look at the SWF
# and PE file with some of the workers in workbench.
swf_view = c.work_request('swf_meta','16cf037b8c8caad6759afc8c309de0f9')
swf_view


Out[30]:
{'swf_meta': {'compressed': True,
  'encoding': 'binary',
  'file_length': 13215,
  'file_size': 7724,
  'file_type': 'Macromedia Flash data (compressed), version 9',
  'filename': 'HTTP-FhsNCh3n8L16lFIiXe.swf',
  'frame_count': 1,
  'frame_rate': 20.0,
  'frame_size': '[xmin: 0 xmax: 300 ymin: 0 ymax: 250]',
  'import_time': '2014-04-15T16:34:48.142000Z',
  'md5': '16cf037b8c8caad6759afc8c309de0f9',
  'mime_type': 'application/x-shockwave-flash',
  'tags': ['[69:FileAttributes] useDirectBlit: 0, useGPU: 0, hasMetadata: 1, actionscript3: 1, useNetwork: 1',
   '[77:Metadata]',
   '[09:SetBackgroundColor] Color: #ffffffff',
   '[43:FrameLabel]',
   '[39:DefineSprite] ID: 2',
   '[82:DoABC]',
   '[76:SymbolClass]',
   '[01:ShowFrame]',
   '[00:End]'],
  'type_tag': 'swf',
  'version': 9}}

Hmmm... not sure...

We need better SWF workers obviously, but naively we can see that actionscript and useNetwork are enabled, also the DoABC (Actionscript ByteCode) tag is there. The SWF file warrants further investigation.


In [31]:
pe_view = c.work_request('pe_indicators', '4410133f571476f2e76e29e61767b557')
pe_view


Out[31]:
{'pe_indicators': {'indicator_list': [{'attributes': ['gettickcount',
     'queryperformancecounter',
     'isdebuggerpresent'],
    'category': 'ANTI_DEBUG',
    'description': 'Imported symbols related to anti-debugging',
    'severity': 3},
   {'attributes': ['cocreateinstance'],
    'category': 'COM_SERVICES',
    'description': 'Imported symbols related to COM or Services',
    'severity': 3},
   {'attributes': ['.malina', '.ndata', '.mdata'],
    'category': 'MALFORMED',
    'description': 'Section(s) with a non-standard name, tamper indication',
    'severity': 3},
   {'attributes': ['getmodulefilenamea',
     'getmodulefilenamew',
     'getmodulehandlea',
     'getstartupinfow',
     'getmodulehandlew'],
    'category': 'PROCESS_MANIPULATION',
    'description': 'Imported symbols related to process manipulation/injection',
    'severity': 3},
   {'attributes': ['filetimetosystemtime', 'getsystemtimeasfiletime'],
    'category': 'PROCESS_SPAWN',
    'description': 'Imported symbols related to spawning a new process',
    'severity': 2},
   {'attributes': ['loadlibraryw', 'getprocaddress'],
    'category': 'STEALTH_LOAD',
    'description': 'Imported symbols related to loading libraries, resources, etc in a sneaky way',
    'severity': 2},
   {'attributes': ['findfirstfilea', 'findnextfilea'],
    'category': 'SYSTEM_PROBE',
    'description': 'Imported symbols related to probing the system',
    'severity': 2},
   {'attributes': ['setfiletime', 'createfilea', 'createfilew'],
    'category': 'SYSTEM_STATE',
    'description': 'Imported symbols related to changing system state',
    'severity': 1}],
  'md5': '4410133f571476f2e76e29e61767b557'}}

Whoa there's a lot of 'interesting' stuff in that executable

Clearly we're not making a definitive statement here but with all the indicators above we highly suspect the executable as being malicious and it at least warrants further investigation.

Network Context

Neo4j: Origin of the four files

Okay we're now pretty sure that at least two of the four files are bad, but exactly where did the files come from within the context of the network information that we can extract from the PCAP file. The image on the right shows all of the relevant information that we gather using the 'pcap_graph' worker called below. The blue nodes are the four files (a close-up image is given below). All graph images were pulled by simply going to the Neo4j graphical interface at http://localhost:7474/browser/.


In [37]:
graph = c.work_request('pcap_graph', pcap_md5)
graph


Out[37]:
{'pcap_graph': {'md5': 'df4f4a2e2bf020be50a12554942edb88',
  'output': 'graph_complete'}}

Full Graph

This graph image below was generated by going to http://localhost:7474/browser and executing this query

match (n)-[r]-() return n,r

File Graph

This graph image below which focuses on the files themselves and the path they took to get to our infected host (orange node in the middle) was generated by going to http://localhost:7474/browser and executing this query

match (s:file),(t{name:'192.168.22.10'}), p=shortestPath((s)--(t)) return p

Timing: File Downloads and DNS requests

Looking at the file graph above, the SWF even looks more suspicious (even though VT has 0 hits out of 51 as of 4-20-2014). So we want to take a look at the timing of the file downloads and DNS requests.



Okay what you'd really do here is an RE analysis of the SWF file to see if indeed is using something like navigateToURL or one of the many other network functions to hit the focondteavrt.ru domain. We would ^love^ for some super nice person to help us out with this and put the results into this notebook. :) You can get the SWF File here SWF_File (obviously proceed with caution!).

But for now we're going to look at timing data which is admittedly a bit more circumstantial.


In [22]:
# Lets look at the timing of the dns requests and the file downloads

# Make a new column in both dataframe with a proper datetime stamp
dns_df['time'] = pd.to_datetime(dns_df['ts'], unit='s')
files_df['time'] = pd.to_datetime(files_df['ts'], unit='s')

# Now make time the new index for both dataframes
dns_df.set_index(['time'], inplace=True)
files_df.set_index(['time'], inplace=True)

Dataframes are great for subsetting and filtering/masking

Below we show several examples where we're just interested in particular columns and rows.

  • First: We're only interested in the set of files above (captured in the python list called 'file_mds') and the domains of interest (captured below in the 'domains' list.
  • Second: We want to look at only a few columns from both the files dataframe and the dns dataframe.
  • Third: The results of the operations are placed into the 'interesting_files' and 'interesting_dns' dataframes and then concatenated together.

In [54]:
interesting_files = files_df[files_df['md5'].isin(file_md5s)]

In [62]:
domains = ['kitchenboss.com.au','www.kitchenboss.com.au','p22x62n0yr63872e-qh6.focondteavrt.ru',
           '2496128308-6.focondteavrt.ru','92.194.4.142.in-addr.arpa']
interesting_dns = dns_df[dns_df['query'].isin(domains)]

In [64]:
all_time = pd.concat([interesting_dns[['query','answers','qtype_name']], interesting_files[['md5','mime_type','tx_hosts']]])
all_time.sort_index(inplace=True)
all_time


Out[64]:
answers md5 mime_type qtype_name query tx_hosts
time
2014-04-09 06:38:54.505653 111.223.225.83 NaN NaN A kitchenboss.com.au NaN
2014-04-09 06:38:55.370111 kitchenboss.com.au,111.223.225.83 NaN NaN A www.kitchenboss.com.au NaN
2014-04-09 06:39:00.511352 NaN 16cf037b8c8caad6759afc8c309de0f9 application/x-shockwave-flash NaN NaN 111.223.225.83
2014-04-09 06:39:00.511405 NaN 16cf037b8c8caad6759afc8c309de0f9 application/x-shockwave-flash NaN NaN 111.223.225.83
2014-04-09 06:39:02.607899 142.4.194.92 NaN NaN A p22x62n0yr63872e-qh6.focondteavrt.ru NaN
2014-04-09 06:39:07.436915 142.4.194.92 NaN NaN A 2496128308-6.focondteavrt.ru NaN
2014-04-09 06:39:08.690350 NaN c762b6ba4f560692b6b84ac212cd3ec2 application/jar NaN NaN 142.4.194.92
2014-04-09 06:39:09.285793 NaN c762b6ba4f560692b6b84ac212cd3ec2 application/jar NaN NaN 142.4.194.92
2014-04-09 06:39:09.862145 NaN c762b6ba4f560692b6b84ac212cd3ec2 application/jar NaN NaN 142.4.194.92
2014-04-09 06:39:10.103892 - NaN NaN PTR 92.194.4.142.in-addr.arpa NaN
2014-04-09 06:39:12.944153 NaN 40b8c3c98f50e078251ec272620dfb5b application/pdf NaN NaN 142.4.194.92
2014-04-09 06:39:34.991542 NaN c762b6ba4f560692b6b84ac212cd3ec2 application/jar NaN NaN 142.4.194.92
2014-04-09 06:39:35.784163 NaN c762b6ba4f560692b6b84ac212cd3ec2 application/jar NaN NaN 142.4.194.92
2014-04-09 06:39:36.369731 NaN 4410133f571476f2e76e29e61767b557 application/x-dosexec NaN NaN 142.4.194.92

14 rows × 6 columns

Discussion: Sequence of File Downloads and DNS requests

Looking at the table above we see the following sequence of events:

  • (Beginning) DNS Queries for the kitchenboss.com.au
  • (+5 seconds) SWF file downloaded
  • (+2 seconds) DNS Query to p22x62n0yr63872e-qh6.focondteavrt.ru
  • (+5 seconds) DNS Query to 2496128308-6.focondteavrt.ru
  • (+1 seconds) 3 JAR files downloads (same one) very close together
  • (+0.3 seconds) DNS reverse(PTR) query for 142.4.194.92 with no answer
  • (+2 seconds) PDF file download
  • (+24 seconds) 2 JAR files downloads (same one)
  • (+0.6 seconds) PE Exec file download </pre>


    So given that the PCAP was captured on a VM that isn't doing anything else and looking at the timing above we could surmise that perhaps the SWF intiated the connection to the focondteavrt.ru domains (circumstantial obviously). As mentioned above the right thing to do here would be to conduct an RE on the SWF file and we would ^love^ for some super nice person to help us out with this and put the results into this notebook. :)

Wrap Up

Well that's it for this notebook. We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: