Explorations of PCAP files from contagio malware dump

Tools

Data

  • contagio malware dump PCAP collection (http://contagiodump.blogspot.com/2013/04/collection-of-pcap-files-from-malware.html)
</font>

What we did:

  • Data gathered with Bro (default Bro content)
    • bro -C -r <pcap> local
  • Data cleanup
    • Had to remove "s from the various smtp logs due to errors with pandas read_csv expecting " surrounding a field vs. in a field
  • Augmented connection records with Maxmind ASN Database (download and unzip in the same directory as this notebook)
  • Explored the Data! </ul> </font>

Additional notes:

All of this data has been contributed to contagio by various sources. The methods of traffic capture/generation/sandbox setup probably varies widly between most (all?) samples. This underscores the need for good data when doing analysis, but in this case we're going to make the best of what we've got.

Thanks to everybody who selflessly contibutes data to sites like contagio, keep up the great work!


In [1]:
import pandas as pd
import numpy as np
import string
import pylab
import re
import pandas
import time
import os
import collections
import matplotlib
import struct
import socket
import json
from datetime import datetime
from netaddr import IPNetwork, IPAddress

%matplotlib inline

print pd.__version__

pylab.rcParams['figure.figsize'] = (16.0, 5.0)


0.13.0rc1-32-g81053f9

In [2]:
# Mapping of fields of the files we want to read in and initial setup of pandas dataframes
logs_to_process = {
                    'conn.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','proto','service','duration','orig_bytes','resp_bytes','conn_state','local_orig','missed_bytes','history','orig_pkts','orig_ip_bytes','resp_pkts','resp_ip_bytes','tunnel_parents','threat','sample'],
                    'dns.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','proto','trans_id','query','qclass','qclass_name','qtype','qtype_name','rcode','rcode_name','AA','TC','RD','RA','Z','answers','TTLs','rejected','threat','sample'],
                    'files.log' : ['ts','fuid','tx_hosts','rx_hosts','conn_uids','source','depth','analyzers','mime_type','filename','duration','local_orig','is_orig','seen_bytes','total_bytes','missing_bytes','overflow_bytes','timedout','parent_fuid','md5','sha1','sha256','extracted','threat','sample'],
                    'ftp.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','user','password','command','arg','mime_type','file_size','reply_code','reply_msg','data_channel.passive','data_channel.orig_h','data_channel.resp_h','data_channel.resp_p','fuid','threat','sample'],
                    'http.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','trans_depth','method','host','uri','referrer','user_agent','request_body_len','response_body_len','status_code','status_msg','info_code','info_msg','filename','tags','username','password','proxied','orig_fuids','orig_mime_types','resp_fuids','resp_mime_types','threat','sample'],
                    'notice.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','fuid','file_mime_type','file_desc','proto','note','msg','sub','src','dst','p','n','peer_descr','actions','suppress_for','dropped','remote_location.country_code','remote_location.region','remote_location.city','remote_location.latitude','remote_location.longitude','threat','sample'],
                    'signatures.log' : ['ts','src_addr','src_port','dst_addr','dst_port','note','sig_id','event_msg','sub_msg','sig_count','host_count','threat','sample'],
                    'smtp.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','trans_depth','helo','mailfrom','rcptto','date','from','to','reply_to','msg_id','in_reply_to','subject','x_originating_ip','first_received','second_received','last_reply','path','user_agent','fuids','is_webmail','threat','sample'],
                    'ssl.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','version','cipher','server_name','session_id','subject','issuer_subject','not_valid_before','not_valid_after','last_alert','client_subject','client_issuer_subject','cert_hash','validation_status','threat','sample'],
                    'tunnel.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','tunnel_type','action','threat','sample'],
                    'weird.log' : ['ts','uid','id.orig_h','id.orig_p','id.resp_h','id.resp_p','name','addl','notice','peer','threat','sample']
                  }

conndf   = pd.DataFrame(columns=logs_to_process['conn.log'])
dnsdf    = pd.DataFrame(columns=logs_to_process['dns.log'])
filesdf  = pd.DataFrame(columns=logs_to_process['files.log'])
ftpdf    = pd.DataFrame(columns=logs_to_process['ftp.log'])
httpdf   = pd.DataFrame(columns=logs_to_process['http.log'])
noticedf = pd.DataFrame(columns=logs_to_process['notice.log'])
sigdf    = pd.DataFrame(columns=logs_to_process['signatures.log'])
smtpdf   = pd.DataFrame(columns=logs_to_process['smtp.log'])
ssldf    = pd.DataFrame(columns=logs_to_process['ssl.log'])
tunneldf = pd.DataFrame(columns=logs_to_process['tunnel.log'])
weirddf  = pd.DataFrame(columns=logs_to_process['weird.log'])

In [3]:
# Process the directory structure
# If you download the complete PCAP zip from Contagio and unzip a structure like:
# PCAPS_TRAFFIC_PATTERNS
#      |->CRIME
#          |-> <sample>
#      |->APT
#          |-> <sample>
#      |->METASPLOIT
#          |-> <sample>
#
# Will appear and this is the structure that's walk CRIME/APT/METASPLOIT will make their way into the "threat" tag
# while the sample/PCAP name will wind up in "sample"
#
# Bro data generated via the "run_bro.sh" shell script (this places all Bro output in the respective sample directories and 
# contributes to the directory structure above
for dirName, subdirList, fileList in os.walk('.'):
    #print('Found directory: %s' % dirName)
    for fname in fileList:
        tags = dirName.split('/')
        if len(tags) == 4 and fname in logs_to_process:
            #print ('%s/%s' %(dirName, fname))
            logname = fname.split('.')
            try:
                tempdf = pd.read_csv(dirName+'/'+fname, sep='\t',skiprows=8, header=None, 
                                     names=logs_to_process[fname][:-2], skipfooter=1)
                tempdf['threat'] = tags[2]
                tempdf['sample'] = tags[3]
                if tags[2] == "0":
                    print ('%s/%s' %(dirName, fname))
                if fname == 'conn.log':
                    conndf = conndf.append(tempdf)
                if fname == 'dns.log':
                    dnsdf = dnsdf.append(tempdf)
                if fname == 'files.log':
                    filesdf = filesdf.append(tempdf)
                if fname == 'ftp.log':
                    ftpdf = ftpdf.append(tempdf)
                if fname == 'http.log':
                    httpdf = httpdf.append(tempdf)
                if fname == 'notice.log':
                    noticedf = noticedf.append(tempdf)
                if fname == 'signatures.log':
                    sigdf = sigdf.append(tempdf)
                if fname == 'smtp.log':
                    smtpdf = smtpdf.append(tempdf)
                if fname == 'ssl.log':
                    ssldf = ssldf.append(tempdf)
                if fname == 'tunnel.log':
                    tunneldf = tunneldf.append(tempdf)
                if fname == 'weird.log':
                    weirddf = weirddf.append(tempdf)
            except Exception as e:
                print "[*] error: %s, on %s/%s" % (str(e), dirName, fname)

# Read in and configure the maxmind db (free ASN)                
maxmind = pd.read_csv("./GeoIPASNum2.csv", sep=',', header=None, names=['low','high','asn'])
maxmind['low'] = maxmind['low'].astype(int)
maxmind['high'] = maxmind['high'].astype(int)

In [4]:
# Helper Functions
def ip2int(addr):               
    try:
        return struct.unpack("!I", socket.inet_aton(addr))[0]
    except Exception as e:
        pass
        #print "Error: %s - %s" % (str(e), addr)
    return 0

maxcache = {}
def maxmind_lookup(ip):
    if ip in maxcache:
        return maxcache[ip]
    i = ip2int(ip)
    if i == 0:
        return "UNKNOWN"
    results = list(maxmind.loc[(maxmind["low"] < i) & (maxmind['high'] > i)]['asn'])
    if len(results) > 0:
        maxcache[ip] = results[0]
        return results[0]
    maxcache[ip] = "UNKNOWN"
    return "UNKNOWN"

def box_plot_df_setup(series_a, series_b): 
    # Count up all the times that a category from series_a
    # matches up with a category from series_b. This is
    # basically a gigantic contingency table
    cont_table = collections.defaultdict(lambda : collections.Counter())
    for val_a, val_b in zip(series_a.values, series_b.values):
        cont_table[val_a][val_b] += 1
    
    # Create a dataframe
    # A dataframe with keys from series_a as the index, series_b_keys
    # as the columns and the counts as the values.
    dataframe = pd.DataFrame(cont_table.values(), index=cont_table.keys())
    dataframe.fillna(0, inplace=True)
    return dataframe

def is_ip(ip):
    try:
        socket.inet_aton(ip)
        return True
    except socket.error:
        return False

In [5]:
# misc cleanup of the Bro conn.log dataframe
try:
    conndf.orig_bytes[conndf.orig_bytes == '-'] = 0
except Exception as e:
    pass
try:
    conndf.resp_bytes[conndf.resp_bytes == '-'] = 0
except Exception as e:
    pass
conndf['orig_bytes'] = conndf['orig_bytes'].astype(long)
conndf['resp_bytes'] = conndf['resp_bytes'].astype(long)
conndf['total_bytes'] = conndf['orig_bytes'] + conndf['resp_bytes']

# and augmentation (asn)
conndf['maxmind_asn'] = conndf['id.resp_h'].map(maxmind_lookup)
# add date
good_datetime = [datetime.fromtimestamp(float(date)) for date in conndf['ts'].values]
conndf['date'] = pd.Series(good_datetime, index=conndf.index)

# reindex the dataframes
conndf = conndf.reindex()
httpdf = httpdf.reindex()
dnsdf = dnsdf.reindex()
noticedf = noticedf.reindex()
filesdf = filesdf.reindex()
smtpdf = smtpdf.reindex()

High-level overview of the data

The goal here is to get a brief understanding of the amount of data that we're dealing with after reading all of the log files in. This includes understanding the "how-many-of-each", the "when", and the "what".

Times of activity


In [7]:
for threat in ['APT', 'CRIME']:
    subset = conndf[conndf['threat'] == threat][['date','sample']]
    subset['count'] = 1
    pivot = pd.pivot_table(subset, values='count', rows=['date'], cols=['sample'], fill_value=0)
    by = lambda x: lambda y: getattr(y, x)
    grouped = pivot.groupby([by('year'),by('month')]).sum()
    
    ax = grouped.plot()
    pylab.ylabel('Connections')
    pylab.xlabel('Date Recorded')
    patches, labels = ax.get_legend_handles_labels()
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2, title="Sample Name")


General Stats

Here we can just get a feeling of the various high-level dimensions of the data


In [8]:
print "Total Samples:          %s" % conndf['sample'].nunique()
print ""
print "APT Samples:            %s" % conndf[conndf['threat'] == 'APT']['sample'].nunique()
print "Crime Samples:          %s" % conndf[conndf['threat'] == 'CRIME']['sample'].nunique()
print "Metasploit Samples:     %s" % conndf[conndf['threat'] == 'METASPLOIT']['sample'].nunique()
print ""
print "Connection Log Entries: %s" % conndf.shape[0]
print "DNS Log Entries:        %s" % dnsdf.shape[0]
print "HTTP Log Entries:       %s" % httpdf.shape[0]
print "Files Log Entries:      %s" % filesdf.shape[0]
print "SMTP Log Entries:       %s" % smtpdf.shape[0]
print "Weird Log Entries:      %s" % weirddf.shape[0]
print "SSL Log Entries:        %s" % ssldf.shape[0]
print "Notice Log Entries:     %s" % noticedf.shape[0]
print "Tunnel Log Entries:     %s" % tunneldf.shape[0]
print "Signature Log Entries:  %s" % sigdf.shape[0]


Total Samples:          105

APT Samples:            31
Crime Samples:          68
Metasploit Samples:     6

Connection Log Entries: 104413
DNS Log Entries:        108371
HTTP Log Entries:       22927
Files Log Entries:      19289
SMTP Log Entries:       4088
Weird Log Entries:      1081
SSL Log Entries:        351
Notice Log Entries:     252
Tunnel Log Entries:     2
Signature Log Entries:  1

Security Workflow Breakdown

This is an example of how to go from an alert (in this case a Bro signature match) to gathering information about the alert via the other data in just a few lines of code.

We'll want to

  • Find the signatures that fired
  • Grab the destination address for each signature
  • For each destination address gather:
    • Basic flow information
    • High-level HTTP information
    • Get information about the data transferred in the HTTP sessions
    • For each file grab the filename and mime-type
      • Also see if the file was found in the Team Cymru MHR (by looking through the notice dataframe)

In [9]:
# Get all the destination addresses from all the signature hits, in this case it's only one.
sig_dst_ips = sigdf['dst_addr'].tolist()
sigdf[['dst_addr', 'dst_port','sig_id','sub_msg','threat','sample']]


Out[9]:
dst_addr dst_port sig_id sub_msg threat sample
0 199.192.156.134 443 windows_reverse_shell POST /bbs/info.asp HTTP/1.1^M^JHost: 199.192.1... APT Mswab_Yayih_FD1BE09E499E8E380424B3835FC973A8_2...

1 rows × 6 columns


In [10]:
# Let's see what other information we can gather about the network sessions surrounding that signature
for ip in sig_dst_ips:
    print "**** IP: %s ****" %ip
    print "  ** Flow Information **"
    print conndf[conndf['id.resp_h'] == ip][['id.resp_p','proto','service','duration','conn_state','orig_ip_bytes','resp_ip_bytes']]
    print "  ** HTTP Information **"
    print httpdf[httpdf['id.resp_h'] == ip][['method','host','uri','user_agent']]
    files = httpdf[httpdf['id.resp_h'] == ip]['orig_fuids']
    flist = files.append(httpdf[httpdf['id.resp_h'] == ip]['resp_fuids']).tolist()
    # We use SHA1 because that's what gets tossed in the Bro notice.log for the Team Cymru MHR alerts
    print "  ** File SHA1 **"
    for f in flist:
        if f != '-':
            sha1 = filesdf[filesdf['fuid'] == f]['sha1'].tolist()
            for m in sha1:
                print "Sample Hash: %s" % m
                if noticedf[noticedf['sub'].str.contains(m)][['sub','sample']].shape[0] > 0:
                    print noticedf[noticedf['sub'].str.contains(m)][['sub','sample']]
                print "Filename: %s    mime-type: %s" % (filesdf[filesdf['sha1'] == m]['filename'].tolist()[0], filesdf[filesdf['sha1'] == m]['mime_type'].tolist()[0])
                print ""
            #print md5


**** IP: 199.192.156.134 ****
  ** Flow Information **
    id.resp_p proto service  duration conn_state  orig_ip_bytes  resp_ip_bytes
4         443   tcp    http  110.4259         SF            436            346
5         443   tcp    http  7.229701         SF            501            360
6         443   tcp    http  0.750663         SF            377            333
7         443   tcp    http  10.36404         SF            517            361
8         443   tcp    http  0.698435       RSTO            378            293
9         443   tcp       -  8.462028         SF           1682            392
10        443   tcp    http  69.32855       RSTO            363            306

[7 rows x 7 columns]
  ** HTTP Information **
  method             host            uri user_agent
0   POST  199.192.156.134  /bbs/info.asp          -
1   POST  199.192.156.134  /bbs/info.asp          -
2   POST  199.192.156.134  /bbs/info.asp          -
3   POST  199.192.156.134  /bbs/info.asp          -
4   POST  199.192.156.134  /bbs/info.asp          -
5   POST  199.192.156.134  /bbs/info.asp          -

[6 rows x 4 columns]
  ** File SHA1 **
Sample Hash: b0122bc7cedf885857e61764806ada3cf40c0934
Filename: -    mime-type: binary

Sample Hash: 6dac3f9397e89d9a5db495f78f66007533f59fcd
Filename: -    mime-type: binary

Sample Hash: dfc7310ffd971d36fe1d84c1b6eba0c0025cc4db
Filename: -    mime-type: binary

Sample Hash: 2c7a0c7577bfefd2cc52677671a423de81328876
Filename: -    mime-type: binary

Sample Hash: bd201fad1039536e84ce0019a03db6fa2f52f4d3
Filename: -    mime-type: binary

Sample Hash: c21f5b7fa4023f68861c11ec4b8fea98595ea8f6
Filename: -    mime-type: binary

Sample Hash: d01e7d3c2358aea02008176834b5e9f37404a7bc
Filename: -    mime-type: binary

Sample Hash: c83da355ebfc8f842c45216621062719394ff106
Filename: -    mime-type: binary

Sample Hash: abbbfc0234d408498b344bcd899ae5f4081e6da6
Filename: -    mime-type: binary

Sample Hash: 5bbf8274b821c61c578b6d3fe9578b6a169767be
Filename: -    mime-type: binary

DNS Breakdown

Query types


In [11]:
print dnsdf.qtype_name.value_counts()


NB      51487
A       42021
MX      10990
-        3825
PTR        16
*          14
SRV         7
NS          6
AAAA        5
dtype: int64

Query type information


In [12]:
for q in dnsdf['qtype_name'].unique().tolist():
    print "Query Type: %s" % q
    print dnsdf[dnsdf['qtype_name'] == q]['query'].value_counts().head(5)
    print ""


Query Type: A
star-trakers.com           482
oiexgmycrtwirsgcmv.com     308
tthayebvhdmntiyeuxw.com    261
google.com                 246
ymcwineqkj.com             241
dtype: int64

Query Type: -
-    3825
dtype: int64

Query Type: NB
NOLOGO1093.COM    6544
NOLOGO0094.NET    6533
FIBLOLPP.COM       725
XNUQKDWEK.COM      725
UWSCTPIHLT.COM     725
dtype: int64

Query Type: NS
de     2
org    2
com    2
dtype: int64

Query Type: PTR
221.107.164.128.in-addr.arpa.localdomain    4
221.107.164.128.in-addr.arpa                4
lb._dns-sd._udp.0.0.29.172.in-addr.arpa     1
lb._dns-sd._udp.va.comcast.net              1
db._dns-sd._udp.va.comcast.net              1
dtype: int64

Query Type: SRV
*\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00    4
orion [00:50:56:c0:00:08]._workstation._tcp.local            3
dtype: int64

Query Type: MX
cpmwc.com               2864
fluidsystemsbots.com    1261
webworkz.com             748
alltel.net               678
fast-solutions.net       426
dtype: int64

Query Type: *
xps-8300          10
revetonforever     4
dtype: int64

Query Type: AAAA
time.windows.com                2
mfodjf393843218.us              2
javadl-esd-secure.oracle.com    1
dtype: int64

Response Code information


In [13]:
dnsdf['rcode_name'].value_counts()


Out[13]:
-           62842
NXDOMAIN    24067
NOERROR     21003
REFUSED       311
SERVFAIL      146
NOTAUTH         1
NOTZONE         1
dtype: int64

Want to take a guess at which one(s) possibly use a DGA to connect/find to C2 domains?

  • https://www.damballa.com/downloads/r_pubs/WP_PushDo_Evolves_Again.pdf (for the relationship between NXDOMAIN and DGA)
  • https://blogs.mcafee.com/mcafee-labs/ramnit-malware-creates-ftp-network-from-victims-computers

In [14]:
dnsdf[dnsdf['rcode_name'] == 'NXDOMAIN']['sample'].value_counts().head(10)


Out[14]:
BIN_Ramnitpcap_2012-01                                          20651
BIN_Kuluoz-Asprox_9F842AD20C50AD1AAB41F20B321BF84B               1711
BIN_Wordpress_Mutopy_Symmi_20A6EBF61243B760DD65F897236B6AD3-ShortRun      432
cryptolocker_9CBB128E8211A7CD00729C159815CB1C                     258
BIN_Cutwail-Pushdo(2)_582DE032477E099EB1024D84C73E98C1            236
BIN_CitadelPacked_2012-05                                         223
BIN_torpigminiloader_011C1CA6030EE091CE7C20CD3AAECFA0             181
purplehaze                                                        131
BIN_Cutwail-Pushdo(1)_582DE032477E099EB1024D84C73E98C1            128
BIN_torpigminiloader_C3366B6006ACC1F8DF875EAA114796F0              28
dtype: int64

Generally in HTTP traffic we expect to see a value in the 'Host' header. This indicates the virtual host that the client is connecting to on a given IP address. It could be interesting to see what hostnames are present in the HTTP 'Host' header, yet no DNS query was seen. Keep in mind this could be as simple as it wasn't recorded/included in the PCAP, or that it really didn't happen.


In [15]:
intersect_hostnames = set(pd.Series(list(set(httpdf['host']).intersection(set(dnsdf['query'])))))
interesting = []
tempdf = pd.DataFrame()
for hn in list(set(httpdf['host'])):
    if hn not in intersect_hostnames and not is_ip(hn):
        #print hn
        interesting.append(hn)
        tempdf = tempdf.append(httpdf[httpdf['host'] == hn])

In [16]:
tempdf['count'] = 1
tempdf[['host', 'id.resp_h', 'sample', 'count']].groupby(['sample', 'host', 'id.resp_h']).sum().sort('count', ascending=0)


Out[16]:
count
sample host id.resp_h
purplehaze insideentrepreneurs.com 209.114.50.164 20
BIN_ZeroAccess_Sirefef_29A35124ABEAD63CD8DB2BBB469CBC7A_2013-05 www.e-zeeinternet.com 209.68.32.176 9
EK_popads_109.236.80.170_2013-08-13 tqhsy.8taglik.info 109.236.80.170 8
EK_BIN_Blackhole_leadingto_Medfos_0512E73000BCCCE5AFD2E9329972208A_2013-04 autorepairgreeley.info 198.100.45.44 7
EK_Smokekt150(Malwaredontneedcoffee)_2012-09 bigfatcounters.com 213.108.252.185 6
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 dgyqimolcqm.cm 31.184.244.182 5
81.17.26.187 4
EK_popads_109.236.80.170_2013-08-13 qkvuz.12taglik.info 109.236.80.170 4
xrp.8taglik.info 109.236.80.170 3
EK_Smokekt150(Malwaredontneedcoffee)_2012-09 LODKDKD12.INFO 62.76.188.226 3
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 dgyqimolcqm.cm 81.17.18.18 3
BIN_ZeroAccess_3169969E91F5FE5446909BBAB6E14D5D_2012-10 izhsuqbtcsx.cm 31.184.244.180 2
RealPlayer_rmoc3260.dll_ActiveX_Control_Remote_Code_Execution_Exploit freak 192.168.0.15 1
BIN_Wordpress_Mutopy_Symmi_20A6EBF61243B760DD65F897236B6AD3-ShortRun VARNAJALAMARTS.com 198.154.237.48 1
iMesh_7.1.0.x(IMWeb.dll_7.0.0.x)_Remote_Heap_Overflow_Exploit freak 192.168.0.15 1
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 SVRIntl-crl.verisign.com 23.4.181.163 1
Yahoo_Music_Jukebox_2.2-AddImage()_ActiveX_Remote_BOF_Exploit(2) freak 192.168.0.15 1
BIN_sality_CEAF4D9E1F408299144E75D7F29C1810 livelife-eg.com 97.74.182.1 1
NUVICO_DVR_NVDV4__PdvrAtl_Module_(PdvrAt.DLL_1.0.1.25)_BoF_Exploit freak 192.168.0.15 1
purplehaze d.pixel.trafficmp.com 107.20.175.29 1
EK_Smokekt150(Malwaredontneedcoffee)_2012-09 delivery.trafficbroker.com 192.168.186.6 1
Sejoong_Namo_ActiveSquare_6_NamoInstaller.dll-ActiveX_BoF_Exploit freak 192.168.0.15 1
Microsoft_SQL_Server_Distributed_Management_Objects_BoF_Exploit freak 192.168.0.15 1
BIN_DNSWatch_protux_4F8A44EF66384CCFAB737C8D7ADB4BB8_2012-11 vcvcvcvc.dyndns.org 114.244.44.115 1

24 rows × 1 columns

From the looks of the above, it seems that for some of the samples DNS traffic wasn't logged vs. malware doing something tricky. We can verify (below) that there really doesn't appear to be any DNS traffic related to one of the domains.


In [17]:
print dnsdf[dnsdf['query'] == "dgyqimolcqm.cm"]
print dnsdf[dnsdf.answers.str.contains('dgyqimolcqm.cm')]
print dnsdf[dnsdf['sample'] == "BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9"]['query'].value_counts().head(50)


Empty DataFrame
Columns: [ts, uid, id.orig_h, id.orig_p, id.resp_h, id.resp_p, proto, trans_id, query, qclass, qclass_name, qtype, qtype_name, rcode, rcode_name, AA, TC, RD, RA, Z, answers, TTLs, rejected, threat, sample]
Index: []

[0 rows x 25 columns]
Empty DataFrame
Columns: [ts, uid, id.orig_h, id.orig_p, id.resp_h, id.resp_p, proto, trans_id, query, qclass, qclass_name, qtype, qtype_name, rcode, rcode_name, AA, TC, RD, RA, Z, answers, TTLs, rejected, threat, sample]
Index: []

[0 rows x 25 columns]
Series([], dtype: int64)

Well, at least there's always HTTP traffic to look at for the above domain.


In [18]:
httpdf[httpdf['host'] == "dgyqimolcqm.cm"][['id.orig_h','id.orig_p','id.resp_h','id.resp_p','uri','sample','threat']]


Out[18]:
id.orig_h id.orig_p id.resp_h id.resp_p uri sample threat
2 192.168.248.165 1138 81.17.26.187 80 /X11HXlhHWF1bR1hbWUZcXA8KCloKW19QCF0NDF8LXlpZC... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
4 192.168.248.165 1143 81.17.26.187 80 /X1xHXVBHW1pHWF1fRlxcDwoKWgpbX1AIXQ0MXwteWlkLD... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
42 192.168.248.165 1146 81.17.26.187 80 /WFBQR1hYXEdYWFxHWFpfRgoFAAoCVhwbBVQIITtZCi0GH... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
108 192.168.248.165 1204 81.17.26.187 80 /UFxHW1hYR1hQWkdYUUZWCgUADVRaWRgFDFgYAANRDAcTWQ== BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
150 192.168.248.165 1229 81.17.18.18 80 /X19HW1tZR15HW11aRg1bWVoNWVENXloLXwxeW19ZXl5ZC... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
221 192.168.248.165 1257 81.17.18.18 80 /X19HW1tZR15HW11eRg1bWVoNWVENXloLXwxeW19ZXl5ZC... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
258 192.168.248.165 1263 81.17.18.18 80 /WFBQR1hYXEdYWFxHWFpfRgoFAAoCVhwbBVQIITtZCi0GH... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
363 192.168.248.165 1328 31.184.244.182 80 /X19HW1tZR15HW11eRg9YWAxeUF8PWg8MXw9eXVALX1pdW... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
365 192.168.248.165 1332 31.184.244.182 80 /WF5dR1haXkdYXV1HWFFaRgoFAAoCRxkBGVYKBQAKAg0IH... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
446 192.168.248.165 1350 31.184.244.182 80 /UFxHW1hYR1hQWkdYXEZWCgUADVRbGAULXlgYAANQWVATWQ== BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
634 192.168.248.165 1419 31.184.244.182 80 /WFBQR1hYXEdYWFxHWFpfRgoFAAoCVhwbBVQIITtZCi0GH... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME
921 192.168.248.165 1561 31.184.244.182 80 /XFFZWkcaAAcNDAUKBQAKAkcKBgRGVhlUUTtcHCo7ASEjX... BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F9... CRIME

12 rows × 7 columns

HTTP Funtime!

User-Agent, who's saying what?

We all know malware will use fake/spoofed User-Agent string, but maybe with a couple commands we can see if some are more dishonest than others...


In [19]:
print "%s Unique User-Agents in %s samples." % (httpdf['user_agent'].nunique(), httpdf['sample'].nunique())


190 Unique User-Agents in 83 samples.

In [20]:
tempdf = pd.DataFrame(columns=['sample','num_ua'])
for sample in list(set(httpdf['sample'])):
    tempdf = tempdf.append({'sample':sample, 'num_ua':httpdf[httpdf['sample'] == sample]['user_agent'].nunique()}, ignore_index=True)
tempdf.sort('num_ua', ascending=0).head()


Out[20]:
sample num_ua
6 BIN_dirtjumper_2011-10 103
63 purplehaze 7
26 EK_Smokekt150(Malwaredontneedcoffee)_2012-09 7
24 EK_popads_109.236.80.170_2013-08-13 6
79 BIN_ZeusGameover_2012-02 6

5 rows × 2 columns

Let's check one of the ones that doesn't completely stand out, purplehaze


In [21]:
# Well, at least we know what UA this sample uses for C2, and it seems we can see some other OS activity as well
tsample = 'purplehaze'
httpdf[httpdf['sample'] == tsample].user_agent.value_counts()


Out[21]:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)    17179
Mozilla/4.0 (compatible; UPnP/1.0; Windows NT/5.1)                 25
Mozilla/4.0 (Windows XP 5.1) Java/1.6.0_26                         23
-                                                                  16
Mozilla/4.0 (compatible; UPnP/1.0; Windows 9x)                     10
Microsoft-CryptoAPI/5.131.2600.5512                                 4
contype                                                             2
dtype: int64

In [22]:
httpdf['count'] = 1
grouped = httpdf[httpdf['sample'] == tsample][['sample','user_agent','host','count']].groupby(['sample', 'user_agent', 'host']).sum()
grouped.sort('count', ascending = 0).head(10)


Out[22]:
count
sample user_agent host
purplehaze Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1) reallysweetgames.com 1830
webgameroom.com 1587
deadrush.com 1043
ui.mevio.com 558
redirect.xmladfeed.com 354
114337.arb.xmladfeed.com 344
redirect.ad-feeds.com 342
b.scorecardresearch.com 339
static3.filmannex.com 254
log.adap.tv 232

10 rows × 1 columns

Now for something more interesting, Dirtjumper


In [23]:
tsample = 'BIN_dirtjumper_2011-10'
httpdf[httpdf['sample'] == tsample].user_agent.value_counts()


Out[23]:
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 6329) Opera 8.00 [ru]    18
Mozilla/4.1 (compatible; MSIE 5.0; Symbian OS; Nokia 6600;452) Opera 6.20 [ru]    11
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20    10
Mozilla/4.0 (compatible; MSIE 7.0b; Win32)                       9
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 6936) Opera 8.50 [ru]     9
Mozilla/4.0 (compatible; MSIE 6.0; Nitro) Opera 8.50 [en]        8
Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050609 Firefox/1.0.4     8
Mozilla/4.0 (compatible; MSIE 5.17; Mac_PowerPC)                 8
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)                  7
Mozilla/4.0 (compatible; MSIE 6.0; MSN 2.5; Windows 98)          7
Opera/9.50 (Windows NT 5.1; U; ru)                               7
Opera/9.23 (Windows NT 5.1; U; ru)                               6
Mozilla/4.0 (compatible; MSIE 5.0; SunOS 5.9 sun4u; X11)         6
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3     6
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.0.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)     6
...
mozilla/4.0 (compatible; msie 7.0; windows nt 5.1; trident/4.0; ...)    2
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.2) Gecko/2008091620 Firefox/3.0.2    2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)         2
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9    2
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6630/4.03.38; 6937) Opera 8.50 [es]    1
Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.2.15 Version/10.10    1
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9) Gecko/2008052906 Firefox/3.0    1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)    1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 SeaMonkey/1.0.4    1
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9a1) Gecko/20061204 GranParadiso/3.0a1    1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)    1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5    1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1    1
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9    1
Mozilla/4.0 (compatible; MSIE 6.0; Nitro) Opera 8.50 [es-es]    1
Length: 103, dtype: int64

In [24]:
grouped = httpdf[httpdf['sample'] == tsample][['sample','user_agent','host','count']].groupby(['sample', 'host']).sum()
grouped.sort('count', ascending = 0)


Out[24]:
count
sample host
BIN_dirtjumper_2011-10 www.tadawulfx.com 386
ukashsepeti.com 4
asdaddddaaaa.com 1

3 rows × 1 columns


In [25]:
grouped = httpdf[httpdf['sample'] == tsample][['sample','user_agent','host','count']].groupby(['sample', 'user_agent', 'host']).sum()
grouped.sort('count', ascending = 0)


Out[25]:
count
sample user_agent host
BIN_dirtjumper_2011-10 Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 6329) Opera 8.00 [ru] www.tadawulfx.com 18
Mozilla/4.1 (compatible; MSIE 5.0; Symbian OS; Nokia 6600;452) Opera 6.20 [ru] www.tadawulfx.com 10
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20 www.tadawulfx.com 10
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 6936) Opera 8.50 [ru] www.tadawulfx.com 9
Mozilla/4.0 (compatible; MSIE 7.0b; Win32) www.tadawulfx.com 9
Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050609 Firefox/1.0.4 www.tadawulfx.com 8
Mozilla/4.0 (compatible; MSIE 5.17; Mac_PowerPC) www.tadawulfx.com 8
Mozilla/4.0 (compatible; MSIE 6.0; Nitro) Opera 8.50 [en] www.tadawulfx.com 8
Opera/9.50 (Windows NT 5.1; U; ru) www.tadawulfx.com 7
Mozilla/4.0 (compatible; MSIE 6.0; MSN 2.5; Windows 98) www.tadawulfx.com 7
Opera/9.23 (Windows NT 5.1; U; ru) www.tadawulfx.com 6
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; YPC 3.0.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) www.tadawulfx.com 6
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) www.tadawulfx.com 6
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3 www.tadawulfx.com 6
Mozilla/4.0 (compatible; MSIE 5.0; SunOS 5.9 sun4u; X11) www.tadawulfx.com 6
Opera/10.00 (Windows NT 6.0; U; en) Presto/2.2.0 www.tadawulfx.com 5
Mozilla/5.0 (X11; U; Linux x86_64; ru; rv:1.9.0.2) Gecko/2008092702 Gentoo Firefox/3.0.2 www.tadawulfx.com 5
mozilla/4.0 (compatible; msie 8.0; windows nt 5.1; trident/4.0; ...) www.tadawulfx.com 5
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 9424) Opera 8.65 [ru] www.tadawulfx.com 5
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) www.tadawulfx.com 5
Opera/8.51 (Windows NT 5.1; U; en) www.tadawulfx.com 5
Mozilla/4.0 (compatible; MSIE 6.0; Nitro) Opera 8.50 [it] www.tadawulfx.com 5
Mozilla/4.0 (compatible; MSIE 6.0; Nitro) Opera 8.50 [de] www.tadawulfx.com 5
Opera/9.50 (Windows NT 6.0; U; en) www.tadawulfx.com 5
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.0.249.89 Safari/532.5 www.tadawulfx.com 5
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 www.tadawulfx.com 4
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0) www.tadawulfx.com 4
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060516 SeaMonkey/1.0.2 www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv:1.8) Gecko/20051107 Firefox/1.5 www.tadawulfx.com 4
Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.03 [en] www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 www.tadawulfx.com 4
Mozilla/2.0 (compatible; MSIE 3.01; Windows 98) www.tadawulfx.com 4
Mozilla/1.22 (compatible; MSIE 1.5; Windows NT) www.tadawulfx.com 4
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2 www.tadawulfx.com 4
Opera/9.80 (Windows NT 5.1; U; ru) Presto/2.2.15 Version/10.20 www.tadawulfx.com 4
Opera/9.0 (Windows NT 5.1; U; en) www.tadawulfx.com 4
Opera/9.00 (Wii; U; ; 1038-58; Wii Shop Channel/1.0; en) www.tadawulfx.com 4
Opera/9.02 (Windows NT 5.1; U; en) www.tadawulfx.com 4
Opera/9.10 (Windows NT 5.1; U; en) www.tadawulfx.com 4
Opera/9.80 (Windows NT 5.1; U; en) Presto/2.5.18 Version/10.50 www.tadawulfx.com 4
Mozilla/5.0 (X11; U; Linux x86_64; ru; rv:1.9.1.1) Gecko/20090730 Gentoo Firefox/3.5.1 www.tadawulfx.com 4
Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.2.15 Version/10.00 www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.65 Safari/525.19 www.tadawulfx.com 4
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3 www.tadawulfx.com 3
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0) www.tadawulfx.com 3
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1 www.tadawulfx.com 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50 www.tadawulfx.com 3
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 www.tadawulfx.com 3
Opera/7.23 (Windows 98; U) [en] www.tadawulfx.com 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727) www.tadawulfx.com 3
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070221 SUSE/2.0.0.2-6.1 Firefox/2.0.0.2 www.tadawulfx.com 3
Opera/8.0 (X11; Linux i686; U; cs) www.tadawulfx.com 3
Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.50 www.tadawulfx.com 3
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20030105 Phoenix/0.5 www.tadawulfx.com 3
Mozilla/4.0 (compatible; MSIE 6.0; Symbian OS; Nokia 6600/5.27.0; 1665) Opera 8.60 [ru] www.tadawulfx.com 3
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text www.tadawulfx.com 3
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13 www.tadawulfx.com 3
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.4.154.25 Safari/525.19 www.tadawulfx.com 3
...

108 rows × 1 columns

Conn.log Information

ASN information

First we can check the coverage of the ASN database (thanks again Maxmind). The coverage appears to be pretty good, only not marking RFC1918 addresses in addtion to broadcast, etc... IPs. However, we can see a few IP addresses that aren't covered, oh well. :)

Keep in mind we're only looking at destination IP addresses.


In [26]:
conndf[conndf['maxmind_asn'] == "UNKNOWN"]['id.resp_h'].value_counts()


Out[26]:
172.29.0.255       4203
10.0.2.255         2774
71.74.56.243        382
71.74.56.244        367
192.168.106.2       341
172.16.165.2        242
172.29.0.1          235
172.29.0.116        136
74.120.140.21       112
208.81.191.111      110
224.0.0.252         100
255.255.255.255      99
ff02::1:3            98
172.16.253.254       92
239.255.255.250      76
...
192.168.2.2             1
ff02::2                 1
74.120.140.23           1
172.16.148.184          1
172.29.0.111            1
31.184.245.202          1
fe80::ffff:ffff:fffe    1
173.23.253.246          1
209.17.74.150           1
115.254.253.254         1
87.120.169.3            1
192.168.106.155         1
172.16.0.255            1
208.93.140.130          1
192.168.254.1           1
Length: 110, dtype: int64

APT Connections

What about if we look at some of the samples marked APT, and see how they map to some of the various ASNs that we now have. It looks like, at first glance a couple of samples are operating out of the same ASN (AS4134 Chinanet and AS3356 L3 Communications)


In [27]:
ax = box_plot_df_setup(conndf[conndf['threat'] == 'APT']['sample'], conndf[conndf['threat'] == 'APT']['maxmind_asn']).T.plot(kind='bar', stacked=True)
pylab.ylabel('Sample Occurrences')
pylab.xlabel('ASN (Autonomous System Number)')
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., title="Sample Name")


Out[27]:
<matplotlib.legend.Legend at 0x11997d210>

Just showing how to look at all the various AS contacted by a single sample. It also points out that the multiple parking above on L3 might be due to DNS requests.


In [29]:
conndf[conndf['sample'] == "BIN_8202_6d2c12085f0018daeb9c1a53e53fd4d1"][['maxmind_asn','id.resp_h']]


Out[29]:
maxmind_asn id.resp_h
0 UNKNOWN 172.16.253.254
1 UNKNOWN 255.255.255.255
2 UNKNOWN 172.16.253.129
3 AS15169 Google Inc. 8.8.8.8
4 AS3356 Level 3 Communications 4.2.2.2
5 AS53850 GorillaServers, Inc. 192.200.99.194
6 AS53850 GorillaServers, Inc. 192.200.99.194
7 AS53850 GorillaServers, Inc. 192.200.99.194
8 UNKNOWN 255.255.255.255
9 UNKNOWN 172.16.253.132
10 AS53850 GorillaServers, Inc. 192.200.99.194
11 UNKNOWN 255.255.255.255
12 UNKNOWN 172.16.253.130

13 rows × 2 columns

Top-N (it's everybody's favorite)

We have to look at them, let's see if we can make it interesting. Starting with the samples who transferred the most amount of bytes, broken down by which ports (in each sample) were used. A brief examination of some of the rows is also performed.


In [30]:
conndf['count'] = 1
grouped = conndf.groupby(['sample', 'id.resp_p']).sum()
grouped.sort('total_bytes', ascending = 0).head(10)


Out[30]:
ts id.orig_p orig_bytes resp_bytes missed_bytes orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes total_bytes count
sample id.resp_p
purplehaze 80 9.584038e+12 21282798 14945798 172420089 29203 124500 20040903 160493 179127211 187365887 7217
1935 6.639908e+09 19212 19172 37680956 8057592 13532 560457 20374 30477819 37700128 5
BIN_LoadMoney_MailRu_dl_4e801b46068b31b82dac65885a58ed9e_2013-04 80 4.242826e+02 15894 2648 28139301 0 14765 593368 28269 29270121 28141949 15
BIN_Kuluoz-Asprox_9F842AD20C50AD1AAB41F20B321BF84B 25 2.326434e+13 51049454 7609979 2814589 0 89551 11325211 79440 6111735 10424568 17241
BIN_ArdamaxKeylogger_E33AF9E602CBB7AC3634C2608150DD18 587 1.359930e+09 1043 10343395 324 0 10074 10749283 10180 407564 10343719 1
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 80 5.559369e+11 563199 663932 8706987 0 4682 854508 9845 9364779 9370919 412
BIN_Cutwail_284Fb18Fab33C93Bc69Ce392D08Fd250_2012-10 80 4.179328e+03 117955 41749 4931282 0 2984 162037 5144 5137374 4973031 105
BIN_ZeusGameover_2012-02 80 9.033005e+10 102617 11477 3721029 0 1943 89581 2792 3833009 3732506 68
BIN_9002_D4ED654BCDA42576FDDFE03361608CAA_2013-01-30 53 1.357444e+09 1143 3580503 117612 0 3073 3703423 3588 261361 3698115 1
XTremeRAT_DAEBFDED736903D234214ED4821EAF99_2013-04-13 336 1.631920e+10 12601 3482315 0 0 2715 3591003 0 0 3482315 12

10 rows × 11 columns


In [31]:
# That port 1935 from above might be interesting, where's it going?
conndf[conndf['id.resp_p'] == 1935][['id.resp_h','proto']]


Out[31]:
id.resp_h proto
5541 69.22.155.28 tcp
5561 68.142.111.111 tcp
6746 184.51.157.60 tcp
7095 184.84.220.133 tcp
7509 184.51.157.60 tcp

5 rows × 2 columns


In [32]:
# Same with port 336
conndf[conndf['id.resp_p'] == 336][['id.resp_h','proto']]


Out[32]:
id.resp_h proto
0 197.163.56.70 tcp
3 197.163.56.70 tcp
4 197.163.56.70 tcp
5 197.163.56.70 tcp
6 197.163.56.70 tcp
7 197.163.56.70 tcp
10 197.163.56.70 tcp
11 197.163.56.70 tcp
12 197.163.56.70 tcp
15 197.163.56.70 tcp
16 197.163.56.70 tcp
17 197.163.56.70 tcp

12 rows × 2 columns

SMTP, because who doesn't love some SPAM


In [33]:
smtpdf.sample.value_counts()


Out[33]:
BIN_Kuluoz-Asprox_9F842AD20C50AD1AAB41F20B321BF84B         4085
BIN_Sanny-Daws_338D0B855421867732E05399A2D56670_2012-10       2
BIN_ArdamaxKeylogger_E33AF9E602CBB7AC3634C2608150DD18         1
dtype: int64

Hosts aka Open Relays

Looks like we've found quite a few (wonder if they still work). I'm also amazed at how many are included with one sample. There are at most 3 other hosts in this list not related to the Asprox sample, but it seems that Asprox sends quite a bit of email and includes a pretty good list of open relays.


In [34]:
print "Unique Hosts found as the HELO portion of SMTP traffic: %s" % smtpdf.helo.nunique()
print ""
print "Some of the examples"
print smtpdf.helo.value_counts().head(10)


Unique Hosts found as the HELO portion of SMTP traffic: 89

Some of the examples
apostille123.com              132
bestcatscratchpost.com        112
costaricaposters.com          109
olgapost.com                  105
highschoolapostles.com        104
kevinpostmotors.com            97
posteraday.com                 92
postroomsupplies.com           89
bancopostaclienteitaly.com     87
alliedpostal.com               87
dtype: int64

Patterns in email

Maybe we can find some patterns in senders and subjects?


In [35]:
smtpdf['count'] = 1
grouped = smtpdf[smtpdf['from'] != "-"][['from','subject','count']].groupby(['from', 'subject']).sum()
grouped.sort('count', ascending = 0).head(20)


Out[35]:
count
from subject
Economy Shipping <support_id81@highperfpostgresql.com> Delivery Notification ID#EN95887556F 6
Next Day Air Saver <message_id98@olgapost.com> Delivery Notification ID#EN79318987H 4
Economy Shipping <support_id55@posturalvertigo.com> Delivery Status Notification 3
One Day Shipping <personal_id86@taskoprupostasi.com> Ship Notification ID#EN05842223A 3
Mail International <contact_id72@bestcatscratchpost.com> Delivery Notification ID#EN18841053F 3
Logistics Services <delivery.id78@kevinpostmotors.com> Ship Notification ID#EN43279293A 3
Next Day Air Saver <us_04@halfpriceposters.com> Delivery Status Notification ID#EN56869729X 3
Standard Shipping <status_id46@goppost.com> Delivery Notification ID#EN28866699H 3
One Day Shipping <personal_id63@taskoprupostasi.com> Delivery Notification 3
Expedited Shipping <federal_id94@scooterspost.com> Ship Notification ID#EN28765320A 3
One Day Shipping <personal_id78@taskoprupostasi.com> Delivery Notification 3
Priority Mail <status_60@hissignpost.com> Delivery Notification ID#EN60271900F 3
One Day Shipping <customer.id15@costaricaposters.com> Delivery Notification ID#EN13576648J 3
One Day Shipping <item_05@npcompost.com> Delivery Status Notification ID#EN75648058P 3
Expedited Shipping <federal_id73@scooterspost.com> Delivery Notification ID#EN92085505H 3
One Day Shipping <personal_id99@taskoprupostasi.com> Delivery Notification ID#EN41600040F 3
Priority Mail <status_80@hissignpost.com> Delivery Notification ID#EN80773754H 3
Mail International <contact_id98@bestcatscratchpost.com> Delivery Status Notification ID#EN58347354P 3
Mail International <help_id50@alexanderapostol.com> Delivery Status Notification ID#EN15607017P 3
Logistics Services <delivery.id07@kevinpostmotors.com> Delivery Status Notification ID#EN45696799P 3

20 rows × 1 columns

Network traffic is fine and dandy, but now it's time for some more eye-candy!

Files

It's nice to have the context around how systems communicate. We've got some great stats/data surrounding the c2 and delivery mechanisms, so let's see how they related to the files that get transferred. Bro can extract files from IRC, SMTP, HTTP, and FTP out of the box.

Summary information

What's the most popular, and what does it look like per-protocol?


In [36]:
ax = box_plot_df_setup(filesdf['source'], filesdf['mime_type']).T.plot(kind='bar', stacked=True)
pylab.xlabel('Mime-Type')
pylab.ylabel('Number of Files')
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, title="Service Type")


Out[36]:
<matplotlib.legend.Legend at 0x11372fcd0>

In [37]:
ax = box_plot_df_setup(filesdf.loc[(filesdf["mime_type"] != 'text/html') & (filesdf['mime_type'] != 'text/plain')]['source'], filesdf.loc[(filesdf["mime_type"] != 'text/html') & (filesdf['mime_type'] != 'text/plain')]['mime_type']).T.plot(kind='bar', stacked=True)
pylab.xlabel('Mime-Type')
pylab.ylabel('Number of Files')
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, title="Service Type")


Out[37]:
<matplotlib.legend.Legend at 0x1192ccc90>

In [38]:
filesdf['count'] = 1
filesdf[filesdf['filename'] != '-'][['source','mime_type','seen_bytes','count']].groupby(['source','mime_type']).sum().sort('count', ascending=0).head(10)


Out[38]:
seen_bytes count
source mime_type
SMTP image/jpeg 7554332 26
HTTP binary 2922872 13
application/x-dosexec 1717057 12
application/pdf 126691 7
image/png 2534 2
image/gif 1082697 2
text/plain 57254 1
image/jpeg 9972 1

8 rows × 2 columns

Filenames

I wonder if filenames get reused, or perhaps there are some super common ones that might be interesting to look for in network traffic.


In [39]:
filesdf[filesdf['filename'] != '-'][['source','mime_type','filename','count']].groupby(['source','mime_type','filename']).sum().sort('count', ascending=0).head(10)


Out[39]:
count
source mime_type filename
HTTP binary COMMON.BIN 6
application/x-dosexec contacts.exe 3
image/png ad516503a11cd5ca435acc9bb6523536.png 2
binary setusating.bin 2
application/x-dosexec readme.exe 2
info.exe 1
image/gif maumauwebtvB.gif 1
maumauwebtvA.gif 1
binary pg.dll.crp 1
fp10na.dll.crp 1

10 rows × 1 columns


In [40]:
filesdf[filesdf['filename'] != '-'][['sample','mime_type','filename','count']].groupby(['sample','mime_type','filename']).sum().sort('count', ascending=0).head(10)


Out[40]:
count
sample mime_type filename
BIN_Kuluoz-Asprox_9F842AD20C50AD1AAB41F20B321BF84B binary COMMON.BIN 6
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 image/png ad516503a11cd5ca435acc9bb6523536.png 2
BIN_ZeusGameover_2012-02 application/x-dosexec contacts.exe 2
BIN_Zeus_b1551c676a54e9127cd0e7ea283b92cc-2012-04 binary setusating.bin 2
purplehaze text/plain jquery-1.3.2.min.js 1
BIN_ArdamaxKeylogger_E33AF9E602CBB7AC3634C2608150DD18 image/jpeg Jun_06_2013__09_46_41.jpg 1
Jun_06_2013__09_46_39.jpg 1
Jun_06_2013__09_46_38.jpg 1
Jun_06_2013__09_46_37.jpg 1
Jun_06_2013__09_46_36.jpg 1

10 rows × 1 columns

Notice

What things should we pay attention to?

We got some interesting coverage of Bro notices, any of these could provide an interesting place to launch an investigation from.


In [41]:
noticedf['count'] = 1
noticedf[['note','msg','count']].groupby(['note','msg']).sum().sort('count', ascending=0)


Out[41]:
count
note msg
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 199
SSL certificate validation failed with (certificate is not yet valid) 14
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 68% Last seen: 2012-12-09 06:16:04 9
SSL::Invalid_Server_Cert SSL certificate validation failed with (self signed certificate) 5
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 41% Last seen: 2013-01-17 10:23:10 3
Malware Hash Registry Detection rate: 32% Last seen: 2012-01-21 16:31:03 3
Malware Hash Registry Detection rate: 36% Last seen: 2013-06-01 07:16:06 2
Malware Hash Registry Detection rate: 50% Last seen: 2012-09-22 14:46:06 1
Malware Hash Registry Detection rate: 23% Last seen: 2012-04-11 18:01:02 1
Malware Hash Registry Detection rate: 60% Last seen: 2013-06-01 03:06:57 1
Malware Hash Registry Detection rate: 55% Last seen: 2012-08-27 07:01:03 1
Scan::Address_Scan 192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 0m30s 1
192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 0m38s 1
192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 3m3s 1
Signatures::Sensitive_Signature 10.0.2.15: ATTACK-RESPONSES Microsoft cmd.exe banner (reverse-shell originator) 1
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 24% Last seen: 2012-02-06 12:06:50 1
Malware Hash Registry Detection rate: 41% Last seen: 2012-04-19 11:16:03 1
Malware Hash Registry Detection rate: 25% Last seen: 2012-02-01 14:40:28 1
Malware Hash Registry Detection rate: 27% Last seen: 2012-12-01 02:31:03 1
Malware Hash Registry Detection rate: 55% Last seen: 2012-04-24 10:46:03 1
Malware Hash Registry Detection rate: 32% Last seen: 2012-12-01 05:16:09 1
Malware Hash Registry Detection rate: 36% Last seen: 2012-02-05 20:46:02 1
Malware Hash Registry Detection rate: 40% Last seen: 2013-06-15 00:00:44 1
Malware Hash Registry Detection rate: 81% Last seen: 2013-10-31 23:33:08 1

24 rows × 1 columns


In [42]:
# We can get a slightly different look at the world by throwing some ports into the mix! Looks like we might have some winners here.
noticedf[['note','msg','id.resp_p','count']].groupby(['note','msg','id.resp_p']).sum().sort('count', ascending=0)


Out[42]:
count
note msg id.resp_p
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 443 120
9001 56
SSL certificate validation failed with (certificate is not yet valid) 443 14
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 68% Last seen: 2012-12-09 06:16:04 80 9
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 80 5
10203 3
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 32% Last seen: 2012-01-21 16:31:03 80 3
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 44945 3
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 41% Last seen: 2013-01-17 10:23:10 80 3
SSL::Invalid_Server_Cert SSL certificate validation failed with (self signed certificate) 443 3
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 36% Last seen: 2013-06-01 07:16:06 80 2
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 9101 2
SSL certificate validation failed with (self signed certificate) 443 2
SSL certificate validation failed with (unable to get local issuer certificate) 5001 1
5251 1
11443 1
7540 1
8443 1
22 1
9002 1
9060 1
6001 1
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 81% Last seen: 2013-10-31 23:33:08 80 1
SSL::Invalid_Server_Cert SSL certificate validation failed with (unable to get local issuer certificate) 39030 1
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 36% Last seen: 2012-02-05 20:46:02 80 1
Malware Hash Registry Detection rate: 60% Last seen: 2013-06-01 03:06:57 80 1
Malware Hash Registry Detection rate: 55% Last seen: 2012-08-27 07:01:03 80 1
Malware Hash Registry Detection rate: 55% Last seen: 2012-04-24 10:46:03 80 1
Malware Hash Registry Detection rate: 50% Last seen: 2012-09-22 14:46:06 8888 1
Malware Hash Registry Detection rate: 41% Last seen: 2012-04-19 11:16:03 80 1
Malware Hash Registry Detection rate: 40% Last seen: 2013-06-15 00:00:44 80 1
Malware Hash Registry Detection rate: 32% Last seen: 2012-12-01 05:16:09 80 1
Scan::Address_Scan 192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 0m38s - 1
TeamCymruMalwareHashRegistry::Match Malware Hash Registry Detection rate: 27% Last seen: 2012-12-01 02:31:03 80 1
Malware Hash Registry Detection rate: 25% Last seen: 2012-02-01 14:40:28 80 1
Malware Hash Registry Detection rate: 24% Last seen: 2012-02-06 12:06:50 80 1
Malware Hash Registry Detection rate: 23% Last seen: 2012-04-11 18:01:02 80 1
Signatures::Sensitive_Signature 10.0.2.15: ATTACK-RESPONSES Microsoft cmd.exe banner (reverse-shell originator) 443 1
Scan::Address_Scan 192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 3m3s - 1
192.168.248.165 scanned at least 25 unique hosts on port 25/tcp in 0m30s - 1

40 rows × 1 columns


In [43]:
noticedf[noticedf['note'] == 'Scan::Address_Scan']['sample']


Out[43]:
2    BIN_Cutwail-Pushdo(1)_582DE032477E099EB1024D84...
1    BIN_Cutwail-Pushdo(2)_582DE032477E099EB1024D84...
0    BIN_Kuluoz-Asprox_9F842AD20C50AD1AAB41F20B321B...
Name: sample, dtype: object

We've come full circle, it looks like we've got more confirmation that we have some malware samples that are really good at SPAM, and disply it by connecting to lots of hosts in rapid succession.

SSL

Stats

We saw some examples above where we had lots of SSL alerts, and some of them were on strange ports. Here we can take a look at the breakdown of SSL traffic and it's associated ports. I suspect some interesting things will fallout.


In [44]:
ssldf['id.resp_p'].value_counts()


Out[44]:
443      271
9001      57
80         5
10203      3
44945      3
9101       2
5001       1
7540       1
6001       1
9060       1
9002       1
8443       1
11443      1
5251       1
39030      1
22         1
dtype: int64

In [45]:
ssldf.subject.value_counts().head(10)


Out[45]:
-                                                               46
emailAddress=marry.smith@ltu.edu,CN=ITU Server,OU=VeriSign Trust Network,O=Internet Widgits Pty Ltd,L=Salisbury,ST=North Carolina,C=US    24
CN=*.google.com,O=Google Inc,L=Mountain View,ST=California,C=US     7
CN=www.3ktww4bg.net                                              6
CN=www.ktq2go444i.net                                            5
CN=eric-office                                                   5
CN=www.ohfe52bk6gyfzojwgts.net                                   4
CN=www.hstk2emyai4yqa5.net                                       4
CN=*.gstatic.com,O=Google Inc,L=Mountain View,ST=California,C=US     4
CN=www.km6ptswm7mo.net                                           4
dtype: int64

In [46]:
ssldf['count'] = 1
ssldf[['version','cipher','count']].groupby(['version','cipher']).sum().sort('count', ascending=0)


Out[46]:
count
version cipher
TLSv10 TLS_DHE_RSA_WITH_AES_256_CBC_SHA 211
SSLv3 TLS_RSA_WITH_RC4_128_MD5 70
TLS_RSA_WITH_RC4_128_SHA 21
TLSv10 TLS_RSA_WITH_RC4_128_SHA 18
TLS_RSA_WITH_RC4_128_MD5 15
- - 13
TLSv10 TLS_RSA_WITH_AES_128_CBC_SHA 3

7 rows × 1 columns

Since we've got a decent grasp on the ports used, the types of ciphers present as well as popular certs that were seen in malware, perhaps there are a couple of ways we can begin to relate that information back to samples to get an idea of what the sample might be doing or how it works.


In [47]:
ssldf[['sample','server_name','id.resp_p','count']].groupby(['sample','id.resp_p','server_name']).sum().sort('count', ascending=0)


Out[47]:
count
sample id.resp_p server_name
purplehaze 443 - 43
PDF_CVE-2011-2462_Pdf_2011-12 443 - 36
BIN_ZeroAccess_Sirefef_C2A9CCC8C6A6DF1CA1725F955F991940_2013-08 443 - 16
BIN_Cutwail-Pushdo(2)_582DE032477E099EB1024D84C73E98C1 443 - 9
BIN_Ramnitpcap_2012-01 443 - 7
BIN_Vobfus_634AA845F5B0B519B6D8A8670B994906_2012-12 443 - 5
BIN_TrojanPage_86893886C7CBC7310F7675F4EFDE0A29 443 - 5
BIN_Enfal_Lurid_0fb1b0833f723682346041d72ed112f9_2013-01 443 - 4
BIN_Googledocs_macadocs_2012-12 443 - 4
BIN_Cutwail-Pushdo(1)_582DE032477E099EB1024D84C73E98C1 443 - 2
EK_Blackholev2_2012-09 443 - 2
EK_Blackholev1_2012-08 443 - 2
BIN_Tbot_FC7C3E087789824F34A9309DA2388CE5_2012-12 10203 www.pcnia4i6e6w.com 1
BIN_Tbot_2E1814CCCF0C3BB2CC32E0A0671C0891_2012-12 443 www.o4rtqjectd6cr7xj2plup.com 1
9001 www.6fwotxu2.com 1
www.54qrxwvimf35.com 1
www.2jh3iq.com 1
443 www.zzrv3tbbn4a.com 1
www.zupgh57porobex5l6rn7gn4b.com 1
www.zkt4.com 1
www.wv5npsememeyqlxeejajjh.com 1
www.tntgu2nvt3x4wguftukjoauw.com 1
www.rx4a.com 1
www.phllv4qobdq66lvikg4.com 1
www.odvlsr75agy44jkafb5.com 1
www.nhoqywktzrxr.com 1
9001 www.h6v4rzfaoh7iwjbwchdkxk5r.com 1
443 www.mdxu5pezm5gctjsiz57jnjlbc.com 1
www.m5467gyzaao3dogqgkgnsjz4.com 1
www.jwjftcuh7svsqg7il5z.com 1
www.jozhagprwwaiayfwtyp.com 1
www.jil7bq.com 1
www.igi4wpls4vqtpv.com 1
www.hoi7duw.com 1
www.gl4fqk3ut2jrhm4hhbn735.com 1
www.fk4pprq42hsvl2wey.com 1
www.enh3nbiuvze2zmjh2e.com 1
9001 www.d6dh.com 1
www.kyswssz.com 1
www.jwrpsthzrih.com 1
443 www.cb4bqglwg.com 1
BIN_Tbot_5375FB5E867680FFB8E72D29DB9ABBD5_2012-12 443 www.czjs7.com 1
www.clwfhegzhknjxrqgo.com 1
www.bt5qn4edtog.com 1
www.amdspuvfnwejdbac3s4eyiiei.com 1
www.a4grdymgccamccd.com 1
www.7f56wbkr.com 1
www.5sja.com 1
www.5gwwuuomvh4aayxc47lnqag.com 1
www.4v2ddyxbnjeeys.com 1
www.4ae7bhbe3vwykaetow67swg.com 1
www.3yksb5uu6h2vacmutmwhlohm5.com 1
80 www.rbgnlzi3jgetoxkzqy75gf.com 1
BIN_Tbot_2E1814CCCF0C3BB2CC32E0A0671C0891_2012-12 44945 www.w6wo5d7enjs3nx3xcsvxhnq7u.com 1
10203 www.sqzhpbncwezqjocze2arciro.com 1
9002 www.7mmi6y7nhxxl3xdtoquu.com 1
9001 www.yvrfwi3jvukj.com 1
www.xy6a.com 1
www.w3woqnnv2hker.com 1
www.p5r5vru7a.com 1
...

228 rows × 1 columns

We can even view the above in a fancy D3 layout!

To run the visualization in your web browser:

  1. Run the code in the next cell
  2. 'cd bsides_austin'
  3. 'python -m SimpleHTTPServer 9999'
  4. Point your brower at the html file http://localhost:9999/ssl_cartesian.html
  5. ^C to exit the webserver after you're done enjoying the D3 graph

In [48]:
data = {'name' : 'ssl'}
samples = list(set(ssldf['sample'].tolist()))
data['children'] = list()
sampleindex = 0
for sample in samples:
    data['children'].append({'name' : sample, 'children' : list()})
    ports = set(ssldf[ssldf['sample'] == sample]['id.resp_p'].tolist())
    portindex = 0
    for port in ports:
        data['children'][sampleindex]['children'].append({'name' : str(port), 'children' : list()})
        hostnames = set(list(ssldf.loc[(ssldf['id.resp_p'] == int(port)) & (ssldf['sample'] == sample)]['server_name']))
        for hostname in hostnames:
            data['children'][sampleindex]['children'][portindex]['children'].append({'name' : hostname, 'size' : 1}) 
        portindex += 1
    sampleindex += 1
json.dump(data, open('ssl.json', 'w'))

To show how easy D3 can be once you have the JSON output, you can also point your browser (after folliwng the steps above) to http://localhost:9999/ssl_cartesian.html

And you'll get some output similar to:




Note: If you run these at home, you can zoom in and zoom out with your browswer hot-keys to get a much nicer view of the graph.


In [49]:
# Ports per sample
ax = box_plot_df_setup(ssldf['id.resp_p'], ssldf['sample']).T.plot(kind='bar', stacked=True)
pylab.ylabel('Total # of connections')
pylab.xlabel('Samples')
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., title="Port")


Out[49]:
<matplotlib.legend.Legend at 0x1197bef10>

In [50]:
# Or as you might see it in an operational sense...
ax = box_plot_df_setup(ssldf['id.resp_p'], ssldf['id.orig_h']).T.plot(kind='bar', stacked=True)
pylab.ylabel('Total # of connections')
pylab.xlabel('Source IP')
patches, labels = ax.get_legend_handles_labels()
ax.legend(patches, labels, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., title="Port")


Out[50]:
<matplotlib.legend.Legend at 0x113347510>

Spoiler Alert!!

Tbot uses TOR for communication. (http://contagiodump.blogspot.com/2012/12/dec-2012-skynet-tor-botnet-trojantbot.html)

Weirdness

The weird.log shows protocol issues/anomalies as well as information pertaining to possible data loss, etc... We weren't able to find anything exciting in there, but that doesn't mean you won't!


In [51]:
weirddf.name.value_counts()


Out[51]:
data_before_established              169
unescaped_special_URI_char           166
possible_split_routing               164
line_terminated_with_single_CR       133
NUL_in_line                          111
unknown_protocol_2                    71
bad_HTTP_request                      65
DNS_Conn_count_too_large              51
inappropriate_FIN                     34
HTTP_version_mismatch                 21
connection_originator_SYN_ack         20
truncated_link_frame                   9
unmatched_HTTP_reply                   9
window_recision                        8
SYN_inside_connection                  6
unescaped_%_in_URI                     5
above_hole_data_without_any_acks       5
DNS_truncated_ans_too_short            5
DNS_truncated_RR_rdlength_lt_len       4
DNS_label_too_long                     4
premature_connection_reuse             4
DNS_label_len_gt_pkt                   3
dns_changed_number_of_responses        3
SYN_with_data                          2
DNS_label_forward_compress_offset      1
SYN_after_close                        1
empty_http_request                     1
double_%_in_URI                        1
malformed_ssh_identification           1
DNS_truncated_len_lt_hdr_len           1
illegal_%_at_end_of_URI                1
truncated_IP                           1
DNS_RR_unknown_type                    1
dtype: int64

Try this at home

What could go wrong?