Exploration of Live Network Tap

Like most people I were wondering "What is my laptop doing? Has it become a botnet? Should I stop downloading a bunch of weird stuff all the time? Will I ever move out of my Mom's basement?" but I digress..

This notebook is an exploration of my laptop's network usage using Workbench https://github.com/SuperCowPowers/workbench.git

Goals of the Exploration

We wanted to get a 'gist' of the network activity happening from a particular capture point, for this case a laptop, but the cpature point could be anywhere. Obviously there are super great tools that already perform exploration and analysis of pcaps: WireShark, Chop Shop, Scapy, blah, foo, etc.. here we leveraging Bro IDS to generate our starting-point data and then hoping off in various directions from there. The work here should be viewed as complimentary to these other tools :)

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server



In [20]:

    
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client(timeout=120)
c.connect("tcp://127.0.0.1:4242")









    Out[20]:





[None]

So I'm confused what am I suppose to do with workbench?

Start with c.help(), look at the commands
Now add stuff to workbench, PCAPs, PE Files, PDFs, SWFs, whatever..
Workbench stores it in a database (MongoDB now, maybe Vertica later)
Make a work request see c.help_workers() for all the options
See Workbench Demo Notebook for a lot more info on using workbench.



In [21]:

    
# I forgot what stuff I can do with workbench
print c.help()









    



Welcome to Workbench: Here's a list of help commands:
	 - Run c.help_basic() for beginner help
	 - Run c.help_commands() for command help
	 - Run c.help_workers() for a list of workers
	 - Run c.help_advanced() for advanced help

See https://github.com/SuperCowPowers/workbench for more information

In this case we're going to skip a lot of intro material to Workbench, see the Workbench Demo Notebook for a startup guide.

Process the Streaming PCAP Data

Lets look at the PCAPs that are being tossed into workbench. A script in the utils directory called pcap_streamer.py will 'stream' PCAPs into workbench off of a live network interface. We can use the 'get_sample_window' call to workbench for it to give us the last 30 MB of streaming PCAPs



In [34]:

    
# Grab a range of pcaps in workbench (last 100 MegaBytes worth in this case)
pcap_md5s = c.get_sample_window('pcap', 50)
print 'Number of PCAPs %d' % len(pcap_md5s)









    



Number of PCAPs 54



In [35]:

    
# Workbench lets you store sample sets
pcap_set = c.store_sample_set(pcap_md5s)



In [36]:

    
# Now give us a HTTP graph of all the activities within that window of PCAPs.
# Workbench also has DNS and CONN graphs, but for now we're just interested in HTTP.
c.work_request('pcap_http_graph', pcap_set)









    Out[36]:





{'pcap_http_graph': {'md5': 'b02cf3ec31a0fe27f309c54f77ddc9ac',
  'output': 'go to http://localhost:7474/browser and execute this query "match (s:origin), (t:file), p=allShortestPaths((s)--(t)) return p"'}}

Workbench + Neo4j = Awesome

The HTTP graph has quite a bit of info, but you can see that we've conducted a shortest paths search from all nodes of type 'origin' (any node originating http communications) to any node of type 'file'. So we're particularly interested in all of the various files that got downloaded from our network tap in the last few minutes.



In [108]:

    
# We can also ask workbench for a python dictionary of all the info from this set of (100MB) PCAPs,
# because sometimes visualization are useful and sometimes organized data is useful.
output = c.work_request('view_pcap_details', pcap_set)['view_pcap_details']
output









    Out[108]:





{'bro_logs': {'conn_log': '53b02c226034fa4508d0ff384cd82381',
  'dns_log': '5056a27297490d61558a6efe6b028ea1',
  'dpd_log': '7ed68bcc653dc97f915f5c7bfb274b2b',
  'files_log': '678daeba4c4cb8839cbef82103d722b2',
  'http_log': '138ecd68a802bd82f8b7ae8852744fa9',
  'packet_filter_log': '53669ae86761ecdfe0304fe5e3123fe6',
  'ssl_log': 'c907419c6c2d929fe66afb8df3943f49',
  'weird_log': 'a97149b9761dff9b524520a1cbc29e36'},
 'extracted_files': [{'entropy': 6.245624377814982,
   'file_size': 42024,
   'file_type': 'TrueType font data',
   'md5': '12be067a6270759b4f861d64cc267166',
   'sha256': 'd5ec46188792388f1ef48c6421d25d73cad6765b9930defa1f28dc7d1790105a',
   'ssdeep': '768:Ricccw6N1wxkOYjked+6OWQOfba+YUMNhY921XS5uUx+qG+qjKylPfKyIreQ+yO6:EWGxkBjkepOEzaigm92XS5uUx+qG+qjm'},
  {'entropy': 7.980726705082393,
   'file_size': 29032,
   'file_type': 'Web Open Font Format, flavor 65536, length 29032, version 1.0',
   'md5': '9b6485fb804ec528b304bc5ba427a52d',
   'sha256': '7480b3771fd67569b3cc75979dfcf8bfdf973af2bc78f331fb93dbf2753dd73e',
   'ssdeep': '768:c8N64HKDhQI5vonULgW5gU154lRGus1uDjUBApCMiDvj:c/5qConULD154lRGugdBAravj'},
  {'entropy': 7.98023762110109,
   'file_size': 28496,
   'file_type': 'Web Open Font Format, flavor 65536, length 28496, version 1.0',
   'md5': '668a147a2d75a58822d1f09e16cf6bba',
   'sha256': 'd1b63bac95978c34df58b5e5afac40ba2c2ff986515dd5a47f86c5bb03e38685',
   'ssdeep': '768:Ma6C1UJ6Gy/J3UbWCUfNhEDaET3OEFTJ/B5FjG4ymDQ6:Ma11Vh/S5Us1jOyFFj/yQQ6'},
  {'entropy': 7.987978971601742,
   'file_size': 32020,
   'file_type': 'Web Open Font Format, flavor 65536, length 32020, version 0.0',
   'md5': 'a188c2f768ce5033d3f5d47be7280e25',
   'sha256': '8c44c3feedae5331a281278ea3ba91d2255928a2f3010d316d6fbb9052e0c2ec',
   'ssdeep': '768:ZeCMB4D5hQRxRkQBtiAN7LrIM0/B5md7YtRZgkyPJxbI6GGS:ZeC04DARxRjoA1fcB5KoRVeJe'},
  {'entropy': 3.907528856075129,
   'file_size': 4286,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': 'e788c077dd2498aaeabae414af20e0c0',
   'sha256': '7b174566bbb7e5920ca79912b055757b06bd254ba501e0b273cc9b074cbb3ac1',
   'ssdeep': '48:fWPUND3V5uWQy4FgBwjovZ9u0ZYhdby1FpUl/F+7:u8Dl5uWQy4Fcw8vZ9VwwFpUl9y'},
  {'entropy': 5.308834490275621,
   'file_size': 345,
   'file_type': 'XML document text',
   'md5': 'a7b900bec0b7b386dfd18ad22c9ed411',
   'sha256': 'd9f7e0aa1bff501986995b7c69742a14f373819ab6ecd599af29d67f9d8b4794',
   'ssdeep': '6:TMVBdoIUnWn8FX0wa9Fgc4svquXsLwFcn4mc4sVI/iHI0aXgsoGH4CmL0Xgs0JQL:TMHdoIWWnMEwKFcuX4wp57iwsoTCmL0P'},
  {'entropy': 4.936019556724105,
   'file_size': 1363,
   'file_type': 'XML document text',
   'md5': 'ecc6377d393d709dcbcdae8798fd06d8',
   'sha256': '8fce4a8b4f926a3fc202d65d8dbf1d70975bb91f7b06bfb5b149b9a020b8f36f',
   'ssdeep': '24:zAaxNjZnQNjGnQNjrnQNjQnKsGVmvmg8B9CUm8p:FNqNpNcNSemvmtB9+m'},
  {'entropy': 6.297298526914034,
   'file_size': 37336,
   'file_type': 'TrueType font data',
   'md5': '2e98fc3ce85f31f63010b706259cb604',
   'sha256': '3fc333eb3107febd406586ee8206bc0ee2aeb7f6c7a77f3923a353b72b0ca080',
   'ssdeep': '768:7X+cccw6N1sXef6IkZKhGfzKWGgSy3jnXqOvKz669MLJcS//9dG:TKW2vIGkqzKWGgJ3jnXdvKz66eVcS/1Y'},
  {'entropy': 3.2961625571831026,
   'file_size': 5430,
   'file_type': 'MS Windows icon resource - 2 icons, 16x16, 256-colors',
   'md5': 'a300693728f5caa531a6886d9b8f38c2',
   'sha256': 'aab089af3b8390a350352b5b7900f5747ba57ef1caf4120cced745518e8b5477',
   'ssdeep': '24:EmiJT5aysE6HpisMN3dhZxpvjEAPtxnDb/xORpcF8G+stFMPl33i62gyFWahTjjm:/a5hspKBxGAPtlXERp7G+stF2MWIul'},
  {'entropy': 6.258940975957493,
   'file_size': 3382,
   'file_type': 'MS Windows icon resource - 2 icons, 32x32, 256-colors',
   'md5': '35f17aa38ef89a37c4013de2d17aab6f',
   'sha256': 'fcea740992d0bd5ba0d3e9dbc91d11d0ed192a6c52fd5ecb39138c8bd927ca1f',
   'ssdeep': '96:gSLCFcGDINyncI+vLduRAqKpERL3LiNKPT0i:BLCFcGDINynT+vZUAqKeRLmNKPJ'},
  {'entropy': 3.6954813851810218,
   'file_size': 1150,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '1a1eb5bb0cc75b6506f0019da0c7f21f',
   'sha256': '336368f42787721f0ef3a21dc9a4cfa3a4fd648e2dd4c7c4340119304612b3b1',
   'ssdeep': '12:HaECUuYe4444sY6QBFsb4m/1swMx+1NbXzOO:ril4444sY7BFvwj19X'},
  {'entropy': 7.9883310766863005,
   'file_size': 41752,
   'file_type': 'Web Open Font Format, flavor 65536, length 41752, version 1.0',
   'md5': '04b9bfc362dcb9bc999c7d1bcb44a942',
   'sha256': 'd45f5fb1fb4e1a101a8ad8722af443272f6c3d409d912e8175e6268d48e0b091',
   'ssdeep': '768:wMxLyqPx8N3lhJB7jDUCj+4jaXJ+I0c2dY924OqtVWviGvNhzfD2iij4:IqPx+lhv7jQCj+4GZ+fc2d+O+VWFpD04'},
  {'entropy': 3.6844259005340536,
   'file_size': 1150,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '386297e91ea17bbc79f08166fbb33efd',
   'sha256': '3a62a1c2bc55c2c51b61addc834aa8061f164b8f2a8fa84d2047a0990cf9ca18',
   'ssdeep': '12:XBwHYDk/lBYpvVmczEY7cQyyyoX7x9H82v:XIYg/lBWEXY7yyye7XP'},
  {'entropy': 7.97373633026029,
   'file_size': 21444,
   'file_type': 'Web Open Font Format, flavor 65536, length 21444, version 1.1',
   'md5': '9766098054494741d153b60206e33f89',
   'sha256': '546c3593de0333eccfef2d0ddb7c1331e456f2304446ed386aacbc5b3bad64bf',
   'ssdeep': '384:+ywm9gOSC7IEHv2gkpN1nC1w6QCj0WRnEalg5:L9gOS4HCFChxjLRO5'},
  {'entropy': 5.908739848449175,
   'file_size': 11078,
   'file_type': 'MS Windows icon resource - 4 icons, 16x16, 256-colors',
   'md5': '943882776674997378513c8831392441',
   'sha256': '0bc5af02a6c79f9cff2c743cbb54e73ca554b038eef49e6d66550617f96ad884',
   'ssdeep': '192:6TaVJfo1nHW/W0GyJpGHx9qVE8Oc93xn2JPHlVC:6OzgBHWu+3GHx9qVYcdOdV'},
  {'entropy': 3.8918131277677763,
   'file_size': 4286,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': 'b5e4f8f388ba118d602564b52309a262',
   'sha256': '33fe1bce4ef9232926e34d236bddc8cc4bea7a54484918311f7f1ee48b4765e0',
   'ssdeep': '48:ixXCvHXti29ZBFpIwvC0gGlACnI7t978AZi0vx0Fuqoexnp:iVC/999pIwvWJ978B0vx0FEeX'},
  {'entropy': 7.988941525087981,
   'file_size': 23118,
   'file_type': 'Macromedia Flash data (compressed), version 9',
   'md5': '2e841d012d10130d8656ad6482ad88a4',
   'sha256': 'a054812bfac083ac0343834abdfb0804054735de27a4a3d2be910a0a20ec136b',
   'ssdeep': '384:qn+Gc/3LiVzD1u85FL5BC1H5ic7KAuiNmpWN32UZt+6LBSy4x:i4uVf1l5FL5B4X7fvPNGut+6LcyY'},
  {'entropy': 7.089124296559591,
   'file_size': 463,
   'file_type': 'data',
   'md5': '21a156f207dfcde876af3830372307de',
   'sha256': '904881c90a884a2614b4221b04aca7528222a89f96bc471e13ac23fe861ee742',
   'ssdeep': '12:hPeVGSoPv95yP6VGjP3I9a9pCHFYa8YLXOpHNKY+URwQgR5Dw:hPeczPltcjP31CHFd3OLDhgfE'},
  {'entropy': 6.309014565293686,
   'file_size': 38232,
   'file_type': 'TrueType font data',
   'md5': '488d5cc145299ba07b75495100419ee6',
   'sha256': 'dee2d2b7658161d7efa0dede8298b64bf88c8bc1fea782fc10468c9269e78d4a',
   'ssdeep': '768:JF4rcccw6N1QvZSWOMcvnnPCJXQ4ezeWBT2KDHqNRqW:b4lW+v8WOMcfqJXBezeWBT2KDHqNp'},
  {'entropy': 5.101549039629513,
   'file_size': 9662,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '173ce0b3b05f0743fd082723179d3fbf',
   'sha256': '4ab6aef8efee4fb811d1bee5b39f4ab11f8282856576978037bedfa99e804ac0',
   'ssdeep': '96:9FX7Qn/QF8h2+NEMsBTrSrBOtOxObSFSjCOYzYYsu2U0rWm6MfZ7aFa:PMSk2GIBTrSr/cVyPsrWm6MxaFa'},
  {'entropy': 5.475841948176977,
   'file_size': 21207,
   'file_type': 'C source, ASCII text, with very long lines',
   'md5': '382a9d8819ad21143fd690782cd1e8ca',
   'sha256': '5a5e5f83b7367c19a7ce084e6b0ee80d953ed8ed8909db48ed067d60ca434b6b',
   'ssdeep': '384:qr26X4PPpp3EE/xIyDYT/Nj7zmjUj+jcjfjfjujujzjc8XQ8BTnHXuoad2eSHn/R:7p8jHm4K4Lj6yvwf8HydmN'},
  {'entropy': 5.308834490275621,
   'file_size': 345,
   'file_type': 'XML document text',
   'md5': 'a7b900bec0b7b386dfd18ad22c9ed411',
   'sha256': 'd9f7e0aa1bff501986995b7c69742a14f373819ab6ecd599af29d67f9d8b4794',
   'ssdeep': '6:TMVBdoIUnWn8FX0wa9Fgc4svquXsLwFcn4mc4sVI/iHI0aXgsoGH4CmL0Xgs0JQL:TMHdoIWWnMEwKFcuX4wp57iwsoTCmL0P'},
  {'entropy': 5.310373741546879,
   'file_size': 163,
   'file_type': 'data',
   'md5': '2024ea458402395d7ce97d7becda1be9',
   'sha256': 'e32534b3191baebcbc72c680b5cd4e0329ea1b241dc15714cb32bb81c41b5b6f',
   'ssdeep': '3:yivnt6/ly7tFffMLts5OCAadCmy42/uDlhlbqYKAdXm/3l81xqCKup:pn0/8xfMRfC19s/6TFXm/W1xqC5p'},
  {'entropy': -0.0,
   'file_size': 1,
   'file_type': 'very short file (no magic)',
   'md5': '7215ee9c7d9dc229d2921a40e899ec5f',
   'sha256': '36a9e7f1c95b82ffb99743e0c5c4ce95d83c9a430aac59f84ef3cbfab6145068',
   'ssdeep': '3:F:F'},
  {'entropy': 4.845301933698079,
   'file_size': 6636,
   'file_type': 'SVG Scalable Vector Graphics image',
   'md5': 'c01cc1cbf05ba8f782bf709168425535',
   'sha256': '4a7df98b2963b14290e39b88cf700121f6c84c106f169b14cb491a0dbf94a580',
   'ssdeep': '96:QXQSfRNqaeYdPrwekQNeymAqX4keWDWk3a8ALbeZ6bppeUg4vO8ZnjE:cQARNqaeQrIkeyhYrylCsnw'},
  {'entropy': 2.8024474140215836,
   'file_size': 1150,
   'file_type': 'floppy with old FAT filesystem, Media descriptor 0xfb',
   'md5': 'de6a154e38bd211276e193a523e8bc98',
   'sha256': 'd2fa61107c4e7bfca982ea0427a2ea37c8826391e515d6868c576db5cb4a2e06',
   'ssdeep': '12:WA2gCZ4HoOateadateapeatataqWKnotlRXi5vgw:WAk54Ggw'},
  {'entropy': 1.2843393639542855,
   'file_size': 3638,
   'file_type': 'MS Windows icon resource - 2 icons, 32x32, 256-colors',
   'md5': '59a0c7b6e4848ccdabcea0636efda02b',
   'sha256': 'a1495da3cf3db37bf105a12658636ff628fee7b73975b9200049af7747e60b1f',
   'ssdeep': '6:NXulKltegZ//OekukCS4kdxpHIWvUkt/ctmnzteghFnUtC+i/T2MWFetk/m+:NaKXe2m5CREDssfnxeo/2XUKu+'},
  {'entropy': 7.970454777158688,
   'file_size': 4955490,
   'file_type': 'ISO Media, MPEG v4 system, version 2',
   'md5': '0f9e7b97671775a3fb29c36557cf662b',
   'sha256': '7dc55fa7a6e60a63ace66639a1b148e1733647044ee9b0ade0c4b7d87bc074cf',
   'ssdeep': '98304:Fxc89GCIa5m7IAXPu5m7vq8njsZN5rL/g/1FCF9bW5UDpRAiqFMyPEO5sb:DH/II6nvqCj2N5rL/gHCFpepEOE'}],
 'md5': '8518f69c4674db948c5e157bd8010adb'}

If the next line of code doesn't blow your mind, you aren't paying attention!

All of the bro logs are streamed from server to client with NETWORK STREAMING GENERATORS, those highly efficient generators are zero-copy and stream data directly into Pandas Dataframes.

For more on client/server generators and client-contructed/server-executed generator pipelines see our super spiffy Generator Pipelines notebook.



In [109]:

    
# Critical Code: Transition from Bro logs to Pandas Dataframes
# This one line of code populates dataframes from the Bro logs, 
# streaming client/server generators, zero-copy, efficient, awesome...
import pandas as pd
dataframes = {name:pd.DataFrame(c.stream_sample(bro_log)) for name, bro_log in output['bro_logs'].iteritems()}

Lets look at the Data

We're going to use some nice functionality in the Pandas dataframe to look at our network data, specifically we're going to group by origin, host, host-ip, and mime_type. The last column represents the aggregated sum of response_body_len.

This type of operation is really just scratching the surface when it comes to dataframes, so quickly and efficiently populating a dataframe is super awesome.



In [110]:

    
# Now we group by host and show the different response mime types for each host
group_host = dataframes['http_log'].groupby(['id.orig_h','host','id.resp_h','resp_mime_types'])[['response_body_len']].sum()
group_host.head(100)









    Out[110]:






  
    
      
      
      
      
      response_body_len
    
    
      id.orig_h
      host
      id.resp_h
      resp_mime_types
      
    
  
  
    
      192.168.1.104
      16518638.log.optimizely.com
      174.129.203.102
      image/gif
            35
    
    
      text/plain
             4
    
    
      664902255.log.optimizely.com
      107.22.231.18
      image/gif
            35
    
    
      text/plain
             2
    
    
      773-gon-065.mktoresp.com
      199.15.215.178
      image/gif
           731
    
    
      ad.doubleclick.net
      74.125.225.187
      text/plain
          2195
    
    
      74.125.225.188
      text/plain
          2270
    
    
      akamaicovers.oreilly.com
      23.4.141.110
      image/jpeg
        118472
    
    
      app-sjl.marketo.com
      23.4.154.76
      -
             0
    
    
      text/html
          1657
    
    
      text/plain
         12905
    
    
      assets.neo4j.org
      54.230.4.132
      image/gif
        214656
    
    
      image/png
       1369054
    
    
      54.230.4.169
      -
             0
    
    
      image/gif
         32454
    
    
      image/png
        245753
    
    
      av.vimeo.com
      96.17.111.57
      video/mp4
       4955490
    
    
      avpa.dzone.com
      208.91.135.45
      text/html
           506
    
    
      text/plain
           117
    
    
      b.scorecardresearch.com
      184.84.180.43
      -
             0
    
    
      23.3.12.195
      -
             0
    
    
      23.3.12.201
      -
             0
    
    
      96.17.111.121
      -
             0
    
    
      b2c-mlm.marketo.com
      173.203.143.212
      -
             0
    
    
      image/gif
           258
    
    
      badge.stumbleupon.com
      199.30.80.32
      text/html
          1035
    
    
      beacon-1.newrelic.com
      50.31.164.168
      text/plain
            21
    
    
      beta.sylvadb.com
      129.100.65.142
      image/png
          9555
    
    
      book.py2neo.org
      162.209.114.75
      -
             0
    
    
      image/png
         35626
    
    
      text/html
        174762
    
    
      text/plain
         55408
    
    
      bs.serving-sys.com
      12.129.210.71
      -
             0
    
    
      text/plain
         14003
    
    
      c.betrad.com
      23.4.150.212
      -
             0
    
    
      c.statcounter.com
      216.59.38.124
      image/gif
            49
    
    
      careers.stackoverflow.com
      198.252.206.17
      -
             0
    
    
      text/html
         28505
    
    
      cdn.atdmt.com
      23.3.68.179
      image/gif
         16690
    
    
      93.184.215.201
      image/gif
          8704
    
    
      cdn.dzone.com
      108.161.188.128
      -
             0
    
    
      binary
         99280
    
    
      image/gif
          2359
    
    
      image/jpeg
         86344
    
    
      image/png
        338086
    
    
      text/plain
        688872
    
    
      cdn.flashtalking.com
      216.38.160.128
      image/gif
         42739
    
    
      image/jpeg
        147692
    
    
      text/plain
         29718
    
    
      cdn.optimizely.com
      72.21.91.19
      text/plain
        327815
    
    
      cdn.slidesharecdn.com
      23.4.155.216
      image/jpeg
        138500
    
    
      cdn.softpedia.com
      68.142.123.254
      image/png
          1677
    
    
      cdn.sstatic.net
      190.93.246.58
      text/plain
          3725
    
    
      cdn.stumble-upon.com
      199.30.80.32
      image/png
          1985
    
    
      data.cmcore.com
      204.77.31.254
      image/gif
            86
    
    
      74.121.133.1
      image/gif
            43
    
    
      doug1izaerwt3.cloudfront.net
      54.230.6.114
      -
             0
    
    
      54.230.7.29
      -
             0
    
    
      ds.serving-sys.com
      23.3.12.32
      -
             0
    
    
      image/jpeg
         55043
    
    
      
      
      
      
      ...
    
  

100 rows × 1 columns



In [111]:

    
# Now we group by host and show the different response mime types for each host
group_host = dataframes['http_log'].groupby(['host','id.resp_h','resp_mime_types','uri'])[['response_body_len']].sum()
group_host.head(50)









    Out[111]:






  
    
      
      
      
      
      response_body_len
    
    
      host
      id.resp_h
      resp_mime_types
      uri
      
    
  
  
    
      0.gravatar.com
      2606:2800:220:bf1:95:a65:51f:1a94
      -
      /css/hovercard.css?ver=201422x
            0
    
    
      /css/services.css?ver=201422x
            0
    
    
      /js/gprofiles.js?ver=201422x
            0
    
    
      image/jpeg
      /avatar/65dc2ca9ccb63a7004ff934f5501d576?s=32&d=identicon&r=G
          988
    
    
      /avatar/65dc2ca9ccb63a7004ff934f5501d576?s=64&d=identicon&r=G
         1477
    
    
      image/png
      /avatar/31c24d77a26eb5e3d35b537f19cbe360?s=32&d=identicon&r=G
         2284
    
    
      /avatar/31c24d77a26eb5e3d35b537f19cbe360?s=64&d=identicon&r=G
         7860
    
    
      /avatar/335410d8b3c50577cf0ed33567a3862e?s=32&d=identicon&r=G
          646
    
    
      /avatar/335410d8b3c50577cf0ed33567a3862e?s=64&d=identicon&r=G
         1300
    
    
      /avatar/3dc0bab46bc3dfa2b7175d6d629abf3f?s=32&d=identicon&r=G
          810
    
    
      /avatar/3dc0bab46bc3dfa2b7175d6d629abf3f?s=64&d=identicon&r=G
         1797
    
    
      /avatar/60b0ef6e1bbdbac2049c933c7c4e7fdd?s=32&d=identicon&r=G
         1279
    
    
      /avatar/60b0ef6e1bbdbac2049c933c7c4e7fdd?s=64&d=identicon&r=G
         2885
    
    
      /avatar/9e32ff422dd2c6155bbb43800b98c4d3?s=32&d=identicon&r=G
          645
    
    
      /avatar/9e32ff422dd2c6155bbb43800b98c4d3?s=64&d=identicon&r=G
         1207
    
    
      /avatar/fd9c7194bbb73ea2f186928424b721f6?s=32&d=identicon&r=G
         1017
    
    
      /avatar/fd9c7194bbb73ea2f186928424b721f6?s=64&d=identicon&r=G
         1887
    
    
      1.bp.blogspot.com
      2607:f8b0:400f:800::100b
      image/png
      /-TK4TVif-FGU/T0Gtlv4zk7I/AAAAAAAAAMw/F-OvNpHcByM/s320/neo4j_sample.png
        36931
    
    
      1.gravatar.com
      2606:2800:220:bf1:95:a65:51f:1a94
      -
      /avatar/ad516503a11cd5ca435acc9bb6523536?s=25&d=identicon&forcedefault=y&r=G
            0
    
    
      /avatar/ad516503a11cd5ca435acc9bb6523536?s=54&d=identicon&forcedefault=y&r=G
            0
    
    
      image/jpeg
      /avatar/a805ec88c0754455cac9b5655dea230e?s=32&d=identicon&r=G
         1294
    
    
      /avatar/a805ec88c0754455cac9b5655dea230e?s=64&d=identicon&r=G
         2720
    
    
      image/png
      /avatar/11ec004db919032b93195fc5d2555fea?s=32&d=identicon&r=G
          276
    
    
      /avatar/11ec004db919032b93195fc5d2555fea?s=64&d=identicon&r=G
          391
    
    
      /avatar/43618511463276f21c363c8079ad5a5b?s=32&d=identicon&r=G
          622
    
    
      /avatar/43618511463276f21c363c8079ad5a5b?s=64&d=identicon&r=G
         1445
    
    
      /avatar/a614ce207462e4f3fe33273c0deccde7?s=32&d=identicon&r=G
          946
    
    
      /avatar/a614ce207462e4f3fe33273c0deccde7?s=64&d=identicon&r=G
         2105
    
    
      /blavatar/babef735719ccc14fa9a281e6de2fe08?s=32
         2497
    
    
      /blavatar/babef735719ccc14fa9a281e6de2fe08?s=64
         8650
    
    
      image/x-icon
      /blavatar/b8c1a3c32665c00621a27ee5fc4e51ce?s=16
        11078
    
    
      16518638.log.optimizely.com
      174.129.203.102
      image/gif
      /event?a=16518638&d=13765676&y=false&n=engagement&g=28357548&u=oeu1401383749478r0.00880544213578105&t=1401383754150&f=1026380119
           35
    
    
      text/plain
      /event?a=16518638&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383749484&f=
            2
    
    
      /event?a=16518638&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383750032&f=1026380119
            2
    
    
      2.bp.blogspot.com
      2607:f8b0:400f:801::100b
      image/png
      /--ghj75EWkyc/UnjtpNV657I/AAAAAAAALiU/UYua5dc4fPk/s1600/Screenshot_11_5_13_2_07_PM.png
       779602
    
    
      /-yz4sG2of89Y/UG4TQ5QsgDI/AAAAAAAAAV8/apaI68-NH5U/s1600/ftd-banner.png
        45874
    
    
      2.gravatar.com
      2606:2800:220:bf1:95:a65:51f:1a94
      image/jpeg
      /avatar/86006d83b5bf07a88ef95dfd001609f8?s=32&d=identicon&r=G
         1078
    
    
      /avatar/86006d83b5bf07a88ef95dfd001609f8?s=64&d=identicon&r=G
         1810
    
    
      /avatar/e8dd547cc3ce614c9662b891791349d4?s=32&d=identicon&r=G
         1266
    
    
      /avatar/e8dd547cc3ce614c9662b891791349d4?s=64&d=identicon&r=G
         2488
    
    
      image/png
      /avatar/58750f2179edbd650b471280aa66fee5?s=32&d=identicon&r=G
         2501
    
    
      /avatar/58750f2179edbd650b471280aa66fee5?s=64&d=identicon&r=G
         8411
    
    
      /avatar/b6e56aafe01b6dfb69550be92e8db136?s=32&d=identicon&r=G
          633
    
    
      /avatar/b6e56aafe01b6dfb69550be92e8db136?s=64&d=identicon&r=G
         1225
    
    
      4.bp.blogspot.com
      2607:f8b0:400f:801::100c
      image/jpeg
      /_NtoTtHZadHE/SZcxuOA6QmI/AAAAAAAAAAk/Y9hjo6lyCI4/S45/hendy-sitting_square.jpg
         1619
    
    
      664902255.log.optimizely.com
      107.22.231.18
      image/gif
      /event?a=664902255&d=13765676&y=false&n=engagement&g=634652246&u=oeu1401383749478r0.00880544213578105&t=1401383754146&f=
           35
    
    
      text/plain
      /event?a=664902255&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383749484&f=
            2
    
    
      773-gon-065.mktoresp.com
      199.15.215.178
      image/gif
      /webevents/clickLink?_mchNc=1401381694487&_mchCn=&_mchHr=http://www.neo4j.org/learn/licensing&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/develop/shell&_mchPc=http:&_mchVr=142
           43
    
    
      /webevents/clickLink?_mchNc=1401381750537&_mchCn=&_mchHr=http://www.neo4j.org/learn/apps&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/develop/shell&_mchPc=http:&_mchVr=142
           43
    
    
      /webevents/clickLink?_mchNc=1401381765145&_mchCn=&_mchHr=http://maxdemarzi.com/2012/08/17/neosocial-connecting-to-facebook-with-neo4j/&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/learn/apps&_mchPc=http:&_mchVr=142
           43
    
  

50 rows × 1 columns



In [112]:

    
# Look at Weird logs
dataframes['weird_log'].head(20)









    Out[112]:






  
    
      
      addl
      id.orig_h
      id.orig_p
      id.resp_h
      id.resp_p
      name
      notice
      peer
      ts
      uid
    
  
  
    
      0 
       -
                   -
           -
                    -
        -
                     unknown_protocol_2
       F
       bro
       1.401381e+09
                        -
    
    
      1 
       -
       192.168.1.104
       63522
         96.17.111.50
       80
       above_hole_data_without_any_acks
       F
       bro
       1.401381e+09
       Cr8erB4kODWfLymHxg
    
    
      2 
       -
       192.168.1.104
       63553
       107.21.109.142
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       CInmEQ2qPTLFUaUJgd
    
    
      3 
       -
       192.168.1.104
       63561
        69.172.216.56
       80
                     unescaped_%_in_URI
       F
       bro
       1.401382e+09
       CsPWi63OSkkqS5bkT2
    
    
      4 
       -
       192.168.1.104
       63565
        69.172.216.56
       80
                     unescaped_%_in_URI
       F
       bro
       1.401382e+09
       CpUvD0416gnFEY3S27
    
    
      5 
       -
       192.168.1.104
       63565
        69.172.216.56
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       CpUvD0416gnFEY3S27
    
    
      6 
       -
       192.168.1.104
       63567
       69.172.216.111
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       CPz0bF1lYoQ8MMjdZc
    
    
      7 
       -
       192.168.1.104
       63570
       74.125.225.187
       80
                     unescaped_%_in_URI
       F
       bro
       1.401382e+09
       CyPWtb4Btqfxo9Rbv7
    
    
      8 
       -
       192.168.1.104
       63585
       69.172.216.111
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       Cfjf9F4oM1IXnypuo4
    
    
      9 
       -
       192.168.1.104
       63586
       69.172.216.111
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
        ChFsjm27dRTqQaHEH
    
    
      10
       -
       192.168.1.104
       63588
       69.172.216.111
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       Ccdrtu3v0mZw804E81
    
    
      11
       -
       192.168.1.104
       63607
       69.172.216.111
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
        C0vA5B35l8eojeAAQ
    
    
      12
       -
       192.168.1.104
       63545
       190.104.31.170
       80
                       bad_HTTP_request
       F
       bro
       1.401382e+09
       Ch8tcl22l2EgiV7dwh
    
    
      13
       -
       192.168.1.104
       63558
       107.21.109.142
       80
             unescaped_special_URI_char
       F
       bro
       1.401382e+09
       C8gJEv1kMJhwG85Hmj
    
    
      14
       -
       192.168.1.104
       63782
         192.0.80.175
       80
       above_hole_data_without_any_acks
       F
       bro
       1.401382e+09
        CsAIRmdNfA18AqWGi
    
    
      15
       -
       192.168.1.104
       63780
         192.0.80.175
       80
       above_hole_data_without_any_acks
       F
       bro
       1.401382e+09
        CFkBvRNG5dNLudepe
    
    
      16
       -
                   -
           -
                    -
        -
                     unknown_protocol_2
       F
       bro
       1.401382e+09
                        -
    
    
      17
       -
       192.168.1.104
       63867
       179.24.226.184
       80
                       bad_HTTP_request
       F
       bro
       1.401382e+09
        CSzfaoC9bkQ6q93je
    
    
      18
       -
       192.168.1.104
       63925
         192.0.80.175
       80
       above_hole_data_without_any_acks
       F
       bro
       1.401382e+09
        CHjPrEuxwlcImsYqk
    
    
      19
       -
       192.168.1.104
       63922
         192.0.80.175
       80
       above_hole_data_without_any_acks
       F
       bro
       1.401382e+09
       Cswz9T28tbPQ7kyBX3
    
  

20 rows × 10 columns

So what's my laptop doing?

Now that we're looking at both the graph and the organized dataframe, lets investigate the traffic we're seeing from the network tap.

Put awesome description of all the stuff going on here :)



In [113]:

    
# Convert the 'ts' field to an official datetime object
dataframes['http_log']['time'] = pd.to_datetime(dataframes['http_log']['ts'],unit='s')
dataframes['http_log']['time'].head()









    Out[113]:





0   2014-05-29 16:33:38.783838
1   2014-05-29 16:34:38.973012
2   2014-05-29 16:35:39.162410
3   2014-05-29 16:36:39.353959
4   2014-05-29 16:37:40.297491
Name: time, dtype: datetime64[ns]



In [114]:

    
# Explore pivoting and resampling
response_bytes = dataframes['http_log'][['time','resp_mime_types','response_body_len']]
response_bytes['response_body_len'] = response_bytes['response_body_len'].astype(int)
print response_bytes.head()
pivot = pd.pivot_table(response_bytes, rows='time', values='response_body_len', cols=['resp_mime_types'], aggfunc=sum)
sampled_bytes = pivot.resample('1Min', how='sum')
sampled_bytes.head()









    



                        time resp_mime_types  response_body_len
0 2014-05-29 16:33:38.783838      text/plain                 21
1 2014-05-29 16:34:38.973012      text/plain                 21
2 2014-05-29 16:35:39.162410      text/plain                 21
3 2014-05-29 16:36:39.353959      text/plain                 21
4 2014-05-29 16:37:40.297491      text/plain                 21

[5 rows x 3 columns]






    Out[114]:






  
    
      resp_mime_types
      -
      application/octet-stream
      application/x-font-ttf
      application/x-shockwave-flash
      application/xml
      binary
      image/gif
      image/jpeg
      image/png
      image/svg+xml
      image/x-icon
      text/html
      text/plain
      text/x-c
      video/mp4
    
    
      time
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2014-05-29 16:33:00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
         NaN
       NaN
         NaN
      NaN
        NaN
         NaN
           21
      NaN
      NaN
    
    
      2014-05-29 16:34:00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
         NaN
       NaN
         NaN
      NaN
        NaN
         NaN
           21
      NaN
      NaN
    
    
      2014-05-29 16:35:00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
         NaN
       NaN
         NaN
      NaN
        NaN
         NaN
           21
      NaN
      NaN
    
    
      2014-05-29 16:36:00
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
         NaN
       NaN
         NaN
      NaN
        NaN
         NaN
           21
      NaN
      NaN
    
    
      2014-05-29 16:37:00
        0
      NaN
      NaN
      NaN
      NaN
      NaN
       68841
       962
       23623
      NaN
       3382
       55238
       417659
      NaN
      NaN
    
  

5 rows × 15 columns



In [115]:

    
# Plotting defaults
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 12.0
plt.rcParams['figure.figsize'] = 12.0, 8.0



In [116]:

    
# Let plot it!
sampled_bytes.plot()









    Out[116]:





<matplotlib.axes.AxesSubplot at 0x110aa4c90>

				response_body_len
id.orig_h	host	id.resp_h	resp_mime_types
192.168.1.104	16518638.log.optimizely.com	174.129.203.102	image/gif	35
	16518638.log.optimizely.com	174.129.203.102	text/plain	4
	664902255.log.optimizely.com	107.22.231.18	image/gif	35
	664902255.log.optimizely.com	107.22.231.18	text/plain	2
	773-gon-065.mktoresp.com	199.15.215.178	image/gif	731
	ad.doubleclick.net	74.125.225.187	text/plain	2195
	ad.doubleclick.net	74.125.225.188	text/plain	2270
	akamaicovers.oreilly.com	23.4.141.110	image/jpeg	118472
	app-sjl.marketo.com	23.4.154.76	-	0
			text/html	1657
			text/plain	12905
	assets.neo4j.org	54.230.4.132	image/gif	214656
		54.230.4.132	image/png	1369054
		54.230.4.169	-	0
			image/gif	32454
			image/png	245753
	av.vimeo.com	96.17.111.57	video/mp4	4955490
	avpa.dzone.com	208.91.135.45	text/html	506
	avpa.dzone.com	208.91.135.45	text/plain	117
	b.scorecardresearch.com	184.84.180.43	-	0
		23.3.12.195	-	0
		23.3.12.201	-	0
		96.17.111.121	-	0
	b2c-mlm.marketo.com	173.203.143.212	-	0
	b2c-mlm.marketo.com	173.203.143.212	image/gif	258
	badge.stumbleupon.com	199.30.80.32	text/html	1035
	beacon-1.newrelic.com	50.31.164.168	text/plain	21
	beta.sylvadb.com	129.100.65.142	image/png	9555
	book.py2neo.org	162.209.114.75	-	0
			image/png	35626
			text/html	174762
			text/plain	55408
	bs.serving-sys.com	12.129.210.71	-	0
	bs.serving-sys.com	12.129.210.71	text/plain	14003
	c.betrad.com	23.4.150.212	-	0
	c.statcounter.com	216.59.38.124	image/gif	49
	careers.stackoverflow.com	198.252.206.17	-	0
	careers.stackoverflow.com	198.252.206.17	text/html	28505
	cdn.atdmt.com	23.3.68.179	image/gif	16690
	cdn.atdmt.com	93.184.215.201	image/gif	8704
	cdn.dzone.com	108.161.188.128	-	0
			binary	99280
			image/gif	2359
			image/jpeg	86344
			image/png	338086
			text/plain	688872
	cdn.flashtalking.com	216.38.160.128	image/gif	42739
			image/jpeg	147692
			text/plain	29718
	cdn.optimizely.com	72.21.91.19	text/plain	327815
	cdn.slidesharecdn.com	23.4.155.216	image/jpeg	138500
	cdn.softpedia.com	68.142.123.254	image/png	1677
	cdn.sstatic.net	190.93.246.58	text/plain	3725
	cdn.stumble-upon.com	199.30.80.32	image/png	1985
	data.cmcore.com	204.77.31.254	image/gif	86
	data.cmcore.com	74.121.133.1	image/gif	43
	doug1izaerwt3.cloudfront.net	54.230.6.114	-	0
	doug1izaerwt3.cloudfront.net	54.230.7.29	-	0
	ds.serving-sys.com	23.3.12.32	-	0
	ds.serving-sys.com	23.3.12.32	image/jpeg	55043
				...

	addl	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	notice	peer	ts	uid
0	-	-	-	-	-	unknown_protocol_2	F	bro	1.401381e+09	-
1	-	192.168.1.104	63522	96.17.111.50	80	above_hole_data_without_any_acks	F	bro	1.401381e+09	Cr8erB4kODWfLymHxg
2	-	192.168.1.104	63553	107.21.109.142	80	unescaped_special_URI_char	F	bro	1.401382e+09	CInmEQ2qPTLFUaUJgd
3	-	192.168.1.104	63561	69.172.216.56	80	unescaped_%_in_URI	F	bro	1.401382e+09	CsPWi63OSkkqS5bkT2
4	-	192.168.1.104	63565	69.172.216.56	80	unescaped_%_in_URI	F	bro	1.401382e+09	CpUvD0416gnFEY3S27
5	-	192.168.1.104	63565	69.172.216.56	80	unescaped_special_URI_char	F	bro	1.401382e+09	CpUvD0416gnFEY3S27
6	-	192.168.1.104	63567	69.172.216.111	80	unescaped_special_URI_char	F	bro	1.401382e+09	CPz0bF1lYoQ8MMjdZc
7	-	192.168.1.104	63570	74.125.225.187	80	unescaped_%_in_URI	F	bro	1.401382e+09	CyPWtb4Btqfxo9Rbv7
8	-	192.168.1.104	63585	69.172.216.111	80	unescaped_special_URI_char	F	bro	1.401382e+09	Cfjf9F4oM1IXnypuo4
9	-	192.168.1.104	63586	69.172.216.111	80	unescaped_special_URI_char	F	bro	1.401382e+09	ChFsjm27dRTqQaHEH
10	-	192.168.1.104	63588	69.172.216.111	80	unescaped_special_URI_char	F	bro	1.401382e+09	Ccdrtu3v0mZw804E81
11	-	192.168.1.104	63607	69.172.216.111	80	unescaped_special_URI_char	F	bro	1.401382e+09	C0vA5B35l8eojeAAQ
12	-	192.168.1.104	63545	190.104.31.170	80	bad_HTTP_request	F	bro	1.401382e+09	Ch8tcl22l2EgiV7dwh
13	-	192.168.1.104	63558	107.21.109.142	80	unescaped_special_URI_char	F	bro	1.401382e+09	C8gJEv1kMJhwG85Hmj
14	-	192.168.1.104	63782	192.0.80.175	80	above_hole_data_without_any_acks	F	bro	1.401382e+09	CsAIRmdNfA18AqWGi
15	-	192.168.1.104	63780	192.0.80.175	80	above_hole_data_without_any_acks	F	bro	1.401382e+09	CFkBvRNG5dNLudepe
16	-	-	-	-	-	unknown_protocol_2	F	bro	1.401382e+09	-
17	-	192.168.1.104	63867	179.24.226.184	80	bad_HTTP_request	F	bro	1.401382e+09	CSzfaoC9bkQ6q93je
18	-	192.168.1.104	63925	192.0.80.175	80	above_hole_data_without_any_acks	F	bro	1.401382e+09	CHjPrEuxwlcImsYqk
19	-	192.168.1.104	63922	192.0.80.175	80	above_hole_data_without_any_acks	F	bro	1.401382e+09	Cswz9T28tbPQ7kyBX3

resp_mime_types	-	application/octet-stream	application/x-font-ttf	application/x-shockwave-flash	application/xml	binary	image/gif	image/jpeg	image/png	image/svg+xml	image/x-icon	text/html	text/plain	text/x-c	video/mp4
time
2014-05-29 16:33:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	21	NaN	NaN
2014-05-29 16:34:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	21	NaN	NaN
2014-05-29 16:35:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	21	NaN	NaN
2014-05-29 16:36:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	21	NaN	NaN
2014-05-29 16:37:00	0	NaN	NaN	NaN	NaN	NaN	68841	962	23623	NaN	3382	55238	417659	NaN	NaN