Exploration of Live Network Tap

Like most people I were wondering "What is my laptop doing? Has it become a botnet? Should I stop downloading a bunch of weird stuff all the time? Will I ever move out of my Mom's basement?" but I digress..

This notebook is an exploration of my laptop's network usage using Workbench https://github.com/SuperCowPowers/workbench.git

Goals of the Exploration

We wanted to get a 'gist' of the network activity happening from a particular capture point, for this case a laptop, but the cpature point could be anywhere. Obviously there are super great tools that already perform exploration and analysis of pcaps: WireShark, Chop Shop, Scapy, blah, foo, etc.. here we leveraging Bro IDS to generate our starting-point data and then hoping off in various directions from there. The work here should be viewed as complimentary to these other tools :)

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server

In [20]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client(timeout=120)
c.connect("tcp://127.0.0.1:4242")


Out[20]:
[None]

So I'm confused what am I suppose to do with workbench?

  • Start with c.help(), look at the commands
  • Now add stuff to workbench, PCAPs, PE Files, PDFs, SWFs, whatever..
  • Workbench stores it in a database (MongoDB now, maybe Vertica later)
  • Make a work request see c.help_workers() for all the options
  • See Workbench Demo Notebook for a lot more info on using workbench.

In [21]:
# I forgot what stuff I can do with workbench
print c.help()


Welcome to Workbench: Here's a list of help commands:
	 - Run c.help_basic() for beginner help
	 - Run c.help_commands() for command help
	 - Run c.help_workers() for a list of workers
	 - Run c.help_advanced() for advanced help

See https://github.com/SuperCowPowers/workbench for more information


In this case we're going to skip a lot of intro material to Workbench, see the Workbench Demo Notebook for a startup guide.


Process the Streaming PCAP Data

Lets look at the PCAPs that are being tossed into workbench. A script in the utils directory called pcap_streamer.py will 'stream' PCAPs into workbench off of a live network interface. We can use the 'get_sample_window' call to workbench for it to give us the last 30 MB of streaming PCAPs


In [34]:
# Grab a range of pcaps in workbench (last 100 MegaBytes worth in this case)
pcap_md5s = c.get_sample_window('pcap', 50)
print 'Number of PCAPs %d' % len(pcap_md5s)


Number of PCAPs 54

In [35]:
# Workbench lets you store sample sets
pcap_set = c.store_sample_set(pcap_md5s)

In [36]:
# Now give us a HTTP graph of all the activities within that window of PCAPs.
# Workbench also has DNS and CONN graphs, but for now we're just interested in HTTP.
c.work_request('pcap_http_graph', pcap_set)


Out[36]:
{'pcap_http_graph': {'md5': 'b02cf3ec31a0fe27f309c54f77ddc9ac',
  'output': 'go to http://localhost:7474/browser and execute this query "match (s:origin), (t:file), p=allShortestPaths((s)--(t)) return p"'}}

Workbench + Neo4j = Awesome

The HTTP graph has quite a bit of info, but you can see that we've conducted a shortest paths search from all nodes of type 'origin' (any node originating http communications) to any node of type 'file'. So we're particularly interested in all of the various files that got downloaded from our network tap in the last few minutes.


In [108]:
# We can also ask workbench for a python dictionary of all the info from this set of (100MB) PCAPs,
# because sometimes visualization are useful and sometimes organized data is useful.
output = c.work_request('view_pcap_details', pcap_set)['view_pcap_details']
output


Out[108]:
{'bro_logs': {'conn_log': '53b02c226034fa4508d0ff384cd82381',
  'dns_log': '5056a27297490d61558a6efe6b028ea1',
  'dpd_log': '7ed68bcc653dc97f915f5c7bfb274b2b',
  'files_log': '678daeba4c4cb8839cbef82103d722b2',
  'http_log': '138ecd68a802bd82f8b7ae8852744fa9',
  'packet_filter_log': '53669ae86761ecdfe0304fe5e3123fe6',
  'ssl_log': 'c907419c6c2d929fe66afb8df3943f49',
  'weird_log': 'a97149b9761dff9b524520a1cbc29e36'},
 'extracted_files': [{'entropy': 6.245624377814982,
   'file_size': 42024,
   'file_type': 'TrueType font data',
   'md5': '12be067a6270759b4f861d64cc267166',
   'sha256': 'd5ec46188792388f1ef48c6421d25d73cad6765b9930defa1f28dc7d1790105a',
   'ssdeep': '768:Ricccw6N1wxkOYjked+6OWQOfba+YUMNhY921XS5uUx+qG+qjKylPfKyIreQ+yO6:EWGxkBjkepOEzaigm92XS5uUx+qG+qjm'},
  {'entropy': 7.980726705082393,
   'file_size': 29032,
   'file_type': 'Web Open Font Format, flavor 65536, length 29032, version 1.0',
   'md5': '9b6485fb804ec528b304bc5ba427a52d',
   'sha256': '7480b3771fd67569b3cc75979dfcf8bfdf973af2bc78f331fb93dbf2753dd73e',
   'ssdeep': '768:c8N64HKDhQI5vonULgW5gU154lRGus1uDjUBApCMiDvj:c/5qConULD154lRGugdBAravj'},
  {'entropy': 7.98023762110109,
   'file_size': 28496,
   'file_type': 'Web Open Font Format, flavor 65536, length 28496, version 1.0',
   'md5': '668a147a2d75a58822d1f09e16cf6bba',
   'sha256': 'd1b63bac95978c34df58b5e5afac40ba2c2ff986515dd5a47f86c5bb03e38685',
   'ssdeep': '768:Ma6C1UJ6Gy/J3UbWCUfNhEDaET3OEFTJ/B5FjG4ymDQ6:Ma11Vh/S5Us1jOyFFj/yQQ6'},
  {'entropy': 7.987978971601742,
   'file_size': 32020,
   'file_type': 'Web Open Font Format, flavor 65536, length 32020, version 0.0',
   'md5': 'a188c2f768ce5033d3f5d47be7280e25',
   'sha256': '8c44c3feedae5331a281278ea3ba91d2255928a2f3010d316d6fbb9052e0c2ec',
   'ssdeep': '768:ZeCMB4D5hQRxRkQBtiAN7LrIM0/B5md7YtRZgkyPJxbI6GGS:ZeC04DARxRjoA1fcB5KoRVeJe'},
  {'entropy': 3.907528856075129,
   'file_size': 4286,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': 'e788c077dd2498aaeabae414af20e0c0',
   'sha256': '7b174566bbb7e5920ca79912b055757b06bd254ba501e0b273cc9b074cbb3ac1',
   'ssdeep': '48:fWPUND3V5uWQy4FgBwjovZ9u0ZYhdby1FpUl/F+7:u8Dl5uWQy4Fcw8vZ9VwwFpUl9y'},
  {'entropy': 5.308834490275621,
   'file_size': 345,
   'file_type': 'XML document text',
   'md5': 'a7b900bec0b7b386dfd18ad22c9ed411',
   'sha256': 'd9f7e0aa1bff501986995b7c69742a14f373819ab6ecd599af29d67f9d8b4794',
   'ssdeep': '6:TMVBdoIUnWn8FX0wa9Fgc4svquXsLwFcn4mc4sVI/iHI0aXgsoGH4CmL0Xgs0JQL:TMHdoIWWnMEwKFcuX4wp57iwsoTCmL0P'},
  {'entropy': 4.936019556724105,
   'file_size': 1363,
   'file_type': 'XML document text',
   'md5': 'ecc6377d393d709dcbcdae8798fd06d8',
   'sha256': '8fce4a8b4f926a3fc202d65d8dbf1d70975bb91f7b06bfb5b149b9a020b8f36f',
   'ssdeep': '24:zAaxNjZnQNjGnQNjrnQNjQnKsGVmvmg8B9CUm8p:FNqNpNcNSemvmtB9+m'},
  {'entropy': 6.297298526914034,
   'file_size': 37336,
   'file_type': 'TrueType font data',
   'md5': '2e98fc3ce85f31f63010b706259cb604',
   'sha256': '3fc333eb3107febd406586ee8206bc0ee2aeb7f6c7a77f3923a353b72b0ca080',
   'ssdeep': '768:7X+cccw6N1sXef6IkZKhGfzKWGgSy3jnXqOvKz669MLJcS//9dG:TKW2vIGkqzKWGgJ3jnXdvKz66eVcS/1Y'},
  {'entropy': 3.2961625571831026,
   'file_size': 5430,
   'file_type': 'MS Windows icon resource - 2 icons, 16x16, 256-colors',
   'md5': 'a300693728f5caa531a6886d9b8f38c2',
   'sha256': 'aab089af3b8390a350352b5b7900f5747ba57ef1caf4120cced745518e8b5477',
   'ssdeep': '24:EmiJT5aysE6HpisMN3dhZxpvjEAPtxnDb/xORpcF8G+stFMPl33i62gyFWahTjjm:/a5hspKBxGAPtlXERp7G+stF2MWIul'},
  {'entropy': 6.258940975957493,
   'file_size': 3382,
   'file_type': 'MS Windows icon resource - 2 icons, 32x32, 256-colors',
   'md5': '35f17aa38ef89a37c4013de2d17aab6f',
   'sha256': 'fcea740992d0bd5ba0d3e9dbc91d11d0ed192a6c52fd5ecb39138c8bd927ca1f',
   'ssdeep': '96:gSLCFcGDINyncI+vLduRAqKpERL3LiNKPT0i:BLCFcGDINynT+vZUAqKeRLmNKPJ'},
  {'entropy': 3.6954813851810218,
   'file_size': 1150,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '1a1eb5bb0cc75b6506f0019da0c7f21f',
   'sha256': '336368f42787721f0ef3a21dc9a4cfa3a4fd648e2dd4c7c4340119304612b3b1',
   'ssdeep': '12:HaECUuYe4444sY6QBFsb4m/1swMx+1NbXzOO:ril4444sY7BFvwj19X'},
  {'entropy': 7.9883310766863005,
   'file_size': 41752,
   'file_type': 'Web Open Font Format, flavor 65536, length 41752, version 1.0',
   'md5': '04b9bfc362dcb9bc999c7d1bcb44a942',
   'sha256': 'd45f5fb1fb4e1a101a8ad8722af443272f6c3d409d912e8175e6268d48e0b091',
   'ssdeep': '768:wMxLyqPx8N3lhJB7jDUCj+4jaXJ+I0c2dY924OqtVWviGvNhzfD2iij4:IqPx+lhv7jQCj+4GZ+fc2d+O+VWFpD04'},
  {'entropy': 3.6844259005340536,
   'file_size': 1150,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '386297e91ea17bbc79f08166fbb33efd',
   'sha256': '3a62a1c2bc55c2c51b61addc834aa8061f164b8f2a8fa84d2047a0990cf9ca18',
   'ssdeep': '12:XBwHYDk/lBYpvVmczEY7cQyyyoX7x9H82v:XIYg/lBWEXY7yyye7XP'},
  {'entropy': 7.97373633026029,
   'file_size': 21444,
   'file_type': 'Web Open Font Format, flavor 65536, length 21444, version 1.1',
   'md5': '9766098054494741d153b60206e33f89',
   'sha256': '546c3593de0333eccfef2d0ddb7c1331e456f2304446ed386aacbc5b3bad64bf',
   'ssdeep': '384:+ywm9gOSC7IEHv2gkpN1nC1w6QCj0WRnEalg5:L9gOS4HCFChxjLRO5'},
  {'entropy': 5.908739848449175,
   'file_size': 11078,
   'file_type': 'MS Windows icon resource - 4 icons, 16x16, 256-colors',
   'md5': '943882776674997378513c8831392441',
   'sha256': '0bc5af02a6c79f9cff2c743cbb54e73ca554b038eef49e6d66550617f96ad884',
   'ssdeep': '192:6TaVJfo1nHW/W0GyJpGHx9qVE8Oc93xn2JPHlVC:6OzgBHWu+3GHx9qVYcdOdV'},
  {'entropy': 3.8918131277677763,
   'file_size': 4286,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': 'b5e4f8f388ba118d602564b52309a262',
   'sha256': '33fe1bce4ef9232926e34d236bddc8cc4bea7a54484918311f7f1ee48b4765e0',
   'ssdeep': '48:ixXCvHXti29ZBFpIwvC0gGlACnI7t978AZi0vx0Fuqoexnp:iVC/999pIwvWJ978B0vx0FEeX'},
  {'entropy': 7.988941525087981,
   'file_size': 23118,
   'file_type': 'Macromedia Flash data (compressed), version 9',
   'md5': '2e841d012d10130d8656ad6482ad88a4',
   'sha256': 'a054812bfac083ac0343834abdfb0804054735de27a4a3d2be910a0a20ec136b',
   'ssdeep': '384:qn+Gc/3LiVzD1u85FL5BC1H5ic7KAuiNmpWN32UZt+6LBSy4x:i4uVf1l5FL5B4X7fvPNGut+6LcyY'},
  {'entropy': 7.089124296559591,
   'file_size': 463,
   'file_type': 'data',
   'md5': '21a156f207dfcde876af3830372307de',
   'sha256': '904881c90a884a2614b4221b04aca7528222a89f96bc471e13ac23fe861ee742',
   'ssdeep': '12:hPeVGSoPv95yP6VGjP3I9a9pCHFYa8YLXOpHNKY+URwQgR5Dw:hPeczPltcjP31CHFd3OLDhgfE'},
  {'entropy': 6.309014565293686,
   'file_size': 38232,
   'file_type': 'TrueType font data',
   'md5': '488d5cc145299ba07b75495100419ee6',
   'sha256': 'dee2d2b7658161d7efa0dede8298b64bf88c8bc1fea782fc10468c9269e78d4a',
   'ssdeep': '768:JF4rcccw6N1QvZSWOMcvnnPCJXQ4ezeWBT2KDHqNRqW:b4lW+v8WOMcfqJXBezeWBT2KDHqNp'},
  {'entropy': 5.101549039629513,
   'file_size': 9662,
   'file_type': 'MS Windows icon resource - 1 icon',
   'md5': '173ce0b3b05f0743fd082723179d3fbf',
   'sha256': '4ab6aef8efee4fb811d1bee5b39f4ab11f8282856576978037bedfa99e804ac0',
   'ssdeep': '96:9FX7Qn/QF8h2+NEMsBTrSrBOtOxObSFSjCOYzYYsu2U0rWm6MfZ7aFa:PMSk2GIBTrSr/cVyPsrWm6MxaFa'},
  {'entropy': 5.475841948176977,
   'file_size': 21207,
   'file_type': 'C source, ASCII text, with very long lines',
   'md5': '382a9d8819ad21143fd690782cd1e8ca',
   'sha256': '5a5e5f83b7367c19a7ce084e6b0ee80d953ed8ed8909db48ed067d60ca434b6b',
   'ssdeep': '384:qr26X4PPpp3EE/xIyDYT/Nj7zmjUj+jcjfjfjujujzjc8XQ8BTnHXuoad2eSHn/R:7p8jHm4K4Lj6yvwf8HydmN'},
  {'entropy': 5.308834490275621,
   'file_size': 345,
   'file_type': 'XML document text',
   'md5': 'a7b900bec0b7b386dfd18ad22c9ed411',
   'sha256': 'd9f7e0aa1bff501986995b7c69742a14f373819ab6ecd599af29d67f9d8b4794',
   'ssdeep': '6:TMVBdoIUnWn8FX0wa9Fgc4svquXsLwFcn4mc4sVI/iHI0aXgsoGH4CmL0Xgs0JQL:TMHdoIWWnMEwKFcuX4wp57iwsoTCmL0P'},
  {'entropy': 5.310373741546879,
   'file_size': 163,
   'file_type': 'data',
   'md5': '2024ea458402395d7ce97d7becda1be9',
   'sha256': 'e32534b3191baebcbc72c680b5cd4e0329ea1b241dc15714cb32bb81c41b5b6f',
   'ssdeep': '3:yivnt6/ly7tFffMLts5OCAadCmy42/uDlhlbqYKAdXm/3l81xqCKup:pn0/8xfMRfC19s/6TFXm/W1xqC5p'},
  {'entropy': -0.0,
   'file_size': 1,
   'file_type': 'very short file (no magic)',
   'md5': '7215ee9c7d9dc229d2921a40e899ec5f',
   'sha256': '36a9e7f1c95b82ffb99743e0c5c4ce95d83c9a430aac59f84ef3cbfab6145068',
   'ssdeep': '3:F:F'},
  {'entropy': 4.845301933698079,
   'file_size': 6636,
   'file_type': 'SVG Scalable Vector Graphics image',
   'md5': 'c01cc1cbf05ba8f782bf709168425535',
   'sha256': '4a7df98b2963b14290e39b88cf700121f6c84c106f169b14cb491a0dbf94a580',
   'ssdeep': '96:QXQSfRNqaeYdPrwekQNeymAqX4keWDWk3a8ALbeZ6bppeUg4vO8ZnjE:cQARNqaeQrIkeyhYrylCsnw'},
  {'entropy': 2.8024474140215836,
   'file_size': 1150,
   'file_type': 'floppy with old FAT filesystem, Media descriptor 0xfb',
   'md5': 'de6a154e38bd211276e193a523e8bc98',
   'sha256': 'd2fa61107c4e7bfca982ea0427a2ea37c8826391e515d6868c576db5cb4a2e06',
   'ssdeep': '12:WA2gCZ4HoOateadateapeatataqWKnotlRXi5vgw:WAk54Ggw'},
  {'entropy': 1.2843393639542855,
   'file_size': 3638,
   'file_type': 'MS Windows icon resource - 2 icons, 32x32, 256-colors',
   'md5': '59a0c7b6e4848ccdabcea0636efda02b',
   'sha256': 'a1495da3cf3db37bf105a12658636ff628fee7b73975b9200049af7747e60b1f',
   'ssdeep': '6:NXulKltegZ//OekukCS4kdxpHIWvUkt/ctmnzteghFnUtC+i/T2MWFetk/m+:NaKXe2m5CREDssfnxeo/2XUKu+'},
  {'entropy': 7.970454777158688,
   'file_size': 4955490,
   'file_type': 'ISO Media, MPEG v4 system, version 2',
   'md5': '0f9e7b97671775a3fb29c36557cf662b',
   'sha256': '7dc55fa7a6e60a63ace66639a1b148e1733647044ee9b0ade0c4b7d87bc074cf',
   'ssdeep': '98304:Fxc89GCIa5m7IAXPu5m7vq8njsZN5rL/g/1FCF9bW5UDpRAiqFMyPEO5sb:DH/II6nvqCj2N5rL/gHCFpepEOE'}],
 'md5': '8518f69c4674db948c5e157bd8010adb'}

If the next line of code doesn't blow your mind, you aren't paying attention!

All of the bro logs are streamed from server to client with NETWORK STREAMING GENERATORS, those highly efficient generators are zero-copy and stream data directly into Pandas Dataframes.

For more on client/server generators and client-contructed/server-executed generator pipelines see our super spiffy Generator Pipelines notebook.


In [109]:
# Critical Code: Transition from Bro logs to Pandas Dataframes
# This one line of code populates dataframes from the Bro logs, 
# streaming client/server generators, zero-copy, efficient, awesome...
import pandas as pd
dataframes = {name:pd.DataFrame(c.stream_sample(bro_log)) for name, bro_log in output['bro_logs'].iteritems()}

Lets look at the Data

We're going to use some nice functionality in the Pandas dataframe to look at our network data, specifically we're going to group by origin, host, host-ip, and mime_type. The last column represents the aggregated sum of response_body_len.

This type of operation is really just scratching the surface when it comes to dataframes, so quickly and efficiently populating a dataframe is super awesome.


In [110]:
# Now we group by host and show the different response mime types for each host
group_host = dataframes['http_log'].groupby(['id.orig_h','host','id.resp_h','resp_mime_types'])[['response_body_len']].sum()
group_host.head(100)


Out[110]:
response_body_len
id.orig_h host id.resp_h resp_mime_types
192.168.1.104 16518638.log.optimizely.com 174.129.203.102 image/gif 35
text/plain 4
664902255.log.optimizely.com 107.22.231.18 image/gif 35
text/plain 2
773-gon-065.mktoresp.com 199.15.215.178 image/gif 731
ad.doubleclick.net 74.125.225.187 text/plain 2195
74.125.225.188 text/plain 2270
akamaicovers.oreilly.com 23.4.141.110 image/jpeg 118472
app-sjl.marketo.com 23.4.154.76 - 0
text/html 1657
text/plain 12905
assets.neo4j.org 54.230.4.132 image/gif 214656
image/png 1369054
54.230.4.169 - 0
image/gif 32454
image/png 245753
av.vimeo.com 96.17.111.57 video/mp4 4955490
avpa.dzone.com 208.91.135.45 text/html 506
text/plain 117
b.scorecardresearch.com 184.84.180.43 - 0
23.3.12.195 - 0
23.3.12.201 - 0
96.17.111.121 - 0
b2c-mlm.marketo.com 173.203.143.212 - 0
image/gif 258
badge.stumbleupon.com 199.30.80.32 text/html 1035
beacon-1.newrelic.com 50.31.164.168 text/plain 21
beta.sylvadb.com 129.100.65.142 image/png 9555
book.py2neo.org 162.209.114.75 - 0
image/png 35626
text/html 174762
text/plain 55408
bs.serving-sys.com 12.129.210.71 - 0
text/plain 14003
c.betrad.com 23.4.150.212 - 0
c.statcounter.com 216.59.38.124 image/gif 49
careers.stackoverflow.com 198.252.206.17 - 0
text/html 28505
cdn.atdmt.com 23.3.68.179 image/gif 16690
93.184.215.201 image/gif 8704
cdn.dzone.com 108.161.188.128 - 0
binary 99280
image/gif 2359
image/jpeg 86344
image/png 338086
text/plain 688872
cdn.flashtalking.com 216.38.160.128 image/gif 42739
image/jpeg 147692
text/plain 29718
cdn.optimizely.com 72.21.91.19 text/plain 327815
cdn.slidesharecdn.com 23.4.155.216 image/jpeg 138500
cdn.softpedia.com 68.142.123.254 image/png 1677
cdn.sstatic.net 190.93.246.58 text/plain 3725
cdn.stumble-upon.com 199.30.80.32 image/png 1985
data.cmcore.com 204.77.31.254 image/gif 86
74.121.133.1 image/gif 43
doug1izaerwt3.cloudfront.net 54.230.6.114 - 0
54.230.7.29 - 0
ds.serving-sys.com 23.3.12.32 - 0
image/jpeg 55043
...

100 rows × 1 columns


In [111]:
# Now we group by host and show the different response mime types for each host
group_host = dataframes['http_log'].groupby(['host','id.resp_h','resp_mime_types','uri'])[['response_body_len']].sum()
group_host.head(50)


Out[111]:
response_body_len
host id.resp_h resp_mime_types uri
0.gravatar.com 2606:2800:220:bf1:95:a65:51f:1a94 - /css/hovercard.css?ver=201422x 0
/css/services.css?ver=201422x 0
/js/gprofiles.js?ver=201422x 0
image/jpeg /avatar/65dc2ca9ccb63a7004ff934f5501d576?s=32&d=identicon&r=G 988
/avatar/65dc2ca9ccb63a7004ff934f5501d576?s=64&d=identicon&r=G 1477
image/png /avatar/31c24d77a26eb5e3d35b537f19cbe360?s=32&d=identicon&r=G 2284
/avatar/31c24d77a26eb5e3d35b537f19cbe360?s=64&d=identicon&r=G 7860
/avatar/335410d8b3c50577cf0ed33567a3862e?s=32&d=identicon&r=G 646
/avatar/335410d8b3c50577cf0ed33567a3862e?s=64&d=identicon&r=G 1300
/avatar/3dc0bab46bc3dfa2b7175d6d629abf3f?s=32&d=identicon&r=G 810
/avatar/3dc0bab46bc3dfa2b7175d6d629abf3f?s=64&d=identicon&r=G 1797
/avatar/60b0ef6e1bbdbac2049c933c7c4e7fdd?s=32&d=identicon&r=G 1279
/avatar/60b0ef6e1bbdbac2049c933c7c4e7fdd?s=64&d=identicon&r=G 2885
/avatar/9e32ff422dd2c6155bbb43800b98c4d3?s=32&d=identicon&r=G 645
/avatar/9e32ff422dd2c6155bbb43800b98c4d3?s=64&d=identicon&r=G 1207
/avatar/fd9c7194bbb73ea2f186928424b721f6?s=32&d=identicon&r=G 1017
/avatar/fd9c7194bbb73ea2f186928424b721f6?s=64&d=identicon&r=G 1887
1.bp.blogspot.com 2607:f8b0:400f:800::100b image/png /-TK4TVif-FGU/T0Gtlv4zk7I/AAAAAAAAAMw/F-OvNpHcByM/s320/neo4j_sample.png 36931
1.gravatar.com 2606:2800:220:bf1:95:a65:51f:1a94 - /avatar/ad516503a11cd5ca435acc9bb6523536?s=25&d=identicon&forcedefault=y&r=G 0
/avatar/ad516503a11cd5ca435acc9bb6523536?s=54&d=identicon&forcedefault=y&r=G 0
image/jpeg /avatar/a805ec88c0754455cac9b5655dea230e?s=32&d=identicon&r=G 1294
/avatar/a805ec88c0754455cac9b5655dea230e?s=64&d=identicon&r=G 2720
image/png /avatar/11ec004db919032b93195fc5d2555fea?s=32&d=identicon&r=G 276
/avatar/11ec004db919032b93195fc5d2555fea?s=64&d=identicon&r=G 391
/avatar/43618511463276f21c363c8079ad5a5b?s=32&d=identicon&r=G 622
/avatar/43618511463276f21c363c8079ad5a5b?s=64&d=identicon&r=G 1445
/avatar/a614ce207462e4f3fe33273c0deccde7?s=32&d=identicon&r=G 946
/avatar/a614ce207462e4f3fe33273c0deccde7?s=64&d=identicon&r=G 2105
/blavatar/babef735719ccc14fa9a281e6de2fe08?s=32 2497
/blavatar/babef735719ccc14fa9a281e6de2fe08?s=64 8650
image/x-icon /blavatar/b8c1a3c32665c00621a27ee5fc4e51ce?s=16 11078
16518638.log.optimizely.com 174.129.203.102 image/gif /event?a=16518638&d=13765676&y=false&n=engagement&g=28357548&u=oeu1401383749478r0.00880544213578105&t=1401383754150&f=1026380119 35
text/plain /event?a=16518638&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383749484&f= 2
/event?a=16518638&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383750032&f=1026380119 2
2.bp.blogspot.com 2607:f8b0:400f:801::100b image/png /--ghj75EWkyc/UnjtpNV657I/AAAAAAAALiU/UYua5dc4fPk/s1600/Screenshot_11_5_13_2_07_PM.png 779602
/-yz4sG2of89Y/UG4TQ5QsgDI/AAAAAAAAAV8/apaI68-NH5U/s1600/ftd-banner.png 45874
2.gravatar.com 2606:2800:220:bf1:95:a65:51f:1a94 image/jpeg /avatar/86006d83b5bf07a88ef95dfd001609f8?s=32&d=identicon&r=G 1078
/avatar/86006d83b5bf07a88ef95dfd001609f8?s=64&d=identicon&r=G 1810
/avatar/e8dd547cc3ce614c9662b891791349d4?s=32&d=identicon&r=G 1266
/avatar/e8dd547cc3ce614c9662b891791349d4?s=64&d=identicon&r=G 2488
image/png /avatar/58750f2179edbd650b471280aa66fee5?s=32&d=identicon&r=G 2501
/avatar/58750f2179edbd650b471280aa66fee5?s=64&d=identicon&r=G 8411
/avatar/b6e56aafe01b6dfb69550be92e8db136?s=32&d=identicon&r=G 633
/avatar/b6e56aafe01b6dfb69550be92e8db136?s=64&d=identicon&r=G 1225
4.bp.blogspot.com 2607:f8b0:400f:801::100c image/jpeg /_NtoTtHZadHE/SZcxuOA6QmI/AAAAAAAAAAk/Y9hjo6lyCI4/S45/hendy-sitting_square.jpg 1619
664902255.log.optimizely.com 107.22.231.18 image/gif /event?a=664902255&d=13765676&y=false&n=engagement&g=634652246&u=oeu1401383749478r0.00880544213578105&t=1401383754146&f= 35
text/plain /event?a=664902255&d=13765676&y=false&n=http://java.dzone.com/articles/storm-neo4j-and-python-real&u=oeu1401383749478r0.00880544213578105&wxhr=true&t=1401383749484&f= 2
773-gon-065.mktoresp.com 199.15.215.178 image/gif /webevents/clickLink?_mchNc=1401381694487&_mchCn=&_mchHr=http://www.neo4j.org/learn/licensing&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/develop/shell&_mchPc=http:&_mchVr=142 43
/webevents/clickLink?_mchNc=1401381750537&_mchCn=&_mchHr=http://www.neo4j.org/learn/apps&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/develop/shell&_mchPc=http:&_mchVr=142 43
/webevents/clickLink?_mchNc=1401381765145&_mchCn=&_mchHr=http://maxdemarzi.com/2012/08/17/neosocial-connecting-to-facebook-with-neo4j/&_mchId=773-GON-065&_mchTk=_mch-neo4j.org-1395320035524-69364&_mchHo=www.neo4j.org&_mchPo=&_mchRu=/learn/apps&_mchPc=http:&_mchVr=142 43

50 rows × 1 columns


In [112]:
# Look at Weird logs
dataframes['weird_log'].head(20)


Out[112]:
addl id.orig_h id.orig_p id.resp_h id.resp_p name notice peer ts uid
0 - - - - - unknown_protocol_2 F bro 1.401381e+09 -
1 - 192.168.1.104 63522 96.17.111.50 80 above_hole_data_without_any_acks F bro 1.401381e+09 Cr8erB4kODWfLymHxg
2 - 192.168.1.104 63553 107.21.109.142 80 unescaped_special_URI_char F bro 1.401382e+09 CInmEQ2qPTLFUaUJgd
3 - 192.168.1.104 63561 69.172.216.56 80 unescaped_%_in_URI F bro 1.401382e+09 CsPWi63OSkkqS5bkT2
4 - 192.168.1.104 63565 69.172.216.56 80 unescaped_%_in_URI F bro 1.401382e+09 CpUvD0416gnFEY3S27
5 - 192.168.1.104 63565 69.172.216.56 80 unescaped_special_URI_char F bro 1.401382e+09 CpUvD0416gnFEY3S27
6 - 192.168.1.104 63567 69.172.216.111 80 unescaped_special_URI_char F bro 1.401382e+09 CPz0bF1lYoQ8MMjdZc
7 - 192.168.1.104 63570 74.125.225.187 80 unescaped_%_in_URI F bro 1.401382e+09 CyPWtb4Btqfxo9Rbv7
8 - 192.168.1.104 63585 69.172.216.111 80 unescaped_special_URI_char F bro 1.401382e+09 Cfjf9F4oM1IXnypuo4
9 - 192.168.1.104 63586 69.172.216.111 80 unescaped_special_URI_char F bro 1.401382e+09 ChFsjm27dRTqQaHEH
10 - 192.168.1.104 63588 69.172.216.111 80 unescaped_special_URI_char F bro 1.401382e+09 Ccdrtu3v0mZw804E81
11 - 192.168.1.104 63607 69.172.216.111 80 unescaped_special_URI_char F bro 1.401382e+09 C0vA5B35l8eojeAAQ
12 - 192.168.1.104 63545 190.104.31.170 80 bad_HTTP_request F bro 1.401382e+09 Ch8tcl22l2EgiV7dwh
13 - 192.168.1.104 63558 107.21.109.142 80 unescaped_special_URI_char F bro 1.401382e+09 C8gJEv1kMJhwG85Hmj
14 - 192.168.1.104 63782 192.0.80.175 80 above_hole_data_without_any_acks F bro 1.401382e+09 CsAIRmdNfA18AqWGi
15 - 192.168.1.104 63780 192.0.80.175 80 above_hole_data_without_any_acks F bro 1.401382e+09 CFkBvRNG5dNLudepe
16 - - - - - unknown_protocol_2 F bro 1.401382e+09 -
17 - 192.168.1.104 63867 179.24.226.184 80 bad_HTTP_request F bro 1.401382e+09 CSzfaoC9bkQ6q93je
18 - 192.168.1.104 63925 192.0.80.175 80 above_hole_data_without_any_acks F bro 1.401382e+09 CHjPrEuxwlcImsYqk
19 - 192.168.1.104 63922 192.0.80.175 80 above_hole_data_without_any_acks F bro 1.401382e+09 Cswz9T28tbPQ7kyBX3

20 rows × 10 columns

So what's my laptop doing?

Now that we're looking at both the graph and the organized dataframe, lets investigate the traffic we're seeing from the network tap.

Put awesome description of all the stuff going on here :)


In [113]:
# Convert the 'ts' field to an official datetime object
dataframes['http_log']['time'] = pd.to_datetime(dataframes['http_log']['ts'],unit='s')
dataframes['http_log']['time'].head()


Out[113]:
0   2014-05-29 16:33:38.783838
1   2014-05-29 16:34:38.973012
2   2014-05-29 16:35:39.162410
3   2014-05-29 16:36:39.353959
4   2014-05-29 16:37:40.297491
Name: time, dtype: datetime64[ns]

In [114]:
# Explore pivoting and resampling
response_bytes = dataframes['http_log'][['time','resp_mime_types','response_body_len']]
response_bytes['response_body_len'] = response_bytes['response_body_len'].astype(int)
print response_bytes.head()
pivot = pd.pivot_table(response_bytes, rows='time', values='response_body_len', cols=['resp_mime_types'], aggfunc=sum)
sampled_bytes = pivot.resample('1Min', how='sum')
sampled_bytes.head()


                        time resp_mime_types  response_body_len
0 2014-05-29 16:33:38.783838      text/plain                 21
1 2014-05-29 16:34:38.973012      text/plain                 21
2 2014-05-29 16:35:39.162410      text/plain                 21
3 2014-05-29 16:36:39.353959      text/plain                 21
4 2014-05-29 16:37:40.297491      text/plain                 21

[5 rows x 3 columns]
Out[114]:
resp_mime_types - application/octet-stream application/x-font-ttf application/x-shockwave-flash application/xml binary image/gif image/jpeg image/png image/svg+xml image/x-icon text/html text/plain text/x-c video/mp4
time
2014-05-29 16:33:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 21 NaN NaN
2014-05-29 16:34:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 21 NaN NaN
2014-05-29 16:35:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 21 NaN NaN
2014-05-29 16:36:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 21 NaN NaN
2014-05-29 16:37:00 0 NaN NaN NaN NaN NaN 68841 962 23623 NaN 3382 55238 417659 NaN NaN

5 rows × 15 columns


In [115]:
# Plotting defaults
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.size'] = 12.0
plt.rcParams['figure.figsize'] = 12.0, 8.0

In [116]:
# Let plot it!
sampled_bytes.plot()


Out[116]:
<matplotlib.axes.AxesSubplot at 0x110aa4c90>