Analysing network traffic with Pandas

Origianl: Dirk Loss, http://dirk-loss.de, @dloss. v1.1, 2013-06-02

Modified for Python3 on Win32 & further modified by: William George 2015-04-20


In [16]:
# This whole business is totally unnecessary if you're path is setup right.  But if it's not,
#  this is probably easier than actually fixing it.
%load_ext autoreload
import os
wireshark_path = "C:\\Program Files\\Wireshark\\" + os.pathsep
# or, if it's under 'program files(x86)'...
# wireshark_path = "C:\\Program Files (x86)\\Wireshark\\" + os.pathsep
os.environ['path'] += wireshark_path


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [17]:
from utilities import *
from pprint import *

In [18]:
%autoreload

In [19]:
pcap_folder = 'C:\\Users\\william.george\\Desktop\\SUA-Test-Data\\'
os.chdir(pcap_folder)
os.getcwd()
!dir


 Volume in drive C is Windows
 Volume Serial Number is D4FE-4E05

 Directory of C:\Users\william.george\Desktop\SUA-Test-Data

04/21/2015  06:48 AM    <DIR>          .
04/21/2015  06:48 AM    <DIR>          ..
04/21/2015  07:07 AM           468,926 frame.len
04/15/2015  04:00 PM            18,178 iperf3output.txt
04/15/2015  03:18 PM        10,485,851 test_1_1_210103.pcap
04/15/2015  03:16 PM         6,128,745 test_1_2_210348.pcap
04/15/2015  04:43 PM         8,255,292 test_1_filtered.pcap
04/16/2015  06:48 AM        16,614,572 test_1_merge.pcap
04/15/2015  03:28 PM        10,486,299 test_2_1_212413.pcap
04/15/2015  03:28 PM        10,486,692 test_2_2_212526.pcap
04/15/2015  03:28 PM        10,487,112 test_2_3_212601.pcap
04/15/2015  03:29 PM         6,708,306 test_2_4_212632.pcap
04/15/2015  04:46 PM        16,029,313 test_2_filtered.pcap
04/16/2015  06:49 AM        38,168,337 test_2_merge.pcap
04/15/2015  04:03 PM        10,486,633 test_3_1_215918.pcap
04/15/2015  04:03 PM        10,486,078 test_3_2_220006.pcap
04/15/2015  04:03 PM         2,248,341 test_3_3_220044.pcap
04/15/2015  04:48 PM        19,801,173 test_3_filtered.pcap
04/16/2015  06:50 AM        23,221,004 test_3_merge.pcap
              17 File(s)    200,580,852 bytes
               2 Dir(s)  169,674,063,872 bytes free

In [77]:
pcap_file = pcap_folder + 'test_2_merge.pcap'
output_file = pcap_folder + 'frame.len'

In [ ]:
!tshark -n -r $pcap_file -T fields -Eheader=y -e frame.number -e frame.len > $output_file

Let's have a look at the file:


In [21]:
import pandas as pd

Plotting

For a better overview, we plot the frame length over time.

We initialise IPython to show inline graphics:


In [22]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

Set a figure size in inches:


In [23]:
figsize(17,10)

Pandas automatically uses Matplotlib for plotting. We plot with small dots and an alpha channel of 0.2:

So there are always lots of small packets (< 100 bytes) and lots of large packets (> 1400 bytes). Some bursts of packets with other sizes (around 400 bytes, 1000 bytes, etc.) can be clearly seen.

A Python function to read PCAP files into Pandas DataFrames

Passing all those arguments to tshark is quite cumbersome. Here is a convenience function that reads the given fields into a Pandas DataFrame:


In [60]:
import subprocess
import datetime
import pandas as pd

def read_pcap(filename, fields=[], display_filter=[], 
              timeseries=False, strict=False, outfile=None):
    """ Read PCAP file into Pandas DataFrame object. 
    Uses tshark command-line tool from Wireshark.

    filename:       Name or full path of the PCAP file to read
    fields:         List of fields to include as columns
    display_filter: Additional filter to restrict frames
    strict:         Only include frames that contain all given fields 
                    (Default: false)
    timeseries:     Create DatetimeIndex from frame.time_epoch 
                    (Default: false)

    Syntax for fields and display_filter is specified in
    Wireshark's Display Filter Reference:
 
      http://www.wireshark.org/docs/dfref/
    """
    if timeseries:
        fields = ["frame.time_epoch"] + fields
    fieldspec = " ".join("-e %s" % f for f in fields)

    display_filters = fields if strict else ['']
    if display_filter:
        display_filters += display_filter
        display_filters = list(filter(None, display_filters))
    
    # display_filter is concatenated with ' and '.  If one or more filters 
    #     need to be 'ORed' togeather, then supply them as a single string
    #     e.g. ['frame.len > 60', '(ip.addr == 10.10.10.10 or ip.addr == 20.20.20.20)'] 
    #     gives '-2 -R "frame.len > 60 and (ip.addr == 10.10.10.10 or ip.addr == 20.20.20.20)"'
    
    filterspec = '-2 -R "%s"' % " and ".join(f for f in display_filters)

    options = "-r %s -n -T fields -Eheader=y" % filename
    cmd = "tshark %s %s %s" % (options, filterspec, fieldspec)
    
    print('filterspec:{0}\n'.format(filterspec),
          'display_filters:{0}\n'.format(display_filters),
          'options:{0}\n'.format(options),
          'cmd:{0}\n'.format(cmd)
         )
    
    proc_arguments = {'shell': True}
    if outfile is not None:
        with open(outfile, 'w') as f:
            proc_arguments['stdout'] = f
            proc = subprocess.Popen(cmd, **proc_arguments)
        return outfile
    else:
        proc_arguments['stdout'] = subprocess.PIPE
        proc = subprocess.Popen(cmd, **proc_arguments)
    
    if timeseries:
        df = pd.read_table(proc.stdout, 
                        index_col = "frame.time_epoch", 
                        parse_dates=True, 
                        date_parser=datetime.datetime.fromtimestamp)
    else:
        df = pd.read_table(proc.stdout,
                          parse_dates='frame.time_epoch',
                          date_parser=datetime.datetime.fromtimestamp)
    return df

We will use this function in my further analysis.

Bandwidth

By summing up the frame lengths we can calculate the complete (Ethernet) bandwidth used. First use our convenience function to read the PCAP into a DataFrame:


In [ ]:
# # original read call
# df=read_pcap(pcap_file, fields = ["frame.len", "ip.src", "ip.dst", 'tcp.stream', 'tcp.srcport', 'tcp.dstport'], timeseries=True).dropna()
# df

df=read_pcap(pcap_file, fields = ["frame.len", "ip.src", "ip.dst", 'tcp.stream', 'tcp.srcport', 'tcp.dstport'], display_filter=['ip', 'tcp'], timeseries=True, outfile=output_file)

In [154]:
df = pd.read_table(output_file, names=['time','len','ip.src','ip.dst','stream','tcp.src', 'tcp.dst'], skiprows=1)
import dateutil
sample_time = 1429133053.239977000
print(pd.to_datetime(sample_time, unit='s'))
df.time = pd.to_datetime(df.time, unit='s')
df[[True if x not in [0,1,2,3, 145, 141] else False for x in df['stream']]]


2015-04-15 21:24:13.239977
Out[154]:
time len ip.src ip.dst stream tcp.src tcp.dst
450 2015-04-15 21:24:19.037992 1145 161.217.20.38 54.165.0.203 4 59565 80
451 2015-04-15 21:24:19.156989 60 54.165.0.203 161.217.20.38 4 80 59565
452 2015-04-15 21:24:19.161978 306 54.165.0.203 161.217.20.38 4 80 59565
454 2015-04-15 21:24:19.368968 60 161.217.20.38 54.165.0.203 4 59565 80
455 2015-04-15 21:24:19.394968 1414 161.217.20.38 50.19.211.103 5 59570 80
456 2015-04-15 21:24:19.398966 1414 161.217.20.38 50.19.211.103 5 59570 80
457 2015-04-15 21:24:19.398966 478 161.217.20.38 50.19.211.103 5 59570 80
458 2015-04-15 21:24:19.403970 1414 161.217.20.38 198.23.64.18 6 59566 80
459 2015-04-15 21:24:19.408975 1414 161.217.20.38 198.23.64.18 6 59566 80
460 2015-04-15 21:24:19.413964 522 161.217.20.38 198.23.64.18 6 59566 80
461 2015-04-15 21:24:19.417977 1414 161.217.20.38 50.19.211.103 7 59569 80
462 2015-04-15 21:24:19.421975 966 161.217.20.38 50.19.211.103 7 59569 80
463 2015-04-15 21:24:19.426964 1414 161.217.20.38 198.23.64.18 8 59567 80
464 2015-04-15 21:24:19.430977 918 161.217.20.38 198.23.64.18 8 59567 80
465 2015-04-15 21:24:19.435966 1050 161.217.20.38 54.165.0.203 4 59565 80
468 2015-04-15 21:24:19.449973 60 50.19.211.103 161.217.20.38 5 80 59570
469 2015-04-15 21:24:19.458975 60 198.23.64.18 161.217.20.38 6 80 59566
470 2015-04-15 21:24:19.471960 60 50.19.211.103 161.217.20.38 7 80 59569
471 2015-04-15 21:24:19.476965 477 198.23.64.18 161.217.20.38 6 80 59566
472 2015-04-15 21:24:19.499974 60 198.23.64.18 161.217.20.38 8 80 59567
473 2015-04-15 21:24:19.504963 60 50.19.211.103 161.217.20.38 5 80 59570
474 2015-04-15 21:24:19.515964 477 198.23.64.18 161.217.20.38 8 80 59567
475 2015-04-15 21:24:19.551958 60 54.165.0.203 161.217.20.38 4 80 59565
476 2015-04-15 21:24:19.576966 306 54.165.0.203 161.217.20.38 4 80 59565
477 2015-04-15 21:24:19.610960 419 50.19.211.103 161.217.20.38 7 80 59569
478 2015-04-15 21:24:19.622953 419 50.19.211.103 161.217.20.38 5 80 59570
479 2015-04-15 21:24:19.686960 60 161.217.20.38 198.23.64.18 6 59566 80
481 2015-04-15 21:24:19.726952 60 161.217.20.38 198.23.64.18 8 59567 80
483 2015-04-15 21:24:19.786946 60 161.217.20.38 54.165.0.203 4 59565 80
484 2015-04-15 21:24:19.816943 60 161.217.20.38 50.19.211.103 7 59569 80
... ... ... ... ... ... ... ...
39237 2015-04-15 21:27:23.851945 920 161.217.20.42 161.217.188.146 151 61871 8014
39258 2015-04-15 21:27:23.926938 670 161.217.188.146 161.217.20.42 151 8014 61871
39353 2015-04-15 21:27:24.282974 60 161.217.20.42 161.217.188.146 151 61871 8014
40563 2015-04-15 21:27:28.913938 60 161.217.188.146 161.217.20.42 151 8014 61871
40590 2015-04-15 21:27:29.010985 60 161.217.20.42 161.217.188.146 151 61871 8014
40608 2015-04-15 21:27:29.070995 60 161.217.20.42 161.217.188.146 151 61871 8014
40963 2015-04-15 21:27:32.158988 60 161.217.20.29 161.217.189.225 149 56993 135
40964 2015-04-15 21:27:32.159980 60 161.217.20.29 161.217.189.225 150 56994 54966
40965 2015-04-15 21:27:32.209980 60 161.217.189.225 161.217.20.29 149 135 56993
40966 2015-04-15 21:27:32.209980 60 161.217.189.225 161.217.20.29 150 54966 56994
40967 2015-04-15 21:27:32.209980 60 161.217.189.225 161.217.20.29 150 54966 56994
40968 2015-04-15 21:27:32.214985 60 161.217.20.29 161.217.189.225 149 56993 135
40969 2015-04-15 21:27:32.214985 60 161.217.20.29 161.217.189.225 150 56994 54966
40988 2015-04-15 21:27:32.494969 60 161.217.20.38 74.125.227.233 13 59385 443
41002 2015-04-15 21:27:32.580963 66 74.125.227.233 161.217.20.38 13 443 59385
41016 2015-04-15 21:27:34.067989 60 93.184.215.200 161.217.20.29 148 80 56992
41017 2015-04-15 21:27:34.072994 60 161.217.20.29 93.184.215.200 148 56992 80
41018 2015-04-15 21:27:34.072994 60 161.217.20.29 93.184.215.200 148 56992 80
41020 2015-04-15 21:27:34.115991 60 93.184.215.200 161.217.20.29 148 80 56992
41025 2015-04-15 21:27:36.106989 60 161.217.20.29 161.217.188.90 104 56961 80
41026 2015-04-15 21:27:36.110986 60 161.217.20.29 74.125.227.233 101 56958 80
41027 2015-04-15 21:27:36.110986 60 161.217.20.29 74.125.227.233 102 56959 80
41028 2015-04-15 21:27:36.110986 60 161.217.20.29 161.217.188.90 97 56954 80
41030 2015-04-15 21:27:36.152991 60 74.125.227.233 161.217.20.29 102 80 56959
41031 2015-04-15 21:27:36.152991 60 74.125.227.233 161.217.20.29 102 80 56959
41032 2015-04-15 21:27:36.157981 60 161.217.188.90 161.217.20.29 104 80 56961
41033 2015-04-15 21:27:36.157981 60 161.217.188.90 161.217.20.29 97 80 56954
41034 2015-04-15 21:27:36.157981 60 161.217.20.29 74.125.227.233 102 56959 80
41035 2015-04-15 21:27:36.157981 60 161.217.20.29 161.217.188.90 104 56961 80
41036 2015-04-15 21:27:36.162985 60 161.217.20.29 161.217.188.90 97 56954 80

18526 rows × 7 columns


In [155]:
df2 = df.head(100)

In [158]:
df.head(100).to_json(date_unit='us')


Out[158]:
'{"time":{"0":1429133053239977,"1":1429133053245974,"2":1429133053250977,"3":1429133053254975,"4":1429133053254975,"5":1429133053260971,"6":1429133053265977,"7":1429133053265977,"8":1429133053269975,"9":1429133053274979,"10":1429133053274979,"11":1429133053279984,"12":1429133053284973,"13":1429133053288971,"14":1429133053297973,"15":1429133053297973,"16":1429133053297973,"17":1429133053302978,"18":1429133053306975,"19":1429133053311980,"20":1429133053315977,"21":1429133053320982,"22":1429133053320982,"23":1429133053327970,"24":1429133053331968,"25":1429133053335981,"26":1429133053335981,"27":1429133053340970,"28":1429133053344968,"29":1429133053344968,"30":1429133053348980,"31":1429133053353970,"32":1429133053357967,"33":1429133053362972,"34":1429133053366970,"35":1429133053366970,"36":1429133053366970,"37":1429133053373973,"38":1429133053377971,"39":1429133053377971,"40":1429133053382975,"41":1429133053387965,"42":1429133053391978,"43":1429133053396967,"44":1429133053400964,"45":1429133053404977,"46":1429133053409967,"47":1429133053409967,"48":1429133053415963,"49":1429133053420968,"50":1429133053424965,"51":1429133053429970,"52":1429133053433968,"53":1429133053438972,"54":1429133053442970,"55":1429133053442970,"56":1429133053446967,"57":1429133053455970,"58":1429133053460974,"59":1429133053464972,"60":1429133053469961,"61":1429133053473974,"62":1429133053478963,"63":1429133053478963,"64":1429133053484960,"65":1429133053488973,"66":1429133053492970,"67":1429133053492970,"68":1429133053497960,"69":1429133053501972,"70":1429133053501972,"71":1429133053509968,"72":1429133053515964,"73":1429133053519962,"74":1429133053525958,"75":1429133053529970,"76":1429133053534960,"77":1429133053534960,"78":1429133053538958,"79":1429133053543962,"80":1429133053543962,"81":1429133053547960,"82":1429133053551958,"83":1429133053551958,"84":1429133053557969,"85":1429133053566956,"86":1429133053575958,"87":1429133053575958,"88":1429133053579956,"89":1429133053579956,"90":1429133053584961,"91":1429133053588958,"92":1429133053588958,"93":1429133053588958,"94":1429133053598968,"95":1429133053602965,"96":1429133053606963,"97":1429133053612959,"98":1429133053616957,"99":1429133053616957},"len":{"0":60,"1":1414,"2":60,"3":1414,"4":1414,"5":1414,"6":60,"7":1414,"8":1414,"9":1414,"10":60,"11":1414,"12":1414,"13":60,"14":1414,"15":1414,"16":60,"17":1414,"18":1414,"19":60,"20":1414,"21":1414,"22":60,"23":1414,"24":1414,"25":60,"26":1414,"27":1414,"28":1414,"29":60,"30":1414,"31":60,"32":1414,"33":1414,"34":60,"35":1414,"36":60,"37":1414,"38":1414,"39":60,"40":1414,"41":60,"42":1414,"43":1414,"44":1414,"45":60,"46":1414,"47":1414,"48":60,"49":1414,"50":1414,"51":1414,"52":1414,"53":60,"54":1414,"55":1414,"56":60,"57":60,"58":1414,"59":1414,"60":1414,"61":1414,"62":60,"63":1414,"64":1414,"65":60,"66":1414,"67":1414,"68":60,"69":1414,"70":1414,"71":1414,"72":1414,"73":60,"74":60,"75":1414,"76":1414,"77":1414,"78":60,"79":1414,"80":1414,"81":60,"82":1414,"83":1414,"84":60,"85":1414,"86":1414,"87":1414,"88":60,"89":1414,"90":1414,"91":60,"92":1414,"93":1414,"94":1414,"95":1414,"96":60,"97":1414,"98":1414,"99":60},"ip.src":{"0":"161.217.20.38","1":"54.230.5.97","2":"161.217.20.38","3":"54.230.5.97","4":"54.230.5.97","5":"54.230.5.97","6":"161.217.20.38","7":"54.230.5.97","8":"54.230.5.97","9":"54.230.5.97","10":"161.217.20.38","11":"54.230.5.97","12":"54.230.5.97","13":"161.217.20.38","14":"54.230.5.97","15":"54.230.5.97","16":"161.217.20.38","17":"54.230.5.97","18":"54.230.5.97","19":"161.217.20.38","20":"54.230.5.97","21":"54.230.5.97","22":"161.217.20.38","23":"54.230.5.97","24":"54.230.5.97","25":"161.217.20.38","26":"54.230.5.97","27":"54.230.5.97","28":"54.230.5.97","29":"161.217.20.38","30":"54.230.5.97","31":"161.217.20.38","32":"54.230.5.97","33":"54.230.5.97","34":"161.217.20.38","35":"54.230.5.97","36":"161.217.104.82","37":"54.230.5.97","38":"54.230.5.97","39":"161.217.20.38","40":"54.230.5.97","41":"161.217.20.38","42":"54.230.5.97","43":"54.230.5.97","44":"54.230.5.97","45":"161.217.20.38","46":"54.230.5.97","47":"54.230.5.97","48":"161.217.20.38","49":"54.230.5.97","50":"54.230.5.97","51":"54.230.5.97","52":"54.230.5.97","53":"161.217.20.38","54":"54.230.5.97","55":"54.230.5.97","56":"161.217.20.38","57":"161.217.20.38","58":"54.230.5.97","59":"54.230.5.97","60":"54.230.5.97","61":"54.230.5.97","62":"161.217.20.38","63":"54.230.5.97","64":"54.230.5.97","65":"161.217.20.38","66":"54.230.5.97","67":"54.230.5.97","68":"161.217.20.38","69":"54.230.5.97","70":"54.230.5.97","71":"54.230.5.97","72":"54.230.5.97","73":"161.217.20.38","74":"161.217.20.38","75":"54.230.5.97","76":"54.230.5.97","77":"54.230.5.97","78":"161.217.20.38","79":"54.230.5.97","80":"54.230.5.97","81":"161.217.20.38","82":"54.230.5.97","83":"54.230.5.97","84":"161.217.20.38","85":"54.230.5.97","86":"54.230.5.97","87":"54.230.5.97","88":"161.217.20.38","89":"54.230.5.97","90":"54.230.5.97","91":"161.217.20.38","92":"54.230.5.97","93":"54.230.5.97","94":"54.230.5.97","95":"54.230.5.97","96":"161.217.20.38","97":"54.230.5.97","98":"54.230.5.97","99":"161.217.20.38"},"ip.dst":{"0":"54.230.5.97","1":"161.217.20.38","2":"54.230.5.97","3":"161.217.20.38","4":"161.217.20.38","5":"161.217.20.38","6":"54.230.5.97","7":"161.217.20.38","8":"161.217.20.38","9":"161.217.20.38","10":"54.230.5.97","11":"161.217.20.38","12":"161.217.20.38","13":"54.230.5.97","14":"161.217.20.38","15":"161.217.20.38","16":"54.230.5.97","17":"161.217.20.38","18":"161.217.20.38","19":"54.230.5.97","20":"161.217.20.38","21":"161.217.20.38","22":"54.230.5.97","23":"161.217.20.38","24":"161.217.20.38","25":"54.230.5.97","26":"161.217.20.38","27":"161.217.20.38","28":"161.217.20.38","29":"54.230.5.97","30":"161.217.20.38","31":"54.230.5.97","32":"161.217.20.38","33":"161.217.20.38","34":"54.230.5.97","35":"161.217.20.38","36":"161.217.20.25","37":"161.217.20.38","38":"161.217.20.38","39":"54.230.5.97","40":"161.217.20.38","41":"54.230.5.97","42":"161.217.20.38","43":"161.217.20.38","44":"161.217.20.38","45":"54.230.5.97","46":"161.217.20.38","47":"161.217.20.38","48":"54.230.5.97","49":"161.217.20.38","50":"161.217.20.38","51":"161.217.20.38","52":"161.217.20.38","53":"54.230.5.97","54":"161.217.20.38","55":"161.217.20.38","56":"54.230.5.97","57":"54.230.5.97","58":"161.217.20.38","59":"161.217.20.38","60":"161.217.20.38","61":"161.217.20.38","62":"54.230.5.97","63":"161.217.20.38","64":"161.217.20.38","65":"54.230.5.97","66":"161.217.20.38","67":"161.217.20.38","68":"54.230.5.97","69":"161.217.20.38","70":"161.217.20.38","71":"161.217.20.38","72":"161.217.20.38","73":"54.230.5.97","74":"54.230.5.97","75":"161.217.20.38","76":"161.217.20.38","77":"161.217.20.38","78":"54.230.5.97","79":"161.217.20.38","80":"161.217.20.38","81":"54.230.5.97","82":"161.217.20.38","83":"161.217.20.38","84":"54.230.5.97","85":"161.217.20.38","86":"161.217.20.38","87":"161.217.20.38","88":"54.230.5.97","89":"161.217.20.38","90":"161.217.20.38","91":"54.230.5.97","92":"161.217.20.38","93":"161.217.20.38","94":"161.217.20.38","95":"161.217.20.38","96":"54.230.5.97","97":"161.217.20.38","98":"161.217.20.38","99":"54.230.5.97"},"stream":{"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0,"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":0,"19":0,"20":0,"21":0,"22":0,"23":0,"24":0,"25":0,"26":0,"27":0,"28":0,"29":0,"30":0,"31":0,"32":0,"33":0,"34":0,"35":0,"36":1,"37":0,"38":0,"39":0,"40":0,"41":0,"42":0,"43":0,"44":0,"45":0,"46":0,"47":0,"48":0,"49":0,"50":0,"51":0,"52":0,"53":0,"54":0,"55":0,"56":0,"57":0,"58":0,"59":0,"60":0,"61":0,"62":0,"63":0,"64":0,"65":0,"66":0,"67":0,"68":0,"69":0,"70":0,"71":0,"72":0,"73":0,"74":0,"75":0,"76":0,"77":0,"78":0,"79":0,"80":0,"81":0,"82":0,"83":0,"84":0,"85":0,"86":0,"87":0,"88":0,"89":0,"90":0,"91":0,"92":0,"93":0,"94":0,"95":0,"96":0,"97":0,"98":0,"99":0},"tcp.src":{"0":59356,"1":80,"2":59356,"3":80,"4":80,"5":80,"6":59356,"7":80,"8":80,"9":80,"10":59356,"11":80,"12":80,"13":59356,"14":80,"15":80,"16":59356,"17":80,"18":80,"19":59356,"20":80,"21":80,"22":59356,"23":80,"24":80,"25":59356,"26":80,"27":80,"28":80,"29":59356,"30":80,"31":59356,"32":80,"33":80,"34":59356,"35":80,"36":50652,"37":80,"38":80,"39":59356,"40":80,"41":59356,"42":80,"43":80,"44":80,"45":59356,"46":80,"47":80,"48":59356,"49":80,"50":80,"51":80,"52":80,"53":59356,"54":80,"55":80,"56":59356,"57":59356,"58":80,"59":80,"60":80,"61":80,"62":59356,"63":80,"64":80,"65":59356,"66":80,"67":80,"68":59356,"69":80,"70":80,"71":80,"72":80,"73":59356,"74":59356,"75":80,"76":80,"77":80,"78":59356,"79":80,"80":80,"81":59356,"82":80,"83":80,"84":59356,"85":80,"86":80,"87":80,"88":59356,"89":80,"90":80,"91":59356,"92":80,"93":80,"94":80,"95":80,"96":59356,"97":80,"98":80,"99":59356},"tcp.dst":{"0":80,"1":59356,"2":80,"3":59356,"4":59356,"5":59356,"6":80,"7":59356,"8":59356,"9":59356,"10":80,"11":59356,"12":59356,"13":80,"14":59356,"15":59356,"16":80,"17":59356,"18":59356,"19":80,"20":59356,"21":59356,"22":80,"23":59356,"24":59356,"25":80,"26":59356,"27":59356,"28":59356,"29":80,"30":59356,"31":80,"32":59356,"33":59356,"34":80,"35":59356,"36":3389,"37":59356,"38":59356,"39":80,"40":59356,"41":80,"42":59356,"43":59356,"44":59356,"45":80,"46":59356,"47":59356,"48":80,"49":59356,"50":59356,"51":59356,"52":59356,"53":80,"54":59356,"55":59356,"56":80,"57":80,"58":59356,"59":59356,"60":59356,"61":59356,"62":80,"63":59356,"64":59356,"65":80,"66":59356,"67":59356,"68":80,"69":59356,"70":59356,"71":59356,"72":59356,"73":80,"74":80,"75":59356,"76":59356,"77":59356,"78":80,"79":59356,"80":59356,"81":80,"82":59356,"83":59356,"84":80,"85":59356,"86":59356,"87":59356,"88":80,"89":59356,"90":59356,"91":80,"92":59356,"93":59356,"94":59356,"95":59356,"96":80,"97":59356,"98":59356,"99":80}}'

Then we re-sample the timeseries into buckets of 1 second, summing over the lengths of all frames that were captured in that second:


In [161]:
df[df.stream == 1]


Out[161]:
time len ip.src ip.dst stream tcp.src tcp.dst
36 2015-04-15 21:24:13.366970 60 161.217.104.82 161.217.20.25 1 50652 3389
113 2015-04-15 21:24:13.674952 171 161.217.20.25 161.217.104.82 1 3389 50652
182 2015-04-15 21:24:13.929944 60 161.217.104.82 161.217.20.25 1 50652 3389
235 2015-04-15 21:24:14.126992 251 161.217.20.25 161.217.104.82 1 3389 50652
305 2015-04-15 21:24:14.383967 60 161.217.104.82 161.217.20.25 1 50652 3389
374 2015-04-15 21:24:14.687952 107 161.217.20.25 161.217.104.82 1 3389 50652
397 2015-04-15 21:24:14.875946 304 161.217.20.25 161.217.104.82 1 3389 50652
398 2015-04-15 21:24:14.925946 66 161.217.104.82 161.217.20.25 1 50652 3389
399 2015-04-15 21:24:15.140983 267 161.217.20.25 161.217.104.82 1 3389 50652
400 2015-04-15 21:24:15.390970 60 161.217.104.82 161.217.20.25 1 50652 3389
401 2015-04-15 21:24:15.701959 171 161.217.20.25 161.217.104.82 1 3389 50652
406 2015-04-15 21:24:15.950939 60 161.217.104.82 161.217.20.25 1 50652 3389
410 2015-04-15 21:24:16.154990 251 161.217.20.25 161.217.104.82 1 3389 50652
411 2015-04-15 21:24:16.404977 60 161.217.104.82 161.217.20.25 1 50652 3389
412 2015-04-15 21:24:16.715951 123 161.217.20.25 161.217.104.82 1 3389 50652
413 2015-04-15 21:24:16.963939 60 161.217.104.82 161.217.20.25 1 50652 3389
415 2015-04-15 21:24:17.168982 251 161.217.20.25 161.217.104.82 1 3389 50652
438 2015-04-15 21:24:17.419976 60 161.217.104.82 161.217.20.25 1 50652 3389
439 2015-04-15 21:24:17.583969 139 161.217.104.82 161.217.20.25 1 50652 3389
440 2015-04-15 21:24:17.587967 123 161.217.20.25 161.217.104.82 1 3389 50652
441 2015-04-15 21:24:17.588958 123 161.217.20.25 161.217.104.82 1 3389 50652
442 2015-04-15 21:24:17.592956 123 161.217.20.25 161.217.104.82 1 3389 50652
443 2015-04-15 21:24:17.636960 60 161.217.104.82 161.217.20.25 1 50652 3389
444 2015-04-15 21:24:17.729957 139 161.217.20.25 161.217.104.82 1 3389 50652
445 2015-04-15 21:24:17.983942 60 161.217.104.82 161.217.20.25 1 50652 3389
446 2015-04-15 21:24:18.182989 267 161.217.20.25 161.217.104.82 1 3389 50652
447 2015-04-15 21:24:18.432976 60 161.217.104.82 161.217.20.25 1 50652 3389
448 2015-04-15 21:24:18.743949 139 161.217.20.25 161.217.104.82 1 3389 50652
449 2015-04-15 21:24:18.989939 60 161.217.104.82 161.217.20.25 1 50652 3389
453 2015-04-15 21:24:19.196980 299 161.217.20.25 161.217.104.82 1 3389 50652
... ... ... ... ... ... ... ...
40991 2015-04-15 21:27:32.506962 123 161.217.20.25 161.217.104.82 1 3389 50652
40992 2015-04-15 21:27:32.506962 123 161.217.20.25 161.217.104.82 1 3389 50652
40993 2015-04-15 21:27:32.507969 123 161.217.20.25 161.217.104.82 1 3389 50652
40994 2015-04-15 21:27:32.507969 123 161.217.20.25 161.217.104.82 1 3389 50652
40995 2015-04-15 21:27:32.511966 123 161.217.20.25 161.217.104.82 1 3389 50652
40996 2015-04-15 21:27:32.554963 60 161.217.104.82 161.217.20.25 1 50652 3389
40997 2015-04-15 21:27:32.558961 60 161.217.104.82 161.217.20.25 1 50652 3389
40998 2015-04-15 21:27:32.558961 123 161.217.20.25 161.217.104.82 1 3389 50652
40999 2015-04-15 21:27:32.558961 123 161.217.20.25 161.217.104.82 1 3389 50652
41000 2015-04-15 21:27:32.563966 123 161.217.20.25 161.217.104.82 1 3389 50652
41001 2015-04-15 21:27:32.563966 123 161.217.20.25 161.217.104.82 1 3389 50652
41003 2015-04-15 21:27:32.608962 60 161.217.104.82 161.217.20.25 1 50652 3389
41004 2015-04-15 21:27:32.612959 123 161.217.20.25 161.217.104.82 1 3389 50652
41005 2015-04-15 21:27:32.612959 123 161.217.20.25 161.217.104.82 1 3389 50652
41006 2015-04-15 21:27:32.612959 123 161.217.20.25 161.217.104.82 1 3389 50652
41007 2015-04-15 21:27:32.612959 60 161.217.104.82 161.217.20.25 1 50652 3389
41008 2015-04-15 21:27:32.617964 123 161.217.20.25 161.217.104.82 1 3389 50652
41009 2015-04-15 21:27:32.617964 123 161.217.20.25 161.217.104.82 1 3389 50652
41010 2015-04-15 21:27:32.659954 60 161.217.104.82 161.217.20.25 1 50652 3389
41011 2015-04-15 21:27:32.664958 60 161.217.104.82 161.217.20.25 1 50652 3389
41012 2015-04-15 21:27:33.021986 379 161.217.20.25 161.217.104.82 1 3389 50652
41013 2015-04-15 21:27:33.269975 60 161.217.104.82 161.217.20.25 1 50652 3389
41014 2015-04-15 21:27:33.943936 603 161.217.20.25 161.217.104.82 1 3389 50652
41015 2015-04-15 21:27:34.041990 123 161.217.20.25 161.217.104.82 1 3389 50652
41019 2015-04-15 21:27:34.090983 60 161.217.104.82 161.217.20.25 1 50652 3389
41021 2015-04-15 21:27:35.033994 331 161.217.20.25 161.217.104.82 1 3389 50652
41022 2015-04-15 21:27:35.281983 60 161.217.104.82 161.217.20.25 1 50652 3389
41023 2015-04-15 21:27:35.969935 411 161.217.20.25 161.217.104.82 1 3389 50652
41024 2015-04-15 21:27:36.068981 123 161.217.20.25 161.217.104.82 1 3389 50652
41029 2015-04-15 21:27:36.117990 60 161.217.104.82 161.217.20.25 1 50652 3389

1485 rows × 7 columns


In [ ]:
# THIS WHOLE BLOCK IS COMMENTED OUT BECAUSE I DON'T TRUST IT RIGHT NOW.  THIS IS THE OLD WAY.

# flows = framelen.groupby(('tcp.stream', 'ip.src'))
# keys = sorted(list(flows.groups.keys()), key=lambda x: x[0])

# #list_streams = []
# #for key in keys:(   # zip (iter(x),...)
# def f(x):
#     print('running one time!')
#     return pd.Series({'frame.len':x[0],'ip.src':x[1]})

# def extract_flow(flow):
#     ipdst = flow['ip.dst'][0]
#     tcpstrm = flow['tcp.stream'][0]
#     tcpsrc = flow['tcp.srcport'][0]
#     tcpdst = flow['tcp.dstport'][0]
    
#     flow_Bps = flow.resample("S", how="sum")
#     flow_filter = np.isnan(flow_Bps['tcp.dstport']) == False
#     flow_Bps.loc[flow_filter, "tcp.stream" : "tcp.dstport"] = (tcpstrm, tcpsrc, tcpdst)

#     return flow_Bps.loc[flow_filter]
# flow_list = []
# for key in keys:
#     flow_list.append(extract_flow(flows.get_group(key)))

    
# pprint(flow_list[0].head(2))

# #stream_df = pd.DataFrame.from_records(stream_list)


# # stream1 = streams.get_group(keys[4])
# # extract_stream(stream1)
        
# # stream1 = streams.get_group(keys[3])
# # ostrm = stream1['tcp.stream'][0]
# # tcpsrc = stream1['tcp.srcport'][0]
# # tcpdst = stream1['tcp.dstport'][0]
# # ipdst = stream1['ip.dst'][0]
# # stream_Bps = stream1.resample("S", how="sum")
# # stream_filter  = np.isnan(stream_Bps['tcp.dstport']) == False

# # stream_filter# is np.float64(np.nan))
# # #stream_Bps['tcp.srcport'] = 80
# # stream_Bps.loc[stream_filter, "tcp.stream" :"tcp.dstport"] = (ostrm, tcpsrc, tcpdst)
# # stream_Bps.loc[stream_filter]
# # # #help(streams)
# # # #stream1

In [ ]:
bytes_per_second=framelen.resample("S", how="sum")
help(framelen.resample)

Here are the first 5 rows. We get NaN for those timestamps where no frames were captured:


In [ ]:
bytes_per_second.sort('tcp.stream')

In [ ]:
framelen.sort('tcp.stream', inplace=False).dropna()

In [ ]:
#bytes_per_second.groupby("tcp.stream")["frame.len"].sum().sort('tcp.len',ascending=False,inplace=False).head(10)
#bytes_per_second.groupby('tcp.stream')['frame.len'].sum()

In [ ]:
plt = (bytes_per_second.groupby('tcp.stream')).plot()
ylabel('kbps')
xlabel('Time')
axhline(linewidth=2, color='r', y=2048)
time_zero = bytes_per_second.index[0]
annotate("2048 kbps",xy=(time_zero,2048), xycoords='data', xytext=(-30,30), textcoords='offset points', size=10,
        bbox=dict(boxstyle="round", fc="0.8"),
        arrowprops=dict(arrowstyle="simple"))

#plt.set_xlim(-1,100)

TCP Time-Sequence Graph

Let's try to replicate the TCP Time-Sequence Graph that is known from Wireshark (Statistics > TCP Stream Analysis > Time-Sequence Graph (Stevens).


In [ ]:
filters = []
fields=["tcp.stream", "ip.src", "ip.dst", "tcp.seq", "tcp.ack", "tcp.window_size", "tcp.len"]
#filters=["ip.addr eq 161.217.20.5"]
ts=read_pcap(pcap_file, fields, display_filter = filters, timeseries=True, strict=True)
ts

Now we have to select a TCP stream to analyse. As an example, we just pick stream number 10:


In [ ]:
stream=ts[ts["tcp.stream"] == 0]

In [ ]:
stream

Pandas only print the overview because the table is to wide. So we force a display:


In [ ]:
print(stream.to_string())

Add a column that shows who sent the packet (client or server).

The fancy lambda expression is a function that distinguishes between the client and the server side of the stream by comparing the source IP address with the source IP address of the first packet in the stream (for TCP steams that should have been sent by the client).


In [ ]:
stream["type"] = stream.apply(lambda x: "client" if x["ip.src"] == stream.irow(0)["ip.src"] else "server", axis=1)

In [ ]:
print(stream.to_string())

In [ ]:
client_stream=stream[stream.type == "client"]

In [ ]:
client_stream["tcp.seq"].plot(style="r-o")

Notice that the x-axis shows the real timestamps.

For comparison, change the x-axis to be the packet number in the stream:


In [ ]:
client_stream.index = arange(len(client_stream))
client_stream["tcp.seq"].plot(style="r-o")

Looks different of course.

Bytes per stream


In [ ]:
def most_bytes_per_stream(df):
    return (df.groupby("tcp.stream"))["tcp.len"].sum().sort('tcp.len',ascending=False,inplace=False).head(10)

bytes_per_stream = most_bytes_per_stream(ts)
print(bytes_per_stream.index)
df_filter = ts['tcp.stream'].isin(bytes_per_stream.index)#[row in bytes_per_stream.index for row in ts['tcp.stream']]
streams = ts[df_filter]
streams.pivot(index=streams.index, columns='tcp.stream', values='tcp.seq')
#df[str(df.index) in str(bytes_per_stream.index)]
#bytes_per_stream.sort('tcp.len', inplace=False,ascending=False).head(5)

In [ ]:
per_stream=ts.groupby("tcp.stream")
per_stream.head()

In [ ]:
bytes_per_stream = per_stream["tcp.len"].sum()
bytes_per_stream.head()

In [ ]:
bytes_per_stream.plot(kind='bar')

In [ ]:
bytes_per_stream.max()

In [ ]:
biggest_stream=bytes_per_stream.idxmax()
biggest_stream

In [ ]:
bytes_per_stream.ix[biggest_stream]