notebook.community

Edit and run

To analyze a QoF-generated IPFIX file for timestamp frequencies, set the filename in the following code block, then run it:



In [ ]:

    
ipfix_file = "../test/mawi-0330-3hours.ipfix"

Run the following code to initialize required modules and define some useful functions:



In [ ]:

    
import ipfix
import qof
import pandas as pd
import numpy as np

ipfix.ie.use_iana_default() # loads IANA default IEs from module core definitions
ipfix.ie.use_5103_default() # loads reverse IEs for RFC5103 biflows
ipfix.ie.use_specfile("qof.iespec") # loads enterprise-specific IEs for QoF

ipfix.types.use_integer_ipv4() # accelerate dataframe processing of per-IP stuff

and run this to load the file into the dataframe (for biflow data):



In [ ]:

    
df = qof.dataframe_from_ipfix(ipfix_file, ("tcpTimestampFrequency", "reverseTcpTimestampFrequency",
                                           "meanTcpChirpMilliseconds", "reverseMeanTcpChirpMilliseconds",
                                           "packetDeltaCount", "reversePacketDeltaCount",
                                           "tcpSequenceLossCount", "reverseTcpSequenceLossCount"))

or this one (for uniflow data):



In [ ]:

    
df = qof.dataframe_from_ipfix(ipfix_file, ("tcpTimestampFrequency",
                                           "meanTcpChirpMilliseconds",
                                           "packetDeltaCount",
                                           "tcpSequenceLossCount",))

Classify timestamp distribution by probable frequency:



In [ ]:

    
df["ts10"] = (df["tcpTimestampFrequency"] >= 8) & (df["tcpTimestampFrequency"] <= 12)
df["ts100"] = (df["tcpTimestampFrequency"] >= 80) & (df["tcpTimestampFrequency"] <= 120)
df["ts250"] = (df["tcpTimestampFrequency"] >= 200) & (df["tcpTimestampFrequency"] <= 300)
df["ts1000"] = (df["tcpTimestampFrequency"] >= 800) & (df["tcpTimestampFrequency"] <= 1200)
df["tsvalid"] = df["ts10"] | df["ts100"] | df["ts250"] | df["ts1000"]
df["tserror"] = np.where(df["ts10"] , abs(10 - df["tcpTimestampFrequency"]) / 10, 
                 np.where(df["ts100"], abs(100 - df["tcpTimestampFrequency"]) / 100, 
                 np.where(df["ts250"], abs(250 - df["tcpTimestampFrequency"]) / 250,
                 np.where(df["ts1000"], abs(1000 - df["tcpTimestampFrequency"]) / 1000, 0))))

(and the same thing for the reverse):



In [ ]:

    
df["rts10"] = (df["tcpTimestampFrequency"] >= 8) & (df["reverseTcpTimestampFrequency"] <= 12)
df["rts100"] = (df["tcpTimestampFrequency"] >= 80) & (df["reverseTcpTimestampFrequency"] <= 120)
df["rts250"] = (df["tcpTimestampFrequency"] >= 200) & (df["reverseTcpTimestampFrequency"] <= 300)
df["rts1000"] = (df["tcpTimestampFrequency"] >= 800) & (df["reverseTcpTimestampFrequency"] <= 1200)
df["rtsvalid"] = df["rts10"] | df["rts100"] | df["rts250"] | df["rts1000"]
df["rtserror"] = np.where(df["rts10"] , abs(10 - df["reverseTcpTimestampFrequency"]) / 10, 
                 np.where(df["rts100"], abs(100 - df["reverseTcpTimestampFrequency"]) / 100, 
                 np.where(df["rts250"], abs(250 - df["reverseTcpTimestampFrequency"]) / 250,
                 np.where(df["rts1000"], abs(1000 - df["reverseTcpTimestampFrequency"]) / 1000, 0))))

A graphical display of the timestamp frequency spectrum:



In [ ]:

    
df["tcpTimestampFrequency"].hist(bins=150,range=(900,1050),weights=df["packetDeltaCount"])

and the error spectrum:



In [ ]:

    
df[df["tsvalid"]]["tserror"].hist(bins=100,weights=df[df["tsvalid"]]["packetDeltaCount"])



In [ ]:

    
df[df["tsvalid"]]["tserror"].describe()

with proportion of how many flows are valid:



In [ ]:

    
df["tsvalid"].value_counts()[True]/len(df)



In [ ]: