Apache Spot's Ipython Advanced Mode

Netflows

This guide provides examples about how to request data, show data with some cool libraries like pandas and more.

Import Libraries

The next cell will import the necessary libraries to execute the functions. Do not remove


In [ ]:
import datetime
import pandas as pd
import numpy as np
import linecache, bisect
import os

spath = os.getcwd()
path = spath.split("/")
date = path[len(path)-1]

Request Data

In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).

We provide the function to make a data request, all you need is a query and variables


In [ ]:
def makeGraphqlRequest(query, variables):
    return GraphQLClient.request(query, variables)

Now that we have a function, we can run a query like this:

*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path


In [ ]:
suspicious_query = """query($date:SpotDateType) {
                            flow {
                              suspicious(date:$date)
                                  {
                                      srcIp
                                      dstIp
                                      srcPort
                                      dstPort
                                      score
                                      srcIp_domain
                                      dstIp_rep
                                      protocol
                                      outBytes
                                      inPkts
                                      srcIp_rep
                                      inBytes
                                      srcIp_isInternal  
                                      rank 
                                      dstIp_geoloc
                                      tstart
                                      outPkts  
                                      dstIp_isInternal
                                      dstIp_domain
                                  }
                            }
                    }"""

##If you want to use a different date for your query, switch the 
##commented/uncommented following lines

variables={
    'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
#     'date': "2016-10-08"
    }
 
suspicious_request = makeGraphqlRequest(suspicious_query,variables)

##The variable suspicious_request will contain the resulting data from the query.
results = suspicious_request['data']['flow']['suspicious']

Pandas Dataframes

The following cell loads the results into a pandas dataframe

For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html


In [ ]:
df = pd.read_json(json.dumps(results))
##Printing only the selected column list from the dataframe
##By default it will only print the first 15 results
print df[['srcIp','dstIp','srcPort','dstPort','score']]

Additional operations

Additional operations can be performed on the dataframe like sorting the data, filtering it and grouping it

Filtering the data


In [ ]:
##Filter results where the destination port = 3389
##The resulting data will be stored in df2 

df2 = df[df['dstPort'].isin(['3389'])]
print df2[['srcIp','dstIp','srcPort','dstPort','score']]

Ordering the data


In [ ]:
srtd = df.sort_values(by="rank")
print srtd[['rank','srcIp','dstIp','srcPort','dstPort','score']]

Grouping the data


In [ ]:
## This command will group the results by pairs of source-destination IP
## summarizing all other columns 
grpd = df.groupby(['srcIp','dstIp']).sum()
## This will print the resulting dataframe displaying the input and output bytes columnns
print grpd[["inBytes","inPkts"]]

Reset Scored Connections

Uncomment and execute the following cell to reset all scored connections for this day


In [ ]:
# reset_scores = """mutation($date:SpotDateType!) {
#                   flow{
#                       resetScoredConnections(date:$date){
#                       success
#                       }
#                   }
#               }"""


# variables={
#     'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
#     }
 
# request = makeGraphqlRequest(reset_scores,variables)

# print request['data']['flow']['resetScoredConnections ']['success']

Sandbox

At this point you can perform your own analysis using the previously provided functions as a guide.

Happy threat hunting!


In [ ]:
#Your code here