Import Libraries
The next cell will import the necessary libraries to execute the functions. Do not remove
In [ ]:
import datetime
import pandas as pd
import numpy as np
import linecache, bisect
import os
spath = os.getcwd()
path = spath.split("/")
date = path[len(path)-1]
Request Data
In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).
We provide the function to make a data request, all you need is a query and variables
In [ ]:
def makeGraphqlRequest(query, variables):
return GraphQLClient.request(query, variables)
Now that we have a function, we can run a query like this:
*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path
In [ ]:
suspicious_query = """query($date:SpotDateType) {
flow {
suspicious(date:$date)
{
srcIp
dstIp
srcPort
dstPort
score
srcIp_domain
dstIp_rep
protocol
outBytes
inPkts
srcIp_rep
inBytes
srcIp_isInternal
rank
dstIp_geoloc
tstart
outPkts
dstIp_isInternal
dstIp_domain
}
}
}"""
##If you want to use a different date for your query, switch the
##commented/uncommented following lines
variables={
'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# 'date': "2016-10-08"
}
suspicious_request = makeGraphqlRequest(suspicious_query,variables)
##The variable suspicious_request will contain the resulting data from the query.
results = suspicious_request['data']['flow']['suspicious']
The following cell loads the results into a pandas dataframe
For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html
In [ ]:
df = pd.read_json(json.dumps(results))
##Printing only the selected column list from the dataframe
##By default it will only print the first 15 results
print df[['srcIp','dstIp','srcPort','dstPort','score']]
In [ ]:
##Filter results where the destination port = 3389
##The resulting data will be stored in df2
df2 = df[df['dstPort'].isin(['3389'])]
print df2[['srcIp','dstIp','srcPort','dstPort','score']]
Ordering the data
In [ ]:
srtd = df.sort_values(by="rank")
print srtd[['rank','srcIp','dstIp','srcPort','dstPort','score']]
Grouping the data
In [ ]:
## This command will group the results by pairs of source-destination IP
## summarizing all other columns
grpd = df.groupby(['srcIp','dstIp']).sum()
## This will print the resulting dataframe displaying the input and output bytes columnns
print grpd[["inBytes","inPkts"]]
Reset Scored Connections
Uncomment and execute the following cell to reset all scored connections for this day
In [ ]:
# reset_scores = """mutation($date:SpotDateType!) {
# flow{
# resetScoredConnections(date:$date){
# success
# }
# }
# }"""
# variables={
# 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# }
# request = makeGraphqlRequest(reset_scores,variables)
# print request['data']['flow']['resetScoredConnections ']['success']
In [ ]:
#Your code here