Import Libraries
The next cell will import the necessary libraries to execute the functions. Do not remove
In [ ]:
import datetime
import pandas as pd
import numpy as np
import linecache, bisect
import os
spath = os.getcwd()
path = spath.split("/")
date = path[len(path)-1]
Request Data
In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).
We provide the function to make a data request, all you need is a query and variables
In [ ]:
def makeGraphqlRequest(query, variables):
return GraphQLClient.request(query, variables)
Now that we have a function, we can run a query like this:
*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path
In [ ]:
suspicious_query = """query($date:SpotDateType) {
proxy {
suspicious(date:$date)
{ clientIp
clientToServerBytes
datetime
duration
host
networkContext
referer
requestMethod
responseCode
responseCodeLabel
responseContentType
score
serverIp
serverToClientBytes
uri
uriPath
uriPort
uriQuery
uriRep
userAgent
username
webCategory
}
}
}"""
##If you want to use a different date for your query, switch the
##commented/uncommented following lines
variables={
'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# 'date': "2016-10-08"
}
suspicious_request = makeGraphqlRequest(suspicious_query,variables)
##The variable suspicious_request will contain the resulting data from the query.
results = suspicious_request['data']['proxy']['suspicious']
The following cell loads the results into a pandas dataframe
For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html
In [ ]:
df = pd.read_json(json.dumps(results))
##Printing only the selected column list from the dataframe
##Unless specified otherwise,
print df[['clientIp','uriQuery','datetime','clientToServerBytes','serverToClientBytes', 'host']]
In [ ]:
##Filter results where the destination port = 3389
##The resulting data will be stored in df2
df2 = df[df['clientIp'].isin(['10.173.202.136'])]
print df2[['clientIp','uriQuery','datetime','host']]
Ordering the data
In [ ]:
srtd = df.sort_values(by="host")
print srtd[['host','clientIp','uriQuery','datetime']]
Grouping the data
In [ ]:
## This command will group the results by pairs of source-destination IP
## summarizing all other columns
grpd = df.groupby(['clientIp','host']).sum()
## This will print the resulting dataframe displaying the input and output bytes columnns
print grpd[["clientToServerBytes","serverToClientBytes"]]
Reset Scored Connections
Uncomment and execute the following cell to reset all scored connections for this day
In [ ]:
# reset_scores = """mutation($date:SpotDateType!) {
# proxy{
# resetScoredConnections(date:$date){
# success
# }
# }
# }"""
# variables={
# 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# }
# request = makeGraphqlRequest(reset_scores,variables)
# print request['data']['proxy']['resetScoredConnections']['success']
In [ ]:
#Your code here