Import Libraries
The next cell will import the necessary libraries to execute the functions. Do not remove
In [ ]:
import datetime
import pandas as pd
import numpy as np
import linecache, bisect
import os
spath = os.getcwd()
path = spath.split("/")
date = path[len(path)-1]
Request Data
In order to request data we are using Graphql (a query language for APIs, more info at: http://graphql.org/).
We provide the function to make a data request, all you need is a query and variables
In [ ]:
def makeGraphqlRequest(query, variables):
return GraphQLClient.request(query, variables)
Now that we have a function, we can run a query like this:
*Note: There's no need to manually set the date for the query, by default the code will read the date from the current path
In [ ]:
suspicious_query = """query($date:SpotDateType) {
dns {
suspicious(date:$date)
{ clientIp
clientIpSev
dnsQuery
dnsQueryClass
dnsQueryClassLabel
dnsQueryRcode
dnsQueryRcodeLabel
dnsQueryRep
dnsQuerySev
dnsQueryType
dnsQueryTypeLabel
frameLength
frameTime
networkContext
score
tld
unixTimestamp
}
}
}"""
##If you want to use a different date for your query, switch the
##commented/uncommented following lines
variables={
'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# 'date': "2016-10-08"
}
suspicious_request = makeGraphqlRequest(suspicious_query,variables)
##The variable suspicious_request will contain the resulting data from the query.
results = suspicious_request['data']['dns']['suspicious']
The following cell loads the results into a pandas dataframe
For more information on how to use pandas, you can learn more here: https://pandas.pydata.org/pandas-docs/stable/10min.html
In [ ]:
df = pd.read_json(json.dumps(results))
##Printing only the selected column list from the dataframe
##Unless specified otherwise,
print df[['clientIp', 'unixTimestamp','tld', 'dnsQuery','dnsQueryRcode','dnsQueryRcodeLabel']]
In [ ]:
##Filter results where the destination port = 3389
##The resulting data will be stored in df2
df2 = df[df['tld'].isin(['sjc04-login.dotomi.com'])]
print df2[['clientIp', 'unixTimestamp','tld', 'dnsQuery','dnsQueryRcode','dnsQueryRcodeLabel']]
Ordering the data
In [ ]:
srtd = df.sort_values(by="tld")
print srtd[['clientIp', 'unixTimestamp','tld', 'dnsQuery','dnsQueryRcode','dnsQueryRcodeLabel']]
Grouping the data
In [ ]:
## This command will group the results by pairs of source-destination IP
## summarizing all other columns
grpd = df.groupby(['clientIp','tld']).count()
## This will print the resulting dataframe displaying the input and output bytes columnns
print grpd[["dnsQuery"]]
Reset Scored Connections
Uncomment and execute the following cell to reset all scored connections for this day
In [ ]:
# reset_scores = """mutation($date:SpotDateType!) {
# dns{
# resetScoredConnections(date:$date){
# success
# }
# }
# }"""
# variables={
# 'date': datetime.datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d')
# }
# request = makeGraphqlRequest(reset_scores,variables)
# print request['data']['dns']['resetScoredConnections']['success']
In [ ]:
#Your code here