In [2]:
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
matplotlib.pyplot.style.use = 'ggplot'
First you're going to want to create a data frame from the dailybots.csv file which can be found in the data directory. You should be able to do this with the pd.read_csv() function. Take a minute to look at the dataframe because we are going to be using it for this entire worksheet.
In [3]:
data = pd.read_csv( '../data/dailybots.csv' )
Count the number of infected days for "Ramnit" in each industry industry. How:
groupby() function
In [ ]:
In this exercise, you are asked to calculate the min, max, median and mean of infected orgs for each bot family sorted by median. HINT:
groupby() function, create a grouped data frame
In [ ]:
In [ ]:
In this exercise you're going to plot the daily infected hosts for three infection types. In order to do this, you'll need to do the following steps:
groupby() to aggregate the data by date and family, then sum up the hosts in each groupunstack() function to prepare the data for plotting.
In [ ]:
Hint: try a box plot and/or violin plot. In order to do this, there are two steps:
.boxplot() method to plot the data. This has grouping built in, so you don't have to group by first.
In [ ]: