Q4

Here, we'll do some basic anomaly detection!

A

Anomaly detection is a huge area of data science and cybersecurity. Even on a single computer, there are hundreds of little programs running simultaneously, all generating log files that record their behavior. Parsing these log files is tricky by itself, but detecting when a program may be misbehaving from its logs can be very challenging; what's the threshold at which behavior goes from normal to malicious?

In this first part, you'll write code that flags certain sequences of numbers. Your log file will be a list of 1s and 0s. If you find that four or more 1s occur sequentially, this is considered "suspicious" and you should flag this activity by saving its starting index in the log file in the flag_indices list.

For example, if the input log file is [1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1], you should return [2, 8].


In [ ]:
def count_flags(logs):
    flag_indices = []
    
    ### BEGIN SOLUTION
    
    ### END SOLUTION
    
    return flag_indices

In [ ]:
import numpy as np

np.random.seed(583945)
l1 = np.random.randint(2, size = 1000).tolist()
a1 = set([39,87,96,132,137,169,174,185,235,257,269,292, 323, 472, 564, 583, 610, 628, 653, 695, 735, 783, 808, 865, 872, 880, 905,933,957,963,990])
assert set(count_flags(l1)) == a1

np.random.seed(49854)
l2 = np.random.randint(2, size = 1000).tolist()
a2 = set([61, 74, 90, 117, 124, 132, 151, 163, 179, 198, 229, 265, 297, 302, 354, 420, 479, 546, 582, 597, 632, 694, 778, 791, 923])
assert set(count_flags(l2)) == a2

B

On average, how many consecutive 0s precede a flagged sequence of suspicious 1s? Finish the code below to compute this average for a given log file and its flagged indices, and store your value in avg_zeros.


In [ ]:
def preceding_zeros(logs, flags):
    avg_zeros = 0.0
    
    ### BEGIN SOLUTION
    
    ### END SOLUTION
    
    return avg_zeros

In [ ]:
import numpy as np

np.random.seed(8959384)
l1 = np.random.randint(2, size = 1000).tolist()
f1 = [25, 86, 104, 157, 180, 215, 259, 321, 346, 430, 518, 523, 537, 636, 657, 678, 687, 714, 771, 796, 820, 828, 850, 894, 902, 926, 954, 959]
a1 = 2.357143
np.testing.assert_allclose(preceding_zeros(l1, f1), a1)

np.random.seed(94721)
l2 = np.random.randint(2, size = 1000).tolist()
f2 = [0, 13, 28, 48, 53, 72, 78, 102, 125, 132, 139, 155, 166, 206, 229, 319, 391, 418, 463, 532, 566, 574, 636, 661, 697, 732, 785, 830, 863, 912, 944, 980]
a2 = 1.875000
np.testing.assert_allclose(preceding_zeros(l2, f2), a2)