Anomaly detection is a huge area of data science and cybersecurity. Even on a single computer, there are hundreds of little programs running simultaneously, all generating log files that record their behavior. Parsing these log files is tricky by itself, but detecting when a program may be misbehaving from its logs can be very challenging; what's the threshold at which behavior goes from normal to malicious?
In this first part, you'll write code that flags certain sequences of numbers. Your log file will be a list of 1s and 0s. If you find that four or more 1s occur sequentially, this is considered "suspicious" and you should flag this activity by saving its starting index in the log file in the flag_indices list.
For example, if the input log file is [1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1], you should return [2, 8].
In [ ]:
def count_flags(logs):
flag_indices = []
### BEGIN SOLUTION
### END SOLUTION
return flag_indices
In [ ]:
import numpy as np
np.random.seed(583945)
l1 = np.random.randint(2, size = 1000).tolist()
a1 = set([39,87,96,132,137,169,174,185,235,257,269,292, 323, 472, 564, 583, 610, 628, 653, 695, 735, 783, 808, 865, 872, 880, 905,933,957,963,990])
assert set(count_flags(l1)) == a1
np.random.seed(49854)
l2 = np.random.randint(2, size = 1000).tolist()
a2 = set([61, 74, 90, 117, 124, 132, 151, 163, 179, 198, 229, 265, 297, 302, 354, 420, 479, 546, 582, 597, 632, 694, 778, 791, 923])
assert set(count_flags(l2)) == a2
In [ ]:
def preceding_zeros(logs, flags):
avg_zeros = 0.0
### BEGIN SOLUTION
### END SOLUTION
return avg_zeros
In [ ]:
import numpy as np
np.random.seed(8959384)
l1 = np.random.randint(2, size = 1000).tolist()
f1 = [25, 86, 104, 157, 180, 215, 259, 321, 346, 430, 518, 523, 537, 636, 657, 678, 687, 714, 771, 796, 820, 828, 850, 894, 902, 926, 954, 959]
a1 = 2.357143
np.testing.assert_allclose(preceding_zeros(l1, f1), a1)
np.random.seed(94721)
l2 = np.random.randint(2, size = 1000).tolist()
f2 = [0, 13, 28, 48, 53, 72, 78, 102, 125, 132, 139, 155, 166, 206, 229, 319, 391, 418, 463, 532, 566, 574, 636, 661, 697, 732, 785, 830, 863, 912, 944, 980]
a2 = 1.875000
np.testing.assert_allclose(preceding_zeros(l2, f2), a2)