During the Israel-Hamas summer of 2014 military clash (also known as operation Protective Edge), Hamas carried thousands of rocket attacks on Israeli cities and towns. Luckily, Tel Aviv, the city were I live, was relatively spared, with about one or two rocket attacks per day. This allowed me and a few others to notice a strange patterns in those attacks. It seemed that those attack, and the air-raid sirens that preceded them, were more prone to happen at the top of the hour. We came with two explanations for it:
We will probably never know what's the true reason for that pattern to emerge, but can we show that it exists in the first place, rather than leave it to some intuition? Thankfully, there are many services, apps and Twitter accounts that report and log every rocket attack. I pulled the data from one such Twitter account, @Alarmdarom, and below I clean it up, and show that indeed the pattern exist, and moreover, it seems like Hamas' clocks are running 4 minutes fast.
In [187]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from numpy import round, log, diff
import itertools
from collections import Counter, defaultdict
import time
import scipy
import dateutil
import re
In [104]:
def parse_timestamp(line):
# 'Date: Sat Jul 26 04:34:09 +0000 2014\n'
line = line[6:].strip()
return dateutil.parser.parse(line)
def parse_alarm(line):
# Alarm Ashkelon, Beer Ganim, Nitzan, Nitzanim 09:19:31
# Alarm Gan Sorek, Netaim, Rishon Lezion, Givat Brenner, Gibbethon, Gan Shlomo, Netzer Sereni, Rehovot, Gealya,... http://t.co/IkJc3yIsyr
line = line.strip()[6:]
if re.search("\d{1,2}:\d{1,2}(:\d{1,2})?$", line):
line = re.sub("\d{1,2}:\d{1,2}(:\d{1,2})?$", "", line)
elif re.search("http://t.co/\w+$", line):
line = re.sub(" tgithttp://t.co/\w+$", "", line)
else:
raise ValueError(line)
locations = [loc.strip().lower() for loc in line.split(',') if loc.strip()]
return locations
In [122]:
def read_file(filename):
result = []
for line in open(filename):
if line.strip().startswith('Date:'):
date = parse_timestamp(line)
elif line.strip().startswith('Alarm '):
locations = parse_alarm(line)
result.append((date, locations))
return pd.DataFrame(data=result, columns=['datetime', 'locations'])
In [123]:
alarms = read_file("/home/rouli/data/redalert.txt")
In [132]:
alarms['date'] = alarms.datetime.apply(lambda x:x.date())
alarms['time'] = alarms.datetime.apply(lambda x:x.time())
alarms['minute'] = alarms.time.apply(lambda x:x.minute)
In [145]:
alarms=alarms[alarms.date>datetime.date(2014,7,8)]
In [146]:
alarms.minute.hist(bins=60)
Out[146]:
Looking at the data, one can immediately see rokcet-fire "events". These consists of several consecutive rocket attacks, which can all be thought of as one inseperatable event. In order to avoid counting each event several times, in the next step I clean the data a bit, by removing any rocket attack that followed another attack within five minutes.
In [210]:
alarms['delta']=alarms.datetime-alarms.datetime.shift(-1)
alarms.dropna(inplace=True)
In [211]:
unique_alarms = alarms[alarms.delta>np.timedelta64(5,'m')]
In [212]:
len(alarms), len(unique_alarms)
Out[212]:
I think the following histogram really shows that our intuition was correct - rocket attacks are much more likely to start on the 56 minutes mark. It's almost 5 times the standard deviation from the mean.
In [233]:
unique_alarms.minute.hist(bins=60)
plt.title('Number of rocket attacks per given minute')
Out[233]:
In [214]:
hist = unique_alarms.minute.value_counts()
In [215]:
hist.mean()
Out[215]:
In [216]:
hist.std()
Out[216]:
In [217]:
(hist[56]-hist.mean())/hist.std()
Out[217]:
In [246]:
from scipy.stats import chisquare, binom_test
Chi square test gives low p-value to the null assumption that the distribution of minutes when rocket attacks happen is uniformly distributed
In [247]:
chisquare(hist.values)[1]
Out[247]:
Specifically, are the five minutes prior to the top of the hour and the five minutes following it more prone to rocket attacks? There are 608 attacks (remember, each attack can contain more than one rocket!), 126 of those are when the minute mark is at least at 55 or at most at 05. This is highly unlikely as can be seen when using a binomial test
In [248]:
sum(hist[55:])+sum(hist[:5])
Out[248]:
In [249]:
sum(hist)
Out[249]:
In [250]:
binom_test(126, 608, 1/6.0)
Out[250]:
In [ ]: