Quick Analysis of Hamas Rocket Attacks

During the Israel-Hamas summer of 2014 military clash (also known as operation Protective Edge), Hamas carried thousands of rocket attacks on Israeli cities and towns. Luckily, Tel Aviv, the city were I live, was relatively spared, with about one or two rocket attacks per day. This allowed me and a few others to notice a strange patterns in those attacks. It seemed that those attack, and the air-raid sirens that preceded them, were more prone to happen at the top of the hour. We came with two explanations for it:

Hamas wanted to fire many missiles at the same time (one reason to do so - to overwhelm Israel's anti-rocket system Iron Dome). Synchronizing the attacks to happen at the top of the hour was, for some reason or another, easier for Hamas militants.
Hamas hoped to attack Israel during a shift-rotation among the soldiers that operate the Iron Dome batteries, and thus making it less likely that its rocket are shot down.

We will probably never know what's the true reason for that pattern to emerge, but can we show that it exists in the first place, rather than leave it to some intuition? Thankfully, there are many services, apps and Twitter accounts that report and log every rocket attack. I pulled the data from one such Twitter account, @Alarmdarom, and below I clean it up, and show that indeed the pattern exist, and moreover, it seems like Hamas' clocks are running 4 minutes fast.



In [187]:

    
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from numpy import round, log, diff
import itertools
from collections import Counter, defaultdict
import time
import scipy
import dateutil
import re

Reading the data



In [104]:

    
def parse_timestamp(line):
    # 'Date: Sat Jul 26 04:34:09 +0000 2014\n'
    line = line[6:].strip()
    return dateutil.parser.parse(line)

def parse_alarm(line):
    # Alarm Ashkelon, Beer Ganim, Nitzan, Nitzanim 09:19:31
    
    # Alarm Gan Sorek, Netaim, Rishon Lezion, Givat Brenner, Gibbethon, Gan Shlomo, Netzer Sereni, Rehovot, Gealya,... http://t.co/IkJc3yIsyr
    line = line.strip()[6:]
    if re.search("\d{1,2}:\d{1,2}(:\d{1,2})?$", line):
        line = re.sub("\d{1,2}:\d{1,2}(:\d{1,2})?$", "", line)
    elif re.search("http://t.co/\w+$", line):
        line = re.sub("       tgithttp://t.co/\w+$", "", line)
    else:
        raise ValueError(line)
    locations = [loc.strip().lower() for loc in line.split(',') if loc.strip()]
    return locations



In [122]:

    
def read_file(filename):
    result = []
    for line in open(filename):
        if line.strip().startswith('Date:'):
            date = parse_timestamp(line)
        elif line.strip().startswith('Alarm '):
            locations = parse_alarm(line)
            result.append((date, locations))
    return pd.DataFrame(data=result, columns=['datetime', 'locations'])



In [123]:

    
alarms = read_file("/home/rouli/data/redalert.txt")



In [132]:

    
alarms['date'] = alarms.datetime.apply(lambda x:x.date())
alarms['time'] = alarms.datetime.apply(lambda x:x.time())
alarms['minute'] = alarms.time.apply(lambda x:x.minute)



In [145]:

    
alarms=alarms[alarms.date>datetime.date(2014,7,8)]



In [146]:

    
alarms.minute.hist(bins=60)









    Out[146]:





<matplotlib.axes.AxesSubplot at 0x45b6890>

Cleaning the data

Looking at the data, one can immediately see rokcet-fire "events". These consists of several consecutive rocket attacks, which can all be thought of as one inseperatable event. In order to avoid counting each event several times, in the next step I clean the data a bit, by removing any rocket attack that followed another attack within five minutes.



In [210]:

    
alarms['delta']=alarms.datetime-alarms.datetime.shift(-1)
alarms.dropna(inplace=True)



In [211]:

    
unique_alarms = alarms[alarms.delta>np.timedelta64(5,'m')]



In [212]:

    
len(alarms), len(unique_alarms)









    Out[212]:





(1628, 608)

I think the following histogram really shows that our intuition was correct - rocket attacks are much more likely to start on the 56 minutes mark. It's almost 5 times the standard deviation from the mean.



In [233]:

    
unique_alarms.minute.hist(bins=60)
plt.title('Number of rocket attacks per given minute')









    Out[233]:





<matplotlib.text.Text at 0x5b0a050>



In [214]:

    
hist = unique_alarms.minute.value_counts()



In [215]:

    
hist.mean()









    Out[215]:





10.133333333333333



In [216]:

    
hist.std()









    Out[216]:





3.8552920558834707



In [217]:

    
(hist[56]-hist.mean())/hist.std()









    Out[217]:





4.8937062077760594

Let's test for significance!



In [246]:

    
from scipy.stats import chisquare, binom_test

Chi square test gives low p-value to the null assumption that the distribution of minutes when rocket attacks happen is uniformly distributed



In [247]:

    
chisquare(hist.values)[1]









    Out[247]:





0.011254809294087394

Specifically, are the five minutes prior to the top of the hour and the five minutes following it more prone to rocket attacks? There are 608 attacks (remember, each attack can contain more than one rocket!), 126 of those are when the minute mark is at least at 55 or at most at 05. This is highly unlikely as can be seen when using a binomial test



In [248]:

    
sum(hist[55:])+sum(hist[:5])









    Out[248]:





126



In [249]:

    
sum(hist)









    Out[249]:





608



In [250]:

    
binom_test(126, 608, 1/6.0)









    Out[250]:





0.0089167383590111077



In [ ]: