The microdata from ANAC (National Agency for Civil Aviation) contains all flights that departure from brazilian airports. The flights are grouped by year-month, from January, 2000 to October, 2017 (last access on November, 2017).
According to this local publication, in Brazil, it is estimated that 20% of the 700K flights/year have some kind of delay: http://infograficos.oglobo.globo.com/economia/raio-x-dos-atrasos-dos-voos.html
In [34]:
# Pandas, Numpy, Matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
## plt.style.use('ggplot')
In [24]:
# Flights (aug, 2017)
flightsAug17 = 'data/flightAug2017.csv'
flights = pd.read_csv(flightsAug17, sep = ';', parse_dates = True)
In [25]:
flights.describe()
Out[25]:
In [26]:
flights.head(5)
Out[26]:
In [27]:
flights[flights['DelayMotivationCode'] >'NaN'].describe()
Out[27]:
In [28]:
flights[flights['DelayMotivationCode'] >'NaN'].head(5)
Out[28]:
In [29]:
# Total of delay per motivation code
flights['DelayMotivationCode'].value_counts()
Out[29]:
In [ ]: