IS360 Project 1 - Data Analysis of Formed Flights Database

  • Before I get started I need to import the tools and data into my Environment

I begin by importing pandas, DataFrame object explicitly because its frequently used. Also I have explicitly imported pyplot from matplotlib to visualize the data use the plt conventional name.


In [20]:
import pandas as pd

In [21]:
from pandas import DataFrame

In [22]:
import matplotlib.pyplot as plt

In [23]:
data = pd.read_csv('project_1.csv')

Now that the data is accessible, it will need to be sliced so that I can create a manageable data object in order to further analyze the data in the csv file.


In [24]:
status = [n for n in data['status']]

In [25]:
airline = [n for n in data['airline']]

In [26]:
la = [n for n in data['LosAngeles']]

In [27]:
phx = [n for n in data['Phoenix']]

In [28]:
sandg = [n for n in data['SanDiego']]

In [29]:
sanfrn = [n for n in data['SanFrancisco']]

In [30]:
seatl = [n for n in data['Seattle']]

I will use the zip function to create a tuple set of the above sliced data list, to create a DataFrame object I can work with.


In [31]:
flight_status = zip(status, la, phx, sandg, sanfrn, seatl)

In [32]:
flight_status


Out[32]:
[('ontime', 497, 221, 212, 503, 1841),
 ('delayed', 62, 12, 20, 102, 305),
 ('ontime', 694, 4840, 383, 320, 201),
 ('delayed', 117, 415, 65, 129, 61),
 ('ontime', 513, 140, 243, 564, 1002),
 ('delayed', 78, 34, 39, 72, 167),
 ('ontime', 813, 3411, 312, 467, 206),
 ('delayed', 72, 301, 12, 90, 9)]

In [33]:
flight_df = DataFrame(data = flight_status, columns = ['status', 'la','phx','sandiego','sanfrancisco','seattle'],
              index = airline)

In [34]:
flight_df


Out[34]:
status la phx sandiego sanfrancisco seattle
ALASKA ontime 497 221 212 503 1841
ALASKA delayed 62 12 20 102 305
AMWEST ontime 694 4840 383 320 201
AMWEST delayed 117 415 65 129 61
United Airlines ontime 513 140 243 564 1002
United Airlines delayed 78 34 39 72 167
American Airlines ontime 813 3411 312 467 206
American Airlines delayed 72 301 12 90 9

In [35]:
flight_df.plot(kind='bar', title='Flights Database Visualization')


Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe8442e7750>

The bar graph above allows me to visualize which cities have the most flights, ontime+delayed each by airline and diferentiated by colored cities.

AMWEST and American Airlines both have the highest number of flights in phoenix, followed by ALASKA and United Airlines in flights on/to Seatle. We would like to see how this relates to ontime and delayed, since its difficult to visualize in this bar graph due to how the data is being presented.

I want to see how the ontime flights, compare with delayed flights, so i will plot a graph indexed by airlines and plot it to display status times in comparison.


In [105]:
flight_df.loc['AMWEST', 'phx'].plot(kind='bar', title='Phoenix AMWEST on-time vs delayed')


Out[105]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe83e5e9c10>

AMWEST does relatively well in Phoenix for the number of ontime flights, compared to delayed flights.

American Airlines does well also in Phoenix although the number of flights are higher in relation to the ontime flights when compared with AMWEST although when looking at the numbers closer, they do comparatively the same ranging about 100 delays per 1000 ontime flights.


In [114]:
flight_df.loc['ALASKA', 'seattle'].plot(kind='bar', title='ALASKA Seattle on-time vs delayed')


Out[114]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe83fc22f90>

The same argument could be made in this observation here, where ALASKA does well over all other flights ontime for Seatle, but ranges close to 400 delays for about nearly 2000 ontime flights.


In [116]:
flight_df.loc['United Airlines', 'seattle'].plot(kind='bar', title='Seatle U. Airlines on-time vs delayed')


Out[116]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe8413af6d0>


In [119]:
flight_df.loc['American Airlines', 'phx'].plot(kind='bar', title='Phoenix American Airlines on-time vs delayed')


Out[119]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe837432090>

United Airlines does realtively less ontime flights in Seatle than ALASKA, but the delayes a bit lower, the same observation when looking at the actual numbers can be made, that about 150 delayed flights per 1000 ontime flights in comparison between United Airlines and ALASKA.