I begin by importing pandas, DataFrame object explicitly because its frequently used. Also I have explicitly imported pyplot from matplotlib to visualize the data use the plt conventional name.
In [20]:
import pandas as pd
In [21]:
from pandas import DataFrame
In [22]:
import matplotlib.pyplot as plt
In [23]:
data = pd.read_csv('project_1.csv')
Now that the data is accessible, it will need to be sliced so that I can create a manageable data object in order to further analyze the data in the csv file.
In [24]:
status = [n for n in data['status']]
In [25]:
airline = [n for n in data['airline']]
In [26]:
la = [n for n in data['LosAngeles']]
In [27]:
phx = [n for n in data['Phoenix']]
In [28]:
sandg = [n for n in data['SanDiego']]
In [29]:
sanfrn = [n for n in data['SanFrancisco']]
In [30]:
seatl = [n for n in data['Seattle']]
I will use the zip function to create a tuple set of the above sliced data list, to create a DataFrame object I can work with.
In [31]:
flight_status = zip(status, la, phx, sandg, sanfrn, seatl)
In [32]:
flight_status
Out[32]:
In [33]:
flight_df = DataFrame(data = flight_status, columns = ['status', 'la','phx','sandiego','sanfrancisco','seattle'],
index = airline)
In [34]:
flight_df
Out[34]:
In [35]:
flight_df.plot(kind='bar', title='Flights Database Visualization')
Out[35]:
The bar graph above allows me to visualize which cities have the most flights, ontime+delayed each by airline and diferentiated by colored cities.
AMWEST and American Airlines both have the highest number of flights in phoenix, followed by ALASKA and United Airlines in flights on/to Seatle. We would like to see how this relates to ontime and delayed, since its difficult to visualize in this bar graph due to how the data is being presented.
I want to see how the ontime flights, compare with delayed flights, so i will plot a graph indexed by airlines and plot it to display status times in comparison.
In [105]:
flight_df.loc['AMWEST', 'phx'].plot(kind='bar', title='Phoenix AMWEST on-time vs delayed')
Out[105]:
AMWEST does relatively well in Phoenix for the number of ontime flights, compared to delayed flights.
American Airlines does well also in Phoenix although the number of flights are higher in relation to the ontime flights when compared with AMWEST although when looking at the numbers closer, they do comparatively the same ranging about 100 delays per 1000 ontime flights.
In [114]:
flight_df.loc['ALASKA', 'seattle'].plot(kind='bar', title='ALASKA Seattle on-time vs delayed')
Out[114]:
The same argument could be made in this observation here, where ALASKA does well over all other flights ontime for Seatle, but ranges close to 400 delays for about nearly 2000 ontime flights.
In [116]:
flight_df.loc['United Airlines', 'seattle'].plot(kind='bar', title='Seatle U. Airlines on-time vs delayed')
Out[116]:
In [119]:
flight_df.loc['American Airlines', 'phx'].plot(kind='bar', title='Phoenix American Airlines on-time vs delayed')
Out[119]:
United Airlines does realtively less ontime flights in Seatle than ALASKA, but the delayes a bit lower, the same observation when looking at the actual numbers can be made, that about 150 delayed flights per 1000 ontime flights in comparison between United Airlines and ALASKA.