CASE STUDY: AIRCRAFT DATA

Lets say we're interested in the characteristics of various aircraft, for the purpose of analyzing noise levels associated therewith. We may have several reasons for improving our fluency regarding what aircraft fly in and out of the nearest airport.

Citizens for Quieter Neighbordhood

Below is a typical pandas function call. We're using Python to read an Excel spreadsheet, but maybe we don't want all the columns or have specific conversions we would like to perform.

pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)

Resources:


In [21]:
import pandas as pd
import numpy as np


URL = "https://www.faa.gov/airports/engineering/aircraft_char_database/media/FAA-Aircraft-Char-Database-v2-201810.xlsx"

df = pd.read_excel(URL, sheet_name="Aircraft Database", 
                   usecols=['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'])

In [22]:
df.head()


Out[22]:
Manufacturer Model Physical Class (Engine) # Engines
0 Acro Sport Acro Sport Piston 1
1 Acro Sport Acro Sport II Piston 1
2 Acro Sport Cougar Piston 1
3 Acro Sport Junior Ace Piston 1
4 Acro Sport Super Ace Piston 1

In [19]:
df.tail()


Out[19]:
Manufacturer Model Physical Class (Engine) # Engines
2759 Zlin Zlin Z-50 tbd tbd
2760 Zlin Zlin Z-526 Akrobat tbd tbd
2761 Zlin Zlin Z-526 Trener Master tbd tbd
2762 Zlin Zlin Z-626 tbd tbd
2763 Zlin Zlin Z-726 Universal tbd tbd

In [20]:
df.columns


Out[20]:
Index(['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'], dtype='object')

In [13]:
len(df)


Out[13]:
2764

In [14]:
df.shape


Out[14]:
(2764, 25)

In [26]:
import IPython

# Grouping by one factor
df_man = df.groupby('Manufacturer')

# Getting all methods from the groupby object:
meth = [method_name for method_name in dir(df_man)
 if callable(getattr(df_man, method_name)) & ~method_name.startswith('_')]

# Printing the result
print(IPython.utils.text.columnize(meth))


agg        corr      cumsum     get_group  mean     pct_change  sem    transform
aggregate  corrwith  describe   head       median   pipe        shift  tshift   
all        count     diff       hist       min      plot        size   var      
any        cov       expanding  idxmax     ngroup   prod        skew 
apply      cumcount  ffill      idxmin     nth      quantile    std  
backfill   cummax    fillna     last       nunique  rank        sum  
bfill      cummin    filter     mad        ohlc     resample    tail 
boxplot    cumprod   first      max        pad      rolling     take 


In [27]:
from keyword import kwlist
print(IPython.utils.text.columnize(kwlist))


False  as      break     del     finally  if      lambda    pass    while
None   assert  class     elif    for      import  nonlocal  raise   with 
True   async   continue  else    from     in      not       return  yield
and    await   def       except  global   is      or        try   


In [31]:
list(df_man.groups.items())[:10]


Out[31]:
[('AESL', Int64Index([106, 107], dtype='int64')),
 ('AIDC', Int64Index([121, 122, 123, 124], dtype='int64')),
 ('AMX', Int64Index([216, 217], dtype='int64')),
 ('AMX International', Int64Index([218], dtype='int64')),
 ('ATR (Aérospatiale/Alenia)',
  Int64Index([248, 249, 250, 251, 252, 253, 254, 255], dtype='int64')),
 ('AVRO', Int64Index([300, 301, 302, 303, 304, 305, 306], dtype='int64')),
 ('Acro Sport', Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')),
 ('Adam Aircraft Industries', Int64Index([6, 7], dtype='int64')),
 ('Aeritalia', Int64Index([8], dtype='int64')),
 ('Aermacchi / Macchi',
  Int64Index([9, 10, 11, 12, 13, 14, 15, 16], dtype='int64'))]

In [33]:
df_man.get_group("Acro Sport")


Out[33]:
Model Physical Class (Engine) # Engines
0 Acro Sport Piston 1
1 Acro Sport II Piston 1
2 Cougar Piston 1
3 Junior Ace Piston 1
4 Super Ace Piston 1
5 Super Acro Sport Piston 1

In [34]:
df_man.size()


Out[34]:
Manufacturer
AESL                          2
AIDC                          4
AMX                           2
AMX International             1
ATR (Aérospatiale/Alenia)     8
                             ..
Wing Aircraft                 1
XTI Aircraft                  1
Yakovlev / Jakovlev          24
Zivko Aeronautics Inc.        2
Zlin                         42
Length: 424, dtype: int64

In [35]:
df_man.count()  # how many not-missing values per column per group


Out[35]:
Model Physical Class (Engine) # Engines
Manufacturer
AESL 2 2 2
AIDC 4 4 4
AMX 2 2 2
AMX International 1 1 1
ATR (Aérospatiale/Alenia) 8 8 8
... ... ... ...
Wing Aircraft 1 1 1
XTI Aircraft 1 1 1
Yakovlev / Jakovlev 24 24 24
Zivko Aeronautics Inc. 2 2 2
Zlin 42 42 42

424 rows × 3 columns


In [36]:
df_man.nunique()  # how many unique values per column per group


Out[36]:
Model Physical Class (Engine) # Engines
Manufacturer
AESL 2 1 1
AIDC 4 1 1
AMX 2 1 1
AMX International 1 1 1
ATR (Aérospatiale/Alenia) 8 1 1
... ... ... ...
Wing Aircraft 1 1 1
XTI Aircraft 1 1 1
Yakovlev / Jakovlev 21 2 2
Zivko Aeronautics Inc. 2 1 1
Zlin 42 1 1

424 rows × 3 columns


In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo("MjHpMCIvwsY")