Introduction to data

Complete all **Exercises**, and submit answers to **Questions** on the Coursera platform.

Some define statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information - the data. In this lab we explore flights, specifically a random sample of domestic flights that departed from the three major New York City airport in 2013. We will generate simple graphical and numerical summaries of data on these flights and explore delay times. As this is a large data set, along the way you'll also learn the indispensable skills of data processing and subsetting.

Getting started

Load packages

In this lab we will explore the data using the dplyr package and visualize it using the ggplot2 package for data visualization. The data can be found in the companion package for this course, statsr.

Let's load the packages.



In [1]:

    
library(statsr)
library(dplyr)
library(ggplot2)









    



Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning message:
: package ‘ggplot2’ was built under R version 3.2.4

Data

The Bureau of Transportation Statistics (BTS) is a statistical agency that is a part of the Research and Innovative Technology Administration (RITA). As its name implies, BTS collects and makes available transportation data, such as the flights data we will be working with in this lab.

We begin by loading the nycflights data frame. Type the following in your console to load the data:



In [2]:

    
data(nycflights)

The data frame containing 32735 flights that shows up in your workspace is a data matrix, with each row representing an observation and each column representing a variable. R calls this data format a data frame, which is a term that will be used throughout the labs.

To view the names of the variables, type the command



In [3]:

    
names(nycflights)









    Out[3]:





	'year'
	'month'
	'day'
	'dep_time'
	'dep_delay'
	'arr_time'
	'arr_delay'
	'carrier'
	'tailnum'
	'flight'
	'origin'
	'dest'
	'air_time'
	'distance'
	'hour'
	'minute'



In [4]:

    
str(nycflights)









    



Classes ‘tbl_df’ and 'data.frame':	32735 obs. of  16 variables:
 $ year     : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
 $ month    : int  6 5 12 5 7 1 12 8 9 4 ...
 $ day      : int  30 7 8 14 21 1 9 13 26 30 ...
 $ dep_time : int  940 1657 859 1841 1102 1817 1259 1920 725 1323 ...
 $ dep_delay: num  15 -3 -1 -4 -3 -3 14 85 -10 62 ...
 $ arr_time : int  1216 2104 1238 2122 1230 2008 1617 2032 1027 1549 ...
 $ arr_delay: num  -4 10 11 -34 -8 3 22 71 -8 60 ...
 $ carrier  : chr  "VX" "DL" "DL" "DL" ...
 $ tailnum  : chr  "N626VA" "N3760C" "N712TW" "N914DL" ...
 $ flight   : int  407 329 422 2391 3652 353 1428 1407 2279 4162 ...
 $ origin   : chr  "JFK" "JFK" "JFK" "JFK" ...
 $ dest     : chr  "LAX" "SJU" "LAX" "TPA" ...
 $ air_time : num  313 216 376 135 50 138 240 48 148 110 ...
 $ distance : num  2475 1598 2475 1005 296 ...
 $ hour     : num  9 16 8 18 11 18 12 19 7 13 ...
 $ minute   : num  40 57 59 41 2 17 59 20 25 23 ...

This returns the names of the variables in this data frame. The codebook (description of the variables) is included below. This information can also be found in the help file for the data frame which can be accessed by typing ?nycflights in the console.

year, month, day: Date of departure
dep_time, arr_time: Departure and arrival times, local timezone.
dep_delay, arr_delay: Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
carrier: Two letter carrier abbreviation.
- 9E: Endeavor Air Inc.
- AA: American Airlines Inc.
- AS: Alaska Airlines Inc.
- B6: JetBlue Airways
- DL: Delta Air Lines Inc.
- EV: ExpressJet Airlines Inc.
- F9: Frontier Airlines Inc.
- FL: AirTran Airways Corporation
- HA: Hawaiian Airlines Inc.
- MQ: Envoy Air
- OO: SkyWest Airlines Inc.
- UA: United Air Lines Inc.
- US: US Airways Inc.
- VX: Virgin America
- WN: Southwest Airlines Co.
- YV: Mesa Airlines Inc.
tailnum: Plane tail number
flight: Flight number
origin, dest: Airport codes for origin and destination. (Google can help you with what code stands for which airport.)
air_time: Amount of time spent in the air, in minutes.
distance: Distance flown, in miles.
hour, minute: Time of departure broken in to hour and minutes.

A very useful function for taking a quick peek at your data frame, and viewing its dimensions and data types is str, which stands for structure.

The nycflights data frame is a massive trove of information. Let’s think about some questions we might want to answer with these data:

We might want to find out how delayed flights headed to a particular destination tend to be. We might want to evaluate how departure delays vary over months. Or we might want to determine which of the three major NYC airports has a better on time percentage for departing flights.

Seven verbs

The dplyr package offers seven verbs (functions) for basic data manipulation:

filter()
arrange()
select()
distinct()
mutate()
summarise()
sample_n()

We will use some of these functions in this lab, and learn about others in a future lab.

Analysis

Departure delays in flights to Raleigh-Durham (RDU)

We can examine the distribution of departure delays of all flights with a histogram.



In [5]:

    
ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram()









    



`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This function says to plot the dep_delay variable from the nycflights data frame on the x-axis. It also defines a geom (short for geometric object), which describes the type of plot you will produce.

Histograms are generally a very good way to see the shape of a single distribution, but that shape can change depending on how the data is split between the different bins. You can easily define the binwidth you want to use:



In [6]:

    
ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 15)



In [7]:

    
ggplot(data = nycflights, aes(x = dep_delay)) +
  geom_histogram(binwidth = 150)

Exercise

How do these three histograms with the various binwidths compare?

If we want to focus on departure delays of flights headed to RDU only, we need to first filter the data for flights headed to RDU (dest == "RDU") and then make a histogram of only departure delays of only those flights.



In [8]:

    
rdu_flights <- nycflights %>%
  filter(dest == "RDU")
ggplot(data = rdu_flights, aes(x = dep_delay)) +
  geom_histogram()









    



`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Let's decipher these three lines of code:

Line 1: Take the nycflights data frame, filter for flights headed to RDU, and save the result as a new data frame called rdu_flights.
- == means "if it's equal to".
- RDU is in quotation marks since it is a character string.
Line 2: Basically the same ggplot call from earlier for making a histogram, except that it uses the data frame for flights headed to RDU instead of all flights.

Logical operators:

Filtering for certain observations (e.g. flights from a particular airport) is often of interest in data frames where we might want to examine observations with certain characteristics separately from the rest of the data. To do so we use the filter function and a series of logical operators. The most commonly used logical operators for data analysis are as follows:

== means "equal to"
!= means "not equal to"
> or < means "greater than" or "less than"
>= or <= means "greater than or equal to" or "less than or equal to"

We can also obtain numerical summaries for these flights:



In [12]:

    
rdu_flights %>%
  summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())









    Out[12]:





mean_dd sd_dd n

	1 11.69913 35.55567 801

Note that in the summarise function we created a list of two elements. The names of these elements are user defined, like mean_dd, sd_dd, n, and you could customize these names as you like (just don't use spaces in your names). Calculating these summary statistics also require that you know the function calls. Note that n() reports the sample size.

Summary statistics:

Some useful function calls for summary statistics for a single numerical variable are as follows:

mean
median
sd
var
IQR
range
min
max </div>

We can also filter based on multiple criteria. Suppose we are interested in flights headed to San Francisco (SFO) in February:



In [15]:

    
sfo_feb_flights <- nycflights %>%
  filter(dest == "SFO", month == 2)

Note that we can separate the conditions using commas if we want flights that are both headed to SFO and in February. If we are interested in either flights headed to SFO or in February we can use the | instead of the comma.

1. Create a new data frame that includes flights headed to SFO in February, and save this data frame as sfo_feb_flights. How many flights meet these criteria?

68
1345
2286
3563
32735



In [17]:

    
nrow(sfo_feb_flights)









    Out[17]:




68

2. Make a histogram and calculate appropriate summary statistics for arrival delays of sfo_feb_flights. Which of the following is false?

The distribution is unimodal.
The distribution is right skewed.
No flight is delayed more than 2 hours.
The distribution has several extreme values on the right side.
More than 50% of flights arrive on time or earlier than scheduled.



In [19]:

    
ggplot(data = sfo_feb_flights, aes(x = arr_delay)) +
  geom_histogram() # C is wrong, so choose 3









    



`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.



In [ ]:

Another useful functionality is being able to quickly calculate summary statistics for various groups in your data frame. For example, we can modify the above command using the group_by function to get the same summary stats for each origin airport:



In [20]:

    
rdu_flights %>%
  group_by(origin) %>%
  summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())









    Out[20]:





origin mean_dd sd_dd n

	1 EWR 13.36552 32.08492 145
	2 JFK 15.39667 40.30535 300
	3 LGA 7.904494 32.1862 356

Here, we first grouped the data by origin, and then calculated the summary statistics.

3. Calculate the median and interquartile range for arr_delays of flights in the sfo_feb_flights data frame, grouped by carrier. Which carrier is the has the hights IQR of arrival delays?

American Airlines
JetBlue Airways
Virgin America
Delta and United Airlines
Frontier Airlines



In [66]:

    
sfo_feb_flights %>%
    group_by(carrier) %>%
    summarise(median_dd = median(arr_delay), IQR_dd = IQR(arr_delay)) %>%
    arrange(desc(IQR_dd))
# Answer is AA









    Out[66]:





carrier median_dd IQR_dd

	1 DL -15 22
	2 UA -10 22
	3 VX -22.5 21.25
	4 AA 5 17.5
	5 B6 -10.5 12.25

Departure delays over months

Which month would you expect to have the highest average delay departing from an NYC airport?

Let's think about how we would answer this question:

First, calculate monthly averages for departure delays. With the new language we are learning, we need to
- group_by months, then
- summarise mean departure delays.
Then, we need to arrange these average delays in descending order



In [22]:

    
nycflights %>%
  group_by(month) %>%
  summarise(mean_dd = mean(dep_delay)) %>%
  arrange(desc(mean_dd))









    Out[22]:





month mean_dd

	1 7 20.75456
	2 6 20.35029
	3 12 17.36819
	4 4 14.55448
	5 3 13.5176
	6 5 13.2648
	7 8 12.6191
	8 2 10.68723
	9 1 10.23333
	10 9 6.872436
	11 11 6.103183
	12 10 5.880374

4. Which month has the highest average departure delay from an NYC airport?

January
March
July
October
December

5. Which month has the highest median departure delay from an NYC airport?

January
March
July
October
December



In [23]:

    
nycflights %>%
  group_by(month) %>%
  summarise(mean_dd = mean(dep_delay), median_dd = median(dep_delay)) %>%
  arrange(desc(median_dd)) # Month 12









    Out[23]:





month mean_dd median_dd

	1 12 17.36819 1
	2 6 20.35029 0
	3 7 20.75456 0
	4 3 13.5176 -1
	5 5 13.2648 -1
	6 8 12.6191 -1
	7 1 10.23333 -2
	8 2 10.68723 -2
	9 4 14.55448 -2
	10 11 6.103183 -2
	11 9 6.872436 -3
	12 10 5.880374 -3



In [64]:

    
nycflights %>%
  group_by(month) %>%
  summarise(mean_dd = mean(dep_delay), IQR_dd = IQR(dep_delay)) %>%
  arrange(desc(IQR_dd)) # Month 12









    Out[64]:





month mean_dd IQR_dd

	1 7 20.75456 26
	2 6 20.35029 25
	3 12 17.36819 25
	4 5 13.2648 19
	5 3 13.5176 17
	6 4 14.55448 16
	7 2 10.68723 15
	8 8 12.6191 15
	9 1 10.23333 12
	10 11 6.103183 10
	11 10 5.880374 9
	12 9 6.872436 8

6. Is the mean and the median a more reliable measure for deciding which month(s) to avoid flying if you really dislike delayed flights, and why?

Mean would be more reliable as it gives us the true average.
Mean would be more reliable as the distribution of delays is symmetric.
Median would be more reliable as the distribution of delays is skewed.
Median would be more reliable as the distribution of delays is symmetric.
Both give us useful information.

We can also visualize the distributions of departure delays across months using side-by-side box plots:



In [28]:

    
ggplot(nycflights, aes(x = factor(month), y = dep_delay)) +
  geom_boxplot()

There is some new syntax here: We want departure delays on the y-axis and the months on the x-axis to produce side-by-side box plots. Side-by-side box plots require a categorical variable on the x-axis, however in the data frame month is stored as a numerical variable (numbers 1 - 12). Therefore we can force R to treat this variable as categorical, what R calls a factor, variable with factor(month).

On time departure rate for NYC airports

Suppose you will be flying out of NYC and want to know which of the three major NYC airports has the best on time departure rate of departing flights. Suppose also that for you a flight that is delayed for less than 5 minutes is basically "on time". You consider any flight delayed for 5 minutes of more to be "delayed".

In order to determine which airport has the best on time departure rate, we need to

first classify each flight as "on time" or "delayed",
then group flights by origin airport,
then calculate on time departure rates for each origin airport,
and finally arrange the airports in descending order for on time departure percentage.

Let's start with classifying each flight as "on time" or "delayed" by creating a new variable with the mutate function.



In [30]:

    
nycflights <- nycflights %>%
  mutate(dep_type = ifelse(dep_delay < 5, "on time", "delayed"))

The first argument in the mutate function is the name of the new variable we want to create, in this case dep_type. Then if dep_delay < 5 we classify the flight as "on time" and "delayed" if not, i.e. if the flight is delayed for 5 or more minutes.

Note that we are also overwriting the nycflights data frame with the new version of this data frame that includes the new dep_type variable.

We can handle all the remaining steps in one code chunk:



In [32]:

    
nycflights %>%
  group_by(origin) %>%
  summarise(ot_dep_rate = sum(dep_type == "on time") / n()) %>%
  arrange(desc(ot_dep_rate)) 
# Choose LGA









    Out[32]:





origin ot_dep_rate

	1 LGA 0.7279229
	2 JFK 0.6935854
	3 EWR 0.6369892

We can also visualize the distribution of on on time departure rate across the three airports using a segmented bar plot.



In [33]:

    
ggplot(data = nycflights, aes(x = origin, fill = dep_type)) +
  geom_bar()

8. Mutate the data frame so that it includes a new variable that contains the average speed, avg_speed traveled by the plane for each flight (in mph). What is the tail number of the plane with the fastest avg_speed? Hint: Average speed can be calculated as distance divided by number of hours of travel, and note that air_time is given in minutes. If you just want to show the avg_speed and tailnum and none of the other variables, use the select function at the end of your pipe to select just these two variables with select(avg_speed, tailnum). You can Google this tail number to find out more about the aircraft.

N666DN
N755US
N779JB
N947UW
N959UW



In [41]:

    
nycflights <- nycflights %>%
  mutate(avg_speed = distance / air_time * 60)



In [51]:

    
nycflights %>%
  #group_by(tailnum) %>%
  arrange(desc(avg_speed))  %>%
  select(avg_speed, tailnum)









    Out[51]:





avg_speed tailnum

	1 703.384615384615 N666DN
	2 557.441860465116 N779JB
	3 554.219653179191 N571JB
	4 547.885714285714 N568JB
	5 547.885714285714 N5EHAA
	6 547.885714285714 N656JB
	7 544.772727272727 N789JB
	8 538.651685393258 N516JB
	9 535.642458100559 N648JB
	10 535.642458100559 N510JB
	11 533.038674033149 N38268
	12 533.038674033149 N53442
	13 533.038674033149 N75858
	14 532.666666666667 N624JB
	15 532.666666666667 N569JB
	16 532.666666666667 N3749D
	17 531.23595505618 N523JB
	18 529.723756906077 N504JB
	19 529.723756906077 N637JB
	20 529.723756906077 N506JB
	21 529.120879120879 N571UA
	22 527.213114754098 N37267
	23 526.813186813187 N608JB
	24 526.813186813187 N3771K
	25 526.813186813187 N595JB
	26 526.378378378378 N5ETAA
	27 526.113074204947 N76065
	28 525.414364640884 N26208
	29 525.414364640884 N37419
	30 524.508196721311 N439UA
	31 ⋮ ⋮
	32 148.148148148148 N37252
	33 147.741935483871 N828AS
	34 147.692307692308 N950UW
	35 147.692307692308 N713UW
	36 147.692307692308 N950UW
	37 144 N756US
	38 144 N950UW
	39 144 N958UW
	40 140.487804878049 N946UW
	41 140.487804878049 N956UW
	42 140.487804878049 N947UW
	43 140.487804878049 N945UW
	44 140.25 N3FEAA
	45 137.142857142857 N952UW
	46 137.142857142857 N950UW
	47 137.142857142857 N957UW
	48 133.953488372093 N963UW
	49 133.953488372093 N954UW
	50 131.752577319588 N813MQ
	51 130.46511627907 N3DRAA
	52 128 N760US
	53 124.285714285714 N13913
	54 122.608695652174 N8943A
	55 122.553191489362 N947UW
	56 120 N956UW
	57 117.551020408163 N957UW
	58 115.2 N957UW
	59 112.8 N8623A
	60 110.769230769231 N959UW
	61 76.8 N755US



In [50]:

    
nycflights %>%
  group_by(tailnum) %>%
  summarise(mean_as = mean(avg_speed)) %>%
  arrange(desc(mean_as))









    Out[50]:





tailnum mean_as

	1 N526AS 509.257950530035
	2 N637DL 505.994764397906
	3 N66051 504.71186440678
	4 N907JB 504.631578947368
	5 N522VA 502.941176470588
	6 N5BTAA 499.375
	7 N654UA 498.582089552239
	8 N382HA 494.831019007655
	9 N75861 494.769230769231
	10 N5DRAA 493.115976519118
	11 N374AA 491.278195488722
	12 N389HA 490.337866153747
	13 N537AS 490.204081632653
	14 N69063 489.724569000401
	15 N68061 488.965517241379
	16 N76065 487.693908344559
	17 N77871 487.181249870808
	18 N526VA 485.306939916162
	19 N399AA 484.875
	20 N380HA 484.688134465044
	21 N391HA 484.524084464903
	22 N388HA 482.20229466452
	23 N385HA 481.87447919347
	24 N376AA 479.345512796064
	25 N5EJAA 478.496500234831
	26 N386HA 478.21573120198
	27 N56859 477.387862837861
	28 N7BGAA 476.934306569343
	29 N177DZ 475.966386554622
	30 N199DN 475.373357781046
	31 ⋮ ⋮
	32 N965UW 278.755486161109
	33 N506MJ 275.792854275391
	34 N8837B 274.979288557528
	35 N755US 274.921681499977
	36 N521LR 274.8
	37 N8968E 274.32319501285
	38 N959UW 272.739146108763
	39 N8847A 272.695782078208
	40 N835MQ 272.6262505002
	41 N8390A 272.247163435217
	42 N949UW 270.887567769774
	43 N8943A 270.780746020137
	44 N8891A 270.254496855872
	45 N756US 269.620501378321
	46 N501MJ 269.499430706903
	47 N923MQ 268.64940869528
	48 N668MQ 268.423998466552
	49 N963UW 267.834729305356
	50 N858MQ 267.286821705426
	51 N958UW 266.683971229054
	52 N8409N 262.857142857143
	53 N635MQ 258.227848101266
	54 N950UW 251.928468657486
	55 N8905F 250.343107883682
	56 N504MJ 249.818181818182
	57 N823AY 248.893984962406
	58 N513MJ 244.266666666667
	59 N956UW 241.554835818005
	60 N8588D 240
	61 N819MQ 236.666666666667

9. Make a scatterplot of avg_speed vs. distance. Which of the following is true about the relationship between average speed and distance.

As distance increases the average speed of flights decreases.
The relationship is linear.
There is an overall postive association between distance and average speed.
There are no outliers.
The distribution of distances are uniform over 0 to 5000 miles.



In [53]:

    
ggplot(data = nycflights, aes(x = distance, y = avg_speed)) +
  geom_point()

10. Suppose you define a flight to be "on time" if it gets to the destination on time or earlier than expected, regardless of any departure delays. Mutate the data frame to create a new variable called arr_type with levels "on time" and "delayed" based on this definition. Then, determine the on time arrival percentage based on whether the flight departed on time or not. What percent of flights that were "delayed" departing arrive "on time"?



In [83]:

    
nycflights <- nycflights %>%
  mutate(arr_type = ifelse(arr_delay <= 0, "arr on time", "delayed"))



In [84]:

    
table(nycflights$arr_type, nycflights$dep_type)









    Out[84]:





             
              delayed on time
  arr on time    1898   17375
  delayed        8453    5009



In [85]:

    
1898 / (19273+13462) # This is the correct answer









    Out[85]:




0.057980754544066



In [90]:

    
nycflights %>%
  summarise(ot_dep_rate = sum(arr_type == "arr on time" & dep_type == "delayed") / n())









    Out[90]:





ot_dep_rate

	1 0.05798075



In [ ]:



In [73]:

    
ggplot(data = nycflights, aes(x = arr_delay)) +
  geom_histogram(binwidth = 15)

	origin	mean_dd	sd_dd	n
1	EWR	13.36552	32.08492	145
2	JFK	15.39667	40.30535	300
3	LGA	7.904494	32.1862	356

	month	mean_dd
1	7	20.75456
2	6	20.35029
3	12	17.36819
4	4	14.55448
5	3	13.5176
6	5	13.2648
7	8	12.6191
8	2	10.68723
9	1	10.23333
10	9	6.872436
11	11	6.103183
12	10	5.880374

	avg_speed	tailnum
1	703.384615384615	N666DN
2	557.441860465116	N779JB
3	554.219653179191	N571JB
4	547.885714285714	N568JB
5	547.885714285714	N5EHAA
6	547.885714285714	N656JB
7	544.772727272727	N789JB
8	538.651685393258	N516JB
9	535.642458100559	N648JB
10	535.642458100559	N510JB
11	533.038674033149	N38268
12	533.038674033149	N53442
13	533.038674033149	N75858
14	532.666666666667	N624JB
15	532.666666666667	N569JB
16	532.666666666667	N3749D
17	531.23595505618	N523JB
18	529.723756906077	N504JB
19	529.723756906077	N637JB
20	529.723756906077	N506JB
21	529.120879120879	N571UA
22	527.213114754098	N37267
23	526.813186813187	N608JB
24	526.813186813187	N3771K
25	526.813186813187	N595JB
26	526.378378378378	N5ETAA
27	526.113074204947	N76065
28	525.414364640884	N26208
29	525.414364640884	N37419
30	524.508196721311	N439UA
31	⋮	⋮
32	148.148148148148	N37252
33	147.741935483871	N828AS
34	147.692307692308	N950UW
35	147.692307692308	N713UW
36	147.692307692308	N950UW
37	144	N756US
38	144	N950UW
39	144	N958UW
40	140.487804878049	N946UW
41	140.487804878049	N956UW
42	140.487804878049	N947UW
43	140.487804878049	N945UW
44	140.25	N3FEAA
45	137.142857142857	N952UW
46	137.142857142857	N950UW
47	137.142857142857	N957UW
48	133.953488372093	N963UW
49	133.953488372093	N954UW
50	131.752577319588	N813MQ
51	130.46511627907	N3DRAA
52	128	N760US
53	124.285714285714	N13913
54	122.608695652174	N8943A
55	122.553191489362	N947UW
56	120	N956UW
57	117.551020408163	N957UW
58	115.2	N957UW
59	112.8	N8623A
60	110.769230769231	N959UW
61	76.8	N755US

	tailnum	mean_as
1	N526AS	509.257950530035
2	N637DL	505.994764397906
3	N66051	504.71186440678
4	N907JB	504.631578947368
5	N522VA	502.941176470588
6	N5BTAA	499.375
7	N654UA	498.582089552239
8	N382HA	494.831019007655
9	N75861	494.769230769231
10	N5DRAA	493.115976519118
11	N374AA	491.278195488722
12	N389HA	490.337866153747
13	N537AS	490.204081632653
14	N69063	489.724569000401
15	N68061	488.965517241379
16	N76065	487.693908344559
17	N77871	487.181249870808
18	N526VA	485.306939916162
19	N399AA	484.875
20	N380HA	484.688134465044
21	N391HA	484.524084464903
22	N388HA	482.20229466452
23	N385HA	481.87447919347
24	N376AA	479.345512796064
25	N5EJAA	478.496500234831
26	N386HA	478.21573120198
27	N56859	477.387862837861
28	N7BGAA	476.934306569343
29	N177DZ	475.966386554622
30	N199DN	475.373357781046
31	⋮	⋮
32	N965UW	278.755486161109
33	N506MJ	275.792854275391
34	N8837B	274.979288557528
35	N755US	274.921681499977
36	N521LR	274.8
37	N8968E	274.32319501285
38	N959UW	272.739146108763
39	N8847A	272.695782078208
40	N835MQ	272.6262505002
41	N8390A	272.247163435217
42	N949UW	270.887567769774
43	N8943A	270.780746020137
44	N8891A	270.254496855872
45	N756US	269.620501378321
46	N501MJ	269.499430706903
47	N923MQ	268.64940869528
48	N668MQ	268.423998466552
49	N963UW	267.834729305356
50	N858MQ	267.286821705426
51	N958UW	266.683971229054
52	N8409N	262.857142857143
53	N635MQ	258.227848101266
54	N950UW	251.928468657486
55	N8905F	250.343107883682
56	N504MJ	249.818181818182
57	N823AY	248.893984962406
58	N513MJ	244.266666666667
59	N956UW	241.554835818005
60	N8588D	240
61	N819MQ	236.666666666667