5-2 pandas DataFrame advaced read_csv, read_html, binning, categorical data, more grouping, Apply, Transforming, Aggregate functions



In [1]:
%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt


3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2
0.16.2

Alright in this video we're going to dive a bit deeper into grouping and grouping like functions. These are certainly more complicated topics but my hope is that it should give you a strong intuition to what you can do with python.

The data set that we're going to work with is a slice of http://stat-computing.org/dataexpo/2009/


In [2]:
pd.read_csv("../data/june_airplane_data.csv", header=False).head()


Out[2]:
1994 6 5 7 1542 1540 1819 1815 US 236 ... NA.2 NA.3 0 NA.4 0.1 NA.5 NA.6 NA.7 NA.8 NA.9
0 1994 6 6 1 1549 1540 1831 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
1 1994 6 7 2 1540 1540 1803 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
2 1994 6 8 3 1541 1540 1808 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
3 1994 6 9 4 1541 1540 1835 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
4 1994 6 10 5 1548 1540 1821 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN

5 rows × 29 columns


In [3]:
pd.read_html("http://stat-computing.org/dataexpo/2009/the-data.html")[1]


Out[3]:
0 1 2
0 NaN Name Description
1 1 Year 1987-2008
2 2 Month 1-12
3 3 DayofMonth 1-31
4 4 DayOfWeek 1 (Monday) - 7 (Sunday)
5 5 DepTime actual departure time (local, hhmm)
6 6 CRSDepTime scheduled departure time (local, hhmm)
7 7 ArrTime actual arrival time (local, hhmm)
8 8 CRSArrTime scheduled arrival time (local, hhmm)
9 9 UniqueCarrier unique carrier code
10 10 FlightNum flight number
11 11 TailNum plane tail number
12 12 ActualElapsedTime in minutes
13 13 CRSElapsedTime in minutes
14 14 AirTime in minutes
15 15 ArrDelay arrival delay, in minutes
16 16 DepDelay departure delay, in minutes
17 17 Origin origin IATA airport code
18 18 Dest destination IATA airport code
19 19 Distance in miles
20 20 TaxiIn taxi in time, in minutes
21 21 TaxiOut taxi out time in minutes
22 22 Cancelled was the flight cancelled?
23 23 CancellationCode reason for cancellation (A = carrier, B = weat...
24 24 Diverted 1 = yes, 0 = no
25 25 CarrierDelay in minutes
26 26 WeatherDelay in minutes
27 27 NASDelay in minutes
28 28 SecurityDelay in minutes
29 29 LateAircraftDelay in minutes

In [4]:
pd.read_html("http://stat-computing.org/dataexpo/2009/the-data.html")[1][1]


Out[4]:
0                  Name
1                  Year
2                 Month
3            DayofMonth
4             DayOfWeek
5               DepTime
6            CRSDepTime
7               ArrTime
8            CRSArrTime
9         UniqueCarrier
10            FlightNum
11              TailNum
12    ActualElapsedTime
13       CRSElapsedTime
14              AirTime
15             ArrDelay
16             DepDelay
17               Origin
18                 Dest
19             Distance
20               TaxiIn
21              TaxiOut
22            Cancelled
23     CancellationCode
24             Diverted
25         CarrierDelay
26         WeatherDelay
27             NASDelay
28        SecurityDelay
29    LateAircraftDelay
Name: 1, dtype: object

In [5]:
df = pd.read_csv('../data/june_airplane_data.csv', names=pd.read_html("http://stat-computing.org/dataexpo/2009/the-data.html")[1][1])

In [6]:
df.head()


Out[6]:
Name Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier ... TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 1994 6 5 7 1542 1540 1819 1815 US 236 ... NaN 0 NaN 0 NaN NaN NaN NaN NaN NaN
1 1994 6 6 1 1549 1540 1831 1815 US 236 ... NaN 0 NaN 0 NaN NaN NaN NaN NaN NaN
2 1994 6 7 2 1540 1540 1803 1815 US 236 ... NaN 0 NaN 0 NaN NaN NaN NaN NaN NaN
3 1994 6 8 3 1541 1540 1808 1815 US 236 ... NaN 0 NaN 0 NaN NaN NaN NaN NaN NaN
4 1994 6 9 4 1541 1540 1835 1815 US 236 ... NaN 0 NaN 0 NaN NaN NaN NaN NaN NaN

5 rows × 30 columns

Unfortunately we can see that our columns are off by one, we don't have the name column however the fix is straight forward, all that we need to do is shift over our names list by one


In [7]:
df = pd.read_csv('../data/june_airplane_data.csv', names=pd.read_html("http://stat-computing.org/dataexpo/2009/the-data.html")[1][1][1:])

In [8]:
df.head()


Out[8]:
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum ... TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 1994 6 5 7 1542 1540 1819 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
1 1994 6 6 1 1549 1540 1831 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
2 1994 6 7 2 1540 1540 1803 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
3 1994 6 8 3 1541 1540 1808 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
4 1994 6 9 4 1541 1540 1835 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN

5 rows × 29 columns


In [9]:
df.describe()


Out[9]:
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime FlightNum TailNum ... TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
count 426490 426490 426490.000000 426490.000000 423805.000000 426490.000000 422641.000000 426490.000000 426490.000000 0 ... 0 0 426490.000000 0 426490.000000 0 0 0 0 0
mean 1994 6 15.520458 3.923185 1343.071375 1337.525309 1489.413914 1493.821825 901.690736 NaN ... NaN NaN 0.006296 NaN 0.002729 NaN NaN NaN NaN NaN
std 0 0 8.670839 1.930781 474.274010 465.902464 489.868444 474.554100 593.512783 NaN ... NaN NaN 0.079095 NaN 0.052171 NaN NaN NaN NaN NaN
min 1994 6 1.000000 1.000000 1.000000 5.000000 1.000000 1.000000 1.000000 NaN ... NaN NaN 0.000000 NaN 0.000000 NaN NaN NaN NaN NaN
25% 1994 6 8.000000 2.000000 930.000000 930.000000 1114.000000 1117.000000 407.000000 NaN ... NaN NaN 0.000000 NaN 0.000000 NaN NaN NaN NaN NaN
50% 1994 6 16.000000 4.000000 1330.000000 1328.000000 1517.000000 1520.000000 792.000000 NaN ... NaN NaN 0.000000 NaN 0.000000 NaN NaN NaN NaN NaN
75% 1994 6 23.000000 6.000000 1731.000000 1722.000000 1913.000000 1910.000000 1429.000000 NaN ... NaN NaN 0.000000 NaN 0.000000 NaN NaN NaN NaN NaN
max 1994 6 30.000000 7.000000 2400.000000 2359.000000 2400.000000 2400.000000 3219.000000 NaN ... NaN NaN 1.000000 NaN 1.000000 NaN NaN NaN NaN NaN

8 rows × 26 columns

Now we've got a new issue, we've got a problem with our date time that we'd like to merge. There are two ways to do this, I'll show you the easiest one - the other one would involve combining them into one column then parsing the results. Let's do it the easier way with read_csv. We can specify parse dates as a dictionary where we map the columns to a date time list.


In [10]:
df = pd.read_csv('../data/june_airplane_data.csv', parse_dates={'Date':['Year','Month','DayofMonth']}, names=pd.read_html("http://stat-computing.org/dataexpo/2009/the-data.html")[1][1][1:])

That will automatically parse it for us and print it out correctly. We can see that it's a bit slow because it's going to try and read all those columns exactly.


In [11]:
df.head()


Out[11]:
Date DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime ... TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 1994-06-05 7 1542 1540 1819 1815 US 236 NaN 157 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
1 1994-06-06 1 1549 1540 1831 1815 US 236 NaN 162 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
2 1994-06-07 2 1540 1540 1803 1815 US 236 NaN 143 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
3 1994-06-08 3 1541 1540 1808 1815 US 236 NaN 147 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
4 1994-06-09 4 1541 1540 1835 1815 US 236 NaN 174 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN

5 rows × 27 columns

Now one that can be helpful in analysis is binning, we've gotten our dates sorted out, but we might want to take a look at some specific groups of data. That's where binning comes in. Let's give it a try. Now this is different than grouping because we're not working on a continuous scale so we've got to be a bit arbitary.

Now let's start by setting up something to group things by time. We can see we've got a departure time and arrival time column that is going to be useful. Right now they're in Army time format. I think it'd be nice to break them up into 6 hour chunks, from midnight to 6 am, 6 to noon, noon to 1800, then 1800 to midnight

I'll start by creating a range of values, these are basically the edges of my bins.


In [12]:
ranges = [0,600,1200,1800,2400]
labels = ['Early Morning','Morning','Early Afternoon','Evening']

Now we use the cut function to cut them up into groups.


In [13]:
?pd.cut()

In [14]:
df['DepTime2'] = pd.cut(df.DepTime, ranges, labels=labels).astype('category')

Now we can do the same with the arrival time, since it's more or less the same


In [15]:
df['ArrTime2'] = pd.cut(df.ArrTime, ranges, labels=labels).astype('category')

In [16]:
df.ArrTime2


Out[16]:
0                 Evening
1                 Evening
2                 Evening
3                 Evening
4                 Evening
5                 Evening
6                 Evening
7                 Evening
8                 Evening
9                 Evening
10                Evening
11                Evening
12                Evening
13                Evening
14                    NaN
15                Evening
16                Evening
17                Evening
18                Evening
19                Evening
20                Evening
21                Evening
22                Evening
23                Evening
24                Evening
25                Evening
26                Evening
27                Evening
28                Evening
29                Evening
               ...       
426460            Evening
426461            Evening
426462            Evening
426463            Evening
426464            Evening
426465                NaN
426466            Evening
426467            Evening
426468            Evening
426469            Evening
426470            Evening
426471            Evening
426472            Evening
426473            Evening
426474            Evening
426475    Early Afternoon
426476    Early Afternoon
426477    Early Afternoon
426478    Early Afternoon
426479    Early Afternoon
426480    Early Afternoon
426481    Early Afternoon
426482    Early Afternoon
426483    Early Afternoon
426484    Early Afternoon
426485    Early Afternoon
426486    Early Afternoon
426487    Early Afternoon
426488    Early Afternoon
426489    Early Afternoon
Name: ArrTime2, dtype: category
Categories (4, object): [Early Morning, Morning, Early Afternoon, Evening]

Now categorical data is different that what we're used to, we can see that it has a hierarchy. There's things that are less than other things and therefore is ordered. This can come in handy in certain types of analysis.

Really what this allows us to do is describe the data in terms of those categories


In [17]:
df[['DepTime2','ArrTime2']].describe()


Out[17]:
DepTime2 ArrTime2
count 423805 422641
unique 4 4
top Morning Early Afternoon
freq 163514 156806

Let's run a group by to see answer a specific question. When flights leave and arrive, what is the average delay time? Now these are for actual departure and arrival times, not the scheduled ones. That's a different time entirely.

IE. If a flight leaves in the early morning and arrives in the early morning, what is the typical delay? Is there a longer departure delay or arrival delay?

We can construct this with a groupby, where we group our new categories together, then select the relevant columns and get the average.


In [18]:
df.groupby(['DepTime2', 'ArrTime2'])[['DepDelay','ArrDelay']].mean()


Out[18]:
DepDelay ArrDelay
DepTime2 ArrTime2
Early Morning Early Morning 43.804627 42.069923
Morning 2.578566 2.109582
Early Afternoon NaN NaN
Evening 64.000000 61.333333
Morning Early Morning NaN NaN
Morning 2.367489 0.664017
Early Afternoon 3.769618 4.102513
Evening 5.411565 12.923469
Early Afternoon Early Morning 8.890995 14.450237
Morning 4.896104 2.480519
Early Afternoon 5.166529 4.571237
Evening 7.632955 11.082200
Evening Early Morning 36.956347 42.037799
Morning 5.464702 7.629198
Early Afternoon 29.142857 22.500000
Evening 13.668750 12.756813

Now when I first say this data, I asked myself why am I getting those NaN values. And This is due to the fact that I have a sample of the data, not all of it. In this dataset, I don't have a any flights that take off in the early morning and land in the Early Afternoon.

We can run a sanity check as follows


In [19]:
sum((df.DepTime < 600) & (df.ArrTime > 1200) & (df.ArrTime < 1800))


Out[19]:
0

In [20]:
df.groupby(['DepTime2', 'ArrTime2']).agg({'ArrDelay':[np.mean, np.std], 'DepDelay':[np.mean, np.std]})


Out[20]:
ArrDelay DepDelay
mean std mean std
DepTime2 ArrTime2
Early Morning Early Morning 42.069923 91.241073 43.804627 91.836738
Morning 2.109582 18.436171 2.578566 16.595306
Evening 61.333333 5.507571 64.000000 8.888194
Morning Morning 0.664017 13.453101 2.367489 10.216042
Early Afternoon 4.102513 17.172703 3.769618 12.482494
Evening 12.923469 24.192638 5.411565 15.704599
Early Afternoon Early Morning 14.450237 42.653210 8.890995 27.347542
Morning 2.480519 11.084140 4.896104 8.718680
Early Afternoon 4.571237 18.710938 5.166529 14.861790
Evening 11.082200 28.073863 7.632955 19.324282
Evening Early Morning 42.037799 74.136225 36.956347 65.635363
Morning 7.629198 17.129908 5.464702 12.298888
Early Afternoon 22.500000 30.933552 29.142857 32.229602
Evening 12.756813 34.126398 13.668750 39.551075

Now that we've explored binning, let's try and answer another question, which flights are worst for delays. How can we rank the flights by those that have, on average, the worst delay time. Which, flight, in each departure time group has the worse delay time.

So let's go back to the data.


In [21]:
df.head()


Out[21]:
Date DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime ... Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay DepTime2 ArrTime2
0 1994-06-05 7 1542 1540 1819 1815 US 236 NaN 157 ... 0 NaN 0 NaN NaN NaN NaN NaN Early Afternoon Evening
1 1994-06-06 1 1549 1540 1831 1815 US 236 NaN 162 ... 0 NaN 0 NaN NaN NaN NaN NaN Early Afternoon Evening
2 1994-06-07 2 1540 1540 1803 1815 US 236 NaN 143 ... 0 NaN 0 NaN NaN NaN NaN NaN Early Afternoon Evening
3 1994-06-08 3 1541 1540 1808 1815 US 236 NaN 147 ... 0 NaN 0 NaN NaN NaN NaN NaN Early Afternoon Evening
4 1994-06-09 4 1541 1540 1835 1815 US 236 NaN 174 ... 0 NaN 0 NaN NaN NaN NaN NaN Early Afternoon Evening

5 rows × 29 columns

first we're going to need a way of ranking the flights by the average delay time. Basically we'll take in a dataframe of all the flights in a set and return a dataframe that has a ranked column.

First We'll need a total delay column then we'll have to sort it by the total delay time.

I'm also going to create a new column which is the scheduled departure time so we can see when planes are supposed to go out and group it by those.


In [22]:
df['TotalDelay'] = df.ArrDelay + df.DepDelay
df['ScheduledDepTime'] = pd.cut(df.CRSDepTime, ranges, labels=labels)

In [23]:
def average_delay(dataframe):
    dataframe['AvgFlightDelay'] = dataframe.TotalDelay.mean()
    return dataframe

In [24]:
avg_delayed = df.groupby('FlightNum').apply(average_delay)

worst ranking...


In [25]:
def ranking(dataframe, column):
    dataframe.sort(column, ascending=False,inplace=True)
    dataframe[column + "Rank"] = np.arange(len(dataframe)) + 1
    return dataframe

In [26]:
avg_delayed_ranked = avg_delayed.groupby('ScheduledDepTime').apply(lambda x: ranking(x, 'AvgFlightDelay'))

In [27]:
avg_delayed_ranked[avg_delayed_ranked.AvgFlightDelayRank == 1]


Out[27]:
Date DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime ... WeatherDelay NASDelay SecurityDelay LateAircraftDelay DepTime2 ArrTime2 TotalDelay ScheduledDepTime AvgFlightDelay AvgFlightDelayRank
ScheduledDepTime
Early Morning 269530 1994-06-02 4 51 110 109 131 DL 1490 NaN 18 ... NaN NaN NaN NaN Early Morning Early Morning -41 Early Morning 36.875000 1
Morning 267020 1994-06-28 2 1158 1151 1253 1250 DL 1428 NaN 55 ... NaN NaN NaN NaN Morning Early Afternoon 10 Morning 173.780142 1
Early Afternoon 60742 1994-06-12 7 1605 1600 1703 1659 US 2849 NaN 58 ... NaN NaN NaN NaN Early Afternoon Early Afternoon 9 Early Afternoon 183.333333 1
Evening 266946 1994-06-09 4 1953 2005 2022 2039 DL 1428 NaN 29 ... NaN NaN NaN NaN Evening Evening 1411 Evening 173.780142 1

4 rows × 33 columns

We can't just do it on our entire data frame right now because we've got a mix of a lot of different flights, we've got to group it by each flight then apply the ranker

Now we're going to first get the average flight delay for each flight


In [28]:
zscore = lambda x: (x - x.mean()) / x.std()

In [29]:
df.groupby('DayOfWeek').transform(zscore)


Out[29]:
DepTime CRSDepTime ArrTime CRSArrTime FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay ... TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay TotalDelay
0 0.371092 0.385981 0.628003 0.632618 -1.110867 NaN 0.654662 0.606524 NaN -0.030753 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.118991
1 0.426897 0.430839 0.690721 0.673286 -1.124572 NaN 0.723576 0.638382 NaN 0.236233 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN 0.142587
2 0.423158 0.432551 0.645238 0.674042 -1.128920 NaN 0.465739 0.637131 NaN -0.845749 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN -0.573104
3 0.417945 0.431557 0.651190 0.674650 -1.126593 NaN 0.496688 0.638025 NaN -0.465561 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN -0.364275
4 0.410037 0.433709 0.700429 0.675495 -1.127322 NaN 0.894021 0.638267 NaN 0.325904 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN 0.037293
5 0.427105 0.433554 0.672448 0.675952 -1.127048 NaN 0.597031 0.636965 NaN -0.062388 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN -0.025152
6 0.481837 0.499775 0.723677 0.737666 -1.105561 NaN 0.592816 0.585611 NaN -0.232932 ... NaN -0.076129 NaN -0.052506 NaN NaN NaN NaN NaN -0.274223
7 0.576642 0.385981 0.833285 0.632618 -1.110867 NaN 0.700718 0.606524 NaN 2.903098 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN 2.903649
8 0.410150 0.430839 0.668534 0.673286 -1.124572 NaN 0.677558 0.638382 NaN -0.138925 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN -0.215463
9 0.425277 0.432551 0.684458 0.674042 -1.128920 NaN 0.745960 0.637131 NaN 0.245374 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN 0.046963
10 0.438900 0.431557 0.699869 0.674650 -1.126593 NaN 0.709727 0.638025 NaN 0.313153 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN 0.240523
11 0.428751 0.433709 0.686479 0.675495 -1.127322 NaN 0.651383 0.638267 NaN 0.099418 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN 0.072631
12 0.605069 0.433554 0.869392 0.675952 -1.127048 NaN 0.781100 0.636965 NaN 2.278184 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN 2.257262
13 0.486162 0.499775 0.730031 0.737666 -1.105561 NaN 0.607958 0.585611 NaN -0.086066 ... NaN -0.076129 NaN -0.052506 NaN NaN NaN NaN NaN -0.143664
14 0.366809 0.385981 NaN 0.632618 -1.110867 NaN NaN 0.606524 NaN NaN ... NaN -0.055545 NaN 18.856676 NaN NaN NaN NaN NaN NaN
15 0.408057 0.430839 0.692738 0.673286 -1.124572 NaN 0.876969 0.638382 NaN 0.270338 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN -0.008171
16 0.427396 0.432551 0.703036 0.674042 -1.128920 NaN 0.870503 0.637131 NaN 0.762222 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN 0.356997
17 0.415850 0.431557 0.663360 0.674650 -1.126593 NaN 0.603208 0.638025 NaN -0.270882 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN -0.275334
18 0.412116 0.433709 0.656586 0.675495 -1.127322 NaN 0.545229 0.638267 NaN -0.385907 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.333752
19 0.422918 0.433554 0.674479 0.675952 -1.127048 NaN 0.643048 0.636965 NaN -0.021325 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN -0.047529
20 0.481837 0.499775 0.700380 0.737666 -1.105561 NaN 0.426254 0.585611 NaN -0.771441 ... NaN -0.076129 NaN -0.052506 NaN NaN NaN NaN NaN -0.561453
21 0.377515 0.385981 0.628003 0.632618 -1.110867 NaN 0.608606 0.606524 NaN -0.030753 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.040140
22 0.565060 0.430839 0.702822 0.673286 -1.124572 NaN 0.416789 0.638382 NaN 0.440864 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN 0.745620
23 0.429516 0.432551 0.684458 0.674042 -1.128920 NaN 0.714824 0.637131 NaN 0.245374 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN 0.108970
24 1.061262 0.431557 1.352977 0.674650 -1.126593 NaN 1.090154 0.638025 NaN 6.867327 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN 6.982245
25 1.071288 0.433709 1.523483 0.675495 -1.127322 NaN 1.728088 0.638267 NaN 8.511735 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN 8.005933
26 1.633332 1.672168 1.452363 1.493111 -1.124886 NaN -1.101103 -1.072482 NaN -0.303329 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN -0.275334
27 1.632728 1.673452 1.469676 1.493639 -1.125617 NaN -0.956091 -1.075467 NaN 0.196484 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN 0.107969
28 1.628885 1.671176 1.464282 1.492411 -1.125342 NaN -0.998240 -1.075725 NaN -0.062388 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN -0.159412
29 1.642937 1.649011 1.465719 1.467068 -1.109226 NaN -1.034050 -1.087601 NaN 0.665415 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN 0.616956
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
426460 1.264525 1.276456 1.097413 1.104867 1.406848 NaN -1.085886 -1.072482 NaN 0.021135 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN 0.062641
426461 1.229323 1.278017 1.045195 1.105545 1.403551 NaN -1.244224 -1.075467 NaN -0.709458 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.545778
426462 1.237364 1.276417 1.064305 1.105116 1.404414 NaN -1.151631 -1.075725 NaN -0.555140 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN -0.472684
426463 1.210424 1.246148 1.026126 1.071239 1.324973 NaN -1.187569 -1.087601 NaN -0.677194 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.539532
426464 1.251852 1.276542 1.066339 1.103928 1.406225 NaN -1.200021 -1.073755 NaN -0.903177 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN -0.728121
426465 NaN 1.276456 NaN 1.104867 1.406848 NaN NaN -1.072482 NaN NaN ... NaN 10.875924 NaN -0.045182 NaN NaN NaN NaN NaN NaN
426466 1.245958 1.278017 1.077081 1.105545 1.403551 NaN -1.122905 -1.075467 NaN -0.191777 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.121726
426467 1.411141 1.276417 1.242975 1.105116 1.404414 NaN -1.074936 -1.075725 NaN 1.415868 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN 1.563587
426468 1.218988 1.246148 1.034420 1.071239 1.324973 NaN -1.187569 -1.087601 NaN -0.478289 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.329262
426469 1.208529 1.224580 1.041209 0.969188 1.403551 NaN -1.122905 -1.075467 NaN 0.034708 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN 0.125637
426470 1.208053 1.223071 1.048062 0.969040 1.404414 NaN -1.059596 -1.075725 NaN 0.142926 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN 0.153861
426471 1.150472 1.191707 0.893418 0.932164 1.324973 NaN -1.126162 -1.087601 NaN -0.627468 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.592100
426472 1.182606 1.220889 1.035619 0.966773 1.403493 NaN -0.963753 -1.075193 NaN -0.070714 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN -0.196619
426473 1.279404 1.223125 1.099367 0.967623 1.406225 NaN -1.153318 -1.073755 NaN 1.451353 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN 1.721144
426474 1.185656 1.224580 1.029252 0.969188 1.403551 NaN -1.047080 -1.075467 NaN -0.159422 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.174733
426475 0.210491 0.206964 0.099496 0.076544 1.406848 NaN -0.918498 -0.916981 NaN -0.043757 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN -0.026300
426476 0.208334 0.209273 0.102569 0.077621 1.403551 NaN -0.971256 -0.919673 NaN -0.159422 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.086389
426477 0.156976 0.157329 0.032893 0.022827 1.324973 NaN -0.987994 -0.933590 NaN -0.130205 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.040140
426478 0.217559 0.206636 0.127992 0.075830 1.403493 NaN -0.795020 -0.919414 NaN 0.440864 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN 0.349879
426479 0.213335 0.208198 0.098219 0.076394 1.406225 NaN -0.873096 -0.918220 NaN 0.302802 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN 0.232983
426480 0.332030 0.206964 0.136005 0.076544 1.406848 NaN -0.918498 -0.916981 NaN 0.540278 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN 0.614075
426481 0.210414 0.209273 0.096591 0.077621 1.403551 NaN -1.031916 -0.919673 NaN -0.256487 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.121726
426482 0.196799 0.209501 0.083649 0.079308 1.404414 NaN -0.936883 -0.920026 NaN -0.267701 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN -0.248918
426483 0.141988 0.157329 0.016305 0.022827 1.324973 NaN -1.003346 -0.933590 NaN -0.528015 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN -0.434397
426484 0.184065 0.206636 0.067484 0.075830 1.403493 NaN -1.009771 -0.919414 NaN -0.582293 ... NaN -0.085912 NaN -0.058296 NaN NaN NaN NaN NaN -0.516980
426485 0.200618 0.208198 0.077577 0.076394 1.406225 NaN -0.935368 -0.918220 NaN -0.271474 ... NaN -0.075105 NaN -0.039442 NaN NaN NaN NaN NaN -0.263070
426486 0.218873 0.206964 0.101524 0.076544 1.406848 NaN -0.964149 -0.916981 NaN -0.011311 ... NaN -0.091945 NaN -0.045182 NaN NaN NaN NaN NaN 0.062641
426487 0.200017 0.209273 0.092605 0.077621 1.403551 NaN -0.986421 -0.919673 NaN -0.321197 ... NaN -0.090548 NaN -0.061789 NaN NaN NaN NaN NaN -0.245408
426488 0.234486 0.209501 0.130347 0.079308 1.404414 NaN -0.860188 -0.920026 NaN 0.676740 ... NaN -0.067870 NaN -0.051885 NaN NaN NaN NaN NaN 0.668523
426489 0.159117 0.157329 0.049482 0.022827 1.324973 NaN -0.880531 -0.933590 NaN 0.267605 ... NaN -0.055545 NaN -0.053031 NaN NaN NaN NaN NaN 0.196415

426490 rows × 23 columns