Project 2

In this project, you will implement the exploratory analysis plan developed in Project 1. This will lay the groundwork for our our first modeling exercise in Project 3.

Step 1: Load the python libraries you will need for this project



In [1]:

    
#imports
from __future__ import division
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pylab as pl
%matplotlib inline
import seaborn as sns

Step 2: Read in your data set



In [2]:

    
#Read in data from source 
df_raw = pd.read_csv("../assets/admissions.csv")
print df_raw.head()









    



   admit    gre   gpa  prestige
0      0  380.0  3.61       3.0
1      1  660.0  3.67       3.0
2      1  800.0  4.00       1.0
3      1  640.0  3.19       4.0
4      0  520.0  2.93       4.0

Questions

Question 1. How many observations are in our dataset?



In [3]:

    
df_raw.count()









    Out[3]:





admit       400
gre         398
gpa         398
prestige    399
dtype: int64



In [4]:

    
df_raw.count().sum()









    Out[4]:





1595

Answer: There are 1595 observations in our data set

Question 2. Create a summary table



In [47]:

    
summary_stats_admissions = df_raw.describe()
summary_stats_admissions



In [154]:

    
# Compute quantiles of gre
gre_quantiles = pd.qcut(df_raw['gre'], 4)
gre_quantiles.value_counts().sort_index()









    Out[154]:





[220, 520]    122
(520, 580]     79
(580, 660]     98
(660, 800]     99
Name: gre, dtype: int64



In [155]:

    
# Compute quantiles of gpa
gpa_quantiles = pd.qcut(df_raw['gpa'], 4)
gpa_quantiles.value_counts().sort_index()









    Out[155]:





[2.26, 3.13]     103
(3.13, 3.395]     96
(3.395, 3.67]    102
(3.67, 4]         97
Name: gpa, dtype: int64



In [156]:

    
# What is the sample size distribution among quantiles of gre and gpa by prestige level?
df_raw.pivot_table(['gre'], ['admit', gre_quantiles], [gpa_quantiles, 'prestige'], aggfunc=[len])









    Out[156]:






  
    
      
      
      len
    
    
      
      
      gre
    
    
      
      gpa
      [2.26, 3.13]
      (3.13, 3.395]
      (3.395, 3.67]
      (3.67, 4]
    
    
      
      prestige
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
    
    
      admit
      gre
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      [220, 520]
      5.0
      12.0
      18.0
      12.0
      3.0
      6.0
      11.0
      5.0
      NaN
      8.0
      5.0
      3.0
      NaN
      1.0
      2.0
      3.0
    
    
      (520, 580]
      2.0
      9.0
      3.0
      1.0
      4.0
      4.0
      5.0
      5.0
      NaN
      6.0
      5.0
      4.0
      2.0
      3.0
      3.0
      2.0
    
    
      (580, 660]
      NaN
      5.0
      2.0
      3.0
      3.0
      6.0
      5.0
      3.0
      NaN
      9.0
      7.0
      2.0
      1.0
      4.0
      9.0
      1.0
    
    
      (660, 800]
      NaN
      6.0
      NaN
      2.0
      3.0
      4.0
      2.0
      4.0
      1.0
      4.0
      6.0
      2.0
      4.0
      8.0
      10.0
      3.0
    
    
      1
      [220, 520]
      2.0
      5.0
      1.0
      NaN
      NaN
      3.0
      1.0
      1.0
      2.0
      1.0
      3.0
      1.0
      2.0
      2.0
      2.0
      2.0
    
    
      (520, 580]
      1.0
      NaN
      2.0
      1.0
      2.0
      2.0
      1.0
      NaN
      2.0
      4.0
      NaN
      1.0
      1.0
      3.0
      1.0
      NaN
    
    
      (580, 660]
      NaN
      2.0
      1.0
      NaN
      1.0
      5.0
      2.0
      1.0
      4.0
      5.0
      4.0
      NaN
      3.0
      6.0
      2.0
      1.0
    
    
      (660, 800]
      2.0
      3.0
      1.0
      1.0
      1.0
      3.0
      NaN
      NaN
      4.0
      5.0
      1.0
      3.0
      6.0
      4.0
      6.0
      NaN



In [157]:

    
# What is the standard deviation distribution among quantiles of gre and gpa by prestige level?
df_raw.pivot_table(['gre'], ['admit', gre_quantiles], [gpa_quantiles, 'prestige'], aggfunc=[np.std])









    Out[157]:






  
    
      
      
      std
    
    
      
      
      gre
    
    
      
      gpa
      [2.26, 3.13]
      (3.13, 3.395]
      (3.395, 3.67]
      (3.67, 4]
    
    
      
      prestige
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
      1.0
      2.0
      3.0
      4.0
    
    
      admit
      gre
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      0
      [220, 520]
      53.665631
      47.097578
      87.372690
      67.149765
      80.000000
      43.204938
      70.478881
      46.043458
      NaN
      33.806170
      57.619441
      61.101009
      NaN
      NaN
      28.284271
      46.188022
    
    
      (520, 580]
      0.000000
      15.634719
      11.547005
      NaN
      19.148542
      20.000000
      10.954451
      14.142136
      NaN
      19.663842
      10.954451
      19.148542
      0.000000
      11.547005
      20.000000
      28.284271
    
    
      (580, 660]
      NaN
      14.142136
      14.142136
      11.547005
      30.550505
      24.221203
      22.803509
      30.550505
      NaN
      22.607767
      23.094011
      14.142136
      NaN
      16.329932
      17.638342
      NaN
    
    
      (660, 800]
      NaN
      54.772256
      NaN
      14.142136
      20.000000
      10.000000
      28.284271
      48.989795
      NaN
      50.000000
      56.213877
      14.142136
      64.031242
      54.248107
      31.902630
      30.550505
    
    
      1
      [220, 520]
      0.000000
      92.736185
      NaN
      NaN
      NaN
      61.101009
      NaN
      NaN
      14.142136
      NaN
      23.094011
      NaN
      0.000000
      0.000000
      42.426407
      28.284271
    
    
      (520, 580]
      NaN
      NaN
      14.142136
      NaN
      14.142136
      0.000000
      NaN
      NaN
      28.284271
      16.329932
      NaN
      NaN
      NaN
      0.000000
      NaN
      NaN
    
    
      (580, 660]
      NaN
      28.284271
      NaN
      NaN
      NaN
      21.908902
      14.142136
      NaN
      19.148542
      30.331502
      30.000000
      NaN
      11.547005
      16.329932
      0.000000
      NaN
    
    
      (660, 800]
      56.568542
      30.550505
      NaN
      NaN
      NaN
      52.915026
      NaN
      NaN
      52.599113
      49.799598
      NaN
      40.000000
      48.989795
      52.915026
      56.095157
      NaN

Question 3. Why would GRE have a larger STD than GPA?



In [9]:

    
# Inspect gre, gpa std
df_raw.std()[['gre', 'gpa']]









    Out[9]:





gre    115.628513
gpa      0.380630
dtype: float64

Answer: Because the GRE consists of three parts: quantitative reasoning, verbal reasoning, and analytical writing and because test takers each have different academic degrees, there is going to be larger variation across testing for knowledge in these areas. Specifically, I would expect skills in quantitative & verbal reasoning, and analytical writing to vary by academic institution, academic college, academic department, academic degree program, and degree program specialization. This is because most academic institutions, including their colleges, departments, and degree programs tend to put varied emphasis on having different quantitative, verbal, and writing skills.

e.g., Theatre arts majors, might not need to have a strong background in quantitative reasoning, so they may not have to take classes focusing on quantitative reasoning (unless they went to strong engineering college). Similary, a computer engineering major may not need to have a strong background in English literature, so the emphasis is not on building analytical writing skills (unless they went to a strong liberal arts college). Therefore, I would expect more variation in GRE scores due to the many different degree programs having a different focus on acquiring different skills, and in varying amounts.

Question 4. Drop data points with missing data



In [10]:

    
# Which columns have missing data?
df_raw.isnull().sum()









    Out[10]:





admit       0
gre         2
gpa         2
prestige    1
dtype: int64



In [11]:

    
# Which records are null?
df_raw[df_raw.isnull().any(axis=1)]



In [12]:

    
# What is shape of dataframe before dropping records?
shape_before_dropna = df_raw.shape
print(shape_before_dropna)



In [13]:

    
# Inspect shape before dropping missing values
shape_after_dropna = df_raw.dropna(how='any').shape
print(shape_after_dropna)



In [14]:

    
# Now, drop missing values
df_raw.dropna(how='any', inplace=True)

Question 5. Confirm that you dropped the correct data. How can you tell?

Answer: Before dropping missing values the dataframe shape was (400, 4). After dropping missing values the dataframe shape was (397, 3). The isnull() method showed that there were three records having any values missing in a row (axis=1).

Question 6. Create box plots for GRE and GPA



In [178]:

    
#boxplot 1
#df_raw.boxplot('gre')
sns.boxplot('gre', data=df_raw)
sns.plt.title('GRE: Box and Whiskers Plot')









    Out[178]:





<matplotlib.text.Text at 0x7fbb1ccd2e10>



In [179]:

    
#boxplot 2 
#df_raw.boxplot('gpa')
sns.boxplot('gpa', data=df_raw)
sns.plt.title('GPA: Box and Whiskers Plot')









    Out[179]:





<matplotlib.text.Text at 0x7fbb1c084e10>

Question 7. What do these plots show?

Answer: They show the data's spread, or how far from the center the data tend to range. Specifically, boxplots show the middle fifty percent of the data, and its range.

The idea is to divide the data into four equal groups and see how far apart the extreme groups are.The data is first divided into two equal high and low groups at the median, which is called the second quartile, or Q2. The median of the low group is called the first quartile or Q1. The median of the high group is the third quartile, or Q3. The box's ends are the quartiles Q1 and Q3 respectively. The box's midline is the quartile Q2, which is the median of the data. The interquartile range (IQR) is the distance between the box's ends: the distance between the third quartile and the first quartile, or Q3-Q1. These plots are especially good for showing off differences between the high and low groups, as well as outliers.

Question 8. Describe each distribution



In [20]:

    
# plot the distribution of each variable 
df_raw.plot(kind='density', subplots=True, layout=(2, 2), sharex=False)
plt.show()

The Admit distribtion is bimodal (has two modes, 0, and 1) as expected. Both the GRE distribution and GPA distribution are approximately symmetrical. The Prestige distribution is multimodal (has four modes, 1, 2, 3, 4) as expected.

Question 9. If our model had an assumption of a normal distribution would we meet that requirement?



In [121]:

    
# Test for normality using the Kolmogorov-Smirnov Test

# GRE normal?
print('GRE:      ', stats.kstest(df_raw.gre, 'norm'))
print('Kurtosis: ', df_raw.gre.kurt())
print('Skew:     ', df_raw.gre.skew())

print('~~~~~~~~~~~')

# GPA normal?
print('GPA      : ', stats.kstest(df_raw.gpa, 'norm'))
print('Kurtosis: ', df_raw.gpa.kurt())
print('Skew:     ', df_raw.gpa.skew())

print('~~~~~~~~~~~')

# Admit normal?
print('Admit:    ', stats.kstest(df_raw.admit, 'norm'))
print('Kurtosis: ', df_raw.admit.kurt())
print('Skew:     ', df_raw.admit.skew())

print('~~~~~~~~~~~')

# Prestige normal?
print('Prestige: ', stats.kstest(df_raw.prestige, 'norm'))
print('Kurtosis: ', df_raw.prestige.kurt())
print('Skew:     ', df_raw.prestige.skew())









    



('GRE:      ', KstestResult(statistic=1.0, pvalue=0.0))
('Kurtosis: ', -0.33286435465143427)
('Skew:     ', -0.146046988215597)
~~~~~~~~~~~
('GPA      : ', KstestResult(statistic=0.98972085476178895, pvalue=0.0))
('Kurtosis: ', -0.56356989952216807)
('Skew:     ', -0.21688893296924305)
~~~~~~~~~~~
('Admit:    ', KstestResult(statistic=0.5, pvalue=0.0))
('Kurtosis: ', -1.3865881769308692)
('Skew:     ', 0.7876691478505351)
~~~~~~~~~~~
('Prestige: ', KstestResult(statistic=0.84134474606854293, pvalue=0.0))
('Kurtosis: ', -0.90103795489017591)
('Skew:     ', 0.086505552897055041)

Answer: No. We would not meet that requirement. Because according to the Kolmogorov-Smirnov test, there is zero percent chance that the test statistic values of D we observed for GRE, GPA, Admit, and Prestige respectively (1.0, 0.9897, 0.5, and 0.8413) could have arisen if the data had been drawn from a normal distribution. We therefore reject the hypothesis at the 95% confidence level that the data were drawn from a normal distribution and conclude that the data is not normally distributed.

Question 10. Does this distribution need correction? If so, why? How?

Answer: Yes, it needs correction. It needs correction because the distributions are not normal. They are both left-skewed and leptokurtic. I plan to remove outliers and log transform the data.



In [49]:

    
# GRE IQR
q3_gre = summary_stats_admissions.gre['75%']
q1_gre = summary_stats_admissions.gre['25%']
iqr_gre = q3_gre - q1_gre
low_fence_gre = q1_gre - 1.5*iqr_gre
high_fence_gre = q3_gre + 1.5*iqr_gre
print("GRE IQR: ", iqr_gre)
print("GRE low fence: ", low_fence_gre)
print("GRE high fence: ", high_fence_gre)









    



('GRE IQR: ', 140.0)
('GRE low fence: ', 310.0)
('GRE high fence: ', 870.0)



In [126]:

    
# Find GRE outliers
print('Number of outliers: ', df_raw[(df_raw.gre < low_fence_gre) | (df_raw.gre > high_fence_gre)].shape[0])
print('These are the outliers: ')
df_raw[(df_raw.gre < low_fence_gre) | (df_raw.gre > high_fence_gre)]









    



('Number of outliers: ', 4)
These are the outliers: 






    Out[126]:






  
    
      
      admit
      gre
      gpa
      prestige
    
  
  
    
      71
      0
      300.0
      2.92
      4.0
    
    
      179
      0
      300.0
      3.01
      3.0
    
    
      304
      0
      220.0
      2.83
      3.0
    
    
      315
      1
      300.0
      2.84
      2.0



In [127]:

    
# Remove GRE outliers
print('Shape before outlier removal is: ', df_raw.shape)
df = df_raw[(df_raw.gre >= low_fence_gre) & (df_raw.gre <= high_fence_gre)]
print('Shape after outlier removal is: ', df.shape)









    



('Shape before outlier removal is: ', (397, 4))
('Shape after outlier removal is: ', (393, 4))



In [140]:

    
# Plot to visually inspect distribution, still looks skewed
df.gre.plot.density()
plt.title('GRE density')
plt.show()



In [50]:

    
# GPA IQR
q3_gpa = summary_stats_admissions.gpa['75%']
q1_gpa = summary_stats_admissions.gpa['25%']
iqr_gpa = q3_gpa - q1_gpa
low_fence_gpa = q1_gpa - 1.5*iqr_gpa
high_fence_gpa = q3_gpa + 1.5*iqr_gpa
print("GPA IQR: ", round(iqr_gpa, 1))
print("GPA low fence: ", round(low_fence_gpa, 1))
print("GPA high fence: ", round(high_fence_gpa, 1))









    



('GPA IQR: ', 0.5)
('GPA low fence: ', 2.3)
('GPA high fence: ', 4.5)



In [129]:

    
# Now, find GPA Outliers
print('Number of outliers: ', df[(df.gpa < low_fence_gpa) | (df.gpa > high_fence_gpa)].shape[0])
print('These are the outliers: ')
df[(df.gpa < low_fence_gpa) | (df.gpa > high_fence_gpa)]









    



('Number of outliers: ', 1)
These are the outliers: 






    Out[129]:






  
    
      
      admit
      gre
      gpa
      prestige
    
  
  
    
      289
      0
      420.0
      2.26
      4.0



In [130]:

    
print('Shape before outlier removal is: ', df.shape)
df = df[(df.gpa >= low_fence_gpa) & (df.gpa <= high_fence_gpa)]
print('Shape after outlier removal is: ', df.shape)









    



('Shape before outlier removal is: ', (393, 4))
('Shape after outlier removal is: ', (392, 4))



In [142]:

    
# Plot to visually inspect distribution, still looks skewed!
df.gpa.plot.density()
plt.title('GPA density')
plt.show()



In [186]:

    
# Removed outliers: re-test for normality using the Kolmogorov-Smirnov Test

# Observation: skew got better, kurtosis got worse!

# GRE
print('GRE:      ', stats.kstest(df.gre, 'norm'))
print('Kurtosis: ', df.gre.kurt())
print('Skew:     ', df.gre.skew())

print('~~~~~~~~~~~')

# GPA
print('GPA      : ', stats.kstest(df.gpa, 'norm'))
print('Kurtosis: ', df.gpa.kurt())
print('Skew:     ', df.gpa.skew())









    



('GRE:      ', KstestResult(statistic=1.0, pvalue=0.0))
('Kurtosis: ', -0.55667361627723366)
('Skew:     ', -0.028408720277087456)
~~~~~~~~~~~
('GPA      : ', KstestResult(statistic=0.99223974644944635, pvalue=0.0))
('Kurtosis: ', -0.62390073209492058)
('Skew:     ', -0.19724629618780309)



In [168]:

    
# Transform GRE distribution to standard normal 
sns.distplot( (df.gre - df.gre.mean()) / df.gre.std(), bins=5, kde_kws={'bw':1} )
sns.plt.title('GRE to Standard Normal')
sns.plt.show()



In [170]:

    
# Transform GPA distribution to standard normal 
sns.distplot( (df.gpa - df.gpa.mean()) / df.gpa.std(), bins=10, kde_kws={'bw':1} )
sns.plt.title('GPA to Standard Normal')
sns.plt.show()



In [185]:

    
# Log transform the data: re-test for normality using the Kolmogorov-Smirnov Test

# Observation: Skew got worse, Kurtosis got better

# GRE
print('GRE:      ', stats.kstest(np.log(df.gre), 'norm'))
print('Kurtosis: ', np.log(df.gre).kurt())
print('Skew:     ', np.log(df.gre).skew())

print('~~~~~~~~~~~')

# GPA
print('GPA      : ', stats.kstest(np.log(df.gpa), 'norm'))
print('Kurtosis: ', np.log(df.gpa).kurt())
print('Skew:     ', np.log(df.gpa).skew())









    



('GRE:      ', KstestResult(statistic=0.99999999721106625, pvalue=0.0))
('Kurtosis: ', -0.18677728430420748)
('Skew:     ', -0.47287836787183568)
~~~~~~~~~~~
('GPA      : ', KstestResult(statistic=0.81696385197730359, pvalue=0.0))
('Kurtosis: ', -0.31189663172165183)
('Skew:     ', -0.43724961849141997)

Answer: I don't know how to correct for the skewness and kurotis inherent in this data set.

But here's what I found:

After removing outliers, the skew got better, but the kurtosis got worse!
After removing outliers and log transforming the data, the skew got worse, but the kurtosis got better!
One way to normalize the data is to subtract the mean, and divide by the standard deviation. This puts the data on the standard normal scale.

Question 11. Which of our variables are potentially colinear?

Answer: GPA and GRE are potentially collinear. i.e., They are moderately positively correlated.



In [28]:

    
# create a correlation matrix for the data
df_raw.corr()



In [29]:

    
sns.heatmap(df_raw.corr(), annot=True, cmap='RdBu')









    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f104f4c9550>



In [30]:

    
pd.scatter_matrix(df_raw)
plt.show()

Question 12. What did you find?

Answer: GPA and GRE are potentially collinear. i.e., They are moderately positively correlated.

Question 13. Write an analysis plan for exploring the association between grad school admissions rates and prestige of undergraduate schools.

Answer: First, inspect the spread of the data both graphically and in tabular form. Look at the counts in each of the factor-level combinations to see how each is represented. Generate summary statistics on each predictor to quality check the count, variance, std, quartiles, and number of Null values. Identify null values if any and remove them or impute values. Create both boxplots, histograms, and density plots to inspect the shape of the distributions. Check for outliers. Identify which data instances are outliers and remove them. Check the data distributions for normality by computing the kstest. At the same time evalute distribution skew, and kurtosis. If necessary, log transform the data to to help transform a non-normally distributed data distribution to a normal one (This won't always work, but its worth trying!). Finally, last but not least, rescale your data if necessary to the standard normal and then graph it to observe how your data fits the standard normal distribution.

Question 14. What is your hypothesis?

Answer: We hypothesize that GRE, GPA, and prestige may be used to predict if a student will be admitted to graduate school.

Bonus/Advanced

1. Bonus: Explore alternatives to dropping obervations with missing data

2. Bonus: Log transform the skewed data

3. Advanced: Impute missing data

Performance Evaluation

Mark boxes with an 'X'

Requirements	Meets Expectations (2)	Exceeds Expectations (3)
Read in your dataset, determine how many samples are present, and ID any missing data	X
Create a table of descriptive statistics for each of the variables (n, mean, median, standard deviation)		X
Describe the distributions of your data		X
Plot box plots for each variable	X
Create a covariance matrix	X
Determine any issues or limitations, based on your exploratory analysis	X
Outline exploratory analysis methods		X

Notes:

"Samples" means rows of data, not individual cells
Great summary tables. Make sure you can explain them, this quickly becomes a lot of data in itself!
Excellent analysis of the distributions! You show a strong grasp of statistical terms and concepts, and a propensity to rely on tests rather than "eyeballing" it - which would often lead you astray. Great work!
I'd like to see more explanation of your box plots. You define them generically, but what about for our dataset.
Your covariance matrix is well done. It can be good to include an explanation of your cutoff point for deciding if things are colinear or not.
You do a good job of attempting to correct for outliers and awkward distributions. One thing to include might be an explanation of what this is doing to your model. You can't always correct data, but you can note the caveats it brings with it.
Solid and clear exploratory analysis plan.

Overall, excellent work. I'm very impressed with your knowledge of statistics and your application in this context. Remember to keep your graphs organized and concise - don't include more than you need or the audience can get overwhelmed. Also always include an explanation with your visuals. Excellent job, you're very much on the right track.

Score:

Based on the requirements, you can earn a maximum of 21 points on this project.

Your total score is: 16



In [ ]:

	admit	gre	gpa	prestige
count	397.000000	397.000000	397.000000	397.000000
mean	0.317380	587.858942	3.392242	2.488665
std	0.466044	115.717787	0.380208	0.947083
min	0.000000	220.000000	2.260000	1.000000
25%	0.000000	520.000000	3.130000	2.000000
50%	0.000000	580.000000	3.400000	2.000000
75%	1.000000	660.000000	3.670000	3.000000
max	1.000000	800.000000	4.000000	4.000000

	admit	gre	gpa	prestige
admit	1.000000	0.181202	0.174116	-0.243563
gre	0.181202	1.000000	0.382408	-0.124533
gpa	0.174116	0.382408	1.000000	-0.060976
prestige	-0.243563	-0.124533	-0.060976	1.000000

		len
		gre
	gpa	[2.26, 3.13]				(3.13, 3.395]				(3.395, 3.67]				(3.67, 4]
	prestige	1.0	2.0	3.0	4.0	1.0	2.0	3.0	4.0	1.0	2.0	3.0	4.0	1.0	2.0	3.0	4.0
admit	gre
0	[220, 520]	5.0	12.0	18.0	12.0	3.0	6.0	11.0	5.0	NaN	8.0	5.0	3.0	NaN	1.0	2.0	3.0
	(520, 580]	2.0	9.0	3.0	1.0	4.0	4.0	5.0	5.0	NaN	6.0	5.0	4.0	2.0	3.0	3.0	2.0
	(580, 660]	NaN	5.0	2.0	3.0	3.0	6.0	5.0	3.0	NaN	9.0	7.0	2.0	1.0	4.0	9.0	1.0
	(660, 800]	NaN	6.0	NaN	2.0	3.0	4.0	2.0	4.0	1.0	4.0	6.0	2.0	4.0	8.0	10.0	3.0
1	[220, 520]	2.0	5.0	1.0	NaN	NaN	3.0	1.0	1.0	2.0	1.0	3.0	1.0	2.0	2.0	2.0	2.0
	(520, 580]	1.0	NaN	2.0	1.0	2.0	2.0	1.0	NaN	2.0	4.0	NaN	1.0	1.0	3.0	1.0	NaN
	(580, 660]	NaN	2.0	1.0	NaN	1.0	5.0	2.0	1.0	4.0	5.0	4.0	NaN	3.0	6.0	2.0	1.0
	(660, 800]	2.0	3.0	1.0	1.0	1.0	3.0	NaN	NaN	4.0	5.0	1.0	3.0	6.0	4.0	6.0	NaN

	admit	gre	gpa	prestige
71	0	300.0	2.92	4.0
179	0	300.0	3.01	3.0
304	0	220.0	2.83	3.0
315	1	300.0	2.84	2.0