Week 12 - Pandas II

Today's Agenda

  • Useful functions when using Pandas

Review

See Week 8 for a review on Pandas

RESOLVE Dataset

Today we'll be using a dataset by the Resolve Survey. RESOLVE is a volume-limited census of stellar, gas, and dynamical mass as well as star formation and merging within >50,000 cubic Mpc of the nearby cosmic web, reaching down to the dwarf galaxy regime and up to structures on tens of Mpc scales such as filaments, walls, and voids.


In [1]:
# Importing modules
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_context("notebook")

First, we need to read in the data from RESOLVE.


In [2]:
## Reading in data
# Column Names
colnames = ['RA','DEC','CZobs','Mr','HaloID','logMHalo',
            'NGalH','FLAG','CZreal','DISTC','Vptot',
            'Vptang','Morph','logMstar','rmag','umag',
            'FSMGR','GalMatchFlag','ur_col','MhiMass','GroupID',
            'GroupNGals','GroupRproj','GroupCZdisp',
            'GroupLogMHalo','GroupGalType']

# We use `pd.read_csv` to read CSV files
# The argument in `sep`, i.e. "\s+" tells pandas to read white spaces
RS = pd.read_csv('./data/Resolve_Catalogue.dat',
                      sep='\s+', 
                      skiprows=2, 
                      names=colnames)

In [3]:
RS.head()


Out[3]:
RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal DISTC ... FSMGR GalMatchFlag ur_col MhiMass GroupID GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType
0 222.361029 13.152031 7462.85561 -21.264 198732 15.141 469 0 9255.587090 1.098726 ... 0.200007 ECO 2.091679 2.041738e+09 0 1 0.000000 0.00 12.225 1
1 222.808069 12.397480 7396.39554 -22.864 198732 15.141 469 0 9313.492255 1.300715 ... 0.089128 ECO 2.181833 8.912519e+09 1 1 0.000000 0.00 13.172 1
2 222.219307 12.979642 6653.37177 -19.477 198732 15.141 469 0 9293.469960 0.802355 ... 0.133072 ECO 2.220791 3.548133e+08 2 18 0.240688 277.11 13.621 0
3 222.153140 13.081357 5945.69782 -20.666 198732 15.141 469 0 9292.222436 1.018345 ... 0.372099 ECO 1.339117 8.511383e+09 3 1 0.000000 0.00 11.989 1
4 222.540987 12.821932 6795.38705 -21.867 198732 15.141 469 0 9288.730044 0.404541 ... 0.368589 ECO 1.815042 2.137960e+09 4 5 0.154728 235.29 13.138 1

5 rows × 26 columns


In [4]:
RS.shape


Out[4]:
(5080, 26)

Let's show some plots for this dataset


In [5]:
sns.jointplot("logMHalo","logMstar",data=RS, color='green')


Out[5]:
<seaborn.axisgrid.JointGrid at 0x10f1d9690>

In [6]:
g = sns.lmplot(x="logMHalo", y="logMstar", hue="FLAG", data=RS, size=8)



In [7]:
sns.lmplot(x="logMHalo", y="GroupLogMHalo", hue="FLAG", data=RS, size=8)


Out[7]:
<seaborn.axisgrid.FacetGrid at 0x10fff6050>

Now that we have looked at some of the data, calculating some statistics for it. Let's first create a subsample of the dataset


In [31]:
RS1 = RS.loc[ (RS.logMHalo >= 11) & (RS.logMHalo <= 12)]
print(RS1.shape)

# Resetting Indices
RS1.reset_index(inplace=True)

RS1.head()


(2730, 26)
Out[31]:
index RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal ... FSMGR GalMatchFlag ur_col MhiMass GroupID GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType
0 32 134.634782 -0.513661 2560.74945 -19.378 281739 11.606 1 1 2389.978581 ... 0.462300 ECO 1.659651 2.570394e+09 11 1 0.000000 0.00 11.512 1
1 36 135.134024 46.477611 3658.44177 -20.172 282776 11.675 1 1 4155.831904 ... 1.293450 ECO 1.210722 9.332556e+09 14 1 0.000000 0.00 11.801 1
2 37 135.812900 46.573182 3503.62320 -18.793 282779 11.360 1 1 4126.863319 ... 0.752197 ECO 1.395662 2.951213e+09 15 2 0.090258 73.24 12.077 0
3 38 135.791237 46.371980 3608.43558 -20.721 282781 11.892 1 1 4136.444575 ... 0.514906 ECO 1.762735 2.454707e+09 15 2 0.090258 73.24 12.077 1
4 39 141.414228 47.734262 3592.98311 -19.252 283291 11.806 3 0 4055.758985 ... 1.687002 ECO 1.203329 3.548138e+09 16 3 0.054441 14.31 12.221 0

5 rows × 27 columns

Map

You can use the unique function to map some function or apply some mask to your data.

Let's say I want to convert RA and DEC to radians.


In [54]:
RS1.loc[:,'RArad' ] = RS1.RA.map( lambda x: np.radians(x))
RS1.loc[:,'DECrad'] = RS1.DEC.map(lambda x: np.radians(x))
RS1[['RA','RArad','DEC','DECrad']].head()


Out[54]:
RA RArad DEC DECrad
0 134.634782 2.349820 -0.513661 -0.008965
1 135.134024 2.358534 46.477611 0.811187
2 135.812900 2.370382 46.573182 0.812855
3 135.791237 2.370004 46.371980 0.809344
4 141.414228 2.468144 47.734262 0.833120

apply

The apply is a function that applies a function along axis of the DataFrame


In [58]:
RS1.loc[:,'RArad'].apply(np.degrees)
RS1['RArad'].head()


Out[58]:
0    2.349820
1    2.358534
2    2.370382
3    2.370004
4    2.468144
Name: RArad, dtype: float64

Unique

The unique function will return unique entries for a specified column. This is analogous to the np.unique function from NumPy.


In [10]:
# Obtaining unique list of `HaloID`
print(RS1['HaloID'].shape)
RS1['HaloID'].head()


(2730,)
Out[10]:
0    281739
1    282776
2    282779
3    282781
4    283291
Name: HaloID, dtype: int64

Number of unique elements in HaloID


In [11]:
HaloID_unq = RS1.HaloID.unique()
HaloID_unq.shape


Out[11]:
(2405,)

In [12]:
HaloID_unq


Out[12]:
array([ 281739,  282776,  282779, ..., 1769274, 1769811, 1769883])

In-place Plotting

You can also plot distributions, x-y plots, etc using Pandas. Let's plot the distribution of logMstar.

See more here: http://pandas.pydata.org/pandas-docs/stable/visualization.html

Group by

This is a really useful function in Pandas. It allows you to group your data set into groups, and returns them as dictionaries


In [20]:
RS1.head()


Out[20]:
index RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal ... ur_col MhiMass GroupID GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType RArad DECrad
0 32 134.634782 -0.513661 2560.74945 -19.378 281739 11.606 1 1 2389.978581 ... 1.659651 2.570394e+09 11 1 0.000000 0.00 11.512 1 2.349820 -0.008965
1 36 135.134024 46.477611 3658.44177 -20.172 282776 11.675 1 1 4155.831904 ... 1.210722 9.332556e+09 14 1 0.000000 0.00 11.801 1 2.358534 0.811187
2 37 135.812900 46.573182 3503.62320 -18.793 282779 11.360 1 1 4126.863319 ... 1.395662 2.951213e+09 15 2 0.090258 73.24 12.077 0 2.370382 0.812855
3 38 135.791237 46.371980 3608.43558 -20.721 282781 11.892 1 1 4136.444575 ... 1.762735 2.454707e+09 15 2 0.090258 73.24 12.077 1 2.370004 0.809344
4 39 141.414228 47.734262 3592.98311 -19.252 283291 11.806 3 0 4055.758985 ... 1.203329 3.548138e+09 16 3 0.054441 14.31 12.221 0 2.468144 0.833120

5 rows × 29 columns

Let's group data by the GroupID


In [25]:
Groups_dict = RS1.groupby('GroupID')
Groups_dict


Out[25]:
<pandas.core.groupby.DataFrameGroupBy object at 0x111141a90>

You can calcualte the sum of all columns, even though sometimes it would not make sense to do it


In [27]:
Groups_dict.sum()


Out[27]:
index RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal ... FSMGR ur_col MhiMass GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType RArad DECrad
GroupID
2 1376 221.840954 13.170086 6471.97371 -20.334 329126 11.902 1 1 6718.518758 ... 0.140674 1.980040 1.412539e+09 18 0.240688 277.11 13.621 0 3.871855 0.229861
11 32 134.634782 -0.513661 2560.74945 -19.378 281739 11.606 1 1 2389.978581 ... 0.462300 1.659651 2.570394e+09 1 0.000000 0.00 11.512 1 2.349820 -0.008965
14 36 135.134024 46.477611 3658.44177 -20.172 282776 11.675 1 1 4155.831904 ... 1.293450 1.210722 9.332556e+09 1 0.000000 0.00 11.801 1 2.358534 0.811187
15 75 271.604137 92.945162 7112.05878 -39.514 565560 23.252 2 2 8263.307894 ... 1.267103 3.158397 5.405920e+09 4 0.180516 146.48 24.154 1 4.740386 1.622199
16 120 424.030674 142.999764 10817.20762 -59.562 849873 35.418 9 1 12174.507866 ... 3.244401 5.414899 8.241571e+09 9 0.163323 42.93 36.663 1 7.400731 2.495817
17 85 313.547838 32.546649 5204.49813 -39.424 567438 23.460 4 1 5596.463143 ... 6.690735 3.393892 1.044847e+10 4 0.002866 71.72 23.856 1 5.472442 0.568046
18 89 281.848772 91.648853 7269.08157 -39.732 567674 23.530 4 1 8135.351431 ... 0.493270 3.312586 1.028967e+10 4 0.134670 99.96 24.118 1 4.919189 1.599574
19 46 157.586636 48.572398 3573.40651 -19.150 284054 11.680 1 1 3967.525066 ... 0.814564 1.449894 2.454705e+09 1 0.000000 0.00 11.429 1 2.750406 0.847748
20 47 140.765926 1.976329 2868.57700 -19.408 284061 11.420 1 1 2872.762118 ... 0.122841 2.304186 3.890450e+08 1 0.000000 0.00 11.529 1 2.456829 0.034493
21 48 138.073734 2.411931 2778.31482 -19.504 284291 11.582 1 1 2835.919200 ... 0.306532 1.811672 1.621812e+09 1 0.000000 0.00 11.566 1 2.409841 0.042096
22 99 300.873734 90.025563 7178.04751 -37.705 568728 23.054 4 1 7894.825551 ... 0.831315 3.768253 1.798101e+09 4 0.014326 14.30 23.200 1 5.251237 1.571242
23 51 143.089444 46.579813 3592.03982 -18.780 284394 11.561 1 1 4071.099481 ... 0.598305 1.689678 7.762471e+08 1 0.000000 0.00 11.194 1 2.497382 0.812971
24 105 268.856882 47.360939 5160.58882 -38.200 568894 23.104 4 1 5546.536342 ... 0.912425 3.118435 3.519127e+09 4 0.117478 19.98 23.374 1 4.692438 0.826604
25 54 157.348751 41.615085 3574.88872 -19.603 284582 11.493 1 1 3837.559625 ... 0.183973 1.952976 1.023292e+09 1 0.000000 0.00 11.599 1 2.746254 0.726320
26 55 162.693188 1.961879 2597.58034 -19.294 284615 11.479 1 1 2601.087229 ... 0.226535 2.245443 2.691535e+08 1 0.000000 0.00 11.486 1 2.839532 0.034241
27 56 157.460001 49.488156 3591.90825 -20.171 284695 11.456 1 1 3976.642566 ... 1.150765 1.377150 6.760832e+09 1 0.000000 0.00 11.800 1 2.748195 0.863731
28 57 154.988872 34.509459 2571.39141 -19.508 284863 11.437 1 1 2858.866877 ... 0.415138 1.852596 9.332558e+08 1 0.000000 0.00 11.569 1 2.705066 0.602304
29 58 141.940558 17.033201 2650.67690 -18.800 284912 11.390 1 1 2796.632604 ... 0.153223 2.180119 1.949847e+08 1 0.000000 0.00 11.246 1 2.477330 0.297285
30 59 142.293349 44.636027 3536.51570 -19.459 285027 11.421 1 1 4011.339702 ... 0.197674 1.325869 5.370314e+09 1 0.000000 0.00 11.549 1 2.483487 0.779046
31 121 421.183035 0.429715 7131.96169 -39.266 603912 22.952 4 1 6586.841058 ... 2.385056 2.584658 1.026811e+10 4 0.051576 36.00 23.834 1 7.351031 0.007500
34 68 212.282651 2.338598 3085.75016 -19.072 302035 11.253 1 1 3385.519790 ... 1.063081 1.290844 5.370319e+09 1 0.000000 0.00 11.399 1 3.705031 0.040816
35 69 211.798464 3.995041 2810.03796 -19.208 302069 11.613 1 1 3248.862125 ... 0.190598 1.381033 4.677355e+09 1 0.000000 0.00 11.453 1 3.696581 0.069727
36 673 212.174133 5.381462 3114.65284 -19.228 307184 11.380 1 1 2937.728600 ... 0.394274 1.700705 2.238722e+09 17 0.236390 257.23 13.859 0 3.703137 0.093924
38 86 211.087234 3.402649 3375.15985 -18.947 302148 11.774 1 1 3271.091928 ... 5.917758 1.092800 5.370311e+09 1 0.000000 0.00 11.346 1 3.684167 0.059387
39 176 422.701455 7.998941 7017.20687 -39.768 604466 23.095 2 2 6580.283489 ... 0.149476 4.821240 1.677295e+09 6 0.272206 378.68 24.390 1 7.377532 0.139608
40 88 212.197331 2.983055 3377.05188 -20.878 302274 11.990 1 1 3439.729113 ... 0.157861 2.236076 1.288250e+09 1 0.000000 0.00 12.076 1 3.703542 0.052064
41 4686 1069.342879 22.153717 22868.84505 -97.327 1613918 58.577 7 4 24817.288333 ... 9.706564 8.588776 1.182789e+10 650 1.840975 3534.55 71.750 0 18.663554 0.386655
47 136 218.110631 -0.474472 3298.73044 -20.337 302654 11.881 1 1 3713.675833 ... 1.560939 1.238695 5.888431e+09 2 0.020057 102.47 12.154 1 3.806749 -0.008281
48 137 217.408884 1.073528 3689.52809 -19.039 302686 11.258 1 1 3791.755414 ... 0.390536 1.508501 2.951212e+09 1 0.000000 0.00 11.386 1 3.794501 0.018737
50 302 437.596073 0.876629 7124.28258 -39.600 605431 23.455 2 2 7557.338338 ... 1.125030 3.078099 6.399664e+09 4 0.126074 263.94 23.916 1 7.637492 0.015300
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3021 5042 181.540914 38.772784 7455.67089 -19.343 1747151 11.312 1 1 7478.296457 ... 0.659876 1.286974 3.388442e+09 1 0.000000 0.00 11.500 1 3.168487 0.676713
3022 5043 196.480535 39.110130 6284.05072 -19.134 1747163 11.338 1 1 6383.299571 ... 0.204553 2.113695 4.897794e+08 1 0.000000 0.00 11.425 1 3.429232 0.682601
3023 5044 180.138126 43.598741 5160.45686 -19.252 1747165 11.324 1 1 5396.837248 ... 0.053944 2.153711 3.235936e+08 1 0.000000 0.00 11.471 1 3.144003 0.760942
3024 5045 170.442172 28.923516 6073.01371 -19.454 1747185 11.226 1 1 6211.177928 ... 0.906933 1.294022 4.897787e+09 1 0.000000 0.00 11.545 1 2.974777 0.504811
3025 5046 189.293884 26.504427 7278.13218 -19.227 1747192 11.354 1 1 7204.081386 ... 1.955318 1.193350 4.265791e+09 1 0.000000 0.00 11.463 1 3.303802 0.462590
3026 5047 194.256326 31.051162 6777.01163 -19.015 1747195 11.320 1 1 6826.537972 ... 2.169908 1.620665 1.778280e+09 1 0.000000 0.00 11.377 1 3.390412 0.541945
3027 5048 186.561296 30.761089 5891.61225 -18.909 1747211 11.323 1 1 5926.337095 ... 0.595122 1.330795 1.949845e+09 1 0.000000 0.00 11.326 1 3.256109 0.536882
3028 5050 178.617072 46.740076 6716.89863 -19.467 1747267 11.306 1 1 6991.574758 ... 1.463564 1.177580 5.128619e+09 1 0.000000 0.00 11.551 1 3.117456 0.815768
3029 5051 179.935088 34.079822 6448.46123 -19.023 1747268 11.344 1 1 6536.768504 ... 0.795007 1.466522 2.041740e+09 1 0.000000 0.00 11.381 1 3.140460 0.594805
3030 5052 191.954131 34.376651 5983.22710 -18.915 1747272 11.257 1 1 5975.862182 ... 0.136626 1.296644 2.951211e+09 1 0.000000 0.00 11.331 1 3.350232 0.599986
3031 5053 188.798977 35.992546 7252.91671 -19.121 1747320 11.327 1 1 7239.302066 ... 0.218624 1.832817 2.137965e+09 1 0.000000 0.00 11.422 1 3.295164 0.628188
3032 5054 197.703967 31.057696 6184.44301 -18.883 1747322 11.311 1 1 6172.350613 ... 0.052827 1.703667 1.288250e+09 1 0.000000 0.00 11.313 1 3.450585 0.542059
3033 5055 198.738687 47.101130 6878.82904 -18.779 1747380 11.305 1 1 7003.185527 ... 1.084265 1.368940 1.412536e+09 1 0.000000 0.00 11.186 1 3.468644 0.822070
3034 5056 177.805632 38.976741 7319.89246 -19.115 1747385 11.242 1 1 7407.814068 ... 0.661404 1.217119 4.073803e+09 1 0.000000 0.00 11.420 1 3.103294 0.680272
3035 5057 205.518590 28.272752 6852.72107 -18.975 1747390 11.288 1 1 6795.855810 ... 0.266980 2.240971 2.137960e+08 1 0.000000 0.00 11.360 1 3.586976 0.493453
3036 5058 183.424051 34.075996 6469.06771 -18.903 1747409 11.293 1 1 6571.591618 ... 1.639086 1.229792 3.090297e+09 1 0.000000 0.00 11.323 1 3.201354 0.594738
3037 5059 189.737700 35.372671 7386.18756 -19.000 1747616 11.249 1 1 7353.243883 ... 0.071320 2.101870 4.897794e+08 1 0.000000 0.00 11.369 1 3.311548 0.617370
3038 5060 194.535351 39.521058 7106.22670 -19.036 1747646 11.236 1 1 7120.347680 ... 0.213695 1.622625 9.772379e+08 1 0.000000 0.00 11.386 1 3.395282 0.689773
3039 5061 189.455632 40.706769 6535.03388 -18.880 1747659 11.226 1 1 6697.603189 ... 2.280949 1.126925 2.691540e+09 1 0.000000 0.00 11.312 1 3.306625 0.710467
3040 5062 188.822804 35.106492 5989.50850 -18.798 1747701 11.229 1 1 6021.227993 ... 0.540425 1.716230 7.079457e+08 1 0.000000 0.00 11.238 1 3.295580 0.612724
3041 5063 178.790815 45.958268 7043.30642 -18.913 1747762 11.250 1 1 7266.875328 ... 0.335774 1.780758 7.413110e+08 1 0.000000 0.00 11.330 1 3.120488 0.802123
3042 5064 206.961728 29.870739 6938.13588 -19.294 1766927 11.491 1 1 6937.666670 ... 0.093860 2.263285 3.388445e+08 4 0.098854 116.12 12.799 0 3.612164 0.521343
3044 10139 415.800925 56.850251 13965.42429 -40.212 3536000 23.410 2 2 13936.910932 ... 0.550681 4.273449 2.721539e+09 4 0.151862 63.34 24.362 1 7.257095 0.992224
3045 10143 420.583605 62.950970 14897.36483 -39.325 3536502 23.227 2 2 14825.958141 ... 2.328639 3.173143 6.170059e+09 4 0.272206 34.76 23.792 1 7.340569 1.098702
3046 5073 208.252977 33.128485 7400.77434 -18.877 1768410 11.604 2 0 7487.473602 ... 5.218676 1.155014 2.454709e+09 1 0.000000 0.00 11.311 1 3.634700 0.578201
3047 5074 211.572491 24.948825 7240.41820 -19.068 1768843 11.624 1 1 7263.552369 ... 2.149163 0.977274 4.897791e+09 1 0.000000 0.00 11.397 1 3.692637 0.435439
3048 5075 208.408550 30.109803 6979.92873 -19.789 1769134 11.625 1 1 7024.615893 ... 0.583560 1.787069 1.023292e+09 1 0.000000 0.00 11.662 1 3.637415 0.525515
3049 10153 411.450877 56.076102 14141.70787 -38.107 3538548 23.050 4 1 14139.803707 ... 3.205733 2.428322 5.588691e+09 4 0.068768 21.22 23.342 1 7.181173 0.978713
3050 5078 208.674537 27.215947 6940.82469 -18.801 1769811 11.424 1 1 6868.269133 ... 0.100881 1.860897 5.623415e+08 1 0.000000 0.00 11.247 1 3.642058 0.475008
3051 5079 211.909499 25.671006 7396.22195 -18.877 1769883 11.274 1 1 7384.348483 ... 5.218676 1.155014 2.454709e+09 1 0.000000 0.00 11.311 1 3.698518 0.448044

2244 rows × 26 columns

You can also get back a particular group


In [48]:
# Group with the most number of Galaxies
GroupID_max = int(RS1.GroupID.loc[RS1.GroupNGals==RS1.GroupNGals.max()].unique())

# Getting the stats for that Group
Groups_dict.get_group(GroupID_max)


Out[48]:
CZobs CZreal DEC DECrad DISTC FLAG FSMGR GalMatchFlag GroupCZdisp GroupGalType ... RA RArad Vptang Vptot index logMHalo logMstar rmag umag ur_col
244 4184.84668 4433.164364 4.731734 0.082584 0.000000 1 2.215381 ECO 706.91 0 ... 212.719898 3.712663 627.97968 675.29264 538 11.628 10.032284 -20.288746 -19.064381 1.224365
410 4924.61353 5034.595634 4.314873 0.075309 0.000000 1 0.442543 ECO 706.91 0 ... 214.448100 3.742825 553.74793 564.56429 778 11.941 10.599734 -19.646400 -17.251831 2.394569
413 4882.03830 4945.885136 4.393863 0.076687 0.000000 1 0.203016 ECO 706.91 0 ... 213.896606 3.733200 464.00287 468.37493 801 11.578 10.293238 -18.854952 -16.671577 2.183374
501 4472.13537 5202.238398 4.334079 0.075644 0.000000 1 6.603787 ECO 706.91 0 ... 214.155222 3.737714 323.99750 798.76455 1284 11.715 10.287772 -19.358240 -18.197712 1.160528
502 4405.21117 5201.404801 4.379168 0.076431 0.072612 0 0.241837 ECO 706.91 0 ... 214.123053 3.737152 287.37216 846.46739 1285 11.715 9.280000 -19.149672 -17.523731 1.625940

5 rows × 28 columns

You can compute different statistics for the different columns


In [49]:
Groups_dict.get_group(GroupID_max).describe()


Out[49]:
CZobs CZreal DEC DECrad DISTC FLAG FSMGR GroupCZdisp GroupGalType GroupLogMHalo ... RA RArad Vptang Vptot index logMHalo logMstar rmag umag ur_col
count 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.00 5.0 5.00 ... 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
mean 4573.769010 4963.457667 4.430743 0.077331 0.014522 0.800000 1.941313 706.91 0.0 14.35 ... 213.868576 3.732711 451.420028 670.692760 937.200000 11.715400 10.098606 -19.459602 -17.741846 1.717755
std 319.422742 316.312926 0.171308 0.002990 0.032473 0.447214 2.737242 0.00 0.0 0.00 ... 0.671383 0.011718 145.731638 157.730799 333.350716 0.139116 0.499827 0.546421 0.920608 0.556157
min 4184.846680 4433.164364 4.314873 0.075309 0.000000 0.000000 0.203016 706.91 0.0 14.35 ... 212.719898 3.712663 287.372160 468.374930 538.000000 11.578000 9.280000 -20.288746 -19.064381 1.160528
25% 4405.211170 4945.885136 4.334079 0.075644 0.000000 1.000000 0.241837 706.91 0.0 14.35 ... 213.896606 3.733200 323.997500 564.564290 778.000000 11.628000 10.032284 -19.646400 -18.197712 1.224365
50% 4472.135370 5034.595634 4.379168 0.076431 0.000000 1.000000 0.442543 706.91 0.0 14.35 ... 214.123053 3.737152 464.002870 675.292640 801.000000 11.715000 10.287772 -19.358240 -17.523731 1.625940
75% 4882.038300 5201.404801 4.393863 0.076687 0.000000 1.000000 2.215381 706.91 0.0 14.35 ... 214.155222 3.737714 553.747930 798.764550 1284.000000 11.715000 10.293238 -19.149672 -17.251831 2.183374
max 4924.613530 5202.238398 4.731734 0.082584 0.072612 1.000000 6.603787 706.91 0.0 14.35 ... 214.448100 3.742825 627.979680 846.467390 1285.000000 11.941000 10.599734 -18.854952 -16.671577 2.394569

8 rows × 26 columns

Covariance

Or even calculate the covariance for the columns


In [60]:
RS1.cov()


Out[60]:
index RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal ... ur_col MhiMass GroupID GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType RArad DECrad
index 2.181596e+06 -2.298427e+04 5.586957e+03 7.772780e+05 1.931854e+01 6.154578e+08 -1.441836e+01 -2.628546e+01 8.913818e+00 8.106301e+05 ... 2.619069e+01 -2.665743e+11 1.311627e+06 -8.450746e+02 -8.782327e+00 -1.349381e+04 -8.693728e+01 5.784895e+01 -4.011512e+02 9.751079e+01
RA -2.298427e+04 1.089537e+03 -3.420840e+01 2.757614e+03 -2.955545e-01 -6.679839e+06 1.474443e-01 -2.572352e-01 1.350572e-01 1.508786e+03 ... -4.593881e-02 6.882014e+08 -1.330206e+04 1.396574e+01 1.476375e-01 2.422790e+02 1.521332e+00 -8.423405e-01 1.901602e+01 -5.970492e-01
DEC 5.586957e+03 -3.420840e+01 2.319493e+02 -2.092093e+03 1.229002e-02 3.753073e+06 -2.776626e-02 2.496057e-01 -7.613347e-02 -1.448510e+03 ... -7.458507e-04 -5.427480e+08 3.905507e+03 -6.081837e+00 -1.928684e-02 -8.218754e+01 -4.941584e-01 2.200978e-01 -5.970492e-01 4.048280e+00
CZobs 7.772780e+05 2.757614e+03 -2.092093e+03 1.654774e+06 -2.310160e+01 1.476149e+08 9.432305e+00 -4.304231e+00 -1.158965e+00 1.575706e+06 ... 3.806679e+01 -1.371571e+11 4.454930e+05 -2.168341e+02 -9.466808e-01 -2.640655e+03 1.055219e+01 -1.348847e-01 4.812945e+01 -3.651391e+01
Mr 1.931854e+01 -2.955545e-01 1.229002e-02 -2.310160e+01 3.456844e-01 2.438294e+03 -7.860392e-02 -4.982106e-03 -2.889540e-02 -2.440613e+01 ... -7.208798e-02 -4.127290e+08 1.591201e+01 6.463370e-03 -1.080272e-03 -1.447294e+00 -1.095948e-01 -4.498804e-02 -5.158399e-03 2.145012e-04
HaloID 6.154578e+08 -6.679839e+06 3.753073e+06 1.476149e+08 2.438294e+03 2.304506e+11 -2.017190e+03 2.410209e+02 -4.727293e+01 1.623230e+08 ... 1.829660e+03 -5.093372e+13 3.804580e+08 -2.043365e+05 -1.349997e+03 -3.194161e+06 -1.997173e+04 1.213782e+04 -1.165852e+05 6.550347e+04
logMHalo -1.441836e+01 1.474443e-01 -2.776626e-02 9.432305e+00 -7.860392e-02 -2.017190e+03 4.046042e-02 3.145101e-02 -1.402054e-02 9.859384e+00 ... 1.709318e-02 8.438130e+07 -1.129491e+01 6.450592e-02 1.211647e-03 1.595945e+00 3.725773e-02 -9.509890e-03 2.573389e-03 -4.846127e-04
NGalH -2.628546e+01 -2.572352e-01 2.496057e-01 -4.304231e+00 -4.982106e-03 2.410209e+02 3.145101e-02 2.507830e-01 -1.088891e-01 -7.419269e+00 ... 5.959124e-03 -1.206168e+07 -1.685324e+01 2.462191e-01 4.749852e-03 6.838984e+00 6.283496e-02 -9.178099e-02 -4.489601e-03 4.356440e-03
FLAG 8.913818e+00 1.350572e-01 -7.613347e-02 -1.158965e+00 -2.889540e-02 -4.727293e+01 -1.402054e-02 -1.088891e-01 1.065840e-01 -2.967087e-01 ... 2.494139e-03 6.993297e+07 6.058182e+00 -1.120744e-01 -2.192924e-03 -3.117333e+00 -2.805117e-02 7.410140e-02 2.357192e-03 -1.328780e-03
CZreal 8.106301e+05 1.508786e+03 -1.448510e+03 1.575706e+06 -2.440613e+01 1.623230e+08 9.859384e+00 -7.419269e+00 -2.967087e-01 1.536303e+06 ... 3.694438e+01 -1.332002e+11 4.676088e+05 -1.325822e+02 -2.442582e-01 -2.289977e+03 1.287176e+01 -6.949726e-01 2.633328e+01 -2.528127e+01
DISTC -2.850860e-01 -2.433786e-02 7.560089e-04 7.343221e-01 2.474714e-03 1.727109e+02 1.626163e-03 1.014831e-02 -9.851798e-03 4.615477e-01 ... -3.733074e-04 -6.462987e+06 -2.776427e-01 9.311207e-03 3.099434e-04 2.935209e-01 2.728544e-03 -6.825898e-03 -4.247758e-04 1.319484e-05
Vptot -8.204723e+04 2.212872e+03 -6.798065e+02 4.119205e+04 -7.070875e-01 -2.892913e+07 1.618646e+00 1.535922e+00 -1.208064e+00 4.957065e+04 ... 1.543374e+00 -2.023708e+10 -5.143447e+04 1.260621e+02 1.656184e+00 1.684616e+03 1.329682e+01 -8.997785e+00 3.862190e+01 -1.186486e+01
Vptang -9.014971e+04 2.843134e+03 -9.531855e+02 6.529761e+04 -2.970740e-01 -3.435219e+07 1.420597e+00 1.500098e+00 -9.836101e-01 6.382373e+04 ... 1.965117e+00 -1.886259e+10 -5.658740e+04 8.171065e+01 1.294948e+00 1.469719e+03 1.129570e+01 -8.118135e+00 4.962205e+01 -1.663622e+01
logMstar -2.100861e+01 3.433979e-01 -9.543213e-02 1.807609e+01 -1.686256e-01 -4.153282e+03 6.186612e-02 -3.336331e-02 6.182902e-02 1.940680e+01 ... 3.543189e-02 2.202997e+08 -1.728105e+01 1.834790e-02 3.831666e-04 -4.884735e-02 4.254488e-02 4.604768e-02 5.993424e-03 -1.665605e-03
rmag 9.695980e+00 -4.198897e-02 -4.127163e-02 -2.869988e+01 3.524043e-01 -5.526142e+02 -7.891440e-02 -2.965146e-03 -3.047496e-02 -2.973787e+01 ... -7.865790e-02 -3.910549e+08 1.023456e+01 6.105523e-03 -1.189317e-03 -1.444331e+00 -1.109829e-01 -4.856009e-02 -7.328458e-04 -7.203258e-04
umag 3.588669e+01 -8.792803e-02 -4.201727e-02 9.366916e+00 2.803163e-01 1.277058e+03 -6.182122e-02 2.993990e-03 -2.798082e-02 7.206523e+00 ... 1.263307e-01 -1.510898e+09 2.417838e+01 -7.343608e-03 -1.640577e-03 -1.566172e+00 -8.919846e-02 -4.200461e-02 -1.534634e-03 -7.333396e-04
FSMGR -3.949605e+01 -2.077870e-02 -3.316090e-01 -2.043116e+01 1.039979e-01 -8.796008e+03 -1.966349e-02 1.519587e-03 -8.545597e-03 -1.669011e+01 ... -3.078101e-01 2.385122e+09 -2.446775e+01 3.987326e-01 2.861589e-03 2.970075e+00 -8.278981e-03 -2.637713e-02 -3.626567e-04 -5.787669e-03
ur_col 2.619069e+01 -4.593881e-02 -7.458507e-04 3.806679e+01 -7.208798e-02 1.829660e+03 1.709318e-02 5.959124e-03 2.494139e-03 3.694438e+01 ... 2.049886e-01 -1.119843e+09 1.394381e+01 -1.344921e-02 -4.512595e-04 -1.218404e-01 2.178444e-02 6.555485e-03 -8.017834e-04 -1.301755e-05
MhiMass -2.665743e+11 6.882014e+08 -5.427480e+08 -1.371571e+11 -4.127290e+08 -5.093372e+13 8.438130e+07 -1.206168e+07 6.993297e+07 -1.332002e+11 ... -1.119843e+09 1.566881e+19 -1.756067e+11 -1.745911e+07 4.917855e+06 2.866463e+09 1.414368e+08 7.499084e+07 1.201138e+07 -9.472739e+06
GroupID 1.311627e+06 -1.330206e+04 3.905507e+03 4.454930e+05 1.591201e+01 3.804580e+08 -1.129491e+01 -1.685324e+01 6.058182e+00 4.676088e+05 ... 1.394381e+01 -1.756067e+11 7.965887e+05 -6.626612e+02 -6.720297e+00 -9.953393e+03 -6.641786e+01 4.325604e+01 -2.321647e+02 6.816395e+01
GroupNGals -8.450746e+02 1.396574e+01 -6.081837e+00 -2.168341e+02 6.463370e-03 -2.043365e+05 6.450592e-02 2.462191e-01 -1.120744e-01 -1.325822e+02 ... -1.344921e-02 -1.745911e+07 -6.626612e+02 3.722428e+01 1.700104e-01 2.560579e+02 1.309322e+00 -6.030891e-01 2.437481e-01 -1.061481e-01
GroupRproj -8.782327e+00 1.476375e-01 -1.928684e-02 -9.466808e-01 -1.080272e-03 -1.349997e+03 1.211647e-03 4.749852e-03 -2.192924e-03 -2.442582e-01 ... -4.512595e-04 4.917855e+06 -6.720297e+00 1.700104e-01 2.613553e-03 2.293824e+00 1.660395e-02 -1.217465e-02 2.576760e-03 -3.366188e-04
GroupCZdisp -1.349381e+04 2.422790e+02 -8.218754e+01 -2.640655e+03 -1.447294e+00 -3.194161e+06 1.595945e+00 6.838984e+00 -3.117333e+00 -2.289977e+03 ... -1.218404e-01 2.866463e+09 -9.953393e+03 2.560579e+02 2.293824e+00 3.399523e+03 1.823995e+01 -1.241124e+01 4.228566e+00 -1.434443e+00
GroupLogMHalo -8.693728e+01 1.521332e+00 -4.941584e-01 1.055219e+01 -1.095948e-01 -1.997173e+04 3.725773e-02 6.283496e-02 -2.805117e-02 1.287176e+01 ... 2.178444e-02 1.414368e+08 -6.641786e+01 1.309322e+00 1.660395e-02 1.823995e+01 1.792074e-01 -1.010384e-01 2.655225e-02 -8.624691e-03
GroupGalType 5.784895e+01 -8.423405e-01 2.200978e-01 -1.348847e-01 -4.498804e-02 1.213782e+04 -9.509890e-03 -9.178099e-02 7.410140e-02 -6.949726e-01 ... 6.555485e-03 7.499084e+07 4.325604e+01 -6.030891e-01 -1.217465e-02 -1.241124e+01 -1.010384e-01 1.698882e-01 -1.470161e-02 3.841431e-03
RArad -4.011512e+02 1.901602e+01 -5.970492e-01 4.812945e+01 -5.158399e-03 -1.165852e+05 2.573389e-03 -4.489601e-03 2.357192e-03 2.633328e+01 ... -8.017834e-04 1.201138e+07 -2.321647e+02 2.437481e-01 2.576760e-03 4.228566e+00 2.655225e-02 -1.470161e-02 3.318921e-01 -1.042047e-02
DECrad 9.751079e+01 -5.970492e-01 4.048280e+00 -3.651391e+01 2.145012e-04 6.550347e+04 -4.846127e-04 4.356440e-03 -1.328780e-03 -2.528127e+01 ... -1.301755e-05 -9.472739e+06 6.816395e+01 -1.061481e-01 -3.366188e-04 -1.434443e+00 -8.624691e-03 3.841431e-03 -1.042047e-02 7.065581e-02

27 rows × 27 columns

Correlation

The corr method provides the correlation between columns


In [61]:
RS1.corr()


Out[61]:
index RA DEC CZobs Mr HaloID logMHalo NGalH FLAG CZreal ... ur_col MhiMass GroupID GroupNGals GroupRproj GroupCZdisp GroupLogMHalo GroupGalType RArad DECrad
index 1.000000 -0.471436 0.248366 0.409091 0.022246 0.868005 -0.048530 -0.035537 0.018485 0.442789 ... 0.039165 -0.045595 0.994961 -0.093777 -0.116307 -0.156689 -0.139040 0.095023 -0.471436 0.248366
RA -0.471436 1.000000 -0.068048 0.064945 -0.015229 -0.421557 0.022207 -0.015562 0.012533 0.036878 ... -0.003074 0.005267 -0.451524 0.069347 0.087490 0.125888 0.108874 -0.061913 1.000000 -0.068048
DEC 0.248366 -0.068048 1.000000 -0.106786 0.001373 0.513336 -0.009064 0.032727 -0.015312 -0.076734 ... -0.000108 -0.009003 0.287319 -0.065452 -0.024771 -0.092555 -0.076646 0.035062 -0.068048 1.000000
CZobs 0.409091 0.064945 -0.106786 1.000000 -0.030544 0.239041 0.036453 -0.006682 -0.002760 0.988251 ... 0.065360 -0.026936 0.388020 -0.027628 -0.014395 -0.035207 0.019377 -0.000254 0.064945 -0.106786
Mr 0.022246 -0.015229 0.001373 -0.030544 1.000000 0.008639 -0.664644 -0.016921 -0.150537 -0.033490 ... -0.270806 -0.177340 0.030323 0.001802 -0.035940 -0.042219 -0.440324 -0.185642 -0.015229 0.001373
HaloID 0.868005 -0.421557 0.513336 0.239041 0.008639 1.000000 -0.020890 0.001003 -0.000302 0.272805 ... 0.008418 -0.026804 0.887975 -0.069766 -0.055008 -0.114119 -0.098276 0.061344 -0.421557 0.513336
logMHalo -0.048530 0.022207 -0.009064 0.036453 -0.664644 -0.020890 1.000000 0.312227 -0.213503 0.039545 ... 0.187691 0.105977 -0.062914 0.052562 0.117827 0.136080 0.437545 -0.114704 0.022207 -0.009064
NGalH -0.035537 -0.015562 0.032727 -0.006682 -0.016921 0.001003 0.312227 1.000000 -0.666023 -0.011953 ... 0.026283 -0.006085 -0.037707 0.080586 0.185531 0.234225 0.296397 -0.444654 -0.015562 0.032727
FLAG 0.018485 0.012533 -0.015312 -0.002760 -0.150537 -0.000302 -0.213503 -0.666023 1.000000 -0.000733 ... 0.016874 0.054115 0.020791 -0.056266 -0.131390 -0.163768 -0.202968 0.550679 0.012533 -0.015312
CZreal 0.442789 0.036878 -0.076734 0.988251 -0.033490 0.272805 0.039545 -0.011953 -0.000733 1.000000 ... 0.065833 -0.027149 0.422695 -0.017532 -0.003855 -0.031687 0.024531 -0.001360 0.036878 -0.076734
DISTC -0.005257 -0.020082 0.001352 0.015548 0.114639 0.009799 0.220190 0.551942 -0.821899 0.010142 ... -0.022457 -0.044470 -0.008473 0.041566 0.165126 0.137113 0.175550 -0.451053 -0.020082 0.001352
Vptot -0.275506 0.332498 -0.221382 0.158817 -0.005965 -0.298882 0.039911 0.015212 -0.018353 0.198353 ... 0.016907 -0.025356 -0.285819 0.102477 0.160674 0.143300 0.155784 -0.108270 0.332498 -0.221382
Vptang -0.305539 0.431187 -0.313307 0.254108 -0.002529 -0.358225 0.035355 0.014995 -0.015082 0.257771 ... 0.021728 -0.023855 -0.317389 0.067043 0.126802 0.126187 0.133575 -0.098597 0.431187 -0.313307
logMstar -0.030929 0.022622 -0.013625 0.030555 -0.623644 -0.018813 0.668791 -0.144868 0.411812 0.034046 ... 0.170170 0.121018 -0.042102 0.006539 0.016298 -0.001822 0.218536 0.242929 0.022622 -0.013625
rmag 0.010307 -0.001997 -0.004255 -0.035031 0.941120 -0.001807 -0.616006 -0.009297 -0.146569 -0.037672 ... -0.272786 -0.155118 0.018005 0.001571 -0.036528 -0.038896 -0.411644 -0.184987 -0.001997 -0.004255
umag 0.036088 -0.003957 -0.004098 0.010815 0.708145 0.003951 -0.456495 0.008880 -0.127300 0.008636 ... 0.414436 -0.566932 0.040237 -0.001788 -0.047664 -0.039897 -0.312963 -0.151366 -0.003957 -0.004098
FSMGR -0.021495 -0.000506 -0.017502 -0.012767 0.142185 -0.014729 -0.078580 0.002439 -0.021041 -0.010824 ... -0.546496 0.484352 -0.022037 0.052534 0.044995 0.040947 -0.015721 -0.051442 -0.000506 -0.017502
ur_col 0.039165 -0.003074 -0.000108 0.065360 -0.270806 0.008418 0.187691 0.026283 0.016874 0.065833 ... 1.000000 -0.624848 0.034506 -0.004869 -0.019496 -0.004615 0.113659 0.035128 -0.003074 -0.000108
MhiMass -0.045595 0.005267 -0.009003 -0.026936 -0.177340 -0.026804 0.105977 -0.006085 0.054115 -0.027149 ... -0.624848 1.000000 -0.049706 -0.000723 0.024302 0.012420 0.084405 0.045963 0.005267 -0.009003
GroupID 0.994961 -0.451524 0.287319 0.388020 0.030323 0.887975 -0.062914 -0.037707 0.020791 0.422695 ... 0.034506 -0.049706 1.000000 -0.121692 -0.147284 -0.191269 -0.175788 0.117584 -0.451524 0.287319
GroupNGals -0.093777 0.069347 -0.065452 -0.027628 0.001802 -0.069766 0.052562 0.080586 -0.056266 -0.017532 ... -0.004869 -0.000723 -0.121692 1.000000 0.545063 0.719807 0.506939 -0.239821 0.069347 -0.065452
GroupRproj -0.116307 0.087490 -0.024771 -0.014395 -0.035940 -0.055008 0.117827 0.185531 -0.131390 -0.003855 ... -0.019496 0.024302 -0.147284 0.545063 1.000000 0.769547 0.767216 -0.577775 0.087490 -0.024771
GroupCZdisp -0.156689 0.125888 -0.092555 -0.035207 -0.042219 -0.114119 0.136080 0.234225 -0.163768 -0.031687 ... -0.004615 0.012420 -0.191269 0.719807 0.769547 1.000000 0.738987 -0.516446 0.125888 -0.092555
GroupLogMHalo -0.139040 0.108874 -0.076646 0.019377 -0.440324 -0.098276 0.437545 0.296397 -0.202968 0.024531 ... 0.113659 0.084405 -0.175788 0.506939 0.767216 0.738987 1.000000 -0.579065 0.108874 -0.076646
GroupGalType 0.095023 -0.061913 0.035062 -0.000254 -0.185642 0.061344 -0.114704 -0.444654 0.550679 -0.001360 ... 0.035128 0.045963 0.117584 -0.239821 -0.577775 -0.516446 -0.579065 1.000000 -0.061913 0.035062
RArad -0.471436 1.000000 -0.068048 0.064945 -0.015229 -0.421557 0.022207 -0.015562 0.012533 0.036878 ... -0.003074 0.005267 -0.451524 0.069347 0.087490 0.125888 0.108874 -0.061913 1.000000 -0.068048
DECrad 0.248366 -0.068048 1.000000 -0.106786 0.001373 0.513336 -0.009064 0.032727 -0.015312 -0.076734 ... -0.000108 -0.009003 0.287319 -0.065452 -0.024771 -0.092555 -0.076646 0.035062 -0.068048 1.000000

27 rows × 27 columns

Handling Missing Values

You can handle missing values in DataFrames without a problem.

Check "Handling of missing values" to learn more about this.

Drop missing values

For this example, we'll use a new DataFrame


In [63]:
df = pd.DataFrame({'int_col' : [1,2,6,8,-1], 'float_col' : [0.1, 0.2,0.2,10.1,None], 'str_col' : ['a','b',None,'c','a']})
df


Out[63]:
float_col int_col str_col
0 0.1 1 a
1 0.2 2 b
2 0.2 6 None
3 10.1 8 c
4 NaN -1 a

We can use the dropna function to drop NaN numbers


In [66]:
df.dropna()


Out[66]:
float_col int_col str_col
0 0.1 1 a
1 0.2 2 b
3 10.1 8 c

Fill missing values

The fillna method can be used to fill missing data (NaN). This example will replae the missing values with the mean of the available values.


In [67]:
df3 = df.copy()

mean = df3['float_col'].mean()

df3


Out[67]:
float_col int_col str_col
0 0.1 1 a
1 0.2 2 b
2 0.2 6 None
3 10.1 8 c
4 NaN -1 a

In [68]:
df3['float_col'].fillna(mean)


Out[68]:
0     0.10
1     0.20
2     0.20
3    10.10
4     2.65
Name: float_col, dtype: float64

In [69]:
df3


Out[69]:
float_col int_col str_col
0 0.1 1 a
1 0.2 2 b
2 0.2 6 None
3 10.1 8 c
4 NaN -1 a

In [ ]: