Week 08 - Introduction to Pandas

Today's Agenda

Pandas: Introduction
- Series
- DataFrames
- Indexing, Selecting, Filtering
- Drop columns
- Handling missing Data



In [1]:

    
# Importing modules
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_context("notebook")
import matplotlib
matplotlib.rc("text", usetex=False)

Series

A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels. One can use any NumPy data type to assign to the Series

Creating a Series:



In [2]:

    
np.random.seed(1)

np.random.random(10)









    Out[2]:





array([4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01,
       1.46755891e-01, 9.23385948e-02, 1.86260211e-01, 3.45560727e-01,
       3.96767474e-01, 5.38816734e-01])



In [3]:

    
series_1 = pd.Series(np.random.random(10))
series_1









    Out[3]:





0    0.419195
1    0.685220
2    0.204452
3    0.878117
4    0.027388
5    0.670468
6    0.417305
7    0.558690
8    0.140387
9    0.198101
dtype: float64

One can get a NumPy array from the Series, by typing:



In [4]:

    
series_1.values









    Out[4]:





array([0.41919451, 0.6852195 , 0.20445225, 0.87811744, 0.02738759,
       0.67046751, 0.4173048 , 0.55868983, 0.14038694, 0.19810149])

Reindexing

One can also get the indices of each element, by typing:



In [5]:

    
series_1.index.values









    Out[5]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

One can also have a custom set of indices:



In [6]:

    
# import string
# alphabet = string.lowercase
# alphabet = np.array([x for x in alphabet])[0:10]
# alphabet

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
alphabet









    Out[6]:





['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']



In [7]:

    
series_2 = pd.Series(np.random.random(len(alphabet)), index=alphabet)
series_2









    Out[7]:





a    0.800745
b    0.968262
c    0.313424
d    0.692323
e    0.876389
f    0.894607
g    0.085044
h    0.039055
i    0.169830
j    0.878143
dtype: float64

One can select only a subsample of the Series



In [8]:

    
series_1[[0, 1, 2]]









    Out[8]:





0    0.419195
1    0.685220
2    0.204452
dtype: float64



In [9]:

    
series_1[[1,3,4]]









    Out[9]:





1    0.685220
3    0.878117
4    0.027388
dtype: float64



In [10]:

    
series_2[['a','d','j']]









    Out[10]:





a    0.800745
d    0.692323
j    0.878143
dtype: float64

Arithmetic and function Mapping

You can also perform numerical expressions



In [11]:

    
series_1**2









    Out[11]:





0    0.175724
1    0.469526
2    0.041801
3    0.771090
4    0.000750
5    0.449527
6    0.174143
7    0.312134
8    0.019708
9    0.039244
dtype: float64



In [12]:

    
series_1[1]**2









    Out[12]:





0.4695257637239847

Or find values greater than some value 'x'



In [13]:

    
x = 0.5
series_1[(series_1 >= x) & (series_1 < 0.8)]









    Out[13]:





1    0.685220
5    0.670468
7    0.558690
dtype: float64

You can apply functions to a column, and save it as a new Series



In [14]:

    
import sys
def exponentials(arr, basis=10.):
    """
    Uses the array `arr` as the exponents for `basis`
    
    Parameters
    ----------
    arr: numpy array, list, pandas Series; shape (N,)
        array to be used as exponents of `basis`
    
    power: int or float, optional (default = 10)
        number used as the basis
    
    Returns
    -------
    exp_arr: numpy array or list, shape (N,)
        array of values for `basis`**`arr`
    """
    if isinstance(arr, list):
        exp_arr = [basis**x for x in arr]
        return exp_arr        
    elif isinstance(arr, np.ndarray) or isinstance(arr, pd.core.series.Series):
        exp_arr = basis**arr
        return exp_arr
    else:
        cmd = ">>>> `arr` is not a list nor a numpy array"
        cmd +="\n>>>> Please give the correct type of object"
        print(cmd)
        sys.exit(1)



In [15]:

    
exponentials(series_1[(series_1 >= x) & (series_1 > 0.6)]).values









    Out[15]:





array([4.84417139, 7.55296438, 4.68238921])

You can also create a Series using a dictionary (we talked about these on Week 4)



In [16]:

    
labels_arr = ['foo', 'bar', 'baz']
data_arr   = [100, 200, 300]
dict_1     = dict(zip(labels_arr, data_arr))
dict_1









    Out[16]:





{'foo': 100, 'bar': 200, 'baz': 300}



In [17]:

    
series_3 = pd.Series(dict_1)
series_3









    Out[17]:





foo    100
bar    200
baz    300
dtype: int64

Handling Missing Data

One of the most useful features of pandas is that it can handle missing data quite easily:



In [18]:

    
index = ['foo', 'bar', 'baz', 'qux']
series_4 = pd.Series(dict_1, index=index)
series_4









    Out[18]:





foo    100.0
bar    200.0
baz    300.0
qux      NaN
dtype: float64



In [19]:

    
pd.isnull(series_4)









    Out[19]:





foo    False
bar    False
baz    False
qux     True
dtype: bool



In [20]:

    
series_3









    Out[20]:





foo    100
bar    200
baz    300
dtype: int64



In [21]:

    
series_3 + series_4









    Out[21]:





bar    400.0
baz    600.0
foo    200.0
qux      NaN
dtype: float64

So using a Series is powerful, but DataFrames are probably what gets used the most since it represents a tabular data structure containing an ordered collection of columns and rows.

DataFrames

A DataFrame is a "tabular data structure" containing an ordered collection of columns. Each column can a have a different data type.

Row and column operations are treated roughly symmetrically. One can obtain a DataFrame from a normal dictionary, or by reading a file with columns and rows.

Creating a DataFrame



In [22]:

    
data_1 = {'state' : ['VA', 'VA', 'VA', 'MD', 'MD'],
          'year' : [2012, 2013, 2014, 2014, 2015],
          'popu' : [5.0, 5.1, 5.2, 4.0, 4.1]}
df_1 = pd.DataFrame(data_1)
df_1

This DataFrame has 4 rows and 3 columns by the name "pop", "state", and "year".

The way to access a DataFrame is quite similar to that of accessing a Series.
To access a column, one writes the name of the column, as in the following example:



In [23]:

    
df_1['popu']









    Out[23]:





0    5.0
1    5.1
2    5.2
3    4.0
4    4.1
Name: popu, dtype: float64



In [24]:

    
df_1.popu









    Out[24]:





0    5.0
1    5.1
2    5.2
3    4.0
4    4.1
Name: popu, dtype: float64

One can also handle missing data with DataFrames. Like Series, columns that are not present in the data are NaNs:



In [25]:

    
df_2 = pd.DataFrame(data_1, columns=['year', 'state', 'popu', 'unempl'])
df_2



In [26]:

    
df_2['state']









    Out[26]:





0    VA
1    VA
2    VA
3    MD
4    MD
Name: state, dtype: object

One can retrieve a row by:



In [27]:

    
df_2.iloc[1:4]

Editing a DataFrame is quite easy to do. One can assign a Series to a column of the DataFrame. If the Series is a list or an array, the length must match the DataFrame.



In [28]:

    
unempl = pd.Series([1.0, 2.0, 10.], index=[1,3,5])
unempl









    Out[28]:





1     1.0
3     2.0
5    10.0
dtype: float64



In [29]:

    
df_2['unempl'] = unempl
df_2



In [30]:

    
df_2.unempl.isnull()









    Out[30]:





0     True
1    False
2     True
3    False
4     True
Name: unempl, dtype: bool

You can also transpose a DataFrame, i.e. switch rows by columns, and columns by rows



In [31]:

    
df_2.T

Now, let's say you want to show only the 'year' and 'popu' columns. You can do it by:



In [32]:

    
df_2



In [33]:

    
df_2[['year', 'unempl']]

Dropping Entries

Let's say you only need a subsample of the table that you have, and you need to drop a column from the DataFrame. You can do that by using the 'drop' option:



In [34]:

    
df_2



In [35]:

    
df_3 = df_2.drop('unempl', axis=1)
df_3

df_2.drop('unempl', axis=1)



In [36]:

    
df_2

You can also drop certain rows:



In [37]:

    
df_2



In [38]:

    
df_4 = df_2.drop([1,2])
df_4

Look at this carefully! The DataFrame preserved the same indices as for df_2.

If you can to reset the indices, you can do that by:



In [39]:

    
df_4.reset_index(inplace=True)
df_4

Gaia Dataset

Pandas is great at reading Data tables and CSV files, and other kinds of documents. For the remainder of this notebook, we will be using the Gaia's DR1 catalogue.



In [40]:

    
# Path to online file
url_path = 'http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source/csv/GaiaSource_1000172165251650944_1000424567594791808.csv.gz'

# Converting data to DataFrame
gaia_df = pd.read_csv(url_path, compression='gzip')



In [41]:

    
gaia_df.head()









    Out[41]:







  
    
      
      solution_id
      designation
      source_id
      random_index
      ref_epoch
      ra
      ra_error
      dec
      dec_error
      parallax
      ...
      e_bp_min_rp_val
      e_bp_min_rp_percentile_lower
      e_bp_min_rp_percentile_upper
      flame_flags
      radius_val
      radius_percentile_lower
      radius_percentile_upper
      lum_val
      lum_percentile_lower
      lum_percentile_upper
    
  
  
    
      0
      1635721458409799680
      Gaia DR2 1000225938242805248
      1000225938242805248
      1197051105
      2015.5
      103.447529
      0.041099
      56.022025
      0.045175
      0.582790
      ...
      0.0595
      0.0080
      0.1351
      200111.0
      1.024730
      1.017359
      1.038814
      1.075774
      0.801798
      1.349751
    
    
      1
      1635721458409799680
      Gaia DR2 1000383512003001728
      1000383512003001728
      598525552
      2015.5
      105.187856
      0.016978
      56.267982
      0.016904
      1.385686
      ...
      0.2430
      0.0830
      0.4030
      200111.0
      1.388711
      1.311143
      1.453106
      1.937890
      1.852440
      2.023341
    
    
      2
      1635721458409799680
      Gaia DR2 1000274106300491264
      1000274106300491264
      299262776
      2015.5
      103.424758
      0.464608
      56.450903
      0.582490
      0.314035
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      1635721458409799680
      Gaia DR2 1000396156385741312
      1000396156385741312
      1148557518
      2015.5
      105.049751
      0.838232
      56.508777
      0.744511
      1.939951
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      1635721458409799680
      Gaia DR2 1000250024419296000
      1000250024419296000
      574278759
      2015.5
      103.352525
      0.023159
      56.395144
      0.022836
      0.747108
      ...
      0.2870
      0.1196
      0.4051
      200111.0
      1.507958
      1.435618
      1.540208
      2.427377
      2.152597
      2.702158
    
  

5 rows × 94 columns

Shape, Columns and Rows

You can get the shape of the "gaia_df" DataFrame by typing:



In [42]:

    
gaia_df.shape









    Out[42]:





(14209, 94)

That means there are 14209 rows and 94 columns.

To get an array of the columns available, one could write:



In [43]:

    
gaia_df.columns.values.sort()
gaia_df.columns.values









    Out[43]:





array(['a_g_percentile_lower', 'a_g_percentile_upper', 'a_g_val',
       'astrometric_chi2_al', 'astrometric_excess_noise',
       'astrometric_excess_noise_sig', 'astrometric_gof_al',
       'astrometric_matched_observations', 'astrometric_n_bad_obs_al',
       'astrometric_n_good_obs_al', 'astrometric_n_obs_ac',
       'astrometric_n_obs_al', 'astrometric_params_solved',
       'astrometric_primary_flag', 'astrometric_pseudo_colour',
       'astrometric_pseudo_colour_error', 'astrometric_sigma5d_max',
       'astrometric_weight_al', 'b', 'bp_g', 'bp_rp', 'dec', 'dec_error',
       'dec_parallax_corr', 'dec_pmdec_corr', 'dec_pmra_corr',
       'designation', 'duplicated_source', 'e_bp_min_rp_percentile_lower',
       'e_bp_min_rp_percentile_upper', 'e_bp_min_rp_val', 'ecl_lat',
       'ecl_lon', 'flame_flags', 'frame_rotator_object_type', 'g_rp', 'l',
       'lum_percentile_lower', 'lum_percentile_upper', 'lum_val',
       'matched_observations', 'mean_varpi_factor_al', 'parallax',
       'parallax_error', 'parallax_over_error', 'parallax_pmdec_corr',
       'parallax_pmra_corr', 'phot_bp_mean_flux',
       'phot_bp_mean_flux_error', 'phot_bp_mean_flux_over_error',
       'phot_bp_mean_mag', 'phot_bp_n_obs', 'phot_bp_rp_excess_factor',
       'phot_g_mean_flux', 'phot_g_mean_flux_error',
       'phot_g_mean_flux_over_error', 'phot_g_mean_mag', 'phot_g_n_obs',
       'phot_proc_mode', 'phot_rp_mean_flux', 'phot_rp_mean_flux_error',
       'phot_rp_mean_flux_over_error', 'phot_rp_mean_mag',
       'phot_rp_n_obs', 'phot_variable_flag', 'pmdec', 'pmdec_error',
       'pmra', 'pmra_error', 'pmra_pmdec_corr', 'priam_flags', 'ra',
       'ra_dec_corr', 'ra_error', 'ra_parallax_corr', 'ra_pmdec_corr',
       'ra_pmra_corr', 'radial_velocity', 'radial_velocity_error',
       'radius_percentile_lower', 'radius_percentile_upper', 'radius_val',
       'random_index', 'ref_epoch', 'rv_nb_transits', 'rv_template_fe_h',
       'rv_template_logg', 'rv_template_teff', 'solution_id', 'source_id',
       'teff_percentile_lower', 'teff_percentile_upper', 'teff_val',
       'visibility_periods_used'], dtype=object)

Let's say you only want a DataFrame with the the colums:

ra (right ascension)
dec (declination)
l (galactic longitude)
b (galactic latitude)

You do this by using the loc option for the DataFrame:



In [44]:

    
gaia_df_2 = gaia_df.loc[:,['ra','dec','l','b']]

# Displaying the first 15 lines
gaia_df_2.head(15)

This selects all of the rows, and only the selected columns in the list.

You can also select only a subsample of the rows as well, as in the following example. Let's say I just want a random subsample of 10% of the galaxies in the Gaia DR1 catalogue. I can do that by:



In [45]:

    
import random
random.sample









    Out[45]:





<bound method Random.sample of <random.Random object at 0x7fb45d83e618>>



In [46]:

    
# Decission indices to select from DataFrame
import random

# Number of rows
nrows = len(gaia_df_2)

# Randomly selecting `nrows` from `gaia_df_2`
gaia_df_3 = gaia_df_2.sample(nrows)

gaia_df_3.shape









    Out[46]:





(14209, 4)

I'm re-normalizing the indices of this DataFrame



In [47]:

    
gaia_df_3.reset_index(inplace=True, drop=True)
gaia_df_3









    Out[47]:







  
    
      
      ra
      dec
      l
      b
    
  
  
    
      0
      102.023218
      56.433992
      159.500110
      21.886331
    
    
      1
      105.272250
      56.502381
      159.938521
      23.636142
    
    
      2
      105.087569
      56.445058
      159.972272
      23.523299
    
    
      3
      103.359287
      56.193791
      159.970039
      22.532502
    
    
      4
      103.240760
      56.630222
      159.495644
      22.585832
    
    
      5
      103.222515
      56.549474
      159.577022
      22.554616
    
    
      6
      103.361986
      56.583872
      159.563251
      22.637842
    
    
      7
      103.775851
      56.168097
      160.063471
      22.749245
    
    
      8
      101.836603
      56.119531
      159.792805
      21.697240
    
    
      9
      102.678893
      56.936834
      159.086470
      22.371832
    
    
      10
      105.667572
      56.619706
      159.870549
      23.875620
    
    
      11
      102.236439
      55.945081
      160.041579
      21.861470
    
    
      12
      102.433467
      56.518693
      159.480950
      22.127565
    
    
      13
      105.105611
      56.486566
      159.931001
      23.543060
    
    
      14
      105.362118
      56.534916
      159.917022
      23.692076
    
    
      15
      102.368405
      56.455920
      159.535245
      22.075601
    
    
      16
      103.676978
      56.307551
      159.901923
      22.732851
    
    
      17
      102.504898
      57.212852
      158.771444
      22.355833
    
    
      18
      101.804305
      55.852613
      160.062130
      21.602850
    
    
      19
      102.119267
      56.169582
      159.789621
      21.862448
    
    
      20
      102.071402
      56.730196
      159.201815
      21.995346
    
    
      21
      102.774558
      56.830031
      159.212977
      22.393163
    
    
      22
      102.137264
      56.350726
      159.605436
      21.923322
    
    
      23
      103.921963
      56.655100
      159.575937
      22.953848
    
    
      24
      102.932901
      56.335646
      159.753100
      22.342795
    
    
      25
      105.385988
      56.643913
      159.804840
      23.730851
    
    
      26
      105.542361
      56.377615
      160.109832
      23.751272
    
    
      27
      102.647940
      56.268813
      159.775683
      22.172426
    
    
      28
      103.711527
      56.349395
      159.863591
      22.762315
    
    
      29
      103.597854
      56.158071
      160.045628
      22.650994
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      14179
      102.489214
      56.081397
      159.943479
      22.035450
    
    
      14180
      102.147983
      56.711074
      159.234323
      22.030321
    
    
      14181
      103.716786
      56.159878
      160.062696
      22.715352
    
    
      14182
      105.252391
      56.393553
      160.050906
      23.599250
    
    
      14183
      102.596930
      56.310072
      159.724415
      22.156631
    
    
      14184
      105.268657
      56.207321
      160.250440
      23.562900
    
    
      14185
      102.044837
      56.476426
      159.459891
      21.909801
    
    
      14186
      102.284147
      56.917347
      159.042894
      22.159330
    
    
      14187
      102.602635
      57.186376
      158.814544
      22.399603
    
    
      14188
      102.825593
      56.441695
      159.625292
      22.314552
    
    
      14189
      101.982163
      56.314866
      159.616222
      21.830699
    
    
      14190
      102.244534
      56.641037
      159.322869
      22.061669
    
    
      14191
      102.608435
      57.217545
      158.783009
      22.411067
    
    
      14192
      102.743681
      57.158952
      158.865450
      22.465819
    
    
      14193
      102.777462
      56.444267
      159.614761
      22.289662
    
    
      14194
      105.891445
      56.646945
      159.872951
      24.001706
    
    
      14195
      102.788402
      56.867407
      159.176279
      22.410572
    
    
      14196
      102.019136
      56.186270
      159.755308
      21.813762
    
    
      14197
      103.381823
      56.792650
      159.348148
      22.703505
    
    
      14198
      102.970067
      56.429717
      159.661250
      22.388156
    
    
      14199
      101.940300
      55.874246
      160.063572
      21.682161
    
    
      14200
      102.951998
      56.286436
      159.807412
      22.339588
    
    
      14201
      103.637537
      56.235559
      159.970962
      22.692768
    
    
      14202
      102.089903
      55.881491
      160.082031
      21.764630
    
    
      14203
      103.215704
      56.428451
      159.702186
      22.518617
    
    
      14204
      102.642189
      56.958484
      159.058057
      22.358449
    
    
      14205
      102.233121
      57.066272
      158.880057
      22.173822
    
    
      14206
      104.919145
      56.263134
      160.139589
      23.388225
    
    
      14207
      103.295677
      56.738090
      159.391665
      22.643558
    
    
      14208
      101.923237
      56.098917
      159.828998
      21.737590
    
  

14209 rows × 4 columns

You can produce plots directly from the DataFrame



In [48]:

    
title_txt = 'Right Ascension and Declination for Gaia'

gaia_df_3.plot('ra','dec',       # Columns to plot
               kind='scatter',   # Kind of plot. In this case, it's `scatter`
               label='Gaia',     # Label of the points
               title=title_txt,  # Title of the figure
               color='#4c72b0',  # Color of the points
               figsize=(12,8))  # Size of the fiure









    Out[48]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a16cd80f0>

Or even Scatterplot Matrices:



In [49]:

    
sns.pairplot(gaia_df_3, plot_kws={'color': '#4c72b0'}, diag_kws={'color': '#4c72b0'})









    Out[49]:





<seaborn.axisgrid.PairGrid at 0x1a1760b438>



In [50]:

    
sns.jointplot(gaia_df_3['l'], gaia_df_3['b'], color='#3c8f40')









    Out[50]:





<seaborn.axisgrid.JointGrid at 0x1a193999e8>

Indexing, Selecting, Filtering Data

Now I want to filter the data based on ra and dec:

I want to select all the stars within:

45 < RA < 50
5 < Dec < 10

Normally, you would could do in numpy using the np.where function, like in the following example:



In [51]:

    
ra_arr = gaia_df.ra.values
dec_arr = gaia_df.dec.values



In [52]:

    
# Just showing the first 25 elements
np.column_stack((ra_arr, dec_arr))[0:25]









    Out[52]:





array([[103.44752895,  56.02202543],
       [105.18785594,  56.2679821 ],
       [103.42475813,  56.45090293],
       [105.04975071,  56.50877738],
       [103.35252488,  56.39514381],
       [101.92979073,  55.97333308],
       [101.85392576,  56.12931976],
       [105.12884963,  56.28508092],
       [103.39632957,  56.7144103 ],
       [101.78043734,  55.94533326],
       [103.50036565,  56.84462941],
       [105.64948082,  56.63252739],
       [103.18961712,  56.81515376],
       [103.53014423,  55.98817459],
       [105.93886175,  56.58868695],
       [102.24453393,  56.64103702],
       [102.02432422,  56.0158414 ],
       [103.0848673 ,  56.25264172],
       [102.30172541,  56.6658301 ],
       [103.28806439,  56.2536194 ],
       [103.05853189,  56.72134178],
       [102.0578448 ,  56.39003547],
       [103.71357369,  56.62167297],
       [103.37050966,  56.11391562],
       [103.44554963,  56.29543043]])



In [53]:

    
## Numpy way of finding the stars that meet the criteria

ra_min, ra_max = (102, 104)
dec_min, dec_max = (56.4, 56.7)

# RA critera
ra_idx = np.where((ra_arr >= ra_min) & (ra_arr <= ra_max))[0]

# Dec criteria
dec_idx = np.where((dec_arr >= dec_min) & (dec_arr <= dec_max))[0]

# Finding `intersecting' indices that meet both criteria
radec_idx = np.intersect1d(ra_idx, dec_idx)

# Selecting the values from only those indices
ra_new = ra_arr[radec_idx]
dec_new = dec_arr[radec_idx]

# Printing out ra and dec for corresponding indices
print(np.column_stack((ra_new, dec_new)))









    



[[103.42475813  56.45090293]
 [102.24453393  56.64103702]
 [102.30172541  56.6658301 ]
 ...
 [103.81978156  56.62006303]
 [103.31712396  56.61945699]
 [103.57468884  56.4318757 ]]

This is rather convoluted and long, and one can easily make a mistake if s/he doesn't keep track of which arrays s/he is using!

In Pandas, this is much easier!!



In [54]:

    
gaia_df_4 = gaia_df.loc[(
                (gaia_df.ra >= ra_min) & (gaia_df.ra <= ra_max) &
                (gaia_df.dec >= dec_min) & (gaia_df.dec <= dec_max))]
gaia_df_4[['ra','dec']]









    Out[54]:







  
    
      
      astrometric_excess_noise_sig
      astrometric_matched_observations
    
  
  
    
      2
      103.424758
      56.450903
    
    
      15
      102.244534
      56.641037
    
    
      18
      102.301725
      56.665830
    
    
      22
      103.713574
      56.621673
    
    
      32
      103.462653
      56.532017
    
    
      35
      103.360486
      56.413851
    
    
      36
      103.676326
      56.623116
    
    
      39
      103.085718
      56.447884
    
    
      40
      103.783947
      56.521326
    
    
      46
      102.206913
      56.640966
    
    
      54
      103.726192
      56.673523
    
    
      57
      102.069819
      56.405446
    
    
      63
      102.920433
      56.430698
    
    
      64
      102.975403
      56.593006
    
    
      69
      103.870145
      56.684652
    
    
      73
      103.201988
      56.538079
    
    
      74
      102.256910
      56.592725
    
    
      78
      103.819070
      56.577742
    
    
      79
      103.797238
      56.403185
    
    
      81
      102.342651
      56.599794
    
    
      84
      103.661779
      56.492773
    
    
      89
      103.637186
      56.622849
    
    
      92
      102.241080
      56.684968
    
    
      98
      102.843411
      56.658409
    
    
      103
      102.898776
      56.470195
    
    
      105
      103.480558
      56.617093
    
    
      110
      103.236574
      56.509793
    
    
      112
      103.633321
      56.650588
    
    
      113
      103.776568
      56.622350
    
    
      114
      102.260166
      56.659839
    
    
      ...
      ...
      ...
    
    
      14066
      102.337580
      56.565822
    
    
      14068
      103.495447
      56.455223
    
    
      14069
      103.408320
      56.628237
    
    
      14071
      103.436448
      56.564892
    
    
      14075
      102.350965
      56.622774
    
    
      14078
      103.267697
      56.473831
    
    
      14080
      102.227288
      56.665774
    
    
      14081
      102.962817
      56.594834
    
    
      14088
      103.021430
      56.620846
    
    
      14090
      103.863479
      56.469286
    
    
      14104
      102.954900
      56.594641
    
    
      14114
      103.442994
      56.647008
    
    
      14119
      102.902805
      56.627175
    
    
      14130
      102.974802
      56.451924
    
    
      14135
      103.838059
      56.442466
    
    
      14138
      102.619619
      56.504853
    
    
      14139
      103.687508
      56.489265
    
    
      14141
      103.495952
      56.632908
    
    
      14142
      103.412026
      56.620438
    
    
      14143
      102.714416
      56.661466
    
    
      14161
      102.844313
      56.595309
    
    
      14167
      103.724074
      56.430243
    
    
      14168
      103.211444
      56.645173
    
    
      14173
      102.584448
      56.539904
    
    
      14180
      103.682558
      56.585142
    
    
      14195
      102.307805
      56.625449
    
    
      14200
      103.733646
      56.568726
    
    
      14201
      103.819782
      56.620063
    
    
      14204
      103.317124
      56.619457
    
    
      14207
      103.574689
      56.431876
    
  

3156 rows × 2 columns

Future of Pandas

Pandas is a great for handling data, especially comma-delimited or space-separated data. Pandas is also compatible with many other packages, like seaborn, astropy, NumPy, etc.

We will have another lecture on Pandas that will cover much more advanced aspects of Pandas. Make sure you keep checking the schedule!

	ra	dec	l	b
0	103.447529	56.022025	160.163475	22.533932
1	105.187856	56.267982	160.174346	23.534087
2	103.424758	56.450903	159.712110	22.635989
3	105.049751	56.508777	159.899324	23.518554
4	103.352525	56.395144	159.758838	22.582657
5	101.929791	55.973333	159.959619	21.705035
6	101.853926	56.129320	159.785705	21.709303
7	105.128850	56.285081	160.147553	23.506483
8	103.396330	56.714410	159.432230	22.690559
9	101.780437	55.945333	159.962507	21.616907
10	103.500366	56.844629	159.312220	22.779748
11	105.649481	56.632527	159.854367	23.868954
12	103.189617	56.815154	159.294467	22.607972
13	103.530144	55.988175	160.212164	22.569395
14	105.938862	56.588687	159.941603	24.013615

	year	state	popu	unempl
0	2012	VA	5.0	NaN
1	2013	VA	5.1	NaN
2	2014	VA	5.2	NaN
3	2014	MD	4.0	NaN
4	2015	MD	4.1	NaN

	0	1	2	3	4
year	2012	2013	2014	2014	2015
state	VA	VA	VA	MD	MD
popu	5	5.1	5.2	4	4.1
unempl	NaN	1	NaN	2	NaN

	solution_id	designation	source_id	random_index	ref_epoch	ra	ra_error	dec	dec_error	parallax	...	e_bp_min_rp_val	e_bp_min_rp_percentile_lower	e_bp_min_rp_percentile_upper	flame_flags	radius_val	radius_percentile_lower	radius_percentile_upper	lum_val	lum_percentile_lower	lum_percentile_upper
0	1635721458409799680	Gaia DR2 1000225938242805248	1000225938242805248	1197051105	2015.5	103.447529	0.041099	56.022025	0.045175	0.582790	...	0.0595	0.0080	0.1351	200111.0	1.024730	1.017359	1.038814	1.075774	0.801798	1.349751
1	1635721458409799680	Gaia DR2 1000383512003001728	1000383512003001728	598525552	2015.5	105.187856	0.016978	56.267982	0.016904	1.385686	...	0.2430	0.0830	0.4030	200111.0	1.388711	1.311143	1.453106	1.937890	1.852440	2.023341
2	1635721458409799680	Gaia DR2 1000274106300491264	1000274106300491264	299262776	2015.5	103.424758	0.464608	56.450903	0.582490	0.314035	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	1635721458409799680	Gaia DR2 1000396156385741312	1000396156385741312	1148557518	2015.5	105.049751	0.838232	56.508777	0.744511	1.939951	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	1635721458409799680	Gaia DR2 1000250024419296000	1000250024419296000	574278759	2015.5	103.352525	0.023159	56.395144	0.022836	0.747108	...	0.2870	0.1196	0.4051	200111.0	1.507958	1.435618	1.540208	2.427377	2.152597	2.702158

	ra	dec	l	b
0	102.023218	56.433992	159.500110	21.886331
1	105.272250	56.502381	159.938521	23.636142
2	105.087569	56.445058	159.972272	23.523299
3	103.359287	56.193791	159.970039	22.532502
4	103.240760	56.630222	159.495644	22.585832
5	103.222515	56.549474	159.577022	22.554616
6	103.361986	56.583872	159.563251	22.637842
7	103.775851	56.168097	160.063471	22.749245
8	101.836603	56.119531	159.792805	21.697240
9	102.678893	56.936834	159.086470	22.371832
10	105.667572	56.619706	159.870549	23.875620
11	102.236439	55.945081	160.041579	21.861470
12	102.433467	56.518693	159.480950	22.127565
13	105.105611	56.486566	159.931001	23.543060
14	105.362118	56.534916	159.917022	23.692076
15	102.368405	56.455920	159.535245	22.075601
16	103.676978	56.307551	159.901923	22.732851
17	102.504898	57.212852	158.771444	22.355833
18	101.804305	55.852613	160.062130	21.602850
19	102.119267	56.169582	159.789621	21.862448
20	102.071402	56.730196	159.201815	21.995346
21	102.774558	56.830031	159.212977	22.393163
22	102.137264	56.350726	159.605436	21.923322
23	103.921963	56.655100	159.575937	22.953848
24	102.932901	56.335646	159.753100	22.342795
25	105.385988	56.643913	159.804840	23.730851
26	105.542361	56.377615	160.109832	23.751272
27	102.647940	56.268813	159.775683	22.172426
28	103.711527	56.349395	159.863591	22.762315
29	103.597854	56.158071	160.045628	22.650994
...	...	...	...	...
14179	102.489214	56.081397	159.943479	22.035450
14180	102.147983	56.711074	159.234323	22.030321
14181	103.716786	56.159878	160.062696	22.715352
14182	105.252391	56.393553	160.050906	23.599250
14183	102.596930	56.310072	159.724415	22.156631
14184	105.268657	56.207321	160.250440	23.562900
14185	102.044837	56.476426	159.459891	21.909801
14186	102.284147	56.917347	159.042894	22.159330
14187	102.602635	57.186376	158.814544	22.399603
14188	102.825593	56.441695	159.625292	22.314552
14189	101.982163	56.314866	159.616222	21.830699
14190	102.244534	56.641037	159.322869	22.061669
14191	102.608435	57.217545	158.783009	22.411067
14192	102.743681	57.158952	158.865450	22.465819
14193	102.777462	56.444267	159.614761	22.289662
14194	105.891445	56.646945	159.872951	24.001706
14195	102.788402	56.867407	159.176279	22.410572
14196	102.019136	56.186270	159.755308	21.813762
14197	103.381823	56.792650	159.348148	22.703505
14198	102.970067	56.429717	159.661250	22.388156
14199	101.940300	55.874246	160.063572	21.682161
14200	102.951998	56.286436	159.807412	22.339588
14201	103.637537	56.235559	159.970962	22.692768
14202	102.089903	55.881491	160.082031	21.764630
14203	103.215704	56.428451	159.702186	22.518617
14204	102.642189	56.958484	159.058057	22.358449
14205	102.233121	57.066272	158.880057	22.173822
14206	104.919145	56.263134	160.139589	23.388225
14207	103.295677	56.738090	159.391665	22.643558
14208	101.923237	56.098917	159.828998	21.737590