In [1]:

    
# pandas is commonly imported as `pd`
import pandas as pd
import numpy as np

1-D

Series



In [2]:

    
s = pd.Series([4, 7, -5, 3])
s









    Out[2]:





0    4
1    7
2   -5
3    3
dtype: int64



In [3]:

    
s.index









    Out[3]:





RangeIndex(start=0, stop=4, step=1)



In [4]:

    
s.values









    Out[4]:





array([ 4,  7, -5,  3])



In [5]:

    
s[1:3]









    Out[5]:





1    7
2   -5
dtype: int64



In [6]:

    
s2 = s**2
s2









    Out[6]:





0    16
1    49
2    25
3     9
dtype: int64



In [7]:

    
s2+s









    Out[7]:





0    20
1    56
2    20
3    12
dtype: int64



In [8]:

    
print(np.sum(s))
print(np.mean(s))
print(np.std(s))









    



9
2.25
4.43705983732



In [9]:

    
s3 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
s3









    Out[9]:





d    4
b    7
a   -5
c    3
dtype: int64



In [10]:

    
# !!!
s2+s3









    



/home/patricia/dev/tools/anaconda3/envs/playground/lib/python3.6/site-packages/pandas/core/indexes/base.py:2984: RuntimeWarning: '<' not supported between instances of 'int' and 'str', sort order is undefined for incomparable objects
  return this.join(other, how=how, return_indexers=return_indexers)






    Out[10]:





0   NaN
1   NaN
2   NaN
3   NaN
d   NaN
b   NaN
a   NaN
c   NaN
dtype: float64



In [11]:

    
s4 = pd.Series({'d':10, 'b':12, 'a':3, 'c':9})
s4









    Out[11]:





a     3
b    12
c     9
d    10
dtype: int64



In [12]:

    
s3+s4









    Out[12]:





a    -2
b    19
c    12
d    14
dtype: int64

x-D

Dataframes

The most important feature pandas gives us access to is the DataFrame. Dataframes are two-dimensional stucutres that you can think of very much like a spreadsheet with named columns and rows. In fact, it supports reading in CSV and Excel files.

First, let's read a CSV containing population information of the States of the USA.



In [13]:

    
# Create a DataFrame, `df`, from the csv located in `data/population.csv`
df = pd.read_csv('../../data/population.csv')

To see what our data looks like, we can use df.head(n) to see the first n rows, with n=5 the default:



In [14]:

    
df.head()









    Out[14]:







  
    
      
      Id
      Id2
      Geography
      2010
      2011
      2012
      2013
      2014
      2015
    
  
  
    
      0
      0400000US01
      1
      Alabama
      4785161
      4801108
      4816089
      4830533
      4846411
      4858979
    
    
      1
      0400000US02
      2
      Alaska
      714021
      722720
      731228
      737442
      737046
      738432
    
    
      2
      0400000US04
      4
      Arizona
      6408208
      6468732
      6553262
      6630799
      6728783
      6828065
    
    
      3
      0400000US05
      5
      Arkansas
      2922394
      2938538
      2949499
      2957957
      2966835
      2978204
    
    
      4
      0400000US06
      6
      California
      37334079
      37700034
      38056055
      38414128
      38792291
      39144818

We see that for each state we have two IDs and then 6 columns of years - in this case each column is the population of that state during the given year.

We can acess columns by referencing the column name just like a python dictionary:



In [15]:

    
df['2010'].head()









    Out[15]:





0     4785161
1      714021
2     6408208
3     2922394
4    37334079
Name: 2010, dtype: int64

We can get multiple columns at once by passing a list of column names:



In [16]:

    
df[['2010', '2011']].head()

And then we can access groups of rows using a range of row IDs:



In [17]:

    
df[5:10]









    Out[17]:







  
    
      
      Id
      Id2
      Geography
      2010
      2011
      2012
      2013
      2014
      2015
    
  
  
    
      5
      0400000US08
      8
      Colorado
      5048254
      5119480
      5191731
      5271132
      5355588
      5456574
    
    
      6
      0400000US09
      9
      Connecticut
      3579717
      3589759
      3593541
      3597168
      3594762
      3590886
    
    
      7
      0400000US10
      10
      Delaware
      899791
      907916
      917099
      925353
      935968
      945934
    
    
      8
      0400000US11
      11
      District of Columbia
      605126
      620472
      635342
      649540
      659836
      672228
    
    
      9
      0400000US12
      12
      Florida
      18849890
      19105533
      19352021
      19594467
      19905569
      20271272

Accessing individual rows is different. You can't just use df[i]. Instead you need to use df.loc:



In [18]:

    
df.loc[20]









    Out[18]:





Id           0400000US24
Id2                   24
Geography       Maryland
2010             5788409
2011             5844171
2012             5890740
2013             5936040
2014             5975346
2015             6006401
Name: 20, dtype: object



In [19]:

    
type(df.loc[20])









    Out[19]:





pandas.core.series.Series

Pandas gives us an easy way to get summary statistics of our data:



In [20]:

    
df.describe()









    Out[20]:







  
    
      
      Id2
      2010
      2011
      2012
      2013
      2014
      2015
    
  
  
    
      count
      52.000000
      5.200000e+01
      5.200000e+01
      5.200000e+01
      5.200000e+01
      5.200000e+01
      5.200000e+01
    
    
      mean
      29.788462
      6.020546e+06
      6.065338e+06
      6.110329e+06
      6.154240e+06
      6.200813e+06
      6.247942e+06
    
    
      std
      16.774557
      6.780109e+06
      6.851091e+06
      6.920737e+06
      6.988942e+06
      7.064993e+06
      7.140930e+06
    
    
      min
      1.000000
      5.645160e+05
      5.677680e+05
      5.770800e+05
      5.831310e+05
      5.843040e+05
      5.861070e+05
    
    
      25%
      16.750000
      1.765265e+06
      1.777821e+06
      1.791004e+06
      1.792935e+06
      1.795265e+06
      1.796828e+06
    
    
      50%
      29.500000
      4.092954e+06
      4.118196e+06
      4.141056e+06
      4.163265e+06
      4.191910e+06
      4.227034e+06
    
    
      75%
      42.500000
      6.609542e+06
      6.664655e+06
      6.717658e+06
      6.774928e+06
      6.832134e+06
      6.913636e+06
    
    
      max
      72.000000
      3.733408e+07
      3.770003e+07
      3.805606e+07
      3.841413e+07
      3.879229e+07
      3.914482e+07

One thing you might notice is that describe only lists the numeric columns, but Id2 is included in that even though it would be better to treat it as a string. pandas tries to guess the datatype of each column and in this case, all of the values in Id2 are integers, so it gets treated as an integer.

We can see the datatype details:



In [21]:

    
df.dtypes









    Out[21]:





Id           object
Id2           int64
Geography    object
2010          int64
2011          int64
2012          int64
2013          int64
2014          int64
2015          int64
dtype: object

We can cast Id2 to a string using astype and then override the original column:



In [22]:

    
df['Id2'] = df['Id2'].astype(str)

df.dtypes









    Out[22]:





Id           object
Id2          object
Geography    object
2010          int64
2011          int64
2012          int64
2013          int64
2014          int64
2015          int64
dtype: object

Or we could have specified the data type when we originally read the CSV:



In [23]:

    
# Pass a dictionary to the dtype parameter with `'column': dtype`
df = pd.read_csv('../../data/population.csv', dtype={'Id2': str})

df.dtypes









    Out[23]:





Id           object
Id2          object
Geography    object
2010          int64
2011          int64
2012          int64
2013          int64
2014          int64
2015          int64
dtype: object

Operations and Filtering

Each column of data behaves very much like a normal numpy array and thus can be used for mathematical operations. For example, to get the population change from 2014 to 2015 for each state:



In [24]:

    
df['2015'] - df['2014']









    Out[24]:





0      12568
1       1386
2      99282
3      11369
4     352527
5     100986
6      -3876
7       9966
8      12392
9     365703
10    117728
11     11346
12     20124
13    -22194
14     21800
15     14418
16      9134
17     12475
18     21734
19      -928
20     31055
21     39298
22      6270
23     32469
24     -1110
25     19845
26      9697
27     13210
28     52564
29      2612
30     19169
31      -458
32     46933
33    102415
34     16887
35     16425
36     31728
37     57775
38      8736
39      1391
40     66986
41      5165
42     52520
43    490036
44     51421
45      -725
46     54895
47    107185
48     -4623
49     11905
50      1803
51    -60706
dtype: int64

Rather than continually computing that value, we can save it to a new column in the DataFrame. Let's make a new column called 'change':



In [25]:

    
df['pop_change'] = df['2015'] - df['2014']

df.head()









    Out[25]:







  
    
      
      Id
      Id2
      Geography
      2010
      2011
      2012
      2013
      2014
      2015
      pop_change
    
  
  
    
      0
      0400000US01
      1
      Alabama
      4785161
      4801108
      4816089
      4830533
      4846411
      4858979
      12568
    
    
      1
      0400000US02
      2
      Alaska
      714021
      722720
      731228
      737442
      737046
      738432
      1386
    
    
      2
      0400000US04
      4
      Arizona
      6408208
      6468732
      6553262
      6630799
      6728783
      6828065
      99282
    
    
      3
      0400000US05
      5
      Arkansas
      2922394
      2938538
      2949499
      2957957
      2966835
      2978204
      11369
    
    
      4
      0400000US06
      6
      California
      37334079
      37700034
      38056055
      38414128
      38792291
      39144818
      352527

Just like numpy, we can also do element-wise comparisons. For example, to find out whether a state's population decreased over that year:



In [26]:

    
df['pop_change'] < 0









    Out[26]:





0     False
1     False
2     False
3     False
4     False
5     False
6      True
7     False
8     False
9     False
10    False
11    False
12    False
13     True
14    False
15    False
16    False
17    False
18    False
19     True
20    False
21    False
22    False
23    False
24     True
25    False
26    False
27    False
28    False
29    False
30    False
31     True
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
41    False
42    False
43    False
44    False
45     True
46    False
47    False
48     True
49    False
50    False
51     True
Name: pop_change, dtype: bool

The True values are the states with negative population change (decrease). But that boolean array by itself isn't very useful. We can use that array as an index to filter our DataFrame:



In [27]:

    
df[df['pop_change'] < 0]









    Out[27]:







  
    
      
      Id
      Id2
      Geography
      2010
      2011
      2012
      2013
      2014
      2015
      pop_change
    
  
  
    
      6
      0400000US09
      9
      Connecticut
      3579717
      3589759
      3593541
      3597168
      3594762
      3590886
      -3876
    
    
      13
      0400000US17
      17
      Illinois
      12841249
      12861882
      12875167
      12889580
      12882189
      12859995
      -22194
    
    
      19
      0400000US23
      23
      Maine
      1327695
      1328257
      1328888
      1328778
      1330256
      1329328
      -928
    
    
      24
      0400000US28
      28
      Mississippi
      2970316
      2977999
      2985660
      2990976
      2993443
      2992333
      -1110
    
    
      31
      0400000US35
      35
      New Mexico
      2064741
      2078226
      2084792
      2086890
      2085567
      2085109
      -458
    
    
      45
      0400000US50
      50
      Vermont
      625984
      626687
      626398
      627129
      626767
      626042
      -725
    
    
      48
      0400000US54
      54
      West Virginia
      1854225
      1854948
      1856283
      1852985
      1848751
      1844128
      -4623
    
    
      51
      0400000US72
      72
      Puerto Rico
      3721526
      3678736
      3634487
      3593079
      3534888
      3474182
      -60706

Now we have a subset of the DataFrame with only the decreasing populations.

Statistical Operations



In [28]:

    
print(df.mean())
print(df.std())
print(df.max())
print(df.sum())









    



2010          6.020546e+06
2011          6.065338e+06
2012          6.110329e+06
2013          6.154240e+06
2014          6.200813e+06
2015          6.247942e+06
pop_change    4.712910e+04
dtype: float64
2010          6.780109e+06
2011          6.851091e+06
2012          6.920737e+06
2013          6.988942e+06
2014          7.064993e+06
2015          7.140930e+06
pop_change    9.629267e+04
dtype: float64
Id            0400000US72
Id2                     9
Geography         Wyoming
2010             37334079
2011             37700034
2012             38056055
2013             38414128
2014             38792291
2015             39144818
pop_change         490036
dtype: object
Id            0400000US010400000US020400000US040400000US0504...
Id2           1245689101112131516171819202122232425262728293...
Geography     AlabamaAlaskaArizonaArkansasCaliforniaColorado...
2010                                                  313068389
2011                                                  315397593
2012                                                  317737110
2013                                                  320020474
2014                                                  322442289
2015                                                  324893002
pop_change                                              2450713
dtype: object



In [29]:

    
# or over single columns
print(df[['2010','2011']].mean())









    



2010    6.020546e+06
2011    6.065338e+06
dtype: float64

Merging DataFrames

If you have data across multiple files, as long as there is a common column they can be joined. To start with, let's read in a CSV which contains the number of housing units in the state for each year from 2010-2015.



In [30]:

    
housing = pd.read_csv('../../data/housing.csv', dtype={'Id2': str})

housing.head()









    Out[30]:







  
    
      
      Id
      Id2
      Geography
      2010
      2011
      2012
      2013
      2014
      2015
    
  
  
    
      0
      0400000US01
      1
      Alabama
      2174266
      2181371
      2189738
      2199413
      2207849
      2218287
    
    
      1
      0400000US02
      2
      Alaska
      307059
      307377
      307664
      308089
      308578
      309448
    
    
      2
      0400000US04
      4
      Arizona
      2848293
      2857849
      2870144
      2887506
      2907820
      2929030
    
    
      3
      0400000US05
      5
      Arkansas
      1317756
      1323130
      1328315
      1334978
      1341017
      1347528
    
    
      4
      0400000US06
      6
      California
      13688185
      13728585
      13771494
      13828412
      13907648
      13987625

Since the Id column is shared, it can easily be merged with our original DataFrame:



In [31]:

    
merged = df.merge(housing, on='Id')

merged.head()









    Out[31]:







  
    
      
      Id
      Id2_x
      Geography_x
      2010_x
      2011_x
      2012_x
      2013_x
      2014_x
      2015_x
      pop_change
      Id2_y
      Geography_y
      2010_y
      2011_y
      2012_y
      2013_y
      2014_y
      2015_y
    
  
  
    
      0
      0400000US01
      1
      Alabama
      4785161
      4801108
      4816089
      4830533
      4846411
      4858979
      12568
      1
      Alabama
      2174266
      2181371
      2189738
      2199413
      2207849
      2218287
    
    
      1
      0400000US02
      2
      Alaska
      714021
      722720
      731228
      737442
      737046
      738432
      1386
      2
      Alaska
      307059
      307377
      307664
      308089
      308578
      309448
    
    
      2
      0400000US04
      4
      Arizona
      6408208
      6468732
      6553262
      6630799
      6728783
      6828065
      99282
      4
      Arizona
      2848293
      2857849
      2870144
      2887506
      2907820
      2929030
    
    
      3
      0400000US05
      5
      Arkansas
      2922394
      2938538
      2949499
      2957957
      2966835
      2978204
      11369
      5
      Arkansas
      1317756
      1323130
      1328315
      1334978
      1341017
      1347528
    
    
      4
      0400000US06
      6
      California
      37334079
      37700034
      38056055
      38414128
      38792291
      39144818
      352527
      6
      California
      13688185
      13728585
      13771494
      13828412
      13907648
      13987625

Since the column names are all shared, pandas appends '_x' and '_y' to columns from the left and right dataframes, respectively.

This isn't very user-friendly, so we can use the parameter suffixes to specify custom labels to append. Furthermore, we can also specify Id2 and Geography in on so we don't duplicate those columns.



In [32]:

    
merged = df.merge(housing, on=['Id', 'Id2', 'Geography'], suffixes=('_population', '_housing'))

merged.head()









    Out[32]:







  
    
      
      Id
      Id2
      Geography
      2010_population
      2011_population
      2012_population
      2013_population
      2014_population
      2015_population
      pop_change
      2010_housing
      2011_housing
      2012_housing
      2013_housing
      2014_housing
      2015_housing
    
  
  
    
      0
      0400000US01
      1
      Alabama
      4785161
      4801108
      4816089
      4830533
      4846411
      4858979
      12568
      2174266
      2181371
      2189738
      2199413
      2207849
      2218287
    
    
      1
      0400000US02
      2
      Alaska
      714021
      722720
      731228
      737442
      737046
      738432
      1386
      307059
      307377
      307664
      308089
      308578
      309448
    
    
      2
      0400000US04
      4
      Arizona
      6408208
      6468732
      6553262
      6630799
      6728783
      6828065
      99282
      2848293
      2857849
      2870144
      2887506
      2907820
      2929030
    
    
      3
      0400000US05
      5
      Arkansas
      2922394
      2938538
      2949499
      2957957
      2966835
      2978204
      11369
      1317756
      1323130
      1328315
      1334978
      1341017
      1347528
    
    
      4
      0400000US06
      6
      California
      37334079
      37700034
      38056055
      38414128
      38792291
      39144818
      352527
      13688185
      13728585
      13771494
      13828412
      13907648
      13987625

We can also notice that when we did the merge, we lost one row. That is because the housing dataset didn't contain data for Puerto Rico.



In [33]:

    
print('Population:', len(df), 'Merged:', len(merged))









    



Population: 52 Merged: 51

Now we can do something involving both datasets. For example, finding the ratio of people to houses:



In [34]:

    
merged['ratio'] = merged['2015_population']/merged['2015_housing']

merged['ratio'].head()









    Out[34]:





0    2.190419
1    2.386288
2    2.331169
3    2.210124
4    2.798532
Name: ratio, dtype: float64

Now let's use sort_values to view the states with the lowest ratio of people to houses and view just the state name and ratio columns:



In [35]:

    
# Sort the data by ratio
merged_sorted = merged.sort_values(by=['ratio'])

# Just get the `Geography` and `ratio` columns
merged_subset = merged_sorted[['Geography', 'ratio']]

# View the first 5
merged_subset.head()









    Out[35]:







  
    
      
      Geography
      ratio
    
  
  
    
      19
      Maine
      1.822463
    
    
      45
      Vermont
      1.915122
    
    
      48
      West Virginia
      2.082643
    
    
      34
      North Dakota
      2.085428
    
    
      26
      Montana
      2.090152

And now to view the top 5 use ascending=False:



In [36]:

    
merged.sort_values(by=['ratio'], ascending=False)[['Geography', 'ratio']].head()

Grouping Rows by Value

Sometimes you'd like to aggregate groups of similar rows. For instance, let's compare the change in housing stock between the states with decreasing population to those with increasing population.

First let's make a column for the housing change and make a column with either True or False for whether the population is increasing.



In [37]:

    
merged['housing_change'] = merged['2015_housing'] - merged['2014_housing']
merged['pop_change_increasing'] = merged['pop_change'] > 0

Then use groupby to group our rows by whether they had an increasing or decreasing population change from 2014-2015:



In [38]:

    
grouped = merged.groupby('pop_change_increasing')

Then we can run aggeregate functions on our groups or describe to run the same summary statistics we did before:



In [39]:

    
grouped.describe()['housing_change']









    Out[39]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
    
      pop_change_increasing
      
      
      
      
      
      
      
      
    
  
  
    
      False
      7.0
      3615.571429
      3433.576069
      886.0
      1344.0
      2406.0
      4546.00
      10237.0
    
    
      True
      44.0
      17930.954545
      27113.340721
      -41.0
      6383.0
      10905.5
      17219.25
      162312.0

We can see that the average housing increase for states with decreasing population is lower. But the change in housing for all those states is still positive.

Statistical Operations



In [40]:

    
grouped.mean()









    Out[40]:







  
    
      
      2010_population
      2011_population
      2012_population
      2013_population
      2014_population
      2015_population
      pop_change
      2010_housing
      2011_housing
      2012_housing
      2013_housing
      2014_housing
      2015_housing
      ratio
      housing_change
    
    
      pop_change_increasing
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      False
      3.609132e+06
      3.616823e+06
      3.621533e+06
      3.624787e+06
      3.623105e+06
      3.618260e+06
      -4844.857143
      1.555811e+06
      1.557427e+06
      1.558741e+06
      1.560897e+06
      1.563665e+06
      1.567280e+06
      2.174036
      3615.571429
    
    
      True
      6.456430e+06
      6.509116e+06
      6.562543e+06
      6.614861e+06
      6.671492e+06
      6.729341e+06
      57848.477273
      2.747959e+06
      2.756344e+06
      2.765614e+06
      2.779173e+06
      2.796137e+06
      2.814068e+06
      2.328579
      17930.954545



In [41]:

    
grouped.mean()['housing_change']









    Out[41]:





pop_change_increasing
False     3615.571429
True     17930.954545
Name: housing_change, dtype: float64

Original Source

of this notebook:

Awesome Python Data Science Primer

	Id	Id2	Geography	2010	2011	2012	2013	2014	2015
0	0400000US01	1	Alabama	4785161	4801108	4816089	4830533	4846411	4858979
1	0400000US02	2	Alaska	714021	722720	731228	737442	737046	738432
2	0400000US04	4	Arizona	6408208	6468732	6553262	6630799	6728783	6828065
3	0400000US05	5	Arkansas	2922394	2938538	2949499	2957957	2966835	2978204
4	0400000US06	6	California	37334079	37700034	38056055	38414128	38792291	39144818

	Id	Id2	Geography	2010	2011	2012	2013	2014	2015
5	0400000US08	8	Colorado	5048254	5119480	5191731	5271132	5355588	5456574
6	0400000US09	9	Connecticut	3579717	3589759	3593541	3597168	3594762	3590886
7	0400000US10	10	Delaware	899791	907916	917099	925353	935968	945934
8	0400000US11	11	District of Columbia	605126	620472	635342	649540	659836	672228
9	0400000US12	12	Florida	18849890	19105533	19352021	19594467	19905569	20271272

	Id2	2010	2011	2012	2013	2014	2015
count	52.000000	5.200000e+01	5.200000e+01	5.200000e+01	5.200000e+01	5.200000e+01	5.200000e+01
mean	29.788462	6.020546e+06	6.065338e+06	6.110329e+06	6.154240e+06	6.200813e+06	6.247942e+06
std	16.774557	6.780109e+06	6.851091e+06	6.920737e+06	6.988942e+06	7.064993e+06	7.140930e+06
min	1.000000	5.645160e+05	5.677680e+05	5.770800e+05	5.831310e+05	5.843040e+05	5.861070e+05
25%	16.750000	1.765265e+06	1.777821e+06	1.791004e+06	1.792935e+06	1.795265e+06	1.796828e+06
50%	29.500000	4.092954e+06	4.118196e+06	4.141056e+06	4.163265e+06	4.191910e+06	4.227034e+06
75%	42.500000	6.609542e+06	6.664655e+06	6.717658e+06	6.774928e+06	6.832134e+06	6.913636e+06
max	72.000000	3.733408e+07	3.770003e+07	3.805606e+07	3.841413e+07	3.879229e+07	3.914482e+07

	Id	Id2	Geography	2010	2011	2012	2013	2014	2015
0	0400000US01	1	Alabama	2174266	2181371	2189738	2199413	2207849	2218287
1	0400000US02	2	Alaska	307059	307377	307664	308089	308578	309448
2	0400000US04	4	Arizona	2848293	2857849	2870144	2887506	2907820	2929030
3	0400000US05	5	Arkansas	1317756	1323130	1328315	1334978	1341017	1347528
4	0400000US06	6	California	13688185	13728585	13771494	13828412	13907648	13987625

	Geography	ratio
19	Maine	1.822463
45	Vermont	1.915122
48	West Virginia	2.082643
34	North Dakota	2.085428
26	Montana	2.090152

	Geography	ratio
44	Utah	2.886233
4	California	2.798532
11	Hawaii	2.688684
43	Texas	2.594424
30	New Jersey	2.492766

	count	mean	std	min	25%	50%	75%	max
pop_change_increasing
False	7.0	3615.571429	3433.576069	886.0	1344.0	2406.0	4546.00	10237.0
True	44.0	17930.954545	27113.340721	-41.0	6383.0	10905.5	17219.25	162312.0

	2010_population	2011_population	2012_population	2013_population	2014_population	2015_population	pop_change	2010_housing	2011_housing	2012_housing	2013_housing	2014_housing	2015_housing	ratio	housing_change
pop_change_increasing
False	3.609132e+06	3.616823e+06	3.621533e+06	3.624787e+06	3.623105e+06	3.618260e+06	-4844.857143	1.555811e+06	1.557427e+06	1.558741e+06	1.560897e+06	1.563665e+06	1.567280e+06	2.174036	3615.571429
True	6.456430e+06	6.509116e+06	6.562543e+06	6.614861e+06	6.671492e+06	6.729341e+06	57848.477273	2.747959e+06	2.756344e+06	2.765614e+06	2.779173e+06	2.796137e+06	2.814068e+06	2.328579	17930.954545

1-D

Series

x-D

Dataframes

Operations and Filtering

Statistical Operations

Merging DataFrames

Grouping Rows by Value

Statistical Operations

Further Reading

Original Source