General pandas Concepts

``````

In [2]:

import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)

``````
``````

3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2
0.16.2

``````

Now we’ve covered numpy the basis for pandas. We’ve covered some of the more advanced python concepts like list comprehensions and lambda functions. Let’s jump back to our roadmap.

We’ve covered the general ecosystem. We’ve covered a lot of numpy, now let’s get our hands dirty with some real data and actually using pandas. I hope you’ve watched the numpy videos that we covered earlier, they may seem academic but they’re really going to provide a fantastic foundation for what we’re going to learn now.

Now I'm going to breeze through a couple of subjects right now. Don’t feel the need to take notes or even try this code yourself. You can if you like, but it’s mainly to introduce you to the power of pandas, not for you to copy.

Pandas is made up of a couple of core types.

We’ve got an index. The index is a way of querying the data in an array or Series or querying the data in a Series or DataFrame.

``````

In [3]:

pd.Index

``````
``````

Out[3]:

pandas.core.index.Index

``````

We’ve got the Series. The Series is like a 1 dimensional array in numpy. It has some helper functions and an index that allows for querying of the data in simple ways.

We can make a simple Series from a numpy array.

``````

In [4]:

pd.Series

``````
``````

Out[4]:

pandas.core.series.Series

``````
``````

In [5]:

series_ex = pd.Series(np.arange(26))
series_ex

``````
``````

Out[5]:

0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
12    12
13    13
14    14
15    15
16    16
17    17
18    18
19    19
20    20
21    21
22    22
23    23
24    24
25    25
dtype: int64

``````

Now that we’ve created it. We can see it has an index, that we just talked about, as well as values. When we print these out, they should look similar - just like numpy arrays. Now here is where the series gets powerful.

``````

In [6]:

series_ex.index

``````
``````

Out[6]:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25],
dtype='int64')

``````

we can replace the index with our own index. In this example I’ll use the lower case values of ascii characters.

``````

In [9]:

import string
lcase = string.ascii_lowercase
ucase = string.ascii_uppercase
print(lcase, ucase)

``````
``````

abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

``````
``````

In [10]:

lcase = list(lcase)
ucase = list(ucase)
print(lcase)
print(ucase)

``````
``````

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

``````
``````

In [11]:

series_ex.index = lcase

``````
``````

In [12]:

series_ex.index

``````
``````

Out[12]:

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
dtype='object')

``````
``````

In [13]:

series_ex

``````
``````

Out[13]:

a     0
b     1
c     2
d     3
e     4
f     5
g     6
h     7
i     8
j     9
k    10
l    11
m    12
n    13
o    14
p    15
q    16
r    17
s    18
t    19
u    20
v    21
w    22
x    23
y    24
z    25
dtype: int64

``````

Now we can query just like we would if an array. You can think of the Series like an extremely powerful array.

We can query either sections or specific values.

``````

In [14]:

series_ex.ix['d':'k']

``````
``````

Out[14]:

d     3
e     4
f     5
g     6
h     7
i     8
j     9
k    10
dtype: int64

``````
``````

In [15]:

series_ex.ix['f']

``````
``````

Out[15]:

5

``````

Now don’t worry about the functions that I’m using. We’re going to go over those in detail - I just wanted to introduce the concept.

We’ve got the DataFrame which is like a matrix or series of series’. It also has an index (or multiple indexes).

``````

In [16]:

pd.DataFrame

``````
``````

Out[16]:

pandas.core.frame.DataFrame

``````

Let’s go ahead and create one. We’ve make it from the lowercase, uppercase, and a number range.

``````

In [19]:

letters = pd.DataFrame([lcase, ucase, list(range(26))])
letters

``````
``````

Out[19]:

0
1
2
3
4
5
6
7
8
9
...
16
17
18
19
20
21
22
23
24
25

0
a
b
c
d
e
f
g
h
i
j
...
q
r
s
t
u
v
w
x
y
z

1
A
B
C
D
E
F
G
H
I
J
...
Q
R
S
T
U
V
W
X
Y
Z

2
0
1
2
3
4
5
6
7
8
9
...
16
17
18
19
20
21
22
23
24
25

3 rows × 26 columns

``````

Just like a numpy array we can transpose it.

``````

In [20]:

letters = letters.transpose()

``````
``````

Out[20]:

0
1
2

0
a
A
0

1
b
B
1

2
c
C
2

3
d
D
3

4
e
E
4

``````
``````

In [21]:

letters.columns

``````
``````

Out[21]:

Int64Index([0, 1, 2], dtype='int64')

``````
``````

In [22]:

letters.index

``````
``````

Out[22]:

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25],
dtype='int64')

``````

But now that we have columns as well as an index, we can rename the columns to better describe and query the data.

``````

In [23]:

letters.columns = ['lowercase','uppercase','number']

``````
``````

In [24]:

letters.lowercase

``````
``````

Out[24]:

0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: lowercase, dtype: object

``````
``````

In [25]:

letters['lowercase']

``````
``````

Out[25]:

0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
19    t
20    u
21    v
22    w
23    x
24    y
25    z
Name: lowercase, dtype: object

``````

We can even set up a date range to associate each letter with a date. Now obviously this isn’t too helpful for the alphabet, but this allows you to do some amazing things once you are analyzing real data.

``````

In [26]:

letters.index = pd.date_range('9/1/2012',periods=26)

``````
``````

In [27]:

letters

``````
``````

Out[27]:

lowercase
uppercase
number

2012-09-01
a
A
0

2012-09-02
b
B
1

2012-09-03
c
C
2

2012-09-04
d
D
3

2012-09-05
e
E
4

2012-09-06
f
F
5

2012-09-07
g
G
6

2012-09-08
h
H
7

2012-09-09
i
I
8

2012-09-10
j
J
9

2012-09-11
k
K
10

2012-09-12
l
L
11

2012-09-13
m
M
12

2012-09-14
n
N
13

2012-09-15
o
O
14

2012-09-16
p
P
15

2012-09-17
q
Q
16

2012-09-18
r
R
17

2012-09-19
s
S
18

2012-09-20
t
T
19

2012-09-21
u
U
20

2012-09-22
v
V
21

2012-09-23
w
W
22

2012-09-24
x
X
23

2012-09-25
y
Y
24

2012-09-26
z
Z
25

``````
``````

In [28]:

letters['9-10-2012':'9-15-2012']

``````
``````

Out[28]:

lowercase
uppercase
number

2012-09-10
j
J
9

2012-09-11
k
K
10

2012-09-12
l
L
11

2012-09-13
m
M
12

2012-09-14
n
N
13

2012-09-15
o
O
14

``````

Now if you don’t have any experience with pandas this is going to seem like a lot! Don’t worry we’re going to cover everything in the coming videos, I just wanted to give you an introduction to the amazingly expressive power of pandas and python. We’ve seen the building blocks with the Index, the Series, and the DataFrame.

Now let’s dive deeper into each one.