``````

In :

%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt

``````
``````

3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2
0.16.2

``````

Now fundamentally the data frame is just an abstraction but it provides a ton of useful tools that you’re going to get to see. This video is just going to go over the basic idea of the data frame as well as how to create them.

``````

In :

import string
upcase = [x for x in string.ascii_uppercase]
lcase = [x for x in string.ascii_lowercase]

``````
``````

In :

print(upcase[:5], lcase[:5])

``````
``````

['A', 'B', 'C', 'D', 'E'] ['a', 'b', 'c', 'd', 'e']

``````

You can create DataFrames by passing in np arrays, lists of series, or dictionaries.

``````

In :

pd.DataFrame([upcase, lcase])

``````
``````

Out:

0
1
2
3
4
5
6
7
8
9
...
16
17
18
19
20
21
22
23
24
25

0
A
B
C
D
E
F
G
H
I
J
...
Q
R
S
T
U
V
W
X
Y
Z

1
a
b
c
d
e
f
g
h
i
j
...
q
r
s
t
u
v
w
x
y
z

2 rows × 26 columns

``````

We’ll be covering a lot of different aspects here but as always we’re going to start with the simple stuff. A simplification of a data frame is like an excel table or sql table. You’ve got columns and rows.

In more specific pandas terms, it's a more powerful list of series. Each column is a Series of data and it just so happens these can have relationships.

You can see that if we just pass in a list of lists it treats them like columns. Of course if that’s an issue we can just transpose it and get we’ll get them as columns.

``````

In :

pd.DataFrame([upcase, lcase]).T

``````
``````

Out:

0
1

0
A
a

1
B
b

2
C
c

3
D
d

4
E
e

5
F
f

6
G
g

7
H
h

8
I
i

9
J
j

10
K
k

11
L
l

12
M
m

13
N
n

14
O
o

15
P
p

16
Q
q

17
R
r

18
S
s

19
T
t

20
U
u

21
V
v

22
W
w

23
X
x

24
Y
y

25
Z
z

``````

This should be familiar because it’s the same way that we transpose ndarrays in numpy.

Of course we can also specify them as explicit columns but passing in a dictionary where the keys are the column names and the values are the lists of each item (or the rows).

``````

In :

letters = pd.DataFrame({'lowercase':lcase, 'uppercase':upcase})

``````
``````

Out:

lowercase
uppercase

0
a
A

1
b
B

2
c
C

3
d
D

4
e
E

``````

Now you’ll see that if these lengths are not the same, we’ll get a ValueError so it’s worth checking to make sure your data is clean before importing or using it to create a DataFrame

``````

In :

pd.DataFrame({'lowercase':lcase + , 'uppercase':upcase})

``````
``````

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-7de5286cc816> in <module>()
----> 1 pd.DataFrame({'lowercase':lcase + , 'uppercase':upcase})

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
212                                  dtype=dtype, copy=copy)
213         elif isinstance(data, dict):
--> 214             mgr = self._init_dict(data, index, columns, dtype=dtype)
216             import numpy.ma.mrecords as mrecords

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
339
340         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 341                               dtype=dtype)
342
343     def _init_ndarray(self, values, index, columns, dtype=None,

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
4796     # figure out the index, if necessary
4797     if index is None:
-> 4798         index = extract_index(arrays)
4799     else:
4800         index = _ensure_index(index)

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in extract_index(data)
4844             lengths = list(set(raw_lengths))
4845             if len(lengths) > 1:
-> 4846                 raise ValueError('arrays must all be same length')
4847
4848             if have_dicts:

ValueError: arrays must all be same length

``````
``````

In :

``````
``````

Out:

lowercase
uppercase

0
a
A

1
b
B

2
c
C

3
d
D

4
e
E

``````

We can rename the columns easily and even add a new one through a relatively simple dictionary like assignment. I'll go over some more complex methods later on.

``````

In :

letters.columns = ['LowerCase','UpperCase']

``````
``````

In :

np.random.seed(25)
letters['Number'] = np.random.random_integers(1,50,26)

``````
``````

In :

letters

``````
``````

Out:

LowerCase
UpperCase
Number

0
a
A
5

1
b
B
27

2
c
C
16

3
d
D
24

4
e
E
45

5
f
F
9

6
g
G
29

7
h
H
5

8
i
I
26

9
j
J
32

10
k
K
6

11
l
L
2

12
m
M
40

13
n
N
4

14
o
O
25

15
p
P
4

16
q
Q
21

17
r
R
46

18
s
S
4

19
t
T
2

20
u
U
23

21
v
V
32

22
w
W
49

23
x
X
48

24
y
Y
10

25
z
Z
17

``````

Now just like Series, DataFrames have data types, we can get those by accessing the dtypes of the DataFrame which will give us details on the data types we've got.

``````

In :

letters.dtypes

``````
``````

Out:

LowerCase    object
UpperCase    object
Number        int64
dtype: object

``````
``````

In :

letters.index = lcase
letters

``````
``````

Out:

LowerCase
UpperCase
Number

a
a
A
5

b
b
B
27

c
c
C
16

d
d
D
24

e
e
E
45

f
f
F
9

g
g
G
29

h
h
H
5

i
i
I
26

j
j
J
32

k
k
K
6

l
l
L
2

m
m
M
40

n
n
N
4

o
o
O
25

p
p
P
4

q
q
Q
21

r
r
R
46

s
s
S
4

t
t
T
2

u
u
U
23

v
v
V
32

w
w
W
49

x
x
X
48

y
y
Y
10

z
z
Z
17

``````

Of course we can sort maybe by a specific column or by the index(the default).

``````

In :

letters.sort('Number')

``````
``````

Out:

LowerCase
UpperCase
Number

t
t
T
2

l
l
L
2

s
s
S
4

p
p
P
4

n
n
N
4

a
a
A
5

h
h
H
5

k
k
K
6

f
f
F
9

y
y
Y
10

c
c
C
16

z
z
Z
17

q
q
Q
21

u
u
U
23

d
d
D
24

o
o
O
25

i
i
I
26

b
b
B
27

g
g
G
29

v
v
V
32

j
j
J
32

m
m
M
40

e
e
E
45

r
r
R
46

x
x
X
48

w
w
W
49

``````
``````

In :

letters.sort()

``````
``````

Out:

LowerCase
UpperCase
Number

a
a
A
5

b
b
B
27

c
c
C
16

d
d
D
24

e
e
E
45

f
f
F
9

g
g
G
29

h
h
H
5

i
i
I
26

j
j
J
32

k
k
K
6

l
l
L
2

m
m
M
40

n
n
N
4

o
o
O
25

p
p
P
4

q
q
Q
21

r
r
R
46

s
s
S
4

t
t
T
2

u
u
U
23

v
v
V
32

w
w
W
49

x
x
X
48

y
y
Y
10

z
z
Z
17

``````

We've seen how to query for one column and multiple columns isn't too much more difficult.

We can get upper and lower case columns

``````

In :

``````
``````

Out:

LowerCase
UpperCase

a
a
A

b
b
B

c
c
C

d
d
D

e
e
E

``````

We can also just query the index as well. We went over a lot of that in the Series Section and a lot of the same applies here.

We can query by index location or by letters

``````

In :

letters.iloc[5:10]

``````
``````

Out:

LowerCase
UpperCase
Number

f
f
F
9

g
g
G
29

h
h
H
5

i
i
I
26

j
j
J
32

``````
``````

In :

letters["f":"k"]

``````
``````

Out:

LowerCase
UpperCase
Number

f
f
F
9

g
g
G
29

h
h
H
5

i
i
I
26

j
j
J
32

k
k
K
6

``````

Now that we’ve covered this basic concept of pandas.

We covered how indexes integrate with both Series and DataFrames. We've covered how numpy underlies a lot of the power we've got and to be honest we've really covered a lot of the fundamental for doing data analysis with python and pandas.

Although these videos have been using fabricated data we have covered a lot of the methods that you’re going to be using on a regular basis during your analysis of data.

Let's go ahead and dive into our first data set