The first main data type we will learn about for pandas is the Series data type. Let's import Pandas and explore the Series object.
A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.
Let's explore this concept through some examples:
In [1]:
import numpy as np
import pandas as pd
In [2]:
labels = ['a', 'b', 'c']
my_list = [10, 20, 30]
arr = np.array([10, 20, 30])
d = {'a': 10,'b': 20,'c': 30}
Using Lists
In [3]:
pd.Series(data = my_list)
Out[3]:
In [4]:
pd.Series(data = my_list,
index = labels)
Out[4]:
In [5]:
pd.Series(my_list, labels)
Out[5]:
NumPy Arrays
In [6]:
pd.Series(arr)
Out[6]:
In [7]:
pd.Series(arr, labels)
Out[7]:
Dictionary
In [8]:
pd.Series(d)
Out[8]:
In [9]:
pd.Series(data = labels)
Out[9]:
In [10]:
# Even functions (although unlikely that you will use this)
pd.Series([sum, print, len])
Out[10]:
The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).
Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:
In [11]:
ser1 = pd.Series([1, 2, 3, 4],
index = ['USA', 'Germany', 'USSR', 'Japan'])
In [12]:
ser1
Out[12]:
In [13]:
ser2 = pd.Series([1, 2, 5, 4],
index = ['USA', 'Germany', 'Italy', 'Japan'])
In [14]:
ser2
Out[14]:
In [15]:
ser1['USA']
Out[15]:
Operations are then also done based off of index:
In [16]:
ser1 + ser2
Out[16]: