In [2]:
%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt
Now NaN values are treated differently in numpy than in pandas. In numpy, as we saw earlier if you’ve got an array with a NaN value, things like summary statistics are calculated as NaN.
In [3]:
np_array = np.array([1,2,3,np.nan])
np_array
Out[3]:
In [4]:
np_array.mean()
Out[4]:
In [5]:
pd_series = pd.Series([1,2,3,np.nan])
pd_series
Out[5]:
Pandas Series treat them differently, it just ignores that empty value. We’ll cover filling in those empty values at a later time.
In [6]:
pd_series.mean()
Out[6]:
In [7]:
np.random.seed(567)
Sometimes you're going to have to make some new indexes. For example we've got two Series.
In [8]:
s1 = pd.Series(np.random.randn(5))
s1
Out[8]:
In [9]:
s2 = pd.Series(np.random.randn(5))
s2
Out[9]:
Now at times you’re going to want to reindex a Series. What does this mean? Basically that you want to destroy the index you have currently and reset it. Let’s walk through a practical example.
In [10]:
combo = pd.concat([s1, s2])
combo
Out[10]:
When we concatenate them, we can see we’ve got repeated index values. We can query just like we would normally by these index values, but in all likelihood we’ll want to replace them with a new one.
In [11]:
combo[0]
Out[11]:
In [12]:
combo.index = range(combo.count())
combo
Out[12]:
However this is rather limited in what you can achieve. It just overwrites the index we have now. What happens if we’re looking to fill in missing data with nan values? We have to use reindex which will return a new Series.
In [13]:
new_combo = combo.reindex([0,2,15,21])
new_combo
Out[13]:
We can specify how to handle nan values with fill_value. or we can specify a method by which they should be filled. This can performed during the reindexing using the method parameter (like we did with fill_value), or we can do it after the fact.
In [14]:
combo.reindex([0,2,15,21], fill_value=0)
Out[14]:
In [15]:
new_combo
Out[15]:
Here’s an example of fill which is forward fill
In [16]:
new_combo.ffill()
Out[16]:
and bfill or backward fill
In [17]:
new_combo.bfill()
Out[17]:
In [18]:
new_combo[21] = 5
In [19]:
new_combo
Out[19]:
In [20]:
new_combo.bfill()
Out[20]:
In [21]:
new_combo
Out[21]:
Fillna just fills the blanks with whatever value you specify.
In [22]:
new_combo.fillna(12)
Out[22]:
Now lastly I want to cover how we can merge different Series’ on certain values and perform simple arithmetic operations.
When s1 and s2 have the same index it’s easy to say add them together and get what we expect.
In [23]:
s1
Out[23]:
In [24]:
s2
Out[24]:
In [25]:
s1 + s2
Out[25]:
However things get more complicated when they have different indices. Now when we try and add them it only does so on the overlapping index labels. Often times this may be what we want when we’re analyzing data but other times it’s not. In order to handle that we’ve got to do some reindexing and use fill values.
In [26]:
s2.index = list(range(3,8))
s2
Out[26]:
In [27]:
s1 + s2
Out[27]:
In [28]:
s1.reindex(range(10),fill_value=0) + s2.reindex(range(10),fill_value=0)
Out[28]:
In [29]:
s2.index = range(5)
In [30]:
s1 = pd.Series(range(1,4), index= ['a','a','c'])
s1
Out[30]:
In [31]:
s2 = pd.Series(range(1,4), index=['a','a','b'])
s2
Out[31]:
Finally when we have multiple labels on an index that are the same and we try to bring these Series together with some sort of operation. We’re going to get some multiple. For example multiplying them is equal to performing a cartesian product or the two Series on those specific labels, in this example A.
In [32]:
s1 * s2
Out[32]:
Adding them together means each one is added to each one.
In [33]:
s1 + s2
Out[33]:
In [34]:
s1
Out[34]:
Lastly sometime you’re going to want to experiment with modification to Series or data frames. That can be done with the copy method which returns a copy of the data. That makes it easy to experiment with the data.
In [35]:
s1_copy = s1.copy()
In [36]:
s1_copy['a'] = 3
In [37]:
s1_copy
Out[37]:
In [38]:
s1
Out[38]:
There are a couple more methods I want to touch on, most specifically map.
In [39]:
s1.map(lambda x: x ** 2)
Out[39]:
Maps are going to feel familiar from our raw python section, except we can do something a bit more special with the pandas Series Version. We can map it to a dictionary as well. This will perform a look up in the dictionary and return whatever is there.
In [40]:
s1.map({1:2,2:3,3:12})
Out[40]:
If it doesn't find the value there, it will return NaN
In [41]:
s1.map({2:3,3:12})
Out[41]:
In [ ]: