pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)
merge
and update
have a lot in common.
There's a whole family of methods.
I can understand where a busy IT worker, with the job of helping numerous IT workers, might get frustrated with the redundancy of it all. How many times must we reinvent the same wheels? That's the skeptic's view.
The good news the same as the bad news: redundancy means generic fluency is possible. Some background in SQL helps with pandas. Some background in pandas helps with SQL.
In [ ]:
import pandas as pd
import numpy as np
In [ ]:
dfA = pd.DataFrame({'A':[1,1,1,1,1,1,1,1],
'B':[2,2,2,2,2,2,2,2]})
In [ ]:
dfA
In [ ]:
dfB = pd.DataFrame({'C':[3,3,3,3,3,3,3,3],
'D':[4,4,4,4,4,4,4,4]})
You get to choose which columns, left and right, serve as "gear teeth" for synchronizing rows (sewing them together). Or choose the index, not a column.
In the expression below, we go with the one synchronizing element: the index, on both input tables.
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True)
In [ ]:
import string
dfA.index = list(string.ascii_lowercase[:8]) # new index, of letters instead
In [ ]:
dfA
In [ ]:
dfB.index = list(string.ascii_lowercase[5:8+5]) # overlapping letters
In [ ]:
dfB
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True) # intersection, not the union
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True, how="left") # left side governs
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True, how="right") # right side governs
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True, how="outer") # the full union
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_index=True, how="inner") # same as intersection
In [ ]:
dfA = pd.DataFrame({'A':[1,2,3,4,5,6,7,8],
'B':[2,2,2,2,2,2,2,2],
'key':['dog', 'pig', 'rooster', 'monkey',
'hen', 'cat', 'slug', 'human']})
In [ ]:
from numpy.random import shuffle
keys = dfA.key.values.copy() # copy or dfA key will also reorder
shuffle(keys) # in place
dfB = pd.DataFrame({'C':[1,2,3,4,5,6,7,8],
'D':[4,4,4,4,4,4,4,4],
'key': keys})
keys
In [ ]:
dfA
In [ ]:
dfB
In [ ]:
pd.merge(dfA, dfB, on='key') # like "zipping together" on a common column
In [ ]:
dfB.rename({"C":"A", "D":"B"}, axis=1, inplace = True)
dfB
In [ ]:
dfA
In [ ]:
pd.merge(dfA, dfB, on='key', sort=False) # sort on key if sort is True
In [ ]:
pd.merge(dfA, dfB, on='key', sort=True) # sort on key if sort is True
In [ ]:
pd.merge(dfA, dfB, on='A')
In [ ]:
pd.merge(dfA, dfB, left_index=True, right_on="A")
In [ ]:
merged = pd.merge(dfA, dfB, left_index=True, right_on="A")
merged.to_json("merged.json") # save for later