Title: Join And Merge Pandas Dataframe
Slug: pandas_join_merge_dataframe
Summary: Join And Merge Pandas Dataframe
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon
In [1]:
import pandas as pd
from IPython.display import display
from IPython.display import Image
In [2]:
raw_data = {
'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a
Out[2]:
In [3]:
raw_data = {
'subject_id': ['4', '5', '6', '7', '8'],
'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b
Out[3]:
In [4]:
raw_data = {
'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n
Out[4]:
In [5]:
df_new = pd.concat([df_a, df_b])
df_new
Out[5]:
In [6]:
pd.concat([df_a, df_b], axis=1)
Out[6]:
In [7]:
pd.merge(df_new, df_n, on='subject_id')
Out[7]:
In [8]:
pd.merge(df_new, df_n, left_on='subject_id', right_on='subject_id')
Out[8]:
"Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null." - source
In [9]:
pd.merge(df_a, df_b, on='subject_id', how='outer')
Out[9]:
"Inner join produces only the set of records that match in both Table A and Table B." - source
In [10]:
pd.merge(df_a, df_b, on='subject_id', how='inner')
Out[10]:
In [11]:
pd.merge(df_a, df_b, on='subject_id', how='right')
Out[11]:
"Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null." - source
In [12]:
pd.merge(df_a, df_b, on='subject_id', how='left')
Out[12]:
In [13]:
pd.merge(df_a, df_b, on='subject_id', how='left', suffixes=('_left', '_right'))
Out[13]:
In [14]:
pd.merge(df_a, df_b, right_index=True, left_index=True)
Out[14]: