Example on the use of correspondence tables

In this simple example it is shown how a vector classified according to one classification is converted into another classification

  • The first classification has four categories: A, B, C, D
  • The second classification has three categories: 1, 2, 3

In [30]:
import numpy as np
import pandas as pd

Let's create an arbitrary classification matrix (CM)


In [59]:
CM=pd.DataFrame.from_items([('A', [1, 0, 0]), ('B', [0, 1, 0]),('C', [0, 0, 1]),('D', [0, 0, 1])], 
                           orient='index', columns=['1', '2', '3'])
display (CM)


1 2 3
A 1 0 0
B 0 1 0
C 0 0 1
D 0 0 1

Notice that moving from the first classification to the second one is possible since the 'totals' of rows are all equal to 1 (see below the other way around)


In [54]:
CM_tot2=CM.sum(axis=1)
C2=CM
C2['total']=CM_tot2
display (C2)


1 2 3 total
A 1 0 0 1
B 0 1 0 1
C 0 0 1 1
D 0 0 1 1

Let's create an arbitrary vector classified according to the first classification


In [56]:
V1 = np.random.randint(0, 10, size=4).reshape(4, 1)
Class_A = [_ for _ in 'ABCD']
V1_A = pd.DataFrame(V1, index=Class_A, columns = ['amount'])
display (V1_A)


amount
A 8
B 1
C 3
D 3

This vector is converted into the second classification


In [60]:
V1_A_transp=pd.DataFrame.transpose(V1_A)
V1_B= pd.DataFrame((np.dot(V1_A_transp, CM)), index=['amount'], columns = ['1','2','3'])
display (V1_B)


1 2 3
amount 8 1 6

Moving from second classifcation to the second one may cause problems, since the "totals" of columns is not always 1.


In [61]:
sum_row = {col: CM[col].sum() for col in CM}
#sum_row =CM.sum()
sum_rCM = pd.DataFrame(sum_row, index=["Total"])
CM_tot = CM.append(sum_rCM)
display (CM_tot)


1 2 3
A 1 0 0
B 0 1 0
C 0 0 1
D 0 0 1
Total 1 1 2

In this case moving from a the second classification to the first one will duplicate the value in the category "3".


In [ ]: