In [30]:
import numpy as np
import pandas as pd
Let's create an arbitrary classification matrix (CM)
In [59]:
CM=pd.DataFrame.from_items([('A', [1, 0, 0]), ('B', [0, 1, 0]),('C', [0, 0, 1]),('D', [0, 0, 1])],
orient='index', columns=['1', '2', '3'])
display (CM)
Notice that moving from the first classification to the second one is possible since the 'totals' of rows are all equal to 1 (see below the other way around)
In [54]:
CM_tot2=CM.sum(axis=1)
C2=CM
C2['total']=CM_tot2
display (C2)
Let's create an arbitrary vector classified according to the first classification
In [56]:
V1 = np.random.randint(0, 10, size=4).reshape(4, 1)
Class_A = [_ for _ in 'ABCD']
V1_A = pd.DataFrame(V1, index=Class_A, columns = ['amount'])
display (V1_A)
This vector is converted into the second classification
In [60]:
V1_A_transp=pd.DataFrame.transpose(V1_A)
V1_B= pd.DataFrame((np.dot(V1_A_transp, CM)), index=['amount'], columns = ['1','2','3'])
display (V1_B)
Moving from second classifcation to the second one may cause problems, since the "totals" of columns is not always 1.
In [61]:
sum_row = {col: CM[col].sum() for col in CM}
#sum_row =CM.sum()
sum_rCM = pd.DataFrame(sum_row, index=["Total"])
CM_tot = CM.append(sum_rCM)
display (CM_tot)
In this case moving from a the second classification to the first one will duplicate the value in the category "3".
Here an example of correspondence table
In [ ]: