SMI - Similarity of Matrices Index

SMI is a measure of the similarity between the dominant subspaces of two matrices. It comes in two flavours (projections):

OP - Orthogonal Projections
PR - Procrustes Rotations.

The former (default) compares subspaces using ordinary least squares and can be formulated as the explained variance when predicting one matrix subspace using the other matrix subspace. PR is a restriction where only rotation and scaling is allowed in the similarity calculations.

Subspaces are by default computed using Principal Component Analysis (PCA). When the number of components extracted from one of the matrices is smaller than the other, the explained variance is calculated predicting the smaller subspace by using the larger subspace.

Example: Sensory and Fluorescence data

Import packages and prepare data

First import hoggorm for analysis of the data and hoggormPlot for plotting of the analysis results. We'll also import pandas such that we can read the data into a data frame. numpy is needed for checking dimensions of the data.



In [18]:

    
import hoggorm as ho
import hoggormplot as hop
import pandas as pd
import numpy as np

Next, load the data that we are going to analyse using hoggorm. After the data has been loaded into the pandas data frame, we'll display it in the notebook.



In [19]:

    
# Load fluorescence data
X1_df = pd.read_csv('cheese_fluorescence.txt', index_col=0, sep='\t')
X1_df









    Out[19]:







  
    
      
      V1
      V2
      V3
      V4
      V5
      V6
      V7
      V8
      V9
      V10
      ...
      V283
      V284
      V285
      V286
      V287
      V288
      V289
      V290
      V291
      V292
    
  
  
    
      Pr 1
      19222.109
      19937.834
      20491.777
      20994.000
      21427.500
      21915.891
      22273.834
      22750.279
      23215.609
      23497.221
      ...
      1338.0557
      1311.9445
      1275.1666
      1235.7777
      1204.6666
      1184.944500
      1140.500000
      1109.888800
      1099.666600
      1070.500000
    
    
      Pr 2
      18965.945
      19613.334
      20157.277
      20661.557
      21167.334
      21554.057
      22031.391
      22451.889
      22915.334
      23311.611
      ...
      1244.5555
      1217.1666
      1183.9445
      1156.5000
      1130.0555
      1084.000000
      1066.500000
      1039.944500
      1018.500000
      992.083313
    
    
      Pr 3
      19698.221
      20438.279
      21124.721
      21740.666
      22200.445
      22709.725
      23222.111
      23646.225
      24047.389
      24519.111
      ...
      1409.5000
      1366.9445
      1319.8888
      1289.7778
      1258.2223
      1235.166600
      1200.611000
      1173.277800
      1126.555700
      1097.250000
    
    
      Pr 4
      20037.334
      20841.779
      21510.889
      22096.443
      22605.889
      23077.834
      23547.725
      23974.445
      24490.889
      24896.945
      ...
      1374.5000
      1332.3334
      1287.5000
      1252.9445
      1228.8334
      1195.944300
      1159.166600
      1153.611200
      1117.222300
      1088.333400
    
    
      Pr 5
      19874.889
      20561.834
      21248.500
      21780.889
      22328.834
      22812.057
      23266.111
      23723.334
      24171.221
      24601.943
      ...
      1329.0000
      1291.9445
      1256.7778
      1226.6110
      1209.7777
      1169.888800
      1144.555500
      1123.333400
      1084.888800
      1081.500000
    
    
      Pr 6
      19529.391
      20157.834
      20847.500
      21308.111
      21716.443
      22165.775
      22583.166
      22993.779
      23520.779
      24015.221
      ...
      1737.3888
      1696.5000
      1635.5000
      1580.3334
      1556.8334
      1501.222200
      1463.555500
      1419.277800
      1365.388800
      1343.416600
    
    
      Pr 7
      18795.582
      19485.582
      20139.584
      20644.668
      21013.668
      21480.668
      21873.666
      22302.418
      22662.500
      23097.000
      ...
      1323.3333
      1286.9167
      1261.0000
      1235.0833
      1190.0833
      1174.666700
      1129.166700
      1095.416600
      1070.416600
      1049.500000
    
    
      Pr 8
      20052.943
      20839.445
      21569.221
      22150.221
      22662.389
      23160.389
      23589.943
      24117.500
      24484.334
      24971.666
      ...
      1140.2778
      1113.1112
      1075.8334
      1055.7778
      1037.1112
      1025.777800
      986.277832
      969.388855
      944.944397
      936.083313
    
    
      Pr 9
      19001.391
      19709.943
      20368.443
      20939.111
      21383.111
      21879.111
      22335.221
      22758.834
      23213.443
      23688.891
      ...
      1119.1666
      1076.7777
      1045.3888
      1033.1112
      1021.3333
      994.222229
      962.111084
      943.000000
      920.166687
      899.083313
    
    
      Pr 10
      20602.834
      21406.389
      22144.611
      22775.000
      23407.443
      23940.609
      24486.111
      24976.275
      25480.779
      25966.279
      ...
      1248.2777
      1226.7778
      1195.0000
      1169.5000
      1135.9445
      1120.888800
      1069.555500
      1062.833400
      1034.722200
      1016.750000
    
    
      Pr 11
      20116.443
      20880.611
      21584.834
      22137.775
      22667.166
      23144.557
      23592.889
      24122.225
      24518.221
      25007.000
      ...
      1237.1112
      1196.5000
      1164.8334
      1152.7223
      1118.1666
      1104.277800
      1057.555700
      1046.666600
      1021.611100
      1007.166700
    
    
      Pr 12
      20282.721
      21016.500
      21678.279
      22241.555
      22751.779
      23257.945
      23730.000
      24221.221
      24638.834
      25100.055
      ...
      1192.2778
      1177.8334
      1130.8889
      1121.0555
      1099.6112
      1068.722200
      1053.277700
      1034.388900
      993.444397
      992.583313
    
    
      Pr 13
      19508.000
      20124.445
      20701.057
      21145.500
      21529.389
      21974.389
      22338.834
      22726.611
      23156.000
      23600.000
      ...
      1710.4445
      1675.3334
      1589.2778
      1568.9445
      1515.2778
      1480.611200
      1424.500000
      1404.777800
      1358.333400
      1334.250000
    
    
      Pr 14
      18739.391
      19444.275
      20072.555
      20603.500
      21035.389
      21470.834
      21912.889
      22356.279
      22747.225
      23205.889
      ...
      1158.2778
      1155.4445
      1102.9443
      1081.4445
      1060.2778
      1044.388800
      999.722229
      985.222168
      954.722229
      935.083313
    
  

14 rows × 292 columns



In [20]:

    
# Load sensory data
X2_df = pd.read_csv('cheese_sensory.txt', index_col=0, sep='\t')
X2_df

Orthogonal Projections

The default comparison between two matrices with SMI is using Orthogonal Projections, i.e. ordinary least squares regression is used to relate the dominant subspaces in the two matrices.

In contrast to PLSR, SMI is not performing av prediction of sensory properties from fluorescence measurements, but rather treats the two sets of measurements symmetrically, focusing on the major variation in each of them.

More details regarding the use of the SMI are found in the documentation.



In [21]:

    
# Get the values from the data frame
X1 = X1_df.values
X2 = X2_df.values

smiOP = ho.SMI(X1, X2, ncomp1=10, ncomp2=10)
print(np.round(smiOP.smi, 2))









    



[[0.21 0.31 0.32 0.58 0.59 0.65 0.66 0.67 0.83 0.83]
 [0.65 0.56 0.62 0.76 0.77 0.8  0.8  0.81 0.9  0.9 ]
 [0.72 0.61 0.54 0.65 0.66 0.69 0.74 0.76 0.82 0.85]
 [0.73 0.65 0.61 0.59 0.6  0.63 0.68 0.76 0.81 0.86]
 [0.74 0.66 0.62 0.6  0.51 0.55 0.6  0.71 0.77 0.84]
 [0.86 0.8  0.76 0.71 0.66 0.59 0.63 0.72 0.78 0.84]
 [0.89 0.82 0.78 0.75 0.69 0.64 0.64 0.72 0.78 0.83]
 [0.94 0.88 0.83 0.84 0.84 0.76 0.75 0.74 0.8  0.85]
 [0.97 0.94 0.88 0.9  0.89 0.8  0.79 0.78 0.79 0.84]
 [0.99 0.96 0.89 0.91 0.91 0.83 0.84 0.83 0.83 0.8 ]]

A hypothesis can be made regarding the similarity of two subspaces where the null hypothesis is that they are equal and the alternative is that they are not. Permutation testing yields the following P-values (probabilities that the observed difference could be larger given the null hypothesis is true).



In [22]:

    
print(np.round(smiOP.significance(), 2))









    



[[0.   0.   0.   0.19 0.29 0.57 0.58 0.53 0.78 0.59]
 [0.09 0.   0.09 0.71 0.9  0.99 0.99 0.96 0.98 0.92]
 [0.33 0.08 0.   0.29 0.66 0.94 0.98 0.94 0.94 0.84]
 [0.53 0.32 0.18 0.07 0.41 0.87 0.96 0.98 0.95 0.92]
 [0.69 0.59 0.5  0.4  0.02 0.5  0.81 0.94 0.92 0.9 ]
 [0.96 0.99 0.99 0.99 0.97 0.8  0.95 0.98 0.96 0.94]
 [0.98 0.99 0.99 1.   0.99 0.96 0.98 0.98 0.97 0.94]
 [0.99 0.99 0.99 1.   1.   1.   1.   1.   1.   0.99]
 [0.99 1.   0.99 1.   1.   0.99 0.99 0.99 1.   0.99]
 [0.99 0.99 0.94 0.99 1.   0.9  0.96 0.96 0.98 0.9 ]]

Finally we visualize the SMI values and their corresponding P-values.



In [23]:

    
# Plot similarities
hop.plotSMI(smiOP, [10, 10], X1name='fluorescence', X2name='sensory')

The significance symbols in the diamond plot above indicate if a chosen subspace from one matrix can be found inside the subspace from the other matrix ($\supset$, $\subset$, =), or if there is signficant difference (P-values <0.001*** <0.01 ** <0.05 * <0.1 . >=0.1).

From the P-values and plot we can observe that the there is a significant difference between the sensory data and the fluorescence data in the first of the dominant subspaces of the matrices. Looking only at the diagonal, we see that 6 components are needed before we loose the significance completely. Looking at the one-dimensional subspaces, we can observe that four sensory components are needed before there is no significant difference to the first fluorescence component.

This can be interpreted as some fundamental difference in the information spanned by flurescence measurements and sensory perceptions that is only masked if large proportions of the subspaces are included.

Procrustes Rotations

The similarities using PR <= OP, and in this simple case OP$^2$ = PR. Otherwise the pattern stays the same.



In [4]:

    
smiPR = ho.SMI(X1, X2, ncomp1=10, ncomp2=10, projection="Procrustes")
print(np.round(smiPR.smi, 2))









    



[[0.21 0.31 0.32 0.58 0.59 0.65 0.66 0.67 0.83 0.83]
 [0.65 0.52 0.57 0.75 0.76 0.79 0.79 0.8  0.89 0.9 ]
 [0.72 0.56 0.5  0.62 0.63 0.68 0.72 0.75 0.81 0.85]
 [0.73 0.61 0.58 0.51 0.52 0.56 0.64 0.74 0.79 0.85]
 [0.74 0.62 0.59 0.53 0.41 0.48 0.55 0.66 0.73 0.82]
 [0.86 0.79 0.75 0.69 0.59 0.48 0.55 0.64 0.7  0.82]
 [0.89 0.81 0.78 0.74 0.63 0.54 0.55 0.64 0.7  0.81]
 [0.94 0.88 0.82 0.83 0.83 0.72 0.7  0.67 0.72 0.83]
 [0.97 0.94 0.87 0.9  0.89 0.76 0.75 0.7  0.71 0.81]
 [0.99 0.96 0.88 0.91 0.91 0.79 0.8  0.76 0.77 0.72]]

The number of permutations can be controlled for quick (100) or accurate (>10000) computations of significance.



In [14]:

    
print(np.round(smiPR.significance(B = 100),2))









    



[[0.   0.   0.   0.25 0.33 0.65 0.61 0.49 0.74 0.53]
 [0.11 0.   0.05 0.56 0.84 0.96 0.99 0.96 0.99 0.89]
 [0.36 0.01 0.   0.02 0.26 0.84 0.97 0.98 0.95 0.85]
 [0.59 0.22 0.02 0.   0.01 0.21 0.88 0.99 0.96 0.92]
 [0.73 0.41 0.13 0.01 0.   0.   0.51 0.93 0.92 0.9 ]
 [0.95 0.97 0.98 0.82 0.3  0.   0.44 0.93 0.89 0.91]
 [0.98 0.96 1.   0.99 0.87 0.41 0.34 0.98 0.92 0.93]
 [0.96 0.96 1.   1.   1.   1.   1.   0.99 0.97 0.97]
 [0.97 0.99 1.   1.   1.   0.98 0.98 0.94 0.98 0.98]
 [0.99 0.99 0.96 0.99 1.   0.85 0.9  0.82 0.95 0.88]]



In [17]:

    
hop.plotSMI(smiPR, X1name='fluorescence', X2name='sensory')

The SMI values in the Procrustes Rotations case are mostly very similar to the Orthogonal Projections case. This means that the differences between the two matrices can be attributed to rotation and scaling to a large degree. With a few execpetions, we therefore see the same patterns in the significances too.

Reference:
Ulf Geir Indahl, Kristian Hovde Liland, Tormod Næs,
A similarity index for comparing coupled matrices, Journal of Chemometrics 32(e3049), (2018).



In [ ]:

	Att 01	Att 02	Att 03	Att 04	Att 05	Att 06	Att 07	Att 08	Att 09	Att 10	Att 11	Att 12	Att 13	Att 14	Att 15	Att 16	Att 17
Product
Pr 01	6.19	3.33	3.43	2.14	1.29	3.11	6.70	3.22	2.66	5.10	4.57	3.34	2.93	1.89	1.23	3.15	4.07
Pr 02	6.55	2.50	4.32	2.52	1.24	3.91	6.68	2.57	2.42	4.87	4.75	4.13	3.09	2.29	1.51	3.93	4.07
Pr 03	6.23	3.43	3.42	2.03	1.28	2.93	6.61	3.39	2.56	5.00	4.73	3.44	3.08	1.81	1.37	3.19	4.16
Pr 04	6.14	2.93	3.96	2.13	1.08	3.12	6.51	2.98	2.50	4.66	4.68	3.92	2.93	1.99	1.19	3.13	4.29
Pr 05	6.70	1.97	4.72	2.43	1.13	4.60	7.01	2.07	2.32	5.29	5.19	4.52	3.14	2.47	1.34	4.67	4.03
Pr 06	6.19	5.28	1.59	1.07	1.00	1.13	6.42	5.18	2.82	5.02	4.49	2.05	2.54	1.18	1.18	1.29	4.11
Pr 07	6.17	3.45	3.32	2.04	1.47	2.69	6.39	3.81	2.76	4.58	4.32	3.22	2.72	1.81	1.33	2.52	4.26
Pr 08	6.90	2.58	4.24	2.58	1.70	4.19	7.11	2.06	2.47	4.58	5.09	4.44	3.25	2.62	1.73	4.87	3.98
Pr 09	6.70	2.53	4.53	2.32	1.22	4.16	6.91	2.42	2.41	4.52	4.96	4.49	3.37	2.47	1.64	4.54	4.01
Pr 10	6.35	3.14	3.64	2.17	1.17	2.57	6.50	2.77	2.66	4.76	4.64	4.06	3.11	2.21	1.46	3.35	3.93
Pr 11	5.97	3.34	3.46	1.67	1.15	1.43	6.31	3.15	2.56	4.57	4.36	3.65	2.66	1.56	1.19	2.23	4.01
Pr 12	6.29	2.99	4.03	2.06	1.17	3.06	6.76	2.37	2.44	4.69	4.97	4.28	3.16	2.56	1.53	4.23	4.03
Pr 13	5.91	4.88	2.04	1.00	1.00	1.08	6.34	4.79	2.44	5.48	4.54	1.98	2.57	1.00	1.03	1.03	4.16
Pr 14	6.75	1.91	4.36	2.95	1.43	4.83	7.14	1.53	2.47	4.72	5.06	4.54	3.43	2.80	1.87	5.65	3.98

	V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	...	V283	V284	V285	V286	V287	V288	V289	V290	V291	V292
Pr 1	19222.109	19937.834	20491.777	20994.000	21427.500	21915.891	22273.834	22750.279	23215.609	23497.221	...	1338.0557	1311.9445	1275.1666	1235.7777	1204.6666	1184.944500	1140.500000	1109.888800	1099.666600	1070.500000
Pr 2	18965.945	19613.334	20157.277	20661.557	21167.334	21554.057	22031.391	22451.889	22915.334	23311.611	...	1244.5555	1217.1666	1183.9445	1156.5000	1130.0555	1084.000000	1066.500000	1039.944500	1018.500000	992.083313
Pr 3	19698.221	20438.279	21124.721	21740.666	22200.445	22709.725	23222.111	23646.225	24047.389	24519.111	...	1409.5000	1366.9445	1319.8888	1289.7778	1258.2223	1235.166600	1200.611000	1173.277800	1126.555700	1097.250000
Pr 4	20037.334	20841.779	21510.889	22096.443	22605.889	23077.834	23547.725	23974.445	24490.889	24896.945	...	1374.5000	1332.3334	1287.5000	1252.9445	1228.8334	1195.944300	1159.166600	1153.611200	1117.222300	1088.333400
Pr 5	19874.889	20561.834	21248.500	21780.889	22328.834	22812.057	23266.111	23723.334	24171.221	24601.943	...	1329.0000	1291.9445	1256.7778	1226.6110	1209.7777	1169.888800	1144.555500	1123.333400	1084.888800	1081.500000
Pr 6	19529.391	20157.834	20847.500	21308.111	21716.443	22165.775	22583.166	22993.779	23520.779	24015.221	...	1737.3888	1696.5000	1635.5000	1580.3334	1556.8334	1501.222200	1463.555500	1419.277800	1365.388800	1343.416600
Pr 7	18795.582	19485.582	20139.584	20644.668	21013.668	21480.668	21873.666	22302.418	22662.500	23097.000	...	1323.3333	1286.9167	1261.0000	1235.0833	1190.0833	1174.666700	1129.166700	1095.416600	1070.416600	1049.500000
Pr 8	20052.943	20839.445	21569.221	22150.221	22662.389	23160.389	23589.943	24117.500	24484.334	24971.666	...	1140.2778	1113.1112	1075.8334	1055.7778	1037.1112	1025.777800	986.277832	969.388855	944.944397	936.083313
Pr 9	19001.391	19709.943	20368.443	20939.111	21383.111	21879.111	22335.221	22758.834	23213.443	23688.891	...	1119.1666	1076.7777	1045.3888	1033.1112	1021.3333	994.222229	962.111084	943.000000	920.166687	899.083313
Pr 10	20602.834	21406.389	22144.611	22775.000	23407.443	23940.609	24486.111	24976.275	25480.779	25966.279	...	1248.2777	1226.7778	1195.0000	1169.5000	1135.9445	1120.888800	1069.555500	1062.833400	1034.722200	1016.750000
Pr 11	20116.443	20880.611	21584.834	22137.775	22667.166	23144.557	23592.889	24122.225	24518.221	25007.000	...	1237.1112	1196.5000	1164.8334	1152.7223	1118.1666	1104.277800	1057.555700	1046.666600	1021.611100	1007.166700
Pr 12	20282.721	21016.500	21678.279	22241.555	22751.779	23257.945	23730.000	24221.221	24638.834	25100.055	...	1192.2778	1177.8334	1130.8889	1121.0555	1099.6112	1068.722200	1053.277700	1034.388900	993.444397	992.583313
Pr 13	19508.000	20124.445	20701.057	21145.500	21529.389	21974.389	22338.834	22726.611	23156.000	23600.000	...	1710.4445	1675.3334	1589.2778	1568.9445	1515.2778	1480.611200	1424.500000	1404.777800	1358.333400	1334.250000
Pr 14	18739.391	19444.275	20072.555	20603.500	21035.389	21470.834	21912.889	22356.279	22747.225	23205.889	...	1158.2778	1155.4445	1102.9443	1081.4445	1060.2778	1044.388800	999.722229	985.222168	954.722229	935.083313